X-ray Microscopy 1107076579, 9781107076570

Written by a pioneer in the field, this text provides a complete introduction to X-ray microscopy, providing all of the

169 62 30MB

English Pages 587 [595] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

X-ray Microscopy
 1107076579, 9781107076570

Table of contents :
Cover
Front Matter
Advances in Microscopy and Microanalysis
X-ray Microscopy
Copyright
Contents
Contributors
Foreword
1 X-ray microscopes: a short
introduction
2 A bit of history
3 X-ray physics
4 Imaging physics
5 X-ray focusing optics
6 X-ray microscope systems
7 X-ray microscope instrumentation
8 X-ray tomography
9 X-ray spectromicroscopy
10 Coherent imaging
11 Radiation damage and cryo
microscopy
12 Applications, and future prospects
Appendix A.
X-ray data tabulations
References
Index

Citation preview

X-ray Microscopy Written by a pioneer in the field, this text provides a complete introduction to x-ray microscopy, providing all of the technical background required to use, understand, and even develop x-ray microscopes. Starting from the basics of x-ray physics and focusing optics, it goes on to cover imaging theory, tomography, chemical and elemental analysis, lensless imaging, computational methods, instrumentation, radiation damage, and cryomicroscopy, and includes a survey of recent scientific applications. Designed as a “one-stop” text, it provides a unified notation, and shows how computational methods in different areas are linked with one another. Including numerous derivations, and illustrated with dozens of examples throughout, this is an essential text for academics and practitioners across engineering, the physical sciences, and the life sciences who use x-ray microscopy to analyze their specimens, as well as those taking courses in x-ray microscopy. Chris Jacobsen is Argonne Distinguished Fellow at Argonne National Laboratory, and Professor of Physics and Astronomy at Northwestern University. He is also a Fellow of the American Association for the Advancement of Science, the American Physical Society, and the Optical Society of America.

Advances in Microscopy and Microanalysis Microscopic visualization techniques range from atomic imaging to visualization of living cells at near nanometer spatial resolution, and advances in the field are fueled by developments in computation, image detection devices, labeling, and sample preparation strategies. Microscopy has proven to be one of the most attractive and progressive research tools available to the scientific community, and remains at the forefront of research in many disciplines, from nanotechnology to live cell molecular imaging. This series reflects the diverse role of microscopy, defining it as any method of imaging objects of micrometer scale or less, and includes both introductory texts and highly technical and focused monographs for researchers and practitioners in materials and the life sciences Series Editors Patricia Calarco, University of California, San Francisco Michael Isaacson, University of California, Santa Cruz Series Advisors Bridget Carragher, The Scripps Research Institute Wah Chiu, Baylor College of Medicine Christian Colliex, Université Paris Sud Ulrich Dahmen, Lawrence Berkeley National Laboratory Mark Ellisman, University of California, San Diego Peter Ingram, Duke University Medical Center J. Richard McIntosh, University of Colorado Giulio Pozzi, University of Bologna John C. H. Spence, Arizona State University Elmar Zeitler, Fritz-Haber Institute

Books in Series Published Heide Schatten, Scanning Electron Microscopy for the Life Sciences Frances Ross, Liquid Cell Electron Microscopy Joel Kubby, Sylvain Gigan, and Meng Cui, Wavefront Shaping for Biomedical Imaging Chris Jacobsen, X-Ray Microscopy

X-ray Microscopy CHRIS J AC OB SEN Argonne National Laboratory, Illinois Northwestern University, Illinois

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107076570 DOI: 10.1017/9781139924542 c Chris Jacobsen 2020  This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2020 Printed in Singapore by Markono Print Media Pte Ltd A catalogue record for this publication is available from the British Library. ISBN 978-1-107-07657-0 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

Contributors Foreword 1

X-ray microscopes: a short introduction

1.1 1.2 1.3 2

A bit of history

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3

How to read this book Online appendices Key mathematical symbols and formulae

R¨ontgen and the discovery of X rays Einstein and mirrors Cold War microscopes Zone plates Synchrotrons and lasers Lensless microscopes The dustbin of history Concluding limerick

X-ray physics 3.1 The Bohr model, energy levels, and x-ray shells 3.1.1 X-ray fluorescence and Auger emission 3.1.2 X-ray transitions: fluorescence nomenclature 3.1.3 Beyond the core: the Fermi energy, valence electrons, and plasmon modes 3.2 Atomic interactions, scattering, and absorption 3.2.1 Scattering by a single electron 3.2.2 Scattering by an atom 3.3 The x-ray refractive index 3.3.1 Electromagnetic waves in media 3.3.2 The great frequency divide and the refractive index 3.3.3 X-ray linear absorption coefficient 3.3.4 The Born and Rytov approximations 3.3.5 Oscillator density in molecules, compounds, and mixtures 3.4 Anomalous dispersion: life on the edge

page xii xiii 1 2 3 3 5 5 9 12 13 14 19 21 22 23 23 26 30 35 38 41 42 45 47 49 55 57 59 62

vi

Contents

3.5 3.6 3.7 4

3.4.1 The Kramers–Kronig relations X-ray refraction X-ray reflectivity Concluding limerick

Imaging physics

4.1

4.2

4.3

4.4

4.5 4.6 4.7

Waves and rays 4.1.1 Adding up waves 4.1.2 Rayleigh quarter wave criterion 4.1.3 Connecting waves and rays Gratings and diffraction 4.2.1 Slits and plane gratings 4.2.2 Volume gratings and Bragg’s law 4.2.3 Bragg’s law and crystals 4.2.4 Synthetic multilayer mirrors 4.2.5 Momentum transfer and the Ewald sphere Wavefield propagation 4.3.1 The Huygens construction 4.3.2 Fraunhofer approximation 4.3.3 Fourier transforms: analytical and discrete 4.3.4 Power spectra of images 4.3.5 Fraunhofer diffraction 4.3.6 Fresnel propagation by integration, and by convolution 4.3.7 Fresnel propagation, distances, and sampling 4.3.8 Propagation and diffraction in circular coordinates 4.3.9 Multislice propagation Imaging systems 4.4.1 Field of view 4.4.2 Optical system via propagators 4.4.3 Diffraction and lens resolution 4.4.4 Beating the diffraction limit in light microscopy 4.4.5 Cylindrical (1D by 1D) optics 4.4.6 Coherence, phase space, and focal spots 4.4.7 Transfer functions 4.4.8 Deconvolution: correcting for the transfer function 4.4.9 Depth resolution and depth of field Full-field imaging 4.5.1 TXM condensers, STXM detectors, and reciprocity Dark-field imaging Phase contrast 4.7.1 Phase contrast in coherent imaging methods 4.7.2 Propagation-based phase contrast 4.7.3 Zernike phase contrast imaging 4.7.4 Differential phase contrast

63 64 66 70 71 71 71 76 76 78 78 80 82 83 86 92 93 95 96 100 101 102 104 106 110 112 114 116 119 122 126 129 138 141 144 149 152 154 156 159 160 162 165

Contents

4.8

4.9 4.10

4.11 4.12

4.7.5 Grazing incidence imaging Image statistics, exposure, and dose 4.8.1 Photon statistics and the contrast parameter Θ 4.8.2 Minimum detection limits 4.8.3 Signal to noise and resolution from experimental images 4.8.4 Estimating the required photon exposure 4.8.5 Imaging modes and diffraction From exposure to radiation dose 4.9.1 Dose versus resolution Comparison with electron microscopy and microanalysis 4.10.1 Elemental mapping 4.10.2 Transmission electron microscopy 4.10.3 A comparison of transmission imaging with electrons and with X rays See the whole picture Concluding limerick

vii

167 169 169 172 174 177 181 183 186 187 188 191 194 197 198

5

X-ray focusing optics 5.1 Refractive optics 5.1.1 Compound refractive lenses 5.2 Reflective optics 5.2.1 Grazing incidence spheres and toroids 5.2.2 Kirkpatrick–Baez and Montel mirrors 5.2.3 Ellipsoidal and Wolter mirrors, and single capillaries 5.2.4 Multilayer mirrors 5.2.5 Non-imaging grazing incidence optics 5.3 Diffractive optics 5.3.1 Fresnel zone plates 5.3.2 Focusing efficiency 5.3.3 Order sorting 5.3.4 Fabrication 5.3.5 Making zone plates thicker 5.3.6 Thick zone plates and multilayer Laue lenses 5.3.7 Multilayer Laue lenses: practical considerations 5.4 Combined optics 5.5 Resolution over the years 5.6 Concluding limerick

199 199 200 204 207 209 211 213 215 216 216 221 224 225 230 232 235 236 238 240

6

X-ray microscope systems 6.1 Contact microscopy 6.2 Point projection x-ray microscopes 6.3 Full-field microscopes, or transmission x-ray microscopes 6.3.1 Zone plate condensers 6.3.2 Capillary condensers

241 242 243 246 248 249

viii

Contents

6.4 6.5 6.6

Scanning x-ray microscopes Electron optical x-ray microscopes (PEEM and others) Concluding limerick

252 256 258

7

X-ray microscope instrumentation 7.1 X-ray sources 7.1.1 Photometric measures 7.1.2 Laboratory x-ray sources: electron impact 7.1.3 Unconventional laboratory x-ray sources 7.1.4 Synchrotron light sources 7.1.5 Bending magnet sources 7.1.6 Undulator sources 7.1.7 Inverse Compton scattering sources 7.1.8 X-ray free-electron lasers (FELs) 7.2 X-ray beamlines 7.2.1 Monochromators and bandwidth considerations 7.2.2 Coherence and phase space matching 7.2.3 Slits and shutters 7.2.4 Radiation shielding 7.2.5 Thermal management 7.2.6 Vacuum issues, and contamination and cleaning of surfaces 7.3 Nanopositioning systems 7.4 X-ray detectors 7.4.1 Detector statistics 7.4.2 Detector statistics: dead time 7.4.3 Detector statistics: charge integration 7.4.4 Pixelated area detectors 7.4.5 Semiconductor detectors 7.4.6 Sensor chips for direct x-ray conversion 7.4.7 Scintillator detectors: visible-light conversion 7.4.8 Gas-based detectors 7.4.9 Superconducting detectors 7.4.10 Energy-resolving detectors 7.4.11 Wavelength-dispersive detectors 7.4.12 Energy-dispersive detectors 7.5 Sample environments 7.5.1 Silicon nitride windows 7.6 Concluding limerick

259 260 261 263 266 267 271 272 277 277 280 280 282 285 286 286 287 288 292 294 297 299 302 303 308 308 310 312 313 313 314 316 318 320

8

X-ray tomography

321 322 325 327 329

8.1

8.2

Tomography basics 8.1.1 The Crowther criterion: how many projections? 8.1.2 Backprojection, filtered backprojection, and gridrec Algebraic (matrix-based) reconstruction methods

Contents

8.3 8.4

8.5

8.6 8.7

8.2.1 Numerical optimization 8.2.2 Maximum likelihood and estimation maximum Analysis of reconstructed volumes Tomography in x-ray microscopes 8.4.1 Tomographic mapping of crystalline grains 8.4.2 Tensor tomography Complications in tomography 8.5.1 Projection alignment 8.5.2 Limited tilt angles and laminography 8.5.3 Pixel intensity errors and ring artifacts 8.5.4 Beam hardening 8.5.5 Self-absorption in fluorescence tomography Limiting radiation exposure via dose fractionation Concluding limerick

ix

330 335 336 338 341 341 342 342 344 346 347 347 347 349

9

X-ray spectromicroscopy 9.1 Absorption spectromicroscopy 9.1.1 Elemental mapping using differential absorption 9.1.2 Living near the edge: XANES/NEXAFS 9.1.3 Carbon XANES 9.1.4 XANES in magnetic materials 9.1.5 XANES in phase contrast 9.1.6 Errors in XANES measurements 9.1.7 Wiggles in spectra: EXAFS 9.2 X-ray fluorescence microscopy 9.2.1 Details of x-ray fluorescence spectra 9.2.2 Fluorescence detector geometries 9.2.3 Elemental detection limits using x-ray fluorescence 9.2.4 Fluorescence self-absorption 9.2.5 Fluorescence tomography 9.3 Matrix mathematics and multivariate statistical methods 9.3.1 Principal component analysis 9.3.2 Cluster analysis and optimization methods 9.4 Concluding limerick

350 350 352 355 357 361 363 364 365 367 370 373 375 378 378 381 383 386 389

10

Coherent imaging 10.1 Diffraction: crystals, and otherwise 10.2 Holography 10.2.1 In-line or Gabor holography 10.2.2 Off-axis or Fourier transform holography 10.2.3 Holography, ankylography, and 3D imaging 10.3 Coherent diffraction imaging and phase retrieval 10.3.1 X-ray speckle and object size 10.3.2 Coherent versus incoherent diffraction

390 390 394 396 399 403 404 406 407

x

Contents

10.4

10.5 10.6 10.7

10.3.3 Iterative phase retrieval algorithms 10.3.4 Coherent diffraction imaging with X rays 10.3.5 CDI geometry and notation 10.3.6 Iterative phase retrieval algorithm details 10.3.7 Focus and resolution in CDI 10.3.8 Bragg CDI Ptychography 10.4.1 Ptychography geometry and resolution gain G p 10.4.2 Ptychography reconstruction algorithms 10.4.3 Focus and resolution in ptychography 10.4.4 Ptychography experiments 10.4.5 Bragg ptychography 10.4.6 Beyond strict Nyquist sampling Coherent imaging beyond the pure projection approximation CDI at XFELS: diffract before destruction Concluding limerick

410 415 418 420 423 425 433 435 440 443 444 446 447 447 451 456

11

Radiation damage and cryo microscopy 11.1 Specimen heating 11.1.1 Anti-Goldilocks and the “no-fly zone” 11.1.2 Heating and ionization with short, intense pulses 11.2 Radiation damage 11.2.1 Radiation damage in soft materials 11.2.2 Radiation damage in water and in hydrated organic materials 11.2.3 Radiation risk in humans 11.2.4 Radiation damage in initially living specimens 11.2.5 Dose rate effects 11.2.6 Specimen size effects 11.2.7 Low-dose strategies 11.3 Cryo microscopy 11.3.1 Vitrification and amorphous ice 11.3.2 Radiation damage to organics at cryogenic temperatures 11.3.3 Bubbling in frozen hydrated specimens 11.3.4 Radiation damage limits to resolution in cryo x-ray microscopy 11.4 Radiation damage in hard materials 11.5 Concluding limerick

457 457 460 460 462 462 468 469 471 474 476 476 477 484 489 492 493 494 495

12

Applications, and future prospects 12.1 Life science 12.2 Geoscience and environmental science 12.3 Astrobiology 12.4 Materials science 12.5 Cultural heritage 12.6 Future prospects

496 496 500 502 503 510 511

Contents

12.7 Appendix A

xi

Concluding limerick

513

X-ray data tabulations

515

References Index

519 573

Contributors

Janos Kirz

Lawrence Berkeley National Laboratory (retired) Malcolm Howells

Lawrence Berkeley National Laboratory, USA (retired) Michael Feser

Lyncean Technologies, USA ˘ Gursoy ¨ Doga

Advanced Photon Source, Argonne National Laboratory, USA Adam Hitchcock

McMaster University, Canada

Foreword

X-ray microscopy is an interdisciplinary topic, both in terms of its technical details and in terms of the scientific and engineering problems it is applied to. While there are a number of books that provide excellent coverage of certain aspects of x-ray physics, optics, and microscopy, it is my opinion that there has not been a single book that one can hand to someone new in the field of x-ray microscopy to give them an introduction to most of the key aspects they should know about. This book is an attempt to fill that need. Are you a new PhD student entering a research group who will use x-ray microscopy for part of your research? If so, you have probably had at least a year or so of university physics during your studies. You are whom I have written the book for! At times I may push you a bit further in mathematics or physics than what you have learned thus far, but if you are in a PhD program you are a serious enough student so this should be OK. Besides, you can always skim over some of the more detailed points. Are you an established researcher or engineer who is new to x-ray microscopy? This book is also for you! Your expertise might be with microscopes using other radiation, or on materials you hope to understand better using x-ray microscopy. What I hope to do in this book is to give you a feel for the fundamental ideas that come into play in a variety of x-ray microscopy approaches and applications, and to do so with enough detail to allow you to go off and invent new approaches of your own. I look forward to seeing your contributions to x-ray microscopy! What do I mean by x-ray microscopy? I have decided to focus on imaging at a spatial resolution of a few micrometers down to nanometers. This is not a book on medical radiology at 0.1 mm resolution as limited by acceptable radiation exposure, and it is not a book on crystallography. I consider X rays to be photons with an energy well above the plasmon resonance (20–50 eV for most solids) and in particular above about 100 eV, and I tend to concentrate on energies below 20 keV since at higher energies the fine structure that one hopes to see in a microscope has reduced contrast. While much useful research is done in an approach where X rays illuminate an area and magnetic or electrostatic lenses image the electrons that come off of the surface, these photoelectron emission microscopes (PEEM and its variations) are based on electron, not x-ray, optics so they are given only brief treatment in Section 6.5. However, I do discuss xray microscopy approaches where one uses the properties of x-ray scattering to recover images without the use of lenses in Chapter 10, and I also include the combination of x-ray microscopy with absorption and fluorescence-based spectroscopy in Chapter

xiv

Foreword

9. I discuss three-dimensional imaging or tomography as a natural extension of twodimensional microscopy in Chapter 8. Chapter 7 covers what I consider to be essential points on x-ray microscope instrumentation. X rays are ionizing radiation, so Chapter 11 is devoted to radiation damage as well as cryo microscopy methods that can help in minimizing damage. While Chapter 12 discusses applications of x-ray microscopy, these applications ultimately involve detailed knowledge in their respective scientific specialties, which may be undergoing rapid development. Therefore the coverage here is rather brief, while pointing out recent review papers when possible. I expect that I have made many sins of commission, and of omission. Cambridge University Press has a web page www.cambridge.org/Jacobsen associated with this book (one can also reach this web page with www.cambridge.org/9781107076570). This web page will host errata, as well as online Appendices B and C. This book was originally undertaken as a team effort with one of my favorite people in the world: Janos Kirz, who is one of the real pioneers in x-ray microscopy. However, the book has taken longer to complete than we had hoped, and Janos has rightfully been enjoying his retirement more completely as of late. His fingerprints are all over the earliest chapters, and he has provided valuable feedback on the entire tome. However, as the book has grown and developments in later chapters have motivated rewrites of earlier ones, all of the warts and blemishes in what remains have become my fault alone. Therefore at Janos’ request he is no longer listed as a coauthor – which means, I guess, that you can’t blame him for anything that’s wrong or incomplete! A number of other people have provided wonderful input. Some are listed as contributors to specific chapters, in which case I will not thank them again here. But people like Marc Allain, Elke Arenholz, Lahsen Assoufid, Anton Barty, Anna Bergamaschi, Sylvan Bohic, Anibal Boscoboinik, Virginie Chamard, Henry Chapman, Si Chen, Yong Chu, Marine Cotte, Bj¨orn De Samber, Peter Fischer, Manuel Guizar-Sicairos, Mirko Holler, Young Pyo Hong, Xiaojing Huang, Sarah K¨oster, Florian Meirer, Nino Miceli, G¨unter Schmahl (1936–2018), Xianbo Shi, Pierre Thibault, Stephen Urquhart, Ivan Vartanyants, Pablo Villanueva-Perez, Stefan Vogt, Michael Wojcik, Russell Woods, and Hanfei Yan have taken the time to read various sections of the book and give important critical comments and suggestions, or contributed figures. Several of my Northwestern University PhD students (Sajid Ali, Ming Du, and Saugat Kandel in particular) have given me great feedback on specific sections. Joshua Zachariah made early versions of several figures. Again, you can’t blame any of the above for my mistakes, but you can thank them for reducing their number. One can only undertake the project of writing a book like this with lots of support. The Advanced Photon Source at Argonne National Laboratory (a U.S. Department of Energy Office of Science user facility) has generously supported me in devoting considerable time to this effort, since x-ray microscopy is one of its widely used methods. My wife, Holly, has been patient with me in so many ways, and has helped keep me in balance as the project progressed by joining me on many activities, adventures, and travels that have kept me refreshed and enthusiastic! Some income from this book is being directed to a student prize at the international conference series on x-ray microscopy.

1

X-ray microscopes: a short introduction

X-ray microscopes are systems in which an x-ray beam is used to illuminate a specimen, and some sort of x-ray image is obtained with a spatial resolution δr of micrometers to nanometers.1 Some microscopes use an x-ray lens such as a Fresnel zone plate to produce a magnified image (Fig. 1.1), while others use an x-ray lens to produce a small beam spot through which the sample is raster-scanned while image data are collected (Fig. 1.2). When using microscopes, one of the first questions asked is this: “What is the magnification?” This is an eminently sensible question to ask when one is looking at an image through the eyepieces of a visible light microscope, and indeed the objective lenses and eyepiece lenses in visible light microscopes are usually labeled in terms of their respective magnification, such as 40× and 10× to yield a net magnification of 400×. However, we do not recommend that you somehow contrive to make an x-ray eyepiece! Your eye does not directly register x-ray images, and in any case you do not wish to expose any part of yourself to such high doses of X rays (this will be discussed in Chapter 11). Because we are likely to view the same image at vastly different magnifications (ranging from printed images in journal papers, to images on computer screens, to very large images projected in conference rooms), it is far more convenient to instead talk about the spatial resolution δr of images (see Section 4.4.3), and a field of view which is the viewable width and height at the specimen’s location. (A common practice that we strongly recommend is to place a scale bar on the image, which shows how large some defined distance would appear; see for example Fig. 4.60.) That is, an image might have a resolution of δr = 20 nanometers (or nm), and a field of view of 10 micrometers (or μm) on a side. The continuous intensity variations I(x, y) in Cartesian coordinates are almost always sampled onto a regular discrete array I[i x , iy ] with spacing Δ x and array indices i x = 0, 1, . . . , (N x − 1), and corresponding values in y. Thus one might encounter an image with a picture element or pixel size of Δ x,y = 10 nm, and N x = 1024 and Ny = 768 pixels, giving a field of view of 10.24 × 7.68 μm. The extension to 3D imaging involves volume elements or voxels, and the z direction. Of course the image is just a particular representation of the object under study; for incoherent brightfield imaging, the image intensity I(x, y) represents the magnitude squared of the wavefield at a particular plane, which hopefully is the downstream side of the specimen so that one obtains a pixel-by-pixel mapping of x-ray absorption in 1

We follow the convention of the American Institute of Physics style guide, so that the noun is “X ray” and the adjective is “x-ray.”

2

X-ray microscopes: a short introduction

Object N.A.

Large source

Objective Image zone detector plate

Condenser zone plate

Figure 1.1 Schematic of a transmission x-ray microscope, delivering a full-field image. An x-ray source illuminates a specimen (often by using a condenser lens to image the source onto the specimen), and the transmitted wavefield is imaged by an objective lens onto a pixelated detector. The numerical aperture (N.A.) of the objective lens is indicated (lens radius divided by focal length; see Eq. 4.172).

Zone plate objective

Sample, raster scanned

N.A.

Undulator Monochromator

Order Sorting Aperture

Fluoresc. detector

Transmitted beam detector

Figure 1.2 Schematic of a scanning x-ray microscope, delivering a scanned image. An x-ray

source such as an undulator at a synchrotron light source is (optionally) monochromatized, and an objective lens with numerical aperture N.A. images this source (or an illuminated aperture) to produce a small focal spot through which a specimen is scanned. One detector might record the transmitted signal (either measuring the total signal, or measuring its redistribution such as with a pixelated detector), and other detectors such as an energy-resolving detector for x-ray fluorescence signals can be used. Figure modified from [de Jonge 2010a].

the specimen. In fact the image usually consists of this signal S due to the presence of contrast in the specimen, and noise N due to stray light, statistical fluctuations, or other causes; the signal-to-noise ratio (SNR) weighs their relative contributions (this is discussed in Section 4.8). As we shall see, some x-ray microscopes deliver multiple image signals, such as simultaneous absorption and phase contrast, or energy-dependent signals as will be discussed in Chapter 9.

1.1

How to read this book Of course you should really read every word of this book, or even memorize it! But not all readers will need to be concerned with every detail, so here are a few suggestions: • If you just want to get a feel for what x-ray microscopes can do, look at Chapter 12,

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:06:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.002

1.2 Online appendices

3

which summarizes a number of recent scientific applications of x-ray microscopy while providing representative images. You can then put this book on your shelf and pull it down when you need to read up on certain details. • If you are wondering what type of x-ray microscope to use for a specific application, Chapter 6 provides an overview. • If your main interest is in using x-ray absorption spectroscopy combined with imaging, see Section 9.1. If you are mostly interested in fluorescence imaging of the distribution of chemical elements in a specimen, see Section 9.2. • Be aware of limitations due to ionizing radiation damage, as discussed in Chapter 11. Many of the later chapters refer back to discussions of the fundamentals of x-ray physics in Chapter 3 and imaging physics in Chapter 4. In most cases there will be a crossreference to the exact section or equation number.

1.2

Online appendices Two short appendices available online at www.cambridge.org/Jacobsen2 supplement this book: • Online Appendix B contains further detail on how to calculate the visible and x-ray refractive index, and properties derived from it. • Online Appendix C provides examples of the many different ways that the key formulae for maximum likelihood and estimation maximum approaches are written in the literature. These are short enough to print out, if so desired.

1.3

Key mathematical symbols and formulae One of the beauties of physics is that there is widespread agreement on the basic notation: we all know what F = ma means, for example. This is mostly but not completely true in x-ray optics and microscopy. Therefore we list our notation for some of the most important mathematical terms in x-ray microscopy in Table 1.1. There are certain instances where key terms and formulae are written in different ways within the community. Some write the x-ray refractive index as n = 1 − δ + iβ whereas we use n = 1 − δ − iβ, as discussed in Box 3.4. Our usage of the terms “magnitude” and “amplitude” is discussed in Box 4.1. The definition of momentum transfer q varies in the literature, as discussed in Box 4.2. We discuss depth resolution δz and depth of field (DOF) in Box 4.7.

2

See also www.cambridge.org/9781107076570

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:06:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.002

4

X-ray microscopes: a short introduction

Table 1.1 Key mathematical symbols and their meaning in this book, along with the section

where they first are described. See also Box 4.1 for our usage of the words “magnitude” and “amplitude.” Additional photometric quantities are shown in Section 7.1.1. Symbol Meaning Location X-ray wavelength λ, photon energy E, and λ = hc/E Planck’s constant h (Eq. 3.2) multiplied by the Eq. 3.7 speed of light c (Eq. 3.55). na and ne Atom (na ) and electron (ne ) number density Eqs. 3.21 and 3.22 Λ and σ Mean free path (Λ) and cross section (σ) Eq. 3.25 Complex number of oscillator modes per atom, Eq. 3.42 ( f1 + i f2 ) which varies with x-ray energy E. 2 α As used in n = 1 − αλ ( f1 + i f2 ) Eq. 3.66 Linear absorption coefficient (μ) and inverse Eqs. 3.45 and 3.75 μ attenuation length (μ−1 ) μ Mass absorption coefficient Eqs. 3.78 and 9.3 X-ray refractive index n with its phase-shifting part Eq. 3.67 n = 1 − δ − iβ δ and absorptive part β  θc Critical angle for grazing incidence reflectivity Eq. 3.115 δr Spatial resolution Eq. 4.173 Figs. 1.1 and 1.2, N.A. Numerical aperture of an optic and Eq. 4.172 drN Outermost zone width in a Fresnel zone plate Eq. 5.27 Depth resolution (depth of field is DOF = 2δz ; see Eq. 4.213 δz Eqs. 4.214 and 4.215) Pixel size (picture element size) at the object. The subscript r can be thought of as referring to real Eq. 4.87 Δr space rather than Fourier space, or a vector coordinate r with components in xˆ and yˆ . Δu Pixel size in Fourier space Eq. 4.92 Δdet Size of a pixel on an area detector. Eq. 4.93 N Number of image pixels (as in N x and Ny ) Eq. 4.87 Spatial frequencies, or wavelength-normalized Eqs. 4.32 and 4.88 u x , uy diffraction angles u x = θ x /λ, uy = θy /λ F Fourier transform, as in G(u x ) = F {g(x)} Eq. 4.80 SNR Signal-to-noise ratio, or S /N Section 4.8.1 DQE Detective quantum efficiency Eq. 7.34 Θ Contrast parameter Eq. 4.238 Φ Flux, in photons/second Section 7.1.1 F Fluence, in photons/area or photons/m2 Section 7.1.1 I Intensity Eq. 4.3

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:06:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.002

2

A bit of history

Those who cannot remember the past are condemned to repeat it – George Santayana, Reason in Common Sense (Vol. 1 in The Life of Reason), 1905. Janos Kirz contributed to this chapter.

2.1

¨ Rontgen and the discovery of X rays The words of discovery are rarely those of Archimedes’ legendary shout of “Eureka!” or “I have found it!” as he supposedly leaped naked from his bathtub (good thing there weren’t webcams in those days!). Instead, the words of discovery are more likely to be along the lines of “hmm . . . that’s odd.” Such is the case of the discovery of X rays [Glasser 1933, Mould 1993]. At the time of their discovery, many investigators were carrying out experiments with various types of cathode ray tubes, but it was only Wilhelm Conrad R¨ontgen, Professor and Director of the Physical Institute at the University of W¨urzburg, who noticed some curious phenomena and decided to investigate further. R¨ontgen was 50 years old at the time, with a reputation for care in experiments even though his research in the physics of gases and fluids was not particularly cutting-edge. Cathode rays (which we would now call electron beams) were all the rage at the time, so R¨ontgen decided to investigate whether they would exit thin-walled Hittorf–Crookes tubes. To make it easier to use a phosphor to try to observe this, he surrounded a tube with black paper and worked in a darkened room. While setting up the experiment late on a Friday afternoon (November 8, 1895), he noticed that the phosphor was flickering in synchrony with the fluctuations of the glowing filament in the tube – even though the phosphor was some distance away, and with black paper in between! The odd phenomena immediately captured R¨ontgen’s attention to the point that he did not notice an assistant entering the room later on to retrieve some equipment. When R¨ontgen’s wife Bertha finally succeeded in getting a servant to coax him upstairs to their apartment on the top floor of the Institute, R¨ontgen ate little of his supper and spoke even less before returning that evening to the puzzle in the lab.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

6

A bit of history

Over the course of the following weeks, R¨ontgen did precisely what a careful investigator would do: while suspecting that he was seeing a new type of radiation, he tried to rule out alternative explanations for his curious observation, and he explored with great care the characteristics of the phenomenon. He noticed that the “rays” seemed to travel in straight lines with no Fresnel fringes, no mirror reflection, and no refraction, and that they were absorbed preferentially by materials with higher atomic numbers. At one point during his experiments while he was picking up a lead brick, he noticed something startling: he could see the bones of his fingers on his phosphor screen! Along with his discovery that these “rays” could darken photographic plates, he soon brought Bertha into the lab and recorded a famous radiograph of her hand with her wedding ring, though she is said to have remarked unhappily “Now I have seen my death!” [Mould 1993]. With quite some trepidation, R¨ontgen produced a handwritten manuscript summarizing his observations in 17 numbered sections which included his observation that “if one holds the hand between the discharge apparatus and the screen, one sees the darker shadows of the bones within the much fainter picture of the hand itself” (transla¨ tion by Glasser [Glasser 1933]). The title of his manuscript was “Uber eine neue Art von Strahlen (Vorl¨aufige Mitteilung)” or “On a new kind of rays (preliminary message),” and within the manuscript he stated “Der K¨urze halber m¨ochte ich den Ausdruck ‘Strahlen’ und zwar zur Unterscheidung von anderen den Namen ‘X-Strahlen’ gebrauchen” or “for the sake of brevity I would like to use the term ‘rays,’ and to distinguish them from others I will use the name ‘X-rays’.” This manuscript [R¨ontgen 1895], which contained no images, was submitted to the Sitzungsberichte der Physikalisch-medicinischen Gesellshaft zu W¨urzburg or the Transactions of the W¨urzburg Physical Medical Society on Saturday, December 28, 1895, or seven feverish weeks after that fateful afternoon. We must now pause to consider the notion that our internet-connected world is frenetic compared to a stately past. Can you imagine what might happen if you were to submit a paper in the Americas or in Europe on the Saturday between Christmas and New Year’s day, and how long it would be before it appeared in print even if it didn’t have to go through peer review? If this is within your experience, you may find the following sequence hard to believe. Starting from R¨ontgen’s handwritten manuscript delivered on Saturday, December 28, the journal issue became available on New Year’s Day (Wednesday, January 1, 1896) at which time R¨ontgen was able to pick up printed copies of his article. He then mailed copies to several respected physicists for their comment, along with photographs (not part of the publication) including the famous one of his wife’s hand. Since R¨ontgen was still not confident that he had not overlooked a more mundane explanation for his observations, he is said to have stated “Now the devil will be to pay.” Perhaps instead the angels danced a jig! One of the recipients of R¨ontgen’s preprint was Franz Exner in Vienna, who showed it to some friends at a party, including Ernst Lecher who was the son of the editor of Die Presse in Vienna. Thus it was that the sensational news came to appear on the front page of the Sunday edition of the newspaper on January 5, 1896, and in the grand tradition of media echo chambers the Daily Chronicle in London reported this on the following day: Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

¨ 2.1 Rontgen and the discovery of X rays

7

The noise of the war’s alarm should not distract attention from the marvelous triumph of science which is reported from Vienna. It is announced that Prof. Routgen [sic] of the Wurzburg University has discovered a light which for the purpose of photography will penetrate wood, flesh, cloth, and most other organic substances. The professor has succeeded in photographing metal weights which were in a closed metal case, also a man’s hand which showed only the bones, the flesh being invisible.

This was followed by brief commentaries in the New York Electrical Engineer on January 8, in the New York Medical Record on January 11, in Nature on January 16 (p. 253), and in Science on January 24 (p. 131). An English translation of the paper was printed in Nature on January 23 [R¨ontgen 1896a], and in Science on February 24 [R¨ontgen 1896b] – both of which included a reprint of the photo of Bertha’s hand. The New York Times started out with skepticism, first commenting on January 19 in a page 1 report about R¨ontgen’s “alleged discovery of how to photograph the invisible” but then on January 26 the report on page 1 stated “R¨ontgen’s photographic discovery increasingly monopolizes scientific attention. Already numerous successful applications of it to surgical difficulties are reported from various countries. . .” [Bakalar 2009]. In the meantime, R¨ontgen had been summoned to appear in Berlin before Emperor Wilhelm II on Monday, January 13 (he remarked, “I hope I shall have ‘Kaiser luck’ with this tube, for these tubes are very sensitive and are often destroyed in the very first experiment, and it takes about four days to evacuate a new one”). The tube worked, and the Kaiser awarded R¨ontgen with the Prussian Order of the Crown, Second Class. In an age of multiple conferences to present new results to, it is remarkable to think of how many public lectures R¨ontgen gave on his results: one! This was at his institute on Thursday, January 23; the talk was introduced by the anatomist Albert von K¨olliker, whose hand R¨ontgen recorded a radiograph of during the lecture. The response was rousing, of course, and von K¨ollicker responded at the end of the talk by leading three cheers and saying that the rays should not be called “X rays” as R¨ontgen had written in his paper, but “R¨ontgenstrahlung” (this term remains in use in Germany today). R¨ontgen refused additional opportunities to present his results in public, including a request to make a presentation to the Reichstag and even at his ceremony for receiving the first Nobel Prize in Physics in 1901! (Alfred Nobel passed away on December 10, 1896, but it took until 1900 for the Nobel Foundation to be formally established.) Since R¨ontgen had not observed significant refraction or reflection of his new rays, there seemed to be no optics to deliver magnified images (we now know about compound refractive lenses; see Section 5.1.1). The lack of optics would seem to thwart the development of x-ray microscopes. How to see finer detail? One early approach was to use a visible light microscope to magnify the x-ray radiograph recorded on micrometergrain-size photographic emulsions [Ranwez 1896]. X rays found application before their nature was understood. They even found early misapplication: an example is x-ray irradiation of the brain as a treatment for epilepsy in experiments by Mihran Kassabian in Philadelphia in 1903–1904. Two patients died, though symptoms of epilepsy were reduced. . .. Kassabian lost several fingers due to radiation burns from handling tubes in operation and is said to have used an assumed Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

8

A bit of history

name when checking into a hospital for treatment so as not to cast a dark light on X rays [Brown 1995]. He died in 1910. R¨ontgen published only two more incremental observations on the nature of his new rays on March 9, 1896 (including the observation that they ionize air and other gases) and on March 10, 1897. However, later on he had an indirect role in solving the puzzle of their nature. R¨ontgen was lured away in 1900 to be the Chair of Experimental Physics at the Ludwig Maximilians University of Munich, where in 1906 he helped recruit Arnold Sommerfeld to be the Chair for Theoretical Physics. Sommerfeld became convinced that X rays were pulsed electromagnetic waves and estimated their wavelength to be of order 1 Å based on diffraction by a slit, but the evidence was considered to be inconclusive so that even Sommerfeld himself wrote in 1905 “it is a shame that, ten years after R¨ontgen’s discovery, one still doesn’t know what R¨ontgen rays really are” [Authier 2013]. Meanwhile, Sommerfeld had recruited Max von Laue. Convinced that X rays must be a short-wavelength version of visible light, von Laue drew upon the presence in Munich of minerologist Paul von Groth (who was a disciple of Auguste Bravais’ notion that crystals are formed from atoms organized in an lattice of unit cells) to propose that the regular spacings of atoms in a crystal might act like the regular bars of a diffraction grating for light, and set Paul Knipping and Walter Friedrich to carry out the experiment. On April 23, 1912, they found success, with a diffraction pattern that by modern standards shows barely discernible diffraction blobs rather than spots,1 though the experimental results were immensly improved by the time publications appeared in the literature [Friedrich 1912, Friedrich 1913, Laue 1912, Laue 1913]. This provided firm evidence of the wavelength of X rays, and led to von Laue receiving the 1914 Nobel Prize in Physics. By November 1912, the father–son team of William Henry and William Lawrence Bragg in Leeds had worked out [Bragg 1913a] the simple relationship for diffraction from atomic planes that we now refer to as Bragg’s law (Eq. 4.33); they were jointly awarded the Nobel Prize in Physics in 1915. Earlier studies by Charles Barkla had shown that X rays are polarized (1904 in Liverpool) and that X rays emitted from different targets include characteristic radiation (1906), a component of radiation with a penetration dependent on the atomic number of the material. Barkla went on to find two series of such radiation, which he first labeled A and B [Barkla 1909] but then labeled K and L [Barkla 1911] in case there might be a series before A (Barkla received the 1917 Nobel Prize in Physics). The work of Barkla, as well as that of the Braggs, inspired Henry Moseley of Oxford to build in 1913 a crystal spectrometer and find that the energy of Barkla’s characteristic X rays scaled as atomic number squared [Moseley 1913], as we shall see in Eqs. 3.11–3.13. If Mosely had not been killed by a sniper in 1915 while serving with the British Army in Galipoli, one can imagine that he might have shared the Nobel Prize honors with Barkla.

1

See for example http://www.iucr.org/publ/50yearsofxraydiffraction/full-text/laues-discovery

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

2.2 Einstein and mirrors

2.2

9

Einstein and mirrors The next important conceptual insight into the nature of X rays came from none other than Albert Einstein [Einstein 1918]. He speculated that bright regions at the edges of radiographs of human limbs made by A. K¨ohler of Wiesbaden might be due to grazing incidence x-ray reflection. Einstein then stated:2 According to the classical theory of dispersion we have to expect that the index of refraction n for x-rays is close to 1 but in general different from 1. Whether n will be greater or less than 1 will depend on whether the electrons dominating the dispersion have eigenfrequencies smaller or larger than the frequencies of the x-rays. The difficulty in determining n lies in the fact that (n − 1) is very small (about 10−6 ). But it is obvious that for almost grazing incidence there must be detectable total reflection for x-rays in the case n < 1.

The radiograph was not shown. It might have actually displayed Fresnel fringes from a non-contact radiograph, as Einstein himself suggested in a note added in proof. Still, as we will see in Section 3.3.2, the refractive index n for X rays is indeed slightly less than 1. To our knowledge this is the first time the x-ray refractive index was expressed as n = 1 − with  1, though of course Einstein was surely aware of the Drude model [Drude 1902] for the visible refractive index, and Charles Galton Darwin (grandson of the naturalist Charles Robert Darwin) had predicted some years earlier [Darwin 1914] that the x-ray refractive index would be n = 1 + with  10−6 . It also seems that Einstein’s brief comment was missed by Compton, who wrote a more widely read paper [Compton 1923] concluding that the x-ray refractive index is less than 1 based on two experimental results: 1. The 1919 PhD dissertation [Stenstr¨om 1919] of Wilhelm Steinstr¨om in Lund, Sweden who was the first to demonstrate experimentally a refractive correction to Bragg’s law and write it in terms of n = 1 − δ. 2. Additional measurements on a deviation from Bragg’s law, from Duane and Patterson at Harvard [Duane 1920], who were in turn aware of Stenstr¨om’s results. This led Compton to modify Bragg’s equation to the form shown in Eq. 4.34. A refractive index of n < 1 implies a wave velocity faster than the speed of light in vacuum c, and it is curious that Einstein, author of the theory of special relativity, did not comment on this fact. Did he feel it was obvious that while the phase velocity of X rays in media is faster than c (Eq. 3.56), the group velocity would turn out (Eq. 3.73) to be less than c? In any case, as we will see in Section 5.2, an index of refraction of the form n = √ 1 − δ implies a critical angle for grazing angle reflectivity of 2δ, a result that was first calculated and demonstrated by Compton [Compton 1923] and is in line with Einstein’s 2

In the original German: “Nach der klassichen Dispersiontheorie m¨ussen wir erwarten, daß der Brechungsexponent n f¨ur R¨ontgenstrahlen nahe an 1 liegt, aber im allgemeinen doch von 1 verschieden ist. n wird kleiner bzw. gr¨oßer als 1 sein, je nachdem der Einfluß derjenigen Elektronen auf die Dispersion u¨ berwiegt, deren Eigenfrequenz kleiner oder gr¨oßer ist als die Frequenz der R¨ontgenstrahlen. Die Schwierigkeit einer Bestimmung von n liegt darin, daß (n − 1) sehr klein ist (etwa 10−6 ). Es ist aber leicht einzusehen, daß bei nahezu streifender Inzidenz der R¨ontgenstrahlen im Falle n < 1 eine nachweisbare Totalreflexion auftreten muß.” The translation shown is due to Dr. Angelika Osanna.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

10

A bit of history

suggestion. This makes it possible to make mirror optics, including focusing optics, with curved surfaces. The problem is that it is very hard to make grazing incidence optics with sufficiently smoothly polished surfaces and exactly the desired figure. The first to begin to succeed and generate excitement was Paul Kirkpatrick, a professor at Stanford University. Together with his Mexican-American student Albert Baez (father of Joan, who became a famous folk-singer in the USA), they developed a scheme of using two orthogonal cylindrical mirrors (now known as Kirkpatrick–Baez or simply KB mirrors [Kirkpatrick 1948a, Kirkpatrick 1948b, Baez 1950]; see Fig. 2.1 and Section 5.2.2) to achieve 2D focusing. Kirkpatrick even published an article in Scientific American in 1949 entitled “The X-ray microscope” [Kirkpatrick 1949b], where the synopsis in the table of contents proudly announced It would be a big improvement on microscopes using light or electrons, for X-rays combine short wavelengths, giving fine resolution, and penetration. The main problems standing in the way have now been solved.

This work opened the floodgates. A variety of groups, especially in the USA, England, Sweden, and Germany, began developing x-ray microscopes. As it turned out, there were indeed still a few more problems to be solved. The Kirkpatrick–Baez design has serious off-axis aberrations, so that it is usually not used as a lens for full-field imaging; however, it does work well for imaging a collimated beam to a small spot in scanning microscopes, as will be described in Section 5.2. In addition, the surfaces of grazing incidence optics must be made very smooth, as will be shown in Eq. 3.124. One particularly notable set of advances dating back to the late 1980s [Mori 1987, Higashi 1989] has been made by a group led by Kazuto Yamauchi at the University of Osaka, leading to several landmarks [Yamamura 2003, Mimura 2010] in x-ray nanofocusing with grazing-incidence reflective optics. Their work is summarized in a recent book chapter [Yamauchi 2016], and they have helped found the Japanese company JTEC, which sells mirrors that have been used by others for their own spectacular results [da Silva 2017]. A conceptually simple alternative to using two optics is to use ellipsoids of revolution to produce a small focal spot from one optic. Given the need for grazing incidence, the shape of the ellipsoid is like a very skinny cigar. The group of Kunz built the first microscope using this scheme [Voss 1992a], but the mirror, painstakingly ground and polished, was nowhere close to delivering the specifications [Kunz 1995]. Many years later, Bilderback came up with the idea of shaping glass capillaries in an oven to form the right optics, and his team was able to demonstrate a sub-micron focus [Bilderback 1994b, Bilderback 1995]. The idea was also developed by others, including Xradia/Carl Zeiss X-ray Microscopy, who use it as a condenser system in many of their microscopes [Zeng 2008]. In 1952, Hans Wolter in Kiel, Germany developed aberration-free designs for fullfield imaging [Wolter 1952], as shown in Fig. 5.10. One of these schemes involves a confocal pairing of a paraboloid and a hyperboloid. However, these designs were much too difficult to implement at the time. (Incidentally, Wolter was the first to articulate the potential of the “water window” spectral range between the carbon and oxygen K Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

2.2 Einstein and mirrors

Kirkpatrick and Baez, 1948

11

Pattee, 1953

(Kirkpatrick 1949)

Kirkpatrick and Pattee, 1953 Figure 2.1 The crossed-1D-lenses focusing scheme developed by Kirkpatrick and Baez using

elliptical profile mirrors [Kirkpatrick 1948a]. At top left is an illustration from Kirkpatrick’s 1949 contribution to Scientific American [Kirkpatrick 1949b]. At bottom is a three-mirror system constructed by Kirkpatrick and Pattee [Kirkpatrick 1953], and at top right is an improved, concentric-cylinder mounting scheme developed in 1953 by Pattee (picture from a 1983 letter by Pattee to M. Howells). More recent focusing schemes involving compound refracting lenses (Section 5.1.1) and multilayer Laue lenses (Section 5.3.6) also sometimes use the crossed-1D-lens focusing geometry. Adapted from a photograph sent in 1983 from Howard Pattee to Malcolm Howells; used with permission from Pattee, and also shown in [Kirz 2009].

edges at 290 eV and 540 eV respectively; see Fig. 2.5.) Nested Wolter mirrors are in use today in state-of-the-art x-ray telescopes orbiting the Earth, but Wolter mirrors have not had much impact in microscopy in spite of considerable effort [Onuki 1992, Aoki 1992, Aoki 1998]. To overcome the challenges of grazing incidence optics, Eberhard Spiller of IBM Research decided in 1972 to coherently combine the individually weak reflectivity of many refractive interfaces via multilayer mirrors [Spiller 1972, Spiller 1974], as will be discussed in Section 4.2.4. Spiller’s approach was to use electron beam evaporation to deposit the layers. Very successful multilayer mirrors were produced using sputtering soon after at Stanford and CalTech by Underwood and Barbee for x-ray astronomy applications [Underwood 1979, Underwood 1981a, Underwood 1981b]. For normal incidence reflectivity, the layers must be about half a wavelength apart so clearly this can only be achieved for very long wavelength X rays. Microscope objectives based on two concentric spherical mirrors operating at near-normal incidence had been developed by Schwarzschild for visible light [Schwarzschild 1905]. Spiller attempted Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

12

A bit of history

to use this geometry with multilayer coatings to build a microscope for soft X rays [Spiller 1980, Spiller 1984], but it did not perform very well. This did not slow down the USA space agency NASA, which in 1992 gave their NASA Scientist of the Year Award [Hoover 1992, NASA 1993] to Richard Hoover for inventing, according to NASA, a “revolutionary new microscope [that] should enable researchers to see in great detail high contrast x-ray images of proteins, chromosomes and other tiny carbon structures inside living cells. Resolution of the microscope could be so high that it may produce detailed images of the building blocks of life—tiny DNA molecules.” Several of us got together and sent a letter to the head of NASA, pointing out that radiation damage would make it impossible to do what was being claimed, as will be discussed in Chapter 11. We received an answer from the top lawyer at the agency, who assured us that no laws were violated (apparently the laws of nature were not of concern). Subsequent publications didn’t include any x-ray micrographs [Hoover 1993, Hoover 1994]. In any case, a successful multilayer mirror Schwarschild XUV microscope was developed by Cerrina et al. [Ng 1990, Capasso 1991] for scanned imaging using photoelectrons as the imaging signal.

2.3

Cold War microscopes Following an earlier demonstration using a pinhole [Sievert 1936], the development of microfocus x-ray sources during the 1930s and 1940s opened the way to the development of the “projection microscope” or “shadow microscope” (we refer to the modern version as a point projection x-ray microscope, as will be described in Section 6.2). By placing the specimen close to an x-ray point source, and an x-ray film at some distance, one obtains geometrical magnification with a resolution as fine as the size of the source. The real successes with this approach began to come about in work at the University of Cambridge in the 1950s [Cosslett 1951, Cosslett 1952], where researchers used an electron beam focused onto an approximately 1 μm thick tungsten film to obtain images of silver grids with a resolution of about 500 nm, along with few micrometer resolution images of the head of the fruit fly Drosophila melanogaster, among numerous examples. Soon commercial instruments were produced and sold for a time by General Electric in the USA [Newberry 1956], Phillips in Holland, and Microray Laboratories in England (see [Newberry 1987] for this and other aspects of the early history of xray microscopy). In more modern point projection instruments [Mayo 2003] the size of the point source is reduced by using the focus of a scanning electron microscope on a roughly 100 nm thick metal target (thus reducing the source size increase that would otherwise be caused by electron scattering) to generate the x-ray illumination. The Cambridge group was led by Vernon Ellis Cosslett, FRS, a leading early electron microscopist, and William C. Nixon, who together wrote a book summarizing the field at the time [Cosslett 1960]. Later on, the group included Theodore Hall, who wrote a book on medical applications of x-ray microscopy [Hall 1972]. It became known much later that back in the 1930s Cosslett “joined the under-cover University Communist Group, which included the German refugee Klaus Fuchs who was later impris-

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

2.4 Zone plates

13

oned for passing sensitive nuclear physics from AERE Harwell to the Soviet Union” [Mulvey 1994]. Hall, as it turned out, was the second Soviet spy working on the Manhattan Project at Los Alamos [Albright 1997]. You should draw a lesson from this: be suspicious (very, very suspicious!) of x-ray microscopists.

2.4

Zone plates The development of the idea of holography by Gabor [Gabor 1948] inspired several people to think of approaches to x-ray microscopy that would be free of the resolution limits of Kirkpatrick–Baez mirrors available at the time. Inspired by Gordon Roger’s analogy between in-line holograms and Fresnel zone plates [Rogers 1950], Baez considered the recording medium resolution limits of in-line holography [Baez 1952a], and this led him to demonstrate the use of Fresnel zone plates as x-ray lenses [Baez 1960]. The basic properties of zone plates had been established more than 70 years earlier, and it was clear that they would work for any kind of wave. However, using them for X rays had particular challenges due to the short wavelength and high penetrating power. The basic zone plate is a circular diffraction grating, consisting of concentric rings with radially increasing ring density. Alternate rings should be opaque (or should reverse the phase) and transparent. The width of the finest (outermost) ring determines the numerical aperture, and the resolution (see Section 5.3.1). But how to make fine rings that are sufficiently thick to absorb, or reverse the phase, yet narrow enough to provide sub-micron resolution? Baez’s first free-standing metal zone plate [Baez 1960], made by optical lithography, had just 19 zones with a finest zone width of 20 μm: certainly not a high-resolution x-ray optic by today’s standards! It was not until 1969 that Gunter Schmahl (1936–2018) and Dietbert Rudolph of the University of G¨ottingen came up with holographic approaches to zone plate fabrication [Schmahl 1969]; their original motivation was for x-ray astronomy but they immediately realized the potential of the approach for x-ray microscopy. By 1974, the G¨ottingen group (including Bastian Niemann) had demonstrated 2 μm resolution zone plate imaging using a laboratory source [Niemann 1974], and in 1976 they produced the first zone plate images using synchrotron radiation [Niemann 1976] (Fig. 2.2). For some time their efforts faced an uphill battle; no less a figure than Ernst Ruska, who pioneered the development of the transmission electron microscope in the 1930s, wrote [Ruska 1979] in 1979 R¨ontgen had however already shown experimentally in his first communication that a large number of solid and liquid materials did not appreciably refract the new rays, so that lenses made from these materials would not appreciably affect the trajectories of these beams. In the meantime it had been recognised that no such materials are possible. Furthermore, one could hardly hope that the very weak interaction of x-rays with the atoms of the material irradiated would be adequate to render visible, with sufficient contrast, particles of sub light-microscope dimensions.

(The quote is from a translation by Mulvey [Ruska 1980].) Ruska’s towering stature for his pioneering developments would soon be recognized by his receiving the 1986 Nobel Prize in Physics (along with Binning and Rohrer, who developed the scanning Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

14

A bit of history

tunneling microscope in 1981), so his lack of enthusiasm for x-ray microscopy carried some influence. It is somewhat ironic that phase contrast is the dominant contrast mechanism in electron microscopy (Section 4.10.2), yet Ruska probably did not consider the possibilities of phase contrast in x-ray microscopy (nor did anyone else until Schmahl and Rudolph made its potential clear in 1987, as will be discussed in Section 4.7). An alternative approach for zone plate fabrication was suggested by David Sayre in 1972 [Sayre 1972]: to use the newly developed electron-beam fabrication technology, available at IBM where he was employed at the time, to produce the required fine linewidth for high resolution. It was his seminar on zone plates that got Janos Kirz interested in x-ray microscopy. The seminar took place in Oxford in 1972 when both were on sabbatical stays, and was the origin of a decades-long collaboration. In fact, Sayre lived in St. James on Long Island in New York, just a few miles from Stony Brook University. Stony Brook was the long-time academic home of Kirz (and, later on, the author) – but it was in Oxford where Sayre and Kirz met for the first time. (Sayre’s background included a foundation role in direct methods in x-ray crystallography, participation in the IBM team that developed the FORTRAN as the first high-level language for mathematical computer calculations, and leadership of the IBM team that developed the first virtual memory operating system [Glusker 2012, Kirz 2012].) It took nearly a decade for Sayre’s suggestion to be put into practice [Ceglio 1980, Shaver 1980, Ceglio 1983], but since that time it has been refined and implemented in several laboratories around the world. Today zone plates are the dominant focusing elements in x-ray microscopes, though reflective and refractive optics are playing increasingly important roles too (Chapter 5, with a historical trend of spatial resolution shown in Section 5.5).

2.5

Synchrotrons and lasers The first microscope to use synchrotron radiation (Section 7.1.4) as the source was built by Horowitz and Howell at the Cambridge Electron Accelerator (CEA) in 1971 [Horowitz 1972, Horowitz 1978, Horowitz 2015] (Fig. 2.3). It used a micron-size pinhole to define the probe size and hence the resolution. The sample was mechanically scanned in a raster fashion to acquire the image. Unfortunately, this light source closed down shortly thereafter as the particle physics program at the CEA came to a close. Janos Kirz’s group made a few early x-ray tests of a similar sort using soft X rays at the SPEAR ring at Stanford as the 1970s ended [Rarback 1980, Kirz 1980c]. By that time, an early dedicated synchrotron light source (the National Synchrotron Light Source or NSLS) was under construction at Brookhaven National Laboratory, which is close to Stony Brook. Soon Kirz, Sayre, and Malcolm Howells of Brookhaven were planning for x-ray microscopy experiments at NSLS. In the meantime, researchers at IBM began a short-lived adventure in “contact microscopy” (Section 6.1). By using an x-ray sensitive polymer (such as Poly(methyl methacrylate) or PMMA) rather than photographic film as the detector, the resolution could be improved significantly. The scheme involved placing the object to be imaged on the polymer, exposing to X rays, then “developing” the polymer and examining it

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

2.5 Synchrotrons and lasers

DESY, 1976

15

ACO, 1983

ACO, 1983 Figure 2.2 The first zone plate TXMs (transmission x-ray microscopes) developed by Schmahl,

Rudolph, and Niemann. At top left is shown an instrument operated at DESY in Hamburg in 1976 [Niemann 1976], while an instrument operated at ACO in Orsay is shown at top right. The bottom image is an x-ray micrograph of a diatom obtained at Orsay [Schmahl 1982]. Images courtesy of the late G. Schmahl; also shown in [Kirz 2009].

using a scanning electron microscope [Feder 1976, Feder 1977]. The fine detail was encoded as a surface-relief pattern on the polymer. The method attracted a number of practitioners, some of whom became a bit carried away with enthusiasm. The contact micrograph of a “live” blood platelet [Feder 1985] was featured on the front page of the Science section of the New York Times (“New tool captures cells alive,” January 15, 1985). Unfortunately it turned out that the platelet remained stuck on the polymer, so the image was not what it was advertised to be. It was also around this time that the first visible-laser-pumped x-ray lasers were demonstrated [Matthews 1985, Suckewer 1985]. The euphoria over this led to another cover article (April 2, 1985) in the Science section of the New York Times that gushed “But aside from its weapons applications, the X-ray laser has excited biologists, chemists and physicists because of its possible use in a super microscope, an instrument that will perhaps be capable of taking holographic three-dimensional movies of the genetic code of a living cell.” The excitement of the New York Times reporter was not based just on visible-laser-pumped x-ray lasers, but on x-ray lasers using the more intense pumping Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

16

A bit of history

Box 2.1 Grazing incidence mirrors and thermonuclear weapons Nuclear weapons (“atom bombs”) are based on achieving a rapid fission chain reaction in specific isotopes of uranium and plutonium. Thermonuclear weapons (“Hydrogen bombs” or “H bombs”) use the energy from a fission bomb to compress light materials to the temperature and pressure required to achieve nuclear fusion, boosting the energy released by a weapon considerably. As the fission bomb trigger is detonated, it becomes a very hot object with a blackbody radiation temperature (Eq. 7.5) sufficient to serve as an intense source of soft X rays. Being massless, the X rays are able to reach the fusion components (which often have another fission element at their core) more rapidly than neutrons can, so they can pre-heat and compress those components for efficient fusion once the neutrons arrive (the penetration power of X rays also helps to even out the heating and compression, a fact that was exploited in laser-driven inertial confinement fusion experiments dating back to the 1970s [Lindl 1995]). This x-ray approach was proposed in early 1951 by Edward Teller and Stanislaw Ulam (see [Ford 2015] for a discussion of the provenance of this idea), and it has led to H bombs with yields up to tens of megatons of dynamite equivalent from fission bomb triggers with yields of tens of kilotons. It may be advantageous to focus the X rays from the fission bomb trigger onto the fusion components; this can be accomplished using grazing incidence x-ray optics (Section 5.2), which could be in turn fabricated out of figuring and polishing the inside surfaces of an exterior casing. This casing can be made of materials such as the heavier isotope of uranium, 238 U, which is referred to as “depleted uranium” because it is what remains after separation of the slow-neutron-fissionable 235 U isotope from natural uranium. This 238 U “depleted uranium” can also undergo fission if flooded with the fast neutrons produced by the fission bomb trigger, though it cannot by itself sustain a chain reaction. (Depleted uranium is also very robust at high temperatures, making it well suited for use as a casing material in ballistic missile warheads, which suffer considerable heating upon atmospheric reentry). The important role that soft X rays play in thermonuclear weapons design and testing might help explain the high level of expertise in grazing incidence optics manufacture that was developed at Lawrence Livermore and Los Alamos National Laboratories in the USA, and equivalent laboratories in other countries that have developed high-yield H bombs.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

2.5 Synchrotrons and lasers

S

17

Si

Figure 2.3 The first synchrotron-based x-ray microprobe was developed by Horowitz and

Howell at the Cambridge Electron Accelerator in 1972 [Horowitz 1972, Horowitz 1978, Horowitz 2015]. The microprobe used a deep pinhole fabricated by plating around a silicon whisker, and was able to take images of sulfur dust (bottom left) and silicon whiskers (bottom right). Unfortunately this promising instrument was short-lived, as the accelerator was shut down in 1973 due to the high-energy physics community moving on to bigger machines. Images from [Horowitz 1978].

source that an exploding nuclear weapon might provide (see e.g. [Broad 1986]; a possible connection between x-ray optics expertise and thermonuclear weapons design is noted in Box 2.1). Now it must be pointed out that several members of the weapons labs had given quite serious thought to the issues of radiation and hydrodynamic damage [Solem 1982b, Solem 1986], as will be discussed in Section 11.1.2, but reporters are not always bound by subtleties. Around this time, Stony Brook University hosted a workshop which included researchers from Livermore and Los Alamos laboratories. One memorable talk began with a statement that went something like “We are planning on carrying out x-ray holographic microscopy experiments with an intense but lowshot-rate x-ray laser source that I cannot describe” while the speaker put a viewgraph of the nuclear test craters at the Nevada Test Site (now the Nevada National Security Site) on the overhead projector! During the 1980s several synchrotron light sources were built and commissioned. The brightness and tunability of these sources provided the opportunity to develop xDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

18

A bit of history

ray microscopes that went beyond the simple transmission image, based on differential absorption (Section 9.1.1) from different regions of the object [Engstr¨om 1946b, Engstr¨om 1946a, Engstr¨om 1947]. Modern practitioners of scanning x-ray microscopy may be amused to know that in 1983 it took nearly an hour to record a modest pixel count image (Fig. 2.4; [Rarback 1984]) at 300 nm resolution, and with an energy resolution no better than 1 eV! The G¨ottingen group did extensive work at the original BESSY synchrotron light source in Berlin on developing full-field imaging with zone plates within Wolter’s soft x-ray “water window” (Fig. 2.5). These developments included pioneering the x-ray version of Zernike phase contrast microscopy [Schmahl 1988, Schmahl 1994] and magnetic circular dichroism imaging [Fischer 1996] using the BESSY source in Berlin. The Stony Brook group used the NSLS at Brookhaven to develop spectromicroscopy [Ade 1990, Ade 1992, Zhang 1994] (see also Box 9.1, and Section 9.1). The G¨ottingen and Stony Brook groups and their collaborators both pioneered nanotomography [Haddad 1994, Lehr 1997] (Section 8.4) and cryo x-ray microscopy [Schneider 1998a, Schneider 1998b, Maser 1998, Maser 2000] (Section 11.3) as well as their combination [Weiß 2000, Wang 2000]. Other pioneering efforts were carried out by the King’s College group at Daresbury, by Tsukuba University at the Photon Factory in Japan, at the Center for X-ray Optics at Lawrence Berkeley National Laboratory, and elsewhere. In the mid-1990s, three large-scale synchrotron light sources began operation with higher electron beam energies: first 6 GeV at the European Synchrotron Radiation Facility (ESRF) in Grenoble, France; then 7 GeV at the Advanced Photon Source (APS) at Argonne Lab near Chicago, USA; and finally 8 GeV at SPring-8 near Himeji, Japan (there were technical reasons why each facility chose a particular electron beam energy, but it does look an awful lot like one-upmanship, doesn’t it?). These facilities have led the way in advancing x-ray microscopy at multi-keV energies, including with fluorescence (as will be described in Section 9.2), and Bragg coherent diffraction imaging (as will be discussed in Section 10.3.8). Experiments at higher energies have also stimulated the development of compound refractive lenses (as will be discussed in Section 5.1.1). This happened first in experiments at the ESRF, and was soon joined by a strong effort at PETRA III at DESY in Hamburg, which is a 6 GeV light source that began operation in 2009. Many of these light source facilities now host several x-ray microscopy beamlines: for example, the Advanced Light Source (ALS) in Berkeley and APS in Argonne host about half a dozen each. The numbers are similarly increasing at many other sources worldwide, yet demand for using the instruments has outstripped the availability of microscope time. While there were several earlier demonstrations of laboratory-source x-ray microscopes (Section 6.3), around the turn of the century Wenbing Yun optimized a conventional microfocus x-ray source along with improved optics and detectors to deliver commercially available table-top x-ray microscopes with tomographic capabilities [Wang 2002]. He founded the company Xradia (now Carl Zeiss X-ray Microscopy), which has sold a large number of laboratory x-ray microscopes and synchrotron microscopes on four continents. Bruker soon joined the list of vendors of commercial Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

2.6 Lensless microscopes

19

Figure 2.4 The first STXM (scanning transmission x-ray microscope) using zone plate optics

was constructed by Rarback, Kirz, and Kenney at bending magnet beamline U-15 at the NSLS at Brookhaven [Rarback 1984]. It used drN = 300 nm zone plates and operated with E/ΔE  300; the images shown took nearly an hour each to acquire. The scanning stage used leave spring flexures to guide linear orthogonal motions, with linear variable differential transformers (LVDT) for position readout. Before tests with Fresnel zone plates, Kirz’s group at Stony Brook undertook scanning microscope tests at the Stanford synchrotron using pinhole optics [Kirz 1980c, Rarback 1980].

synchrotron-based microscopes, with Axilon as a more recent spinoff (and Yun has now founded another company, Sigray).

2.6

Lensless microscopes In parallel with the evolution of microscopes based on zone plate lenses, alternative lensless schemes have also been developed. The first techniques to be demonstrated were Gabor or in-line holography [Aoki 1974, Howells 1987, Lindaas 1996], and Fourier transform holography [Kikuta 1972, Reuter 1976, McNulty 1992, Eisebitt 2004], as will be discussed in Sec 10.2. In-line holography depends on a plane reference wave to interfere with the wave diffracted by the object. The resolution in this scheme is limited by the detector. In Fourier transform holography a spherical wave diverging from a “point source” is employed. The resolution is limited by how small the source is (or alternatively the resolution at which the source properties are known). This involves a trade-off in that the intensity of the reference source tends to diminish with decreasing size unless it is produced by a lens, in which case one has to judge whether lens-based imaging is more appropriate. Normally the advantage of the holographic technique is

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

20

A bit of history

;UD\HQHUJ\ H9 0

3HQHWUDWLRQGLVWDQFH ѥP

10

500 Carbon edge

1000 Oxygen edge

1500

X rays

1/ƫ SURWHLQ 1/ƫ ZDWHU 1

Electrons

ZDWHU R elastic URWHLQ  S

DWHU

 Z

WHLQ R inelastic SUR

0.1 0

100

200

300

400

(OHFWURQHQHUJ\ NH9 Figure 2.5 The “water window,” which is an x-ray energy region between the K shell absorption

edge energies of carbon at 290 eV and oxygen in water at 540 eV, has played a huge role in x-ray microscopy [Wolter 1952, Sayre 1977a]. In that energy range, water is relatively transparent while organic materials are not; for example, biological cells show especially favorable absorption contrast. The only other region in the electromagnetic spectrum with such favorable intrinsic contrast for high-resolution imaging of hydrated biological specimens is the visible light region to which our eyes are adapted (see Fig. 9.1). Also shown here are the mean free paths Λ for elastic and inelastic scattering of electrons, showing that intrinsic contrast is lower and that electron microscopy is better suited for studying specimens with a thickness of less than about 1 μm, as will be discussed in Section 4.10.

that it does not require optics, and that the reconstruction of the image is rather straightforward, since the reference wave encodes the phase of the diffraction pattern. During the 1980s, David Sayre drew from his earlier thoughts on data sampling in crystallography [Sayre 1952a] to advocate a form of lensless imaging that relies on the detection of the intensity of the diffracted wave without any reference wave [Sayre 1980]. He pointed out that if the diffraction pattern is sampled finely enough, there should be enough information available to reconstruct the object. The algorithm to perform the reconstruction in the case of an isolated object was developed independently by Fienup [Fienup 1978], and the first experimental demonstration of what became known as coherent diffraction imaging (CDI; see Section 10.3) was performed [Miao 1999]. In the past decade a powerful new variant of the CDI idea inspired by the electron microscopist Walter Hoppe [Hoppe 1969a] has been developed theoretically by Rodenburg and Faulkner [Rodenburg 2004, Rodenburg 2008], and implemented with X rays first at the Swiss Light Source [Rodenburg 2007a, Thibault 2008] and subsequently at several other synchrotron light sources. (A similar experiment with a slightly different reconstruction algorithm – also put forward by Rodenburg [Rodenburg 1992] – was Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

2.7 The dustbin of history

21

carried out earlier by Chapman [Chapman 1996b]). The technique, referred to as ptychography (Section 10.4), does not require that the object be isolated. It depends on the recording of many diffraction patterns from overlapping areas of the object. In ptychography the resolution is limited only by the wavelength of the X rays and the angular range over which the diffraction patterns can be recorded. Resolution as fine as 3 nm has already been demonstrated, and further improvements are expected. We are entering an era where the resolution of x-ray focusing optics is no longer a limitation! Holography, CDI, and even scanning microscopes require coherent illumination, as will be discussed in Sections 4.4.6 and 10.3.2. Synchrotron radiation from real electron beam dimensions is generally not intrinsically fully coherent, and selecting out the coherent portion of the beam involves monochromators and spatial filters [Kondratenko 1977], resulting in severe loss in intensity. The development of X-ray free-electron lasers (FELs) (Section 7.1.8) introduces a new dimension to X-ray microscopy with inherently coherent or nearly coherent beams. FELs operate with highly intense ultrashort pulses (generally on the order of 50 fs long). It was pointed out by Neutze et al. [Neutze 2000] that although a single pulse will vaporize the object, it may be possible to record the atomic resolution diffraction pattern before the parts of the object have a chance to move far enough to blur or distort the pattern. This “diffract before destruction” scheme (Section 10.6) was beautifully demonstrated by Chapman et al. [Chapman 2006a], and has led to considerable excitement. Radiation damage to the object is always a concern in high-resolution microscopy with ionizing radiation (see Chapter 11). Morphology and elemental content can often be preserved by keeping the sample near liquid nitrogen temperature. It is ironic that “diffract before destruction” provides a different way out of the radiation damage problem, as long as a single exposure is sufficient to collect all necessary information from each of a large number of identical objects.

2.7

The dustbin of history Several other perspectives on the history of x-ray microscopy are available [Engstr¨om 1980, Baez 1989, Baez 1997, Kirz 2009]. A more complete view of the history of x-ray microscopy can be found by consulting the original literature. Conference proceedings provide snapshots of the state of the field at particular moments. In the Cold War era, there were three conferences: one hosted by Cosslett in Cambridge in 1956 [Cosslett 1957], one by Engstr¨om in Stockholm in 1959 [Engstr¨om 1960], and one by Kirkpatrick in Stanford in 1962 [Pattee 1963]. This third conference led into the series International Congress on X-ray Optics and Microanalysis (ICXOM), which for some years concentrated only on electron probe stimulated x-ray fluorescence. The topic of x-ray imaging systems emerged again in two New York Academy of Sciences conferences run by Donald Parsons in 1978 [Parsons 1978] and 1980 [Parsons 1980], and in a Rank Prize Funds meeting on scanning microscopy organized in 1980 by Eric Ash [Ash 1980].

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

22

A bit of history

Table 2.1 X-ray microscopy conferences in the synchrotron radiation phase of history. This

listing is only of the dedicated international x-ray microscopy conferences; there have been numerous x-ray microscopy sessions at other larger conferences, and smaller workshops. Year 1983 1986 1990 1993 1996 1999 2002 2005 2008 2010 2012 2014 2016 2018 2020 2022

Location G¨ottingen, Germany Brookhaven, New York, USA London, England Chernogolovka, Russia W¨urzburg, Germany Berkeley, California, USA Grenoble, France Himeji, Japan Z¨urich, Switzerland Chicago, Illinois, USA Shanghai, China Melbourne, Australia Oxford, England Saskatoon, Canada Hsinchu, Taiwan Lund, Sweden

Proceedings [Schmahl 1984] [Sayre 1988] [Michette 1992] [Aristov 1994] [Thieme 1998b] [Meyer-Ilse 2000] [Susini 2003] [Aoki 2006] [Pfeiffer 2009] [McNulty 2011] [Xu 2013] [de Jonge 2016] [Rau 2017] [Urquhart 2018]

It is also interesting to see how the spatial resolution of x-ray optics has improved over the years; this will be discussed in Section 5.5. With the rise of zone plate microscopy groups in G¨ottingen, Stony Brook, King’s College London, and elsewhere, a conference series was begun that continues through the present (Table 2.1). Most of these conference proceedings are relatively easy to come by. The proceedings of the September 20–24, 1993 conference held in Chernogolovka, Russia are a bit harder to find [Aristov 1994], but the conference was memorable: the Congress of People’s Deputies was dissolved by President Boris Yeltsin on September 21. Rumors were rampant, the ruble decreased in value by 30 percent during the conference week, and the authorities kept changing their minds on whether Lenin’s Tomb was open for viewing; the author was unable to examine the quality of the sample’s aldehyde fixation, but others at the conference were. Most foreign participants had returned home before street riots and battles took place over September 28 to October 5.

2.8

Concluding limerick Like any scientific specialty, x-ray microscopy has an interesting history which we summarize in a limerick: R¨ontgen discovered some rays Which left him somewhat in a daze It was not hocus-pocus When they came into focus Leading now to our microscope days

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003

3

X-ray physics

Janos Kirz and Malcolm Howells contributed to this chapter. X-ray microscopes rely on the characteristics of X rays and how they interact with materials. There are a number of books that go into much greater detail on certain aspects of x-ray physics [Compton 1935, Dyson 1973, James 1982, Michette 1993, AlsNielsen 2011, Attwood 2017]; we describe here what we consider to be particularly important for x-ray microscopy.

3.1

The Bohr model, energy levels, and x-ray shells The early years of the twentieth century saw a revolution in our understanding of the physics behind the atom. J. J. Thomson identified the electron as a lightweight particle with a specific charge-to-mass ratio, and Max Planck postulated quantized energies for electromagnetic radiation proportional to its frequency. In 1905, Albert Einstein had his anno mirabilis, which included his papers on Brownian motion and special relativity, and his paper on the photoelectric effect (for which he received his Nobel Prize in Physics in 1921), which put forward his description of light in terms of photons with specific energy and momentum. Soon after, Ernest Rutherford showed via alpha particle scattering that the atom has a dense, positively charged nucleus, leading to a solar system analogy with electrons being the planets that orbit a dense nucleus (the Sun). However, it was obvious that this model was incomplete, since classical electrodynamics would suggest that the electrons in orbit would radiate away energy at a rate such that they would crash into the nucleus in around a nanosecond – not a situation that favors a universe with stable atoms! It was Neils Bohr (a young Dane who had made an extended visit to Rutherford’s lab) who supplied in 1913 the first glimmers of an answer [Bohr 1913] by using a combination of scaling laws already noticed by others, Planck’s quantization revolution, and intuition. Bohr arrived at a model of the atom in which electrons have discrete values of angular momentum n where n = 1, 2, 3, . . . is an integer (the principal quantum number) and ≡

h 2π

(3.1)

is based on Planck’s constant of h = 6.626 × 10−34 joule · seconds = 4.136 × 10−15 eV · seconds.

(3.2)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

24

X-ray physics

(a)

(b) Photon in

-e

-e

Photoelectron out

n=3

n=3

n=2

n=2

n=1

n=1

+Ze

+Ze K

K L

L M

(c)

M

Fluorescent photon

-e

(d)

Electron

Auger electron

-e Electron

n=3

n=3

n=2

n=2

Vacancy n=1

Vacancy n=1

+Ze

+Ze K

K L

L M

M

Figure 3.1 Bohr model of electron states indexed by n, and processes of transitions between

them. For the very lightest atoms such as hydrogen, these processes take place at longer wavelengths; we show here the x-ray version appropriate for most atoms (a). Incident x-ray photons lead to photoelectric absorption (b; in this case, removing an n = 1 electron). Once an atom has a vacancy in an inner shell, an outer electron drops into that vacancy and the excess energy is emitted either via an x-ray fluorescence photon (c) or emission of an Auger (or Coster–Kronig) electron (d). The fraction of fluorescent photon events (as compared to electron emission events) is known as the fluorescence yield ω, such as ωK for the case shown here. The notation of x-ray shells K, L, and M is discussed in Section 3.1.2.

This immediately led Bohr to a calculation of discrete binding energies of these electrons of Z2 (3.3) En = −E0 2 n with E0 as the Bohr energy of E0 =

me e4 = 13.6 eV. 8h2 02

(3.4)

Here Z is the atomic number of the atom (given by the number of protons in the nucleus), me is the mass of an electron (Eq. 3.29) and e is its charge, and 0 is the electric permittivity of free space. The Bohr energies are negative energies relative to the energy of a zero velocity, unbound electron (i.e., the continuum state), so that electrons can be thought of as falling from a flat surface into an energy well when they become bound to an atom. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.1 The Bohr model, energy levels, and x-ray shells

25

Box 3.1 The Bohr model and de Broglie waves Albert Einstein’s special theory of relativity and explanation of the photoelectric effect led to a picture in which photons have a momentum p = E/c based on their energy E and the speed of light c (Eq. 3.55). This leads to λ = hc/E = h/p. In his 1924 PhD dissertation, Louis de Broglie proposed that this expression of λ=

h p

(3.5)

might also apply to objects with a mass, so that matter might exhibit wave-like properties. This created a stir, for he showed that if you set the circumference of an electron’s orbit in the Bohr model equal to an integer number of his postulated waves or 2πr = nλ = nh/p, in the non-relativistic limit of p = mv you arrive at mvr = n. This is simply Bohr’s postulate of quantization of the non-relativistic angular momentum mvr for a mass m traveling at a velocity v in a circular orbit of radius r. Is it more surprising that it took more than a decade to arrive at such a simple physical picture for Bohr’s model, or that Bohr was courageous enough to put forward his radical notion of angular momentum quantization without such a clear physical model? In any case it must be noted that de Broglie’s proposal of wave-like properties for matter demanded that a wave equation be found, which Erwin Schr¨odinger delivered in 1926 [Schr¨odinger 1926a], where it serves as a cornerstone of quantum mechanics.

In an era where Bohr’s model of discrete electron energy states in atoms is presented as standard information to secondary school students, it is perhaps hard to appreciate how revolutionary it was. The simple physical model based on electron waves arrived more than a decade later (see Box 3.1), so Bohr’s postulate originally had the flavor of an improvisation. But it was a successful improvisation! From the Bohr model, we find that it takes a discrete amount of energy ⎛ ⎞ ⎛ ⎞ ⎜⎜ 1 ⎜⎜ 1 1 ⎟⎟ 1 ⎟⎟ (3.6) Ei→ f = E f − Ei = −E0 Z 2 ⎜⎜⎜⎝ 2 − 2 ⎟⎟⎟⎠ = E0 Z 2 ⎜⎜⎜⎝ 2 − 2 ⎟⎟⎟⎠ n f ni ni nf to raise an electron from a more tightly bound initial state ni to a less tightly bound final state n f ; it follows that the same amount of energy is released when an electron drops from the upper state to the lower one. Using the relationship between the energy E and wavelength λ of a photon of hc 1239.84 eV · nm = , (3.7) E E it turns out that Eq. 3.6 perfectly explained the spectral lines that had already been observed in visible and ultraviolet spectroscopy experiments on hydrogen gas discharge tubes, and it helped explain how atoms could be excited by, and emit, electrons at specific energies (see Fig. 3.1). The Bohr energy of Eq. 3.3 provides a good guide to estimating the binding energies of single-electron atoms, though charge screening effects modify it, as discussed in Section 3.1.2 and Eq. 3.12; the best way to find out actual tranλ=

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

26

X-ray physics

Electron binding energy (keV)

100

L1

L2 L3

K

10

M1 M M2 M M3 M 5 M4,

L

N1 N N2 N N3 N N4,5

M 1 N

N6,7 N O1 O2 O3

O

0.1 0

20

40

60

80

100

Atomic number Z Figure 3.2 Electron binding energies for elements Z = 1–92, labeled by atomic shell using the notation of Table 3.1. The actual values for multi-electron atoms differ significantly from the Bohr energies of Eq. 3.3 due to screening (Eq. 3.12). The data shown here are from a recent tabulation [Elam 2002] that draws in part upon an older compilation [Bearden 1967]; however, more accurate tabulations now exist [Deslattes 2003], as discussed in Appendix A.

sition energies is to consult tabulations of electron binding energies (see Appendix A and Fig. 3.2).

3.1.1

X-ray fluorescence and Auger emission Let’s begin with an atom in its ground state, or lowest energy configuration, and consider what can happen when an x-ray photon is incident upon it. In the Bohr model, the atom starts out in a state with electrons filling its orbitals, as shown in Fig. 3.1A. Consider the case in which the incoming photon has sufficient energy so that it can remove an n = 1 electron from the atom to the continuum, as shown in Fig. 3.1B; the threshold energy for this is referred to as the K shell ionization potential E K , which is the binding energy of the electron (Eq. 3.3, Fig. 3.2), and the ejected electron is referred to as a photoelectron. The absorption coefficient for the atom increases dramatically at this energy, leading to an x-ray absorption edge as shown schematically in Fig. 3.3. Because atoms like to have all of their orbitals filled up in energy order (see Section 3.1.3), an atom with a core electron removed is thermodynamically unstable and an electron from a less strongly bound orbital will want to drop down and fill the hole. However, the atom then has an excess of energy that it will want to release. There are three competing processes for releasing this excess energy:

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.1 The Bohr model, energy levels, and x-ray shells

n=1 n=2 (core level)

27

n=3 Continuum (fully ionized)

K edge

Absorption cross section Ʊ EK

(Ionization potential)

Photon energy

Figure 3.3 Schematic of a K shell x-ray absorption edge. When the incident photon energy

reaches the ionization potential for a particular atom (equal to the binding energy of a particular electron orbital; labeled here as E K ), absorption increases dramatically. The ratio of absorption σK at the energy just above the ionization potential relative to σK just below is known as the jump ratio rK = σK /σK (Eq. 3.8), with values as shown in Fig. 3.6.

1. The atom can emit a photon with an energy equal to the difference between the “filling” electron’s initial state and the energy of the state from which the photoelectron was ejected (approximated by Eq. 3.6). This is the process of fluorescence, and if the photon’s energy is above about 100 eV (as it is when core-level electron photoabsorption takes place with all but the few lightest elements) then we refer to the process as x-ray fluorescence. This process is illustrated in Fig. 3.1C, and its fractional probability or yield is designated with ωK in the case of a vacancy in the K shell. 2. The excess energy between the “filling” electron’s initial state and the energy of the state from which the photoelectron was ejected can alternatively be released through the ejection of another electron, which is called an Auger1 electron [Auger 1925] (see Fig. 3.1D). Its fractional probability or yield [Zschornack 2007, Eq. 2.78] is designated by a. 3. There can also be transitions between states with the same value of n, leading to transitions to higher subshells within n before the vacancy is filled by another transition and an electron is ejected (see [Zschornack 2007, Fig. 2.47]). These are known as Coster–Kronig transitions [Coster 1935, Bambynek 1972] with yield [Zschornack 2007, Eq. 2.79] f , and they become significant for transitions from L1 , L2 , and L3 shells for elements with Z  70 [Zschornack 2007]. Because these heavier elements are less frequently studied using x-ray microscopy, and because many tabulations take into account their subsequent effect on fluorescence yields [Bambynek 1972, Hubbell 1994], we will not consider Coster–Kronig transitions further. Which process is more likely to occur? Lumping Coster–Kronig transitions together 1

Note that Auger is pronounced oh-Zhay, though someone familiar with the hole-boring tools required for the fine Minnesota activity of ice fishing (CJ) might be tempted to say Aw-gerr.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

28

X-ray physics

100

K-L3 K L2 K-

Fluorescence energy (keV)

K-M3 M

L1-M3

L1-M2 M

K

10

L3-M5 M

L M4-N6

M 1

0.1 0

20

40

60

80

100

Atomic number Z Figure 3.4 Fluorescence emission energies as a function of the atomic number Z. We show here values from the stronger emission lines from one tabulation [Elam 2002]. The notation for the total fluorescence at each final atomic state is the IUPAC notation shown in Table 3.2.

with Auger electrons, this is characterized by the fluorescence yield ω, which is the fraction of time that fluorescence occurs. As Figs. 3.5 and 3.7 show, the fluorescence yield ω is quite low for lighter atoms and low x-ray fluorescence energies. When calculating expected fluorescence signals following x-ray absorption, one must account for not only the fluorescence yield ω but also the absorption jump ratio [Martin 1927, Compton 1935] r of [Zschornack 2007, Eq. 2.146] σ (3.8) r = , σ as shown in Fig. 3.3. Consider the example of silicon at the K edge, where the ionization potential for the K edge is E K = 1839 eV. As one crosses this absorption edge, the xray absorption cross section σabs increases and this is in turn proportional to the linear absorption coefficient (LAC) of μ, and by extension the imaginary part of the complex number of oscillation modes per atom f2 , as will be shown in Eq. 3.45. For silicon, f2 jumps from 0.367 to 4.16 as one crosses the K edge, giving a jump ratio of 4.16 = 11.3 (3.9) 0.367 according to the tabulation of Henke et al. [Henke 1993] (other tabulations [Elam 2002] give a jump ratio of rK,Si = 10.37). That is, the silicon atom is still absorptive at energies below E K as one excites electrons in the L and M shells, but as one crosses E K the fractional of absorption events that go towards creating K shell vacancies is given by (rK − 1)/rK or (11.3 − 1)/11.3 = 91.2 percent. As a result, the net intensity of K line rK,Si =

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.1 The Bohr model, energy levels, and x-ray shells

29

Fluorescence yield Ʒ

1

K 0.1 M4 M M5

L L1 L2

M

0.01 L3

0.001 0

20

40

60

80

100

Atomic number Z Figure 3.5 X-ray fluorescence yields ω as a function of the atomic number Z. The fluorescence

yield measures the fraction of the time a vacancy in the specified shell leads to x-ray fluorescence emission, rather than to the emission of Auger or Coster–Kronig electrons. We show here values for the stronger emission lines from one tabulation [Elam 2002], though other tabulations are available [Bambynek 1972, Krause 1979a, Hubbell 1994, Zschornack 2007]. The notation for the total fluorescence at each final atomic state is as shown in Table 3.2.

x-ray fluorescence emission IK,F will be given from the intensity of x-ray absorption Iabs by both the fluorescence yield ωK and the edge jump ratio rK [Martin 1927, Compton 1935, Sherman 1955]. One can also distinguish emission into specific fluorescence lines within a given shell (the K shell in this case) with a factor F, and account for the creation of orbital vacancies or holes by the additional processes of Auger, Coster–Kronig, and radiative transitions with an electron–hole transfer factor T which is unity for the case of K shell fluorescence [Sparks 1980, Bambynek 1972]. With all of these factors, the net fluorescence rate into the Kα1 line (as one example) is given by  1

1 − e−μZ (E) tZ I0 (E) IKα1 (E) = ωK F Kα1 T Kα1 1 − rK

(3.10)

where T Kα1 = 1 in this case, and μZ (E) represents the linear absorption coefficient and tZ the thickness of the fluorescing element Z. In order to use the jump ratio r of Eq. 3.8 in this way, one should ignore absorption cross section changes due to nearedge effects (such as XANES peaks and EXAFS wiggles) since they involve transitions to lower-energy electronic states (XANES; Section 9.1.2) or ejected photoelectron self-interference (EXAFS; Section 9.1.7). Jump ratios r calculated for various absorption edges are included in some tabulations of x-ray optical constants [Elam 2002, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

30

X-ray physics

25 20 15

Jump ratio r

10 8 6 K

4

L3 2 L2 L1

1 0

20

40

60

80

100

Atomic number Z Figure 3.6 X-ray absorption edge jump ratio r for various x-ray shells. The jump ratio r (Eq. 3.8)

is the ratio of absorption above and below an x-ray absorption edge, as illustrated in Fig. 3.3. The data shown here come from one recent tabulation [Elam 2002]. Note the very high value of rL3,K .

Schoonjans 2011a], and are shown in Fig. 3.6. An example of calculating an x-ray fluorescence rate is given in Section 9.2.3. Inner-shell electrons can be removed by X rays with sufficient photon energy as described above, or by electrons, protons, or other energetic particles. When charged particles are used, one must worry about continuum radiation or Bremsstrahlung backgrounds, as will be discussed in Section 4.10.1. Further details on electron-impact x-ray sources are provided in Section 7.1.2.

3.1.2

X-ray transitions: fluorescence nomenclature The fact that different chemical elements produce x-rays with different absorption properties was first characterized by Charles Barkla, as noted in Section 2.1; in the absence of a good physical model, he labeled the two series first A and B [Barkla 1909] and later K and L [Barkla 1911]. Within a few months of the publication of Bohr’s model, Henry Moseley realized that it provided the way to explain Barkla’s results, so he undertook a study to measure the x-ray fluorescence energies from a wide range of elements [Moseley 1913]. If electrons accelerated through a voltage have bombarded atoms in a target and removed some n = 1 electrons through inelastic collisions, then Eq. 3.6 suggests that some n = 2 electrons can subsequently drop down and release an energy

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.1 The Bohr model, energy levels, and x-ray shells

31

Fluorescence yield Ʒ

1.000

0.100

K shell Al Ne O

Ni

S

In Pd Tc

Zr Sr Kr Se Ge Zn

Ta Yb Ho Gd Pm Ce Cs Te

Ti Ca

L shell

Fe

N Cr

C h=10 B nm

P

Mg Na

F 0.010

Si

K Ar Cl

Cr

Zn Ni Fe

h=3 nm

h=1 nm

Ti

0.001 100

1,000

h=0.3 nm 10,000

Photon energy (eV) Figure 3.7 X-ray fluorescence yield ω as a function of x-ray emission energy, labeled with the

names of some of the elements. This is an alternative way of viewing the same data as is shown in Fig. 3.5.

of

 E K−L  E0 Z

2

1 1 3 − = E0 Z 2 , 4 12 22

(3.11)

while electrons dropping down from n = 3 to n = 2 can release (5/36)E0 Z 2 and so on. 2 If one writes Eq. 3.11 in terms of a photon’s frequency f , one has h f = (3/4)E 0 Z or √ Z = 2 h/(3E0 ) f , which leads to a linear trend on a plot of Z versus 1/ f . What Moseley found is that he had to use Z → (Z − Zscreen )) with Zscreen  1 to get a good approximate fit for Barkla’s first series of x-ray fluorescence lines, which he interpreted as a reflection of partial screening of the nuclear charge by another electron in the n = 1 shell; Moseley suggested an even larger (and, in hindsight, overly large) screening value of Zscreen  7.4 for Lα or L − M transitions. This would lead to a modification of the Bohr energy of Eq. 3.3 to E = E0

(Z − Zscreen )2 . n2

(3.12)

In fact, in Rydberg’s paper [Rydberg 1890] on the pattern of spectral lines that was written more than 20 years before Bohr’s theory was developed, another empirical correction term to Eq. 3.3 was introduced which is now called a quantum defect term δ , yielding E = E0

(Z − Zscreen )2 (n − δ )2

(3.13)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

32

X-ray physics

where in most cases the quantum defect δ is quite small, but it can reach values of 0.40 (Li) to 4.06 (Cs) for the s orbitals of alkali atoms [Rau 1997]. For atoms other than the alkalis the quantum defect is small, so we can remain with Eq. 3.12 and arrive at a modified K − L transition energy using Eq. 3.11, which gives  1 1 3 (3.14) E K−L  E0 (Z − Zscreen )2 2 − 2 = E0 (Z − Zscreen )2 . 4 1 2 which is sufficient for most estimates. While Moseley’s approximate treatment of nuclear charge screening is useful as a rough explanation and has therefore been reproduced in generations of physics textbooks, it is not quite proper [Whitaker 1999], so don’t expect it to provide fully accurate results! As noted in Box 3.1, in early 1926 Erwin Schr¨odinger was inspired by de Broglie’s electron wave hypothesis to come up with a corresponding wave equation [Schr¨odinger 1926a] which he immediately used to find a full quantum mechanical solution for electronic states in the hydrogen atom [Schr¨odinger 1926b]. Schr¨odinger’s eigenstate solution is nicely described in a myriad of textbooks on quantum mechanics; see for example [Griffiths 2004]. It describes electron states in a single electron atom in terms of wavefunctions ψ(r, θ, ϕ) = Rn, (r) Y m (θ, ϕ),

(3.15)

where the radial part of the solution Rn, (r) depends on Bohr’s principal quantum number n and a total angular momentum quantum number , while the angular part of the solution is described in terms of spherical harmonics Y m (θ, ϕ) that depend on the total angular momentum indexed by and also on its zˆ-axis projection in the presence of an external field, indexed by m . Wolfgang Pauli soon incorporated electron spin (postulated by Goudsmit and Uhlenbeck in 1925, and shown by Dirac to be required in a relativistic version of quantum mechanics) into the solution as an additional quantum number s [Pauli 1927], and this finally gave a solid theoretical footing for the notion of stable “closed shells” that Bohr had proposed in 1922 and for which Hund had articulated a pattern of orbital filling soon after [Hund 1925b, Hund 1925a, Kutzelnigg 1996]. When combined with the angular momentum quantum number , electron spin allows one to characterize the total spin of an atom with a quantum index j = ± s.

(3.16)

We shall not re-create the full explanation here since it is described in quantum mechanics textbooks, but with these quantum numbers the set of allowed states for atoms can be found to begin with those shown in Table 3.1. In Tables 3.1 and 3.2, we confront the outcome of a series of unfortunate events, which is that the phenomenology of x-ray fluorescence lines and their shells was described [Barkla 1911, Siegbahn 1925] before a complete theory of quantum mechanics of the atom emerged. As a result, we are left with multiple conflicting notations, including another variant for x-ray fluorescence lines proposed by the International Union of Pure and Applied Chemistry (IUPAC) [Jenkins 1991]. We can wish it otherwise, but as Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.1 The Bohr model, energy levels, and x-ray shells

33

Table 3.1 Quantum states for the atom, corresponding to Schr¨odinger’s solution for the

hydrogen-like atom (Eq. 3.15), with n as the principal quantum number, as the orbital angular momentum quantum number and m as its zˆ-axis projection, s as electron spin, and j as the total angular momentum. The occupancy of each state is indicated, along with its modern spectroscopic state name and the x-ray shell notation due to Siegbahn [Siegbahn 1925], since it predates the work of Schr¨odinger and Pauli. As one gets to higher quantum indices, the energy ordering is not as cleanly separated; for Z = 21 (Sc) through Z = 30 (Zn), there is interplay between the energies of the 3d and 4s states, or the M4 , M5 , and N1 shells. In this case, the Siegbahn notation for states coincides with the IUPAC convention [Jenkins 1991]. n 1

0

m 0

s ± 12

2

0

0

± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12

2

1

0

2

1

−1, +1

3

0

0

3

1

−1, +1

3

2

−1, 0, +1

3

2

−2, +2

4

0

0

4

1

0

4

1

−1, +1

4

2

−1, 0, +1

4

2

−2, +2

4

3

−2, 0, +2

4

3

−3, +3

5

0

0

5

1

0

5

1

−1, +1

j 1 2 1 2 1 2 3 2 1 2 3 2 3 2 5 2 1 2 1 2 3 2 3 2 5 2 5 2 7 2 7 2 7 2 7 2

Occupancy 2

State 1s

Siegbahn K

2

2s

L1

2

2p1/2

L2

4

2p3/2

L3

2

3s

M1 M2

4

3p3/2

M3

6

3d3/2

M4

4

3d5/2

M5

2

4s

N1

2

4p1/2

N2

2

4p3/2

N3

2

4d3/2

N4

2

4d5/2

N5

2

4 f5/2

N6

2

4 f7/2

N7

2

5s

O1

2

5p1/2

O2

2

5p3/2

O3

the Danish philosopher Søren Kierkegaard famously wrote, “Life can only be understood backwards, but it must be lived forwards.” We therefore show in Fig. 3.8 the x-ray transitions that can be expected from a zinc atom (Z = 30) in the Siegbahn notation. These transitions allow for the selection rules [Agarwal 1991, Eq. 2.83] of Δn  0 Δl = ±1 Δ j = 0, ±1

(3.17)

in quantum mechanics. These selection rules are based on the orthogonality of the spherical harmonics Y m (θ, ϕ) of Eq. 3.15 when calculating transition rates using Fermi’s Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

34

X-ray physics

Table 3.2 X-ray fluorescence transitions written in several notations, including using initial and

final quantum states, the x-ray fluorescence line identification of Siegbahn [Siegbahn 1925], and the IUPAC recommended notation [Jenkins 1991]. Transitions such as Kα3 and Lβ2 are forbidden by selection rules for single-electron wavefunctions, but it can be weakly present in multi-electron atoms (see for example [Agarwal 1991, Table 2.5]). Some x-ray databases [Elam 2002] provide x-ray fluorescence yields ω corresponding to specific final states such as ωK , and within those final states they further indicate relative fluorescence intensity from various initial states. Initial state 2p3/2 2p1/2 2s 3p3/2 4p3/2 4p1/2 3p1/2 3d5/2 3d3/2 3d3/2 4p5/2 4d3/2 4p1/2 4 f7/2 4 f5/2

Final state 1s 1s 1s 1s 1s 1s 1s 2p3/2 2p3/2 2p1/2 2p3/2 2p1/2 2s 3d5/2 3d5/2

IUPAC K-L3 K-L2 K-L1 K-M3 K-N3 K-N2 K-M2 L3 -M5 L3 -M4 L2 -M4 L3 -N5 L2 -N4 L1 -N2 M5 -N7 M5 -N6

Siegbahn Kα1 Kα2 Kα3 (forbidden) Kβ1 Kβ2I Kβ2II Kβ3 Lα1 Lα2 Lβ1 Lβ2 (forbidden) Lγ1 Lγ2 Mα1 Mα2

Golden Rule of Γ=

2π | ψ f | H |ψi |2 ρ f 

(3.18)

for the transition rate Γ between initial ψi and final ψ f quantum states when coupled by a Hamiltonian H, where ρ f gives the density of final states. Because multi-electron quantum states differ slightly from those given by Eq. 3.15, “disallowed” transitions can be weakly present (see for example [Agarwal 1991, Table 2.5]). In Table 3.2, we list several transitions in the various notations. While a full and accurate calculation of x-ray fluorescence energies and yields corresponding to various x-ray transitions requires a quantum mechanical solution for multielectron atoms, the experimental results are tabulated [Bambynek 1972, Krause 1979a, Elam 2002], including in computer-readable formats as described in Appendix A. Some of these tabulations include not only the overall fluorescence yield by final atom state (such as ωK , ωL2 , ωL3 , and so on), but also the relative intensities from various initial states [Salem 1974, Elam 2002] as expressed by ratios like Kα2 /Kα1 in Siegbahn notation (Table 3.2). Using the strongest of these transitions, we show fluorescence yields in Figs. 3.5 and 3.7, and fluorescence energies for the stronger emission lines in Fig. 3.4. The fluorescence energies show the general Z 2 trend as expected from Moseley’s law, and it is also clear that fluorescence is small compared to Auger emission in lighter atoms. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.1 The Bohr model, energy levels, and x-ray shells

’

-2 -5

Binding energy (eV)

-10

Fermi level

-9

-20

j

X-ray shell

0

Ɛ

Quantum state

n

35

4

1 1/2 4p1/2 N2

4

0 1/2 4s

3 3

2 5/2 3d5/2 M5 2 3/2 3d3/2 M4

3 3 3

1 3/2 3p3/2 M3 1 1/2 3p1/2 M2 0 1/2 3s M1

2 2 2

1 3/2 2p3/2 L3 1 1/2 2p1/2 L2 L1 0 1/2 2s

1

0 1/2 1s

N1

-50 -87 -100 -200 -500 -1000

-137

LƠ1 LƠ2 Lơ1

-1021 -1044 -1194

-2000

KƠ1 KƠ2 Kơ1 Kơ3

-5000 -10,000

-9659

K

Figure 3.8 Electron energy levels and several x-ray fluorescence transitions for Zn, showing the

correspondence of Barkla’s notation for x-ray shells to electron state notation in quantum mechanics. Additional states are listed in Table 3.1, and additional transitions are listed in Table 3.2. In the transition metals from Z = 21 (Sc) to Z = 30 (Zn), there is interplay between the energies of the 3d and 4s states. We show here the energies of the various states as tabulated by Zschornack [Zschornack 2007], though they are similar to those of Bearden and Burr [Bearden 1967] and Deslattes et al. [Deslattes 2003].

3.1.3

Beyond the core: the Fermi energy, valence electrons, and plasmon modes As much as an x-ray physicist might want to think otherwise, most of what happens in the world is driven not by the properties of core-level electrons in atoms and x-ray transitions with photon energies of 102 –105 eV, but instead by the outermost electrons which have binding and transition energies of a few eV. This is the realm of chemical bonds and visible light interactions (see Box 3.2), and these electronic states can be accessed via near-absorption edge resonances in x-ray spectromicroscopy (as will be discussed in Section 9.1.2). As a result, it is important for x-ray microscopists to step back from >100 eV photon chauvinism and begrudgingly acknowledge the importance of electrons in the outer, weakly bound states of atoms. When considering an ensemble of atoms, the Fermi–Dirac distribution function fFD (T )

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

36

X-ray physics

of fFD (T ) =

1 exp[(E − E F )/(kB T )] − 1

(3.19)

describes the probability that a quantum state of energy E will be occupied by an electron; it involves Boltzmann’s constant of kB = 8.62 × 10−5 eV/kelvin = 1.38 × 10−23 J/K,

(3.20)

the absolute temperature T in kelvin, and the Fermi energy E F of the atom. At zero temperature, electrons fill up all available quantum states until the Fermi energy −E F is reached, after which no higher-energy states are occupied; therefore, there is an energy gap between the last occupied state and the vacuum (the state of unbound electrons traveling with zero velocity). Electronic state occupancy distributions cease this all-ornothing behavior at finite temperatures, but since Fermi energies are typically a few eV and room temperature corresponds to an energy kB T  1/40 eV, the room-temperature Fermi–Dirac distribution function of Eq. 3.19 still has a fairly abrupt transition of occupancy going from 1 to 0. What lies beyond the Fermi energy? In isolated atoms such as in a gas, the Bohr model tells us that there are available-but-unoccupied quantum states with higher values of the principal quantum number n, so there exist few-eV transitions that outer-shell electrons can make to these available states (energies that can be supplied or released via visible-light photons). In solids, symmetric and anti-symmetric interactions between electron states in neighboring atoms lead to bands of allowed energy states rather than the discrete levels shown in Fig. 3.8, and the relationship between these bands and the Fermi level2 determines the electron transport characteristics of the material: • In conductors, the Fermi level lies within the conduction band so that there is essentially no energy cost (aside from losses associated with occasional electron inelastic scattering) to electron transport. • In semiconductors, there are allowed electron states (bands) at energies only a few multiples of the thermal energy kB T away, so that according to the Fermi–Dirac distribution of Eq. 3.19 there can be some population in these states and thus weak electrical conductivity. Dopants can shift the Fermi level relative to the band structure, thereby leading to large changes in conductivity. • In insulators, the next available electron states (bands) might lie many multiples of kB T away, so that unless the material is subjected to a very high electric field there is no opportunity for electrons to “jump” far enough above the Fermi level and become transported. Fermi levels, band structure, and resulting material electronic properties are discussed further in introductory texts on solid state physics (see for example [Harrison 2011, Ashcroft 1976]). Chemical bonds between atoms involve electrons in the atom’s last, most weakly 2

The Fermi energy is properly defined only at zero temperature, while the Fermi level is the finite temperature quantity that affects the distribution of Eq. 3.19.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.1 The Bohr model, energy levels, and x-ray shells

37

Box 3.2 Chemical bond dissociation energies Consider the C–H bond involved in removing the first hydrogen atom when burning methane. The bond dissociation energy is 104 kilocalories/mole or 435 kilojoules/mole, so that the energy per molecule in terms of electron volts or eV is (435×103

1 eV J 1 mole eV )·( . )·( ) = 4.5 mole 1.602 × 10−19 J NA = 6.02 × 1023 molecules molecule

This is the energy of an ultraviolet photon with a wavelength (Eq. 3.7) of λ = (1240 eV · nm)/(4.5 eV) = 275 nm. Chemical bond dissociation energies span a range that includes 142 kJ/mol for the O–O peroxide bond and 1072 kJ/mol for dissociation of carbon monoxidea (CO), corresponding to 1.5–11 eV or 110–840 nm. a

One can only hope that the state of Colorado in the USA won’t dissociate, in spite of its two-letter abbreviation!

bound occupied shell (the states just below, or at, the Fermi level). These are referred to as the atom’s valence electrons. Chemical bonds tend to involve energies in the range 1.5–11 eV (see Box 3.2). The effect of chemical bonding on near-edge x-ray absorption spectra is discussed further in Section 9.1.2. When atoms are in the close proximity that a solid provides, their valence electron states are altered by coupling with their neighbors. This can give rise to a collective oscillation mode known as the plasmon resonance. In order to calculate this, we first need to consider the number density of atoms na of na =

ρNA , A

(3.21)

where ρ is the material’s density (typically g/cm3 ), NA = 6.02 × 1023 is Avogadro’s number, and A is the atomic weight (typically g/mole) of the atom type (mixtures of atom types are considered in Section 3.3.5). The electron density ne in the material is then given by ne = Zna ,

(3.22)

where Z is the element number, and thus the number of electrons per neutral atom. For non-delocalized electrons such as those in insulating solids (as opposed to the case of semiconductors and conductors, where some electrons move as if they had a reduced mass m∗ ), the plasmon frequency ω p is given by

ne e2 ωp = , (3.23) me 0 which leads to a plasmon excitation energy of E p = ω p . In Box 3.3, we estimate the energy of the plasmon resonance in glass to be E p = 30 eV, and most solids have strong collective excitation modes in the 8–50 eV energy range (see Fig. 3.15 for an example of the plasmon resonance in amorphous ice). This becomes important when we consider Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

38

X-ray physics

Box 3.3 Plasmon mode energy in fused silica In Section 3.3.5, we show that the mean electron density n¯ e for fused silica (amorphous SiO2 with a density of 2.20 g/cm3 ) is n¯ e = 6.61×1029 electrons/m3 . This leads to a plasmon mode energy E p using Eq. 3.23 of

ne e2 h E p = ω p = 2π m 0

(6.61 × 1029 m−3 ) · (1.60 × 10−19 C)2 6.63 × 10−34 J · s = 2π (9.11 × 10−31 kg) · (8.85 × 10−12 C2 /N · m2 ) 1 eV ) = 30.2 eV. = (4.83 × 10−18 joules) · ( 1.60 × 10−19 joules This photon energy corresponds to a wavelength of E = hc/λ = 41 nm. Since the plasmon oscillations are quickly damped, their lifetime is short (with a standard deviation σt ), so their standard deviation in energy σE is broad according to the Heisenberg uncertainty principle [Heisenberg 1927] which can be written as [Griffiths 2004, Eq. 3.63 and 3.70]  (3.24) σE σt ≥ . 2 Because of this broad energy distribution centered on something like 30 eV, photon absorption begins to increase significantly at photon energies above about 8 eV, or wavelengths below about λ = 160 nm (see Fig. 3.9).

the refractive index n of materials in Section 3.3; in particular, in Section 3.3.2 we will see that plasmon resonances set the great (strongly absorbing) divide between the lowfrequency, visible-light form of the refractive index, and the high-frequency, x-ray form.

3.2

Atomic interactions, scattering, and absorption Having discussed energy levels and transitions in atoms, we now wish to consider their more general interactions with photons. This will be done in terms of atomic cross sections σ, which are often written in units of barns, where 1 barn = 10−24 cm2 (legend has it that the young American physicists from farm country who worked on the Manhattan Project to develop nuclear weapons came to regard a cross section of this size as “big as a barn door”). The cross section σ is related to the mean free path Λ by3 Λ= 3

1 , σna

(3.25)

We have chosen to use Λ for mean free paths so as to reserve λ for describing the wavelength of X rays and other electromagnetic waves.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.2 Atomic interactions, scattering, and absorption

200 130 100

80 70

/LQHDU$EVRUSWLRQ&RHIILFLHQW ѥP

1

60

50

40

39

30

ѤLQQP

Absorption of fused silica 0.1

0.01 5

10

15

20

25

30

35

40

Photon energy (eV) Figure 3.9 Linear absorption length μ−1 for fused silica (a common optical glass) in the ultraviolet (UV) and extended ultraviolet (XUV) wavelength range. The linear absorption coefficient (LAC) μ leads to internal absorption of a beam within a medium of thickness t according to exp[−μt] (Eq. 3.76). This shows how visible light optics become strongly absorptive at wavelengths shorter than about 160 nm, with the large absorption resonance arising due to plasmon modes (collective oscillations of the electrons in the glass; see Section 3.1.3 and Box 3.3). The plasmon modes set the great dividing line between low- and high-frequency forms of the refractive index n. The data shown here were compiled from several sources [Kitamura 2007], and are available via an internet search for pilon silica optical properties xls.

where na is found from Eq. 3.21. The fraction of particles that are removed from the “unaffected” category for a beam of intensity I over a thickness of material x is given by I dI = −I na σ = − dx Λ

(3.26)

so that the unaffected fraction of the beam declines as I = I0 exp[−x/Λ]. Therefore the mean free path Λ represents the thickness over which the unaffected fraction of the beam declines to 1/e = 0.368 of its original value. While photon cross sections with materials include phenomena such as pair production at much higher energies [Hubbell 1980], the main three interactions of interest to x-ray microscopists are: Photoelectric absorption σabs : a photon is entirely absorbed by an atom. Following photoelectric absorption and emission of an electron, an atom can release its energy by either emission of a characteristic x-ray, or by emission of an Auger electron (see Section 3.1.1). Elastic (coherent) scattering σel : a photon is scattered by the atom with no transfer of Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

40

X-ray physics

energy (well, almost – see Box 4.3). This process was described by Rayleigh as one wherein electrons oscillate in the electric field of the incident photon, and re-radiate a wave at the same frequency. The elastically scattered photon is locked in phase with the incident photon (with a phase shift described by Eq. 3.53), which is why elastic scattering is sometimes called coherent scattering. Inelastic (incoherent, Compton) scattering σinel : a photon is scattered inelastically by imparting kinetic energy to an electron. Conservation of momentum and energy leads to the Compton relationship between the incident wavelength λ, the scattered photon of wavelength λ at an angle θ relative to the incident photon, and the electron mass me of λ = λ +

h (1 − cos θ) me c

(3.27)

from which one can find an expression for the energy decrease ΔECompton of the inelastically scattered photon with energy E0 = hc/λ of ΔECompton =

E02 (1 − cos θ) me c2

(3.28)

where me c2 = 511 keV

(3.29)

for an electron. For 180◦ backscattering of a 10 keV photon, Eq. 3.28 gives ΔECompton = 0.39 keV. The cross section for this process was calculated by Klein and Nishina [Klein 1928, Klein 1929]. There can also be inelastic energy transfers to electron energy states in the atom, but the cross section for Compton scattering from valence electrons usually dominates. An inelastic scattered photon loses its phase relationship relative to the incident photon, which is why the process is sometimes called incoherent scattering. The relative strength of these interactions for carbon is shown in Fig. 3.10. As can be seen, interactions in the E  30 keV energy range can be well described with a total cross section σtot composed of photoelectric absorption σabs and elastic scattering σel as σtot = σabs + σel .

(3.30)

In transmission x-ray microscopy of lighter materials below about 20 keV, we can largely ignore the effects of inelastic or Compton scattering. However, Compton scattering adds a background signal that must be considered when considering weak elastic scattering at high angles, or the detection of x-ray fluorescence as discussed in Section 9.2. Examination of Fig. 3.10 reveals something important about x-ray interactions: there are no cloudy days in x-ray microscopy! What is meant by that sunny statement? In a cloud, or in fog, you have reasonably good light transmission but very little ability to form sharp images of objects. Of course the reason for this is that visible photons are Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.2 Atomic interactions, scattering, and absorption

41

10

1

0.1

0.01

107 106

Carbon n tio rp so Ab

105 104 103 102 10

m abs

Cross section (barns = 10-24 cm2)

h (nm) 100

Elastic scattering mel

Inelastic scattering m

1

inel

100 10-1 10-2 101

102

103

104

105

106

Energy (eV) Figure 3.10 Photon cross sections in carbon as a function of energy, showing the contributions of different processes: photoelectric absorption σabs (with an absorption edge at ∼290 eV), elastic scattering σel , and Compton scattering as the dominant form of inelastic scattering σinel . This figure shows that, for x-ray microscopy of lighter materials, absorption dominates and plural scattering can be ignored at photon energies below ∼30 keV. Data from Hubbell [Hubbell 1980]; see also Fig. A.2.

multiply elastically scattered on their path from the object to your eye, so that the ray directions of the light are lost. With X rays, Fig. 3.10 shows that absorption dominates over scattering over an energy range up to about 20 keV (somewhat higher energies for higher Z materials; see Fig. A.2). As a result, if a photon is scattered, it is far more likely that any subsequent interaction of that photon will be an absorption event. This in turn means that multiple scattering events are very unlikely. The situation in electron microscopy is very different (electrons are never simply absorbed, but instead undergo both elastic and inelastic scattering), as will be discussed in Section 4.10.2. In electron microscopy, this leads to difficulties in interpreting signals from samples that have a thickness of many scattering mean free paths.

3.2.1

Scattering by a single electron In order to examine scattering processes in greater detail, we begin by considering the scattering of radiation by a single electron. Following conventional treatments of scattering processes, we suppose that an x-ray amplitude ψ(r) must have an asymptotic form comprising a plane wave incident along the z axis and a spherical scattered wave with a dependence on the polar angle θ and azimuthal angle ϕ (Fig. 3.11) of ψ(r) = exp[−ik0 z] +

exp[−ik0 r] F(θ, ϕ), r

(3.31)

2π λ

(3.32)

where k0 =

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

42

X-ray physics

y

z Ƨ

x Figure 3.11 Geometry for x-ray scattering, with θ as the deflection angle and ϕ as the azimuthal

angle.

is the wave number k in vacuum, and λ the wavelength. (We have made here a particular choice of sign convention for forward-propagating waves; see Box 3.4). The form factor F(θ, ϕ) is related to the differential cross section dσ/dΩ by [Eisberg 1964] dσ = |F(θ, ϕ)|2 . dΩ

(3.33)

When the scatterer is a single free electron at the origin, the process is simple Thomson scattering and, for radiation linearly polarized along ϕ = 0, we have [Jackson 1962]  (3.34) F(θ, ϕ) = −re 1 − sin2 θ cos2 ϕ, where re = 2.818 × 10−15 meters

(3.35)

is the classical radius of the electron. Integrating Eq. 3.33 with Eq. 3.34 over θ and ϕ leads to the Thomson total cross section for one electron: σThom =

8 2 πr . 3 e

(3.36)

Averaging Eq. 3.33 over all incident polarizations [Jackson 1999, Eq.14.125] leads to a cross section for unpolarized radiation of 2 dσ  2 1 + cos θ . (3.37) electron = re dΩ 2

3.2.2

Scattering by an atom For an assembly of electrons, it can be shown [Lipson 1958] that, within the first Born approximation (see Section 3.3.4), the x-ray amplitude G(q) diffracted in direction k by an electron density distribution ρ(r) relative to a single free electron at the origin is given by  +∞ (3.38) ρ(r ) exp[−iq · r ]d3r. G(q) = −∞

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.2 Atomic interactions, scattering, and absorption

43

Here, k0 and k represent incident and scattered wave vectors, and q = k0 − k is the momentum transfer so that |q| = q = 2k0 sin(θ/2) (see also Box 4.2). In the particular case that ρ(r) represents an atom, G(q) can be regarded as an atomic form factor f0 (q). For the forward direction (θ = 0), the integral of Eq. 3.38 shows that f0 (0) = Z. For a spherically symmetric atom, the integrals over the polar and azimuthal angles can be carried out so that the (in this case) real function f0 (q) is given by  ∞ sin(qr) 2 r dr. ρ(r) (3.39) f0 (q) = 4π qr 0  can be calculated for particular wavefunctions, and are tabulated [Lonsdale Values of f0 (K) 1962]. This procedure accounts for the spatial distribution of the atomic electrons but not for the fact they are bound. That is, it applies for normal but not for anomalous dispersion (dispersion around an absorption edge, as discussed in Section 3.4). In order to represent the effects of anomalous dispersion, we replace f0 (q) with a complex number f˜ representing the effective number of electrons per atom (more properly, the number of electron oscillation modes per atom, as will be described in Section 3.3) which is defined in view of Eqs. 3.33 and 3.34 by   F(θ, ϕ) = − f˜ re 1 − sin2 θ cos2 ϕ. (3.40) atom In the x-ray scattering and crystallography communities, it is conventional to write f˜ as f˜ = f0 (q) + Δ f  + iΔ f  ,

(3.41)

where | f0 (q)|  Z, the atomic number and thus total number of electrons in the atom, while Δ f  and Δ f  are small and, for light elements, are independent of θ and ϕ [Lonsdale 1962]. In the soft x-ray optics and x-ray microscopy communities, the standard notation (see Eq. 3.65, and Henke [Henke 1981]) is f˜ = f1 + i f2 ,

(3.42)

where values of f1 and f2 are well tabulated (see Appendix A, and Fig. 3.16). When the atom is much smaller than the x-ray wavelength, as it is in the soft x-ray region, the amplitudes scattered by the individual electrons add coherently for all values of the polar angle. On the other hand, for shorter wavelengths, the coherent superposition is applicable only near the forward direction. Thus the f1 and f2 tables apply for all q vectors in the soft x-ray range, but only for q → 0 in the hard x-ray region (this leads to problems in reconciling different tabulations of x-ray optical constants, as will be discussed in Appendix A). The expression for F(θ, ϕ) given in Eq. 3.40 allows us to apply the optical theorem [Jackson 1999, Eq. 10.139] of σ = (4π/k)Im[ f (0)] to write the total atomic cross section σT as σT = 2λ Im [F(θ = 0)] = 2λre f2 ,

(3.43)

where σT is equal to the sum of the cross sections for absorption and scattering. Because in most cases in x-ray microscopy we do not consider angular variations in scattering, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

44

X-ray physics

2000

CSDA range (nm)

1000

Polystyrene 100

Si Au

10

1 20

100

1000

10,000

Electron energy (eV) Figure 3.12 The range of secondary electrons in polystyrene, silicon (Si), and gold (Au) as a function of the energy of the primary electron [Ashley 1976, Ashley 1978] (more recent compilations are available [Tanuma 1988, Tanuma 2011]). This calculation is in the continuous slowing down approximation, so it is known as the CSDA range of electrons. This figure illustrates how x-ray absorption by one atom can lead to ionizing radiation damage to many other molecules in the vicinity, an effect that is also shown in Fig. 11.4. Note that the mean free path for inelastic scattering of individual electrons is shown in Fig. 6.9.

we assume that f1 and f2 do not depend on q so that the elastic scattering cross section can be found by integrating Eq. 3.40 over θ and ϕ (as was done to obtain Eq. 3.36 from Eq. 3.34), giving σel =

8 2 ˜2 8 2 2 πr | f | = πre ( f1 + f22 ), 3 e 3

(3.44)

which holds roughly for λ  1 nm. The fact that F(θ, ϕ) is real in Eqs. 3.34 and 3.40 might suggest that the optical theorem will require that σT = 0 for a single free electron. This surely cannot be true; the explanation is that while the optical theorem is exact, Eqs. 3.34 and 3.40 are approximations which neglect the effect of the energy radiated on the motion of the driven electron. When this is taken into account [Heitler 1954], F(θ, ϕ) acquires an extra factor [1 − i4πre /(3λ)]. The value of the imaginary part of this factor is too small to be important for most purposes (although it was apparently known to Thomson), but it is exactly the factor needed to reproduce the correct Thomson total cross section σThom = (8/3)πre2 by application of the optical theorem. Examination of Eqs. 3.43 and 3.44 allows some further conclusions (beyond the one that there are no cloudy days in x-ray microscopy!) concerning the magnitudes of σT and σel . The latter is much smaller because re  λ in cases of interest to us. This is reflected in Fig. 3.10, which shows that σT is dominated by absorption at most energies of interest for x-ray microscopy. For free atoms, the dominant effect is photoelectric absorption, while the inelastic (Compton) cross section is negligible in the soft x-ray range. However, in spite of the relatively small size of the atomic elastic cross section, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

45

it would be wrong to conclude that elastic scattering is unimportant in soft x-ray optical systems. If the amplitudes scattered by the atoms in a unit of matter are added coherently, the total scattered intensity scales with the square of the volume of the unit while the total absorbed intensity scales linearly with volume. Therefore, as the unit becomes larger, scattering is increasingly favored over absorption so long as the superposition continues to be coherent. This property is exploited in gratings and zone plates, and is considered further below. (Similar coherence arguments apply to other radiation, such as electrons, neutrons, or hard X rays, though the angular extent of the enhancement scales as the wavelength and is restricted to small angles in electron microscopy, for example). We will see in Section 3.3.3 that the linear absorption coefficient (LAC) [J¨onsson 1928] for x-ray propagation in media can be written as μ = 2λre na f2 = na σabs ,

(3.45)

which is consistent with Eq. 3.43 because, for uniform matter, the elastically scattered amplitudes add to zero in all directions except the forward. For observable scattering to take place there must therefore be some degree of nonuniformity – and a specimen without nonuniformity is a pretty boring specimen for x-ray microscopy! The consequence of photoelectric absorption is the generation of Auger and photoelectrons (along with x-ray fluorescence photons). These energetic electrons then undergo inelastic scattering to produce a cascade of lower-energy electrons, and the typical range of this electron shower as a function of the primary electron energy is shown in Fig. 3.12 (the inelastic mean free paths of low-energy electrons in several materials are shown in Fig. 6.9). Remember that chemical bonds have energies of only a few eV (Box 3.2), so one primary x-ray absorption event can lead to many electrons that damage many chemical bonds. One can use electron optics to collect the Auger and photoelectrons for sensitive surface microscopies with samples that are suitably vacuum compatible; this is what is done in scanning photoelectron emission microscopy (SPEM; Section 6.4) and photoemission electron microscopy (PEEM; Section 6.5).

3.3

The x-ray refractive index We have described above the characteristics of how individual photons with specific energy and momentum interact with individual atoms. As the joke from quantum mechanics goes, we treat a photon as a particle on Mondays and Wednesdays, and as a wave on Tuesdays and Thursdays (since there are so few quanta in need of repair, apparently their mechanics are able to take three-day weekends – either that or they work in France). Since you’re probably reading this section on a Tuesday or a Thursday, it’s time to consider how electromagnetic waves interact with refractive media in order to understand more about x-ray physics. We will discuss the essence of the story here; further details are provided in Appendix B, available online at www.cambridge.org/Jacobsen. Let’s start with a short review of a simple physical system: the damped, driven harmonic oscillator. A typical experiment involves a cart with mass m on a low-friction

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

46

X-ray physics

track, connected to an oscillating driving force through a spring; the cart might have a sail on it to provide a velocity-dependent damping force. With a periodic driving force of F0 cos(ωt), where ω = 2π f is the angular frequency associated with an oscillation period T = 1/ f , we can write the sum of the forces F = ma leading to acceleration a = d2 x/dt2 for a mass m as dx d2 x = −kx + (−b) + F0 cos(ωt) (3.46) 2 dt dt with an opposing spring force −kx proportional to an offset from x = 0, and a velocitydependent damping force −b dx/dt opposing the motion. This can be rearranged into a more convenient differential equation form as m

d2 x dx F0 iωt e + γ + ω20 x = (3.47) 2 dt m dt involving a resonant frequency ω0 of  k , (3.48) ω0 = m a damping coefficient γ = b/m, and notation based on complex numbers such that ˜ iθ ] indicates the observable quantity. The solution to the differential the real part Re[Ae equation of Eq. 3.47 can be written in the form x(t) = Re[|A(ω)| eiωt−δ(ω) ].

(3.49)

The driving-frequency-dependent magnitude response |A(ω)| is given by F0 /m F0 /m =  |A(ω)| =  (ω20 − ω2 )2 + (γω)2 (ω20 − ω2 )2 + (ωω0 /Q)2 and the phase retardation δ(ω) can be found from γω tan[δ(ω)] = 2 , ω0 − ω2

(3.50)

(3.51)

with both quantities plotted together in Fig. 3.13. In the second form of Eq. 3.50, a quality factor Q ≡ ω0 /γ was used, in which case the magnitude response at the resonance frequency becomes QF0 (3.52) A(ω0 ) = k while the phase becomes π (3.53) δ(ω0 ) = , 2 which describes the phase shift imparted to an elastically scattered photon. It is easy to remind oneself of the properties of a damped, driven mechanical oscillator by using your hand as the driving force for a string on which hangs a mass as a pendulum harmonic oscillator: • At frequencies well below the resonance, the motion is very nearly in phase with the driving force. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

12 10

For Q = 10 180°

3.0 2.5

8

2.0 6

1.5

4

1.0

2 0

ƣƷ

A(Ʒ)

47

0.5

0° 0.6 0.8 1.0 1.2 1.4

0.0 1.6

Ʒ/Ʒ0 Figure 3.13 Resonance in the classical damped, driven harmonic oscillator. As the driving frequency ω is increased towards the resonant frequency ω0 , the amplitude A(ω) of the response increases (Eq. 3.50), as does the phase retardation δ(ω) found from Eq. 3.51. Above resonance, the amplitude response quickly decreases, while the phase retardation increases towards 180◦ .

• At the resonance frequency, the motion of the object is exactly 90◦ behind the phase of the driving force (Eq. 3.53) but the magnitude of oscillation is at a maximum. • Above the resonance frequency, the motion of the object approaches an opposite phase from the driving motion. • And here’s a bit more of a subtle point: the shape of the magnitude response curve |A(ω)| shown in Fig. 3.13 is a bit asymmetric, with a slightly higher response below the resonance than above. All of these points give us a helpful mechanical model for understanding the refractive properties of media for electromagnetic waves of different wavelength.

3.3.1

Electromagnetic waves in media One of the great triumphs of classical physics was the unification by Scottish physicist James Clerk Maxwell of several electromagnetic phenomena into a self-consistent set of equations, which Oliver Heaviside (who was self-taught) later simplified using vector calculus [Hunt 1991a] into the four equations we know collectively today as Maxwell’s Equations (see Appendix B at www.cambridge.org/Jacobsen). Based on earlier work by Weber and Kohlrausch (see [Kirchner 1956, Kirchner 1957]), Maxwell found that electromagnetic waves in linear media should travel with a phase velocity of vp =

ω 1 λ = = √ T k μm

(3.54)

where μm is the magnetic permeability and is the electric permittivity of the medium. In the case of electromagnetic waves in a vacuum, the resulting phase velocity is the speed of light √ (3.55) c ≡ 1/ μ0 0 = 2.9979 × 108 m/s, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

48

X-ray physics

where μ0 can be found by measuring (for example) the magnetic field produced by an electric current in a wire, and 0 can be found by measuring the capacitance between two plates in a vacuum. Maxwell noted with some triumph [Maxwell 1861] that The velocity of transverse undulations in our hypothetical medium, calculated from the electromagnetic experiments of M.M. Kohlrausch and Weber, agrees so exactly with the velocity of light calculated from the optical experiments of M. Fizeau, that we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena. [Italics as in the original.]

With a linear medium other than vacuum, from Eq. 3.54 one can write the phase velocity as c (3.56) vp = , n where

 n≡

μm μ0 0

(3.57)

is the index of refraction n (equivalently, the refractive index n) of the medium. As is shown in detail in Appendix B at www.cambridge.org/Jacobsen, the strength of the magnetic component of electromagnetic waves is much weaker than the electric component is, and the magnetic permeability μm = μ0 (1 + χm )

(3.58)

of most media is much closer to the value μ0 in vacuum than is the case for electric permittivity = 0 (1 + χe )

(3.59)

in media relative to the value 0 in vacuum. That is, χm tends to be extremely small while χe tends to be somewhat small compared to unity (Appendix B), and furthermore with electromagnetic waves the magnetic fields tend to be small while the electric fields are appreciable (Appendix B.3). As a result, one can consider these electromagnetic waves in media as producing mainly a dielectric response due to the displacement of the atom’s electron charge distribution from the nucleus, as shown in Fig. 3.14. This leads to the Drude model of the refractive index [Drude 1902] where one can write the refractive index n for waves of driving frequency ω propagating in a dielectric medium according to Eq. B.27 in Appendix B, or [Griffiths 1989, Eq. 9.170] n≡

  gj na e2  k =1− (ω2 − ω2j ) + iγ j ω , 2 2 2 2 2 k0 2me 0 j (ω j − ω ) + γ j ω

(3.60)

where g j are the weights for each of j electron oscillation modes with associated resonant frequencies ω j , na is given by Eq. 3.21, me is the mass of an electron (Eq. 3.29), and each electron oscillation mode has a damping coefficient γ j . Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

E

-

-

-

+

-

-

- + - -- -

49

E

d +q

+ -

-q

p

Figure 3.14 A simple model of inducing a dipole moment on an atom. With no applied electric field (left), an idealized atom has a positively charged point nucleus and a symmetrically distributed negative charge cloud due to the electrons. When placed in an electric field, the fact that the positively charged protons are nearly 2000 times more massive than the electrons means that the nucleus stays pretty much in place, while the electron cloud is displaced, leading to a dipole moment p for the atom.

3.3.2

The great frequency divide and the refractive index In the general expression for the refractive index of Eq. 3.60, we have in the denominator a term (ω2i − ω2 ) involving a particular resonant frequency ω j and the driving frequency ω. When ω matches a particular oscillator mode’s resonant frequency ω j , that mode will contribute a local maximum in energy transfer (absorption) as well as a phase resonance reminiscent of the mechanical resonator case shown in Fig. 3.13. However, because an atom has many quantum states for its electrons and thus many oscillator modes, the response of one particular mode may be a rather minor contributor to the overall frequency-dependent polarization of the material and thus its refractive index. In order to better characterize the overall refractive index and thereby see when we have the term (ω2i − ω2 ) predominantly positive or negative, we have to ask: what are the dominant oscillator modes of electrons in atoms? The answer is that the plasmon modes are the dominant modes, setting forth a great frequency divide between low- and high-frequency forms of the refractive index for materials. In Section 3.1.3 we showed the expression of Eq. 3.24 for the plasmon energy, and in Box 3.3 we estimated that ω p = 30 eV for fused silica, though we noted that the short lifetime of plasmon resonances can lead to a rather broad distribution about that energy due to the Heisenberg uncertainty principle (Eq. 3.24). In Fig. 3.9 we showed that this is indeed indicative of the broad optical absorption response of fused silica, which in fact has very good ultraviolet transmittance compared to many other glasses. The dominance of the plasmon response appears across a range of materials and measurement methods. As an example, we show in Fig. 3.15 a spectrum of inelastically scattered electrons in amorphous ice (Section 11.3.1), which shows that the relative probability for inelastic energy transfer is highest in the plasmon range, with a maximum at about 20 eV, which is what Eq. 3.24 gives using the electron density of ice. Since the plasmon frequency ω p is the great divide in the dielectric response of materials to electromagnetic waves, the general expression of Eq. 3.60 has different reduced approximations in the case for ω  ω p , or visible light, compared to the x-

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

50

X-ray physics

Equivalent photon wavelength (nm) 200 100 60

40

20

Electron inelastic scattering strength (A.U.)

1.0

Extreme UV

Visible Vacuum light UV 0.8

0.6

Single-elastic-scatter spectrum of vitreous ice at 100 keV (data from Rchard Leapman, NIH)

0.4

Plasmon peak

0.2

0.0 0

200 100

20

40

60

80

100

Electron energy loss (eV)

Figure 3.15 The inelastic energy loss spectrum of 100 keV electrons in amorphous ice, showing the dominance of plasmon mode losses in the 10–40 eV energy range (not unlike the case for fused silica; see Fig. 3.9). Electron energy loss spectroscopy (EELS) data acquired in a 100 kV electron microscope by Richard Leapman of the National Institutes of Health, with the single-scatter spectrum calculated by the author using the Fourier-log deconvolution method [Johnson 1974, Wang 2009a]. Plural scattering effects in electron interactions will be discussed in Section 4.10; see also Fig. 4.78.

ray case of ω  ω p . For visible light, the driving frequency ω is lower than most of the oscillator mode resonant frequencies ωi , so one arrives (see Appendix B.1 at www.cambridge.org/Jacobsen) at an expression for the refractive index of [Griffiths 1989, Eq. 9.173] ⎞ ⎛ ⎞ ⎛ ⎜⎜⎜ na e2  g j ⎟⎟⎟ ⎜⎜⎜ na e2  g j ⎟⎟⎟ 2 ⎟ ⎜ ⎟⎟⎟ . ⎜ ⎟⎟ + ω ⎜⎝⎜ Re[n]  1 + ⎜⎝⎜ (3.61) 2me 0 j ω2j ⎠ 2me 0 j ω4j ⎠ Recognizing that the wavelength in vacuum is given by λ = 2πc/ω, we can also write this as  B (3.62) Re[n]  1 + A 1 + 2 , λ which is known by visible-light optical system designers as Cauchy’s equation (see also Eq. B.33 in Appendix B.1). For crown glass, the coefficient of refraction is A = 0.5320, and the coefficient of disperson is B = 8107 nm2 . X rays are on the high side of the great frequency divide, so one arrives at a different expansion of Eq. 3.60. Ignoring smaller terms discussed in Appendix B.2, one arrives at an expression for the refractive index for X rays in dielectric media of n=1−

  gj n a e2  (ω2 − ω2j ) + iγ j ω . 2 2 2 2 2 2me 0 j (ω − ω j ) + γ j ω

(3.64)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

51

Box 3.4 Sign convention for ψ and n It is an arbitrary choice to say that forward-propagating plane waves (Eq. 3.31) go as ψ = ψ0 exp[−i(kz − ωt)] instead of ψ = ψ0 exp[+i(kz − ωt)]. The space z and time t variations must have opposite signs to yield a positive phase velocity v p [French 1966, Chap. 7] given by Eq. 3.72, but both of the above sign conventions meet the condition of the wave equation [Born 1999, Eq. 1.3.5] of 1 ∂2 ψ ∂2 ψ = 2 2. 2 ∂x v p ∂t

(3.63)

Therefore there is no right or wrong choice for choice for exp[−ikz] or exp[+ikz]. As pointed out by Attwood [Attwood 2017, Eq. 1.26 footnote], some of the early x-ray literature used exp[−ikz] [Compton 1927] while other x-ray literature used exp[+ikz] [James 1982, Als-Nielsen 2011, Attwood 2017], as does much of the optics literature [Born 1999, Goodman 2017]. One book [Cowley 1981, Cowley 1995] even switched sign conventions between editions! Our choice of ψ = ψ0 exp[−ikz] affects several expressions which would appear differently with the choice ψ = ψ0 exp[+ikz]: • The refractive index of n = 1 − δ − iβ (Eq. 3.67) would become n = 1 − δ + iβ with exp[+ikx] (see for example [Attwood 2017, Eq. 1.26]). • One would change Eq. 3.65 of n = 1 − αλ2 ( f1 + i f2 ) to become n = 1 − αλ2 ( f1 − i f2 ). (Some of this was noted some time ago [Ramaseshan 1975].) That’s all fine; physically the phase is still advanced in the medium, as demonstrated by the x-ray prism experiment shown in Fig. 3.19.

This is usually written in a much a simpler form of n = 1 − αλ2 ( f1 + i f2 )

(3.65)

with α≡

re na , 2π

(3.66)

where once again re is the classical radius of the electron and na is given by Eq. 3.21. The refractive index is frequently written as n = 1 − δ − iβ

(3.67)

so that a wavefield that has propagated through a material of thickness t is modified according to ψ = ψ0 e−kβt e+ikδt

(3.68)

relative to a wave that has propagated a distance t in vacuum (and k = 2π/λ as in Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

52

X-ray physics

100

10

h (nm)

1

0.1

100

f1

Gold

f2

f1 and f2

10

f1 1

Carbon

0.1

0.01 10

f2

100

1000

10,000 30,000

Energy (eV) Figure 3.16 Complex number of oscillators modes ( f1 + i f2 ) for carbon and gold as a function of x-ray energy, as tabulated by Henke et al. [Henke 1993]. In the regions near x-ray absorption edges, this tabulation is generally not valid due to near-edge effects, as discussed in Section 9.1.2. Note that at high energies, f1 → Z (Eq. B.43 in Appendix B at www.cambridge.org/Jacobsen), while the absorptive term f2 declines roughly as λ2 relative to the phase shifting term f1 . This makes phase contrast imaging especially favorable at higher x-ray energies, as will be discussed in Section 4.7.

Eq. 3.32). The expression of Eq. 3.67 uses the definitions of δ ≡ αλ2 f1

(3.69)

β ≡ αλ f2 ,

(3.70)

2

and for 3D imaging of weakly absorbing objects one can relate δ to the electron density, as will be shown in Eq. 10.72. In Eq. 3.65, ( f1 + i f2 ) is the frequency-dependent number of oscillation modes per atom (Eq. 3.42), and it is natural that the sum of these modes tends to approach the atomic number Z for neutral atoms. As discussed in Appendix B.2 for the case of high frequencies (or higher photon energies, and shorter wavelengths), f2 will decline as λ2 relative to f1 , while f1 should approach Z (Eq. B.43 in the online appendix). This is indeed what is observed in experimental values of f1 and f2 , such as those shown in Fig. 3.16 (and see Eq. 3.77 below for a discussion of the wavelength or energy scaling of the linear absorption coefficient). Finally, one should note that if one makes a different choice for the sign convention of forward-propagating waves, one arrives at n = 1 − δ + iβ instead, as discussed in Box 3.4. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

ѥP

ѥP

1.0

53

ѥP

ѥP

200 nm 0.5

400 nm

C at 500 eV

Imaginary part

ѥP 600 nm 0.0

ѥP

ѥP 0 nm

ѥP 800 nm

-0.5

ѥP 100 nm

ѥP

ѥP

ѥP C at 5000 eV

-1.0

ѥP -1.0

-0.5

0.0 Real part

0.5

1.0

Figure 3.17 The complex modulation of a transmitted wave as given by Eq. 3.71 leads to an Argand spiral (named after the French mathematician Jean-Robert Argand). The magnitude of the transmitted wave is reduced by exp[−kβt], and the phase is advanced by exp[+ikδt] as given by Eq. 3.71. This is shown here in the real and imaginary plane for 500 eV and 5 keV X rays being transmitted through various thicknesses t of carbon with ρ = 2.2 g/cm3 , using the data tabulation of Henke et al. [Henke 1993]. Phase contrast imaging methods may need to employ phase unwrapping algorithms [Goldstein 1988, Volkov 2003] to correctly interpret phases that go beyond π.

The x-ray refractive index tells us how an incident wavefield ψ0 (x, y) with wave number k0 = 2π/λ (Eq. 3.32) traveling in the zˆ direction is modulated by an object with a thickness t and 2D refractive index distribution of n(x, y) = 1 − δ(x, y) − iβ(x, y). The wavefield transmitted by the object becomes ψ(x, y) = ψ0 (x, y) exp[−ikt] exp[kt(iδ(x, y) − β(x, y))] = ψ0 (x, y) exp[−ikt] exp[ikδ(x, y)t] exp[−kβ(x, y)t],

(3.71)

where we have, for simplicity, written k0 as k because the non-vacuum wave propagation characteristics are captured in δ+iβ for the medium. There is first of all a geometric phase exp[−ikt] according to propagation through a distance t in vacuum. On top of that, a wave traveling through uniform medium with thickness t undergoes a net amplitude reduction of exp[−kβt] and a phase advance of exp[+ikδt], as shown in Fig. 3.17. This pure projection through the specimen’s thickness works for the case where the specimen is within the depth of field limit, as will be discussed in Section 4.4.9. For thicker objects, one should use the multislice method, as will be discussed in Section 4.3.9. At this point, you should feel a bit disturbed. (Oops – we don’t mean to pry into your Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

54

X-ray physics

Wave propagation direction X-ray refractive media

Figure 3.18 Schematic representation of an x-ray wave traveling in media, showing phase advance and attenuation. Figure due to Benjamin Hornberger [Hornberger 2007a].

psychological status). Why? Recall that the phase velocity for an electromagnetic wave in a refractive medium is shown in Eq. 3.56 to be v p = c/n. In other words, since the real part of the refractive index is 1 − δ for X rays in a medium, one can use the binomial approximation on (1 − δ)−1 to arrive at a phase velocity (Eq. B.44 in Appendix B at www.cambridge.org/Jacobsen) of v p  c(1 + δ),

(3.72)

which is faster than the speed of light in a vacuum! Doesn’t that violate special relativity, which is described as setting an absolute speed limit in the university of c? What’s even more curious is that, to our knowledge, the first person to point out this characteristic of x-ray propagation in media was Einstein himself [Einstein 1918], as we have discussed in Section 2.2 – and Einstein failed to comment on this point in his short paper! We suspect that the reason for this is that Einstein had already calculated both the phase and group velocities, since he already had some understanding of the nature of the x-ray refractive index. In Eq. B.46 we show that the group velocity is well approximated by vg = 1 − δ,

(3.73)

and the group velocity describes the speed at which the wave transmits energy. That is, the group velocity describes the speed of the main body of the wave, while wavefronts race ahead at the phase velocity (Fig. 3.18) until dispersion in the wave starts to reduce the energy it carries – in other words, wave attenuation! If wavefronts are faster in media than in vacuum, doesn’t that imply that prisms refract X rays the opposite direction from how visible light is refracted? Yes, it does! This was demonstrated already in 1924 [Larsson 1924] (see Fig. 3.19). There are additional curious consequences of the refractive index of X rays in media, such as the nature of x-ray reflectivity (Section 3.6) and the characteristics of refractive focusing lenses (Section 5.1). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

55

Figure 3.19 X-ray refraction demonstration of Larsson et al. [Larsson 1924], showing that for X

rays the refracted rays (Gebrochener Strahl, of several fluorescence lines from the x-ray tube used) go in the direction one would expect from n  1 − δ, which is opposite from the case with visible light. Also shown are the direct and reflected rays.

3.3.3

X-ray linear absorption coefficient We found in Eq. 3.43 that the optical theorem gives a cross section for beam loss in the forward direction (that is, ignoring how scattering can redistribute energy directionally) of σT = 2λre f2 , and we related interaction cross sections with beam intensity losses in Eq. 3.26 of dI/dx = −Ina σ. Together these expressions imply an intensity decrease in the forward direction of dI = −I 2re na λ f2 = −μ I (3.74) dz with a linear absorption coefficient (LAC) μ of μ = 2re na λ f2 = 4παλ f2 ,

(3.75)

where in the latter form we have used α from Eq. 3.66. Integration of Eq. 3.74 with an initial beam intensity of I0 leads to the well-known Lambert–Beer law of x-ray absorption through a material of thickness t of I = I0 exp[−μt].

(3.76)

This is often simply referred to as Beer’s law, and this law is frequently celebrated in liquid form by some x-ray microscopists. The inverse μ−1 is the absorption or attenuation length, which is the distance over which a beam is reduced in intensity by a factor of exp[−1]  0.37 due to absorption. The linear absorption coefficient μ includes an explicit decrease at shorter wavelengths due to λ, and (as shown in Eq. B.42 online at www.cambridge.org/Jacobsen) we expect f2 to decrease with λ2 . Therefore we expect the linear absorption coefficient to scale as μ ∝ λ3 ∝ E −3 .

(3.77)

Finally, in Section 9.1 we’ll also find it useful to consider the mass absorption coefficient μ defined by μ (3.78) μ ≡ ρ Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

56

X-ray physics

ƪ (nm) 100 1000

10

1

0.1

absorption length ƫ-1 (çm)

100

Carbon 10

1

0.1

Gold

0.01

0.001 10

100

1000

10,000

Energy (eV) Figure 3.20 Absorption length μ−1 (which is the inverse of the linear absorption coefficient μ of

Eq. 3.75) for X rays in carbon and gold, as tabulated by Henke et al. [Henke 1993]. This figure shows the general trend of μ−1 to increase as λ3 , and the presence of x-ray absorption edges as shown in Fig. 3.3. The assumed densities were ρ = 2.26 g/cm3 for carbon, and 18.92 for gold.

which is typically expressed in units of μg/cm2 (see Eq. 9.3). We can arrive at the same result in another way. In Eq. B.30, we show that waves in media have their amplitude attenuated according to the imaginary part of the refractive index as exp[k0 Im[n] x], and in Eq. 3.65 we found that re (3.79) Im[n] = − na λ2 f2 . 2π Since k0 = 2π/λ, this leads to a wave amplitude reduction of I = I0 exp[k0 Im[n] x] = I0 exp[−

2π re na λ2 f2 x] λ 2π

(3.80)

or an intensity reduction of I = I0 exp[−2kβx] = I0 exp[−μx],

(3.81)

μ = 2kβ

(3.82)

giving the relationship

as another expression equivalent to Eq. 3.75 for the linear absorption coefficient, thus reproducing Beer’s law of Eq. 3.76. This again confirms the consistency between the atomic scattering view of x-ray interactions described in Section 3.2, and the refractive index view described in the present section. Perhaps we don’t have to use only particle views of X rays on some days, and wave views on other days! In many cases in x-ray imaging it is useful to work with an image representation Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

57

that is linear with specimen thickness (for example, in nanotomography as discussed in Chapter 8, or spectromicroscopy as discussed in Chapter 9). For a transmission image I(x, y) based on absorption contrast, this means working with the optical density D(x, y). This is calculated from knowledge of the incident flux I0 as   I(x, y) D(x, y) = − ln = μ t(x, y), (3.83) I0 which is obtained from the Lambert–Beer law of Eq. 3.76 or Eq. 3.81, except that we now assume a single material with one value of linear absorption coefficient μ, and use t(x, y) to represent the thickness as projected onto each image pixel location (x, y). This linear treatment assumes that the specimen is within the depth of field DOF = 2δz of the imaging system (as will be given in Eq. 4.215), and that the first Born approximation applies – a topic that we now turn to.

3.3.4

The Born and Rytov approximations In Box 3.4, it was noted that electromagnetic waves traveling in the xˆ direction must have the form ψ = ψ0 exp[−i(kx − ωt)] so as to have positive phase velocity and to meet the condition of the wave equation of Eq. 3.63. If we instead separate out the space- and time-varying parts of the wave to have ψ = ψ(x) exp[iωt], Eq. 3.63 yields the condition 1 ∂2 ψ = 2 (−ω2 )ψ ∂x2 v ∂2 ω ψ + ( )2 ψ = 0 2 v ∂x ∇2 ψ + k2 n2 ψ = 0,

(3.84)

where in arriving at the final form we have made use of v p = c/n (Eq. 3.56), c = λ f , and ω = 2π f while also generalizing the one-dimensional derivative ∂2 /∂x2 as the Laplacian ∇2 . Thus we obtain in Eq. 3.84 the well-known Helmholtz equation (see [Born 1999, Eqs. 8.3.2 and 13.1.4] and [Goodman 2017, Eq. 3.13]) for waves in a medium with refractive index n, or n(r) for an inhomogeneous medium. We now consider the case where the wave ψ becomes a combination of a wave ψ0 incident from vacuum onto a medium, and a scattered wave ψ s that is formed while traversing an object with n(r). We therefore make the substitution ψ → ψ0 + ψ s .

(3.85)

If we also make the substitution k2 n2 → k2 + (n2 − 1)k2 in the Helmholtz equation (Eq. 3.84), and then use the result of ∇2 ψ0 + k2 ψ0 = 0 for the Helmholtz equation of the incident wave ψ0 before it hits the refractive medium, we arrive at   ∇2 ψ s + k2 ψ s = −k2 (n2 − 1)ψ0 + k2 (n2 − 1)ψ s . (3.86) Now if we could neglect the term [k2 (n2 − 1)ψ s ], we would have a linear differential equation allowing us to calculate the inhomogeneous refractive index distribution n(r) (that is, the three-dimensional object) from the scattered wave ψ s and the incident wave Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

58

X-ray physics

ψ0 . Neglecting the term [k2 (n2 − 1)ψ s ] is in fact what is done in the first Born approximation [Kaveh 1982], a simplification that was first applied for matter wave scattering in quantum mechanics [Born 1926] (one can use the first approximation solution to recursively yield higher-order approximations [Born 1999, Sec. 13.1.4]). In effect, one assumes that the incident wavefield ψ0 that reaches the downstream features in the object is the same as the wavefield ψ0 that illuminated the upstream features. It can be shown that the condition for satisfying this requirement is [Tatarski 1961, Eq. 7.5] |

ψs |  (n − 1) ψ0

(3.87)

and Eq. 3.71 tells us how strongly an incident wave ψ0 is modulated by a material of thickness t due to the x-ray refractive index of n = 1 − δ − iβ. Since δ and β are both small, the Born approximation is quite frequently satisfied when imaging thinner samples in x-ray microscopes, especially at high x-ray energies. Since Eq. 3.71 tells us that both the magnitude and phase modulations are exponential functions of the refractive index, an alternative approach due to Rytov [Rytov 1937, Chernov 1960] works in a logarithmic expansion of the wavefield, so that χ = ln(ψ).

(3.88)

One can then substitute ψ = eχ into the Helmholtz equation (Eq. 3.84) and arrive at a differential equation of [Kaveh 1982] ∇2 χ + ∇χ · ∇χ + k2 n2 = 0.

(3.89)

Making a substitution equivalent to Eq. 3.85 of χ → χ0 + χ s as well as χ1 = χ s ψ0 , one obtains   (3.90) ∇2 χ1 + k02 χ1 = −ψ0 k2 (n2 − 1) − k2 ψ0 ∇χ s · ∇χ s . When comparing the Rytov expansion of Eq. 3.90 against the Born expansion of Eq. 3.86, one sees that the term [k2 ψ0 ∇χ s · ∇χ s ] that is neglected is slightly different. It can be shown that the Rytov approximation makes two less-restrictive demands [Tatarski 1961, Eq. 7.15]: the first is (n2 − 1)  1,

(3.91)

which is quite easily satisfied for the x-ray refractive index; and the second is λ|∇ψ|  2π,

(3.92)

which effectively means that the wavefield ψ should have small changes over the distance of a wavelength λ. An additional limitation of the Born expansion in ψ is that its linear approximation runs into challenges with phase wrapping when the phase kδt approaches π. Because the Rytov approximation works on the logarithm of kδt, it does not suffer from the same phase wrapping problem [Kaveh 1982]. The Rytov approximation has certain advantages for single-step calculations of x-ray scattering from thicker objects [Sung 2013], although these advantages tend to disappear [Gureyev 2004] in the phase retrieval methods used in coherent diffraction imaging Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

59

(Chapter 10). If one instead models wave propagation through a thick object not in a single step but instead as applying to successive thin slabs of the object (the multislice method described in Section 4.3.9), the effects of upstream features on the illumination of downstream features are accounted for. In multislice methods one does not assume that the bracketed terms [] are zero in either the Born (Eq. 3.86) or Rytov (Eq. 3.90) expansions of the Helmholtz equation, respectively; instead, the actual combination of ψ0 + ψ s is calculated slice-by-slice and carried forward to illuminate the next slice.

3.3.5

Oscillator density in molecules, compounds, and mixtures Up until now we have discussed optical constant for single elements, including tabulations for ( f1 + i f2 ) for all of the elements at a variety of x-ray wavelengths (see for example [Henke 1993], and Appendix A). We now discuss the “mixture rule” [Deslattes 1969, McCullough 1975, Jackson 1981] for calculations involving collections of atoms, such as in a mixture, compound, or molecule. This mixture rule assumes that we can simply add up the net absorption and phase effects of all the atoms in proportion to their stoichiometric ratio, an assumption that holds well unless we discuss the finer details of x-ray absorption spectra (Sections 9.1.2 and 9.1.7). We start by considering an example for a mixture unit, which might be H2 O for one water molecule, or (C5 O2 H8 )n for the repeated monomer in poly(methyl methacrylate), or some other mixture. To provide a hard example, let’s consider a simplified borosilicate glass consisting of 80 percent SiO2 and 20 percent B2 O3 with a density of 2.23 g/cm3 . In this case the stoichiometric mixture unit can be written as 1 mixture unit = 0.8(SiO2 ) + 0.2(B2 O3 ) = B0.4 O2.2 Si0.8 ,

(3.93)

where the subscripts for each element refers to its stoichiometric weighting si in the mixture unit (that is, sB = 0.20 · 2 = 0.40 for boron, sO = 0.80 · 2 + 0.20 · 3 = 2.20 for oxygen, and sSi = 0.80 · 1 = 0.80 for silicon). We can then go on to calculate a number of properties of this mixture unit. Its total atomic weight A¯ is given by  si · Ai , (3.94) A¯ = Z





where Z means i=1,...,92 (for the 92 for naturally occurring elements), and a mole counts up Avogadro’s number NA of mixture units rather than of individual atoms. For a mixture unit of our example glass, we have A¯ = 0.40 · (10.81 = 61.99

g . mole

g g g ) + 2.20 · (15.999 ) + 0.80 · (28.085 ) mole mole mole

The atom number density na for a single material type is given by Eq. 3.21. Therefore the number density of mixture units is nm.u. = ρNA

1 1 = ρNA  A¯ Z si Ai

(3.95)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

60

X-ray physics

which for our example mixture unit of borosilicate glass is nm.u. = (2.23

1 100 cm 3 mixture units g )·( )·( ) ) · (6.02 × 1023 3 mole 61.99 g/mole 1m cm

or nm.u. = 2.17 × 1028 (mixture units)/m3 . Turning instead to atoms, the number density na,i for atom type i in the mixture is given by na,i = ρNA

si si , = ρNA  A¯ Z si Ai

(3.96)

so for boron in our example glass we obtain na,B = (2.23

100 cm 3 atoms g 0.4 )·( ) · ) · (6.02 × 1023 3 mole 1m 61.99 g/mole cm

or na,B = 8.66 × 1027 (boron atoms)/m3 . If we add up all the individual element atom densities, we obtain a total atom number density of   si Z si (3.97) = ρNA  Z n¯ a = ρNA ¯ s A Z i Ai which is n¯ a = 7.36 × 1028 atoms/m3 . The electron density ne,i for atom type i in the mixture is given by si Zi si Zi ne,i = ρNA (3.98) = ρNA  A¯ Z si Ai which for boron’s electrons in our example glass is ne,B = (2.23

0.4 · 5 100 cm 3 atoms g )·( )·( ) , ) · (6.02 × 1023 mole 61.99 g/mole 1m cm3

giving ne,B = 4.33 × 1028 (boron atom electrons)/m3 with an overall electron density n¯ e given by  si Zi , (3.99) n¯ e = ρNA  Z Z si Ai which is n¯ e = 6.67 × 1029 electrons/m3 for our borosilicate glass. Finally, the fractional density ρi for atom type i is given by ρi = ρ

si Ai si Ai , = ρ A¯ Z si Ai

(3.100)

which gives ρB = 0.156 g/cm3 for boron in our glass. To calculate the oscillator density f¯1 + i f¯2 for a mixture unit, we need to know the oscillator strengths f1 + i f2 for each of the atom types. Let’s do this at a photon energy of 999.7 eV (very close to 1 keV) using values tabulated by the Center for X-ray Optics at Lawrence Berkeley Lab (henke.lbl.gov/optical constants/): Element B O Si

f1 + i f2 at 999.7 eV 5.245 + i0.316 8.225 + i1.756 14.142 + i1.456.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.3 The x-ray refractive index

The mean oscillator strength for a mixture unit is then ( f¯1 + i f¯2 ) with  f¯1 = si f1,i

61

(3.101)

Z

f¯2 =



si f2,i

(3.102)

Z

or in our example f¯1 = 0.40 · 5.245 + 2.20 · 8.225 + 0.80 · 14.142 = 31.507 oscillator modes f¯2 = 0.40 · 0.316 + 2.20 · 1.756 + 0.80 · 1.456 = 5.154 oscillator modes. The net real and imaginary parts of the refractive index n = 1 − δ¯ − iβ¯ for the mixture unit draw upon Eqs. 3.65 and 3.66, leading to re nm.u. λ2 ( f¯1 + i f¯2 ) δ¯ + iβ¯ = 2π  re ρNA 2  si f1,i + i si f2,i ) = λ ( 2π A¯ z Z   1 re ρNA λ2  ( si f1,i + i si f2,i ) = 2π Z si Ai z Z

(3.103)

which for our glass at 1 keV (or λ = hc/E = (1240 eV · nm)/(1000 eV) = 1.24 nm using Eq. 3.7) gives 2.82 × 10−15 m ) · (2.17 × 1028 m−3 ) · (1.24 × 10−9 m)2 · (31.507) δ¯ = ( 2π = 4.71 × 10−4 2.82 × 10−15 m ) · (2.17 × 1028 m−3 ) · (1.24 × 10−9 m)2 · (5.154) β¯ = ( 2π = 7.70 × 10−5 . The net linear absorption coefficient μ¯ is found from Eq. 3.82 as

or from Eq. 3.75 as

μ¯ = 2kβ¯

(3.104)

 si  μ¯ = 2re nm.u. λ f¯2 = 2re ρNA  Z si f2,i Z si Ai Z

(3.105)

or in our example μ¯ = 2 · (2.82 × 10−15 m) · (2.17 × 1028

atoms ) · (1.24 × 10−9 m) · 5.154 m3

= 7.81 × 105 m−1 , giving an absorption length μ−1 of 1 106 μm = 1.28 μm. · m 7.81 × 105 m−1 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

62

X-ray physics

For calculations at energies in between those where tabulated values are available, examination of Fig. 3.16 makes it clear that one should interpolate f1 on a linear scale, and f2 on a logarithmic scale. The “mixture rule” appears in older texts as well as more recent papers [Jackson 1981] in terms of a weighted sum of mass absorption coefficients (see Eqs. 3.78 and 9.3) as μ μ¯  = wi ( )i (3.106) ρ¯ ρ Z with wi =

si Ai . A¯

(3.107)

From Eqs. 3.21 and 3.75 we have NA μ = 2re λ f2 . ρ A

(3.108)

We can therefore rewrite Eq. 3.106 as 2re

NA ¯  si Ai NA λ f2 = (2re λ f2,i ) ¯ ¯ Ai A A Z

(3.109)

which, when cancelling the terms 2re NA λ/A¯ between right- and left-hand sides and Ai within the right-hand side, reduces to  f¯2 = si f2,i , (3.110) Z

which is really just a restatement of Eq. 3.102. Thus all is well with the universe.

3.4

Anomalous dispersion: life on the edge We began our discussion of refractive indices by invoking the experience of the damped, driven mechanical oscillator. This showed that one can expect strong phase shifts around an absorption resonance (Fig. 3.13). With the mechanical oscillator there is relatively little energy transfer when the driving frequency goes above the resonance frequency; with X rays on atoms the story is somewhat different because enhanced absorption remains “turned on” at energies above the threshold needed to remove an electron from a specific atomic state via photoelectric absorption (Fig. 3.3). Because of this, and the fact that in atoms we have not a single resonance but a set of available oscillation modes, we can expect phase resonances around absorption edges to differ considerably from the single mechanical oscillator case. We therefore briefly consider these anomalous characteristics of the x-ray refractive index, or anomalous dispersion.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.4 Anomalous dispersion: life on the edge

6

63

f2

Henk e

4

He

nk

e

f1, f2

2

0

f1

-2 270

280

290

300

310

320

Energy (eV) Figure 3.21 Comparison of the oscillator strength ( f1 + i f2 ) for graphite at the Carbon K

absorption edge. The crosses and boxes represent experimental measurements by Dambach et al. [Dambach 1998], while the light red and grey curves are from the tabulation of Henke et al. [Henke 1993] for f1 and f2 , respectively. The solid black curve shows a smoothed version of the experimental near-edge f2 data spliced into the longer-spectral-range tabulation of f2 of Henke et al., while the solid red curve shows a local calculation of f1 using the Kramers–Kronig expression of Eq. 3.111 based on the combined f2 curve [Jacobsen 2004].

3.4.1

The Kramers–Kronig relations Examination of high-frequency refractive index expression of Eq. 3.60 reveals that the same parameters factor into both the phase shifting and absorptive components of the x-ray refractive index: the resonance frequencies ω j and damping coefficients γ j appear in both the real and imaginary parts. This suggests that if one has made a complete measurement of the imaginary part of the refractive index (i.e., the absorption spectrum), one can calculate the phase-shifting response. It turns out that with a few basic assumptions (such as that the electric susceptibility χe goes to zero as the driving frequency ω goes towards infinity, and a requirement for causality in that charge displacements can lag but not lead the application of an electric field), one can relate the real and imaginary parts of the permittivity using the Kramers–Kronig relations (see e.g., [Nussenzveig 1972, Burge 1993], or [Attwood 2017, Sec. 3.8], or [Jackson 1999, Sec. 7.10]). For the purposes of x-ray optical interactions, these relations can be written [Henke 1981] in terms of the oscillator strength ( f1 + i f2 ) to give  2 ∞ 2 f2 (E) f1 (E) = Z + d − Δ fr , (3.111) E 0 E2 − 2 where Δ fr is a relativistic correction term that is negligible for soft X rays. Since one can find f2 (E) from the absorption spectrum μ(E)=− ln [I(E)/I0 (E)]/t in a material of

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

64

X-ray physics

thickness t using Eqs. 3.75 and 3.76, one can then calculate the phase shifting spectrum using Eq. 3.111. This is in fact how various tabulations of the oscillator strength f1 (E) + i f2 (E) have been arrived at (see Appendix A). Because Eq. 3.111 involves an integration over all frequencies, one must have absorption spectra covering a very wide range of energies to obtain a reasonable approximation for f1 (E), though one can also splice a near-edge absorption spectrum into a larger-range absorption spectrum tabulation to obtain near-edge phase-shifting spectra [Palmer 1998, Jacobsen 2004, Yan 2013, Watts 2014]. X-ray spectra of atoms in solids, and of molecules, have fine structure near their absorption edges, as will be discussed in Section 9.1.2. Because this is dependent on the details of the chemical bonding of an atom, any tabulation of element-by-element oscillator strength ( f1 +i f2 ) will not accurately reflect the exact values exhibited by a particular material near an absorption edge. As an example, we show in Fig. 3.21 experimental values for ( f1 + i f2 ) for graphite near the carbon K edge obtained via interferometry experiments by Dambach et al. [Dambach 1998], along with the tabulated values of ( f1 + i f2 ) of Henke et al. [Henke 1993], and finally a calculation [Jacobsen 2004] of f1 from the Dambach f2 near-edge data spliced into the full Henke f2 tabulation. This illustrates the type of fine detail in the response of ( f1 +i f2 ) near absorption edges which is lost in element-by-element tabulations. A discussion of the possibilities of using nearedge structure in phase contrast ( f1 ) spectromicroscopy is given in Section 9.1.5.

3.5

X-ray refraction When electromagnetic waves cross a boundary where there is a change of refractive index from n1 to n2 , waves can be refracted according to Snell’s law of n1 sin θ1 = n2 sin θ2 ,

(3.112)

where θ is relative to the surface normal (Snell’s law can be found from Fermat’s principle, which is in turn described in Section 4.1.3). Rays exiting a higher refractive index medium 2 and encountering medium 1 are refracted exactly along the surface when the condition n1 sin(θ1 = 90◦ ) = n2 sin(θ2 = θc ) is met; that is, when n1 sin(θc ) = (3.113) n2 where θc is referred to as the critical angle (see Fig. 3.23). For visible light, n1 = 1 in air and n2 = 1.33 in water, so the critical angle becomes θc = sin−1 (1./1.33) = 49◦ (beyond the critical angle one has total internal reflection). As a result, if one is sitting in a lake looking up, one sees the above-water world refracted within a circle of light known as Fresnel’s window, and the underwater world reflected from the water surface at angles beyond 49◦ (Fig. 3.22). Now consider the case of X rays, where the refractive index is less than 1. This means that the region outside of the mirror has a higher refractive index (n ≡ 1) than the material inside mirror itself (n = 1 − δ). Therefore, while Fig. 3.23 showed the blue-shaded higher-index medium as representing water at the air–water visible-light Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.5 X-ray refraction

65

Dive boat “Double Trouble”

Critical angle

Total internal reflection from the depths...

Figure 3.22 Illustration of external refraction, and total internal reflection, at the air–water interface. Within Fresnel’s window (as indicated by the dashed yellow line), one can see a cloudless blue sky and the the scuba diving boat Double Trouble on the surface of Lake Huron. Beyond the critical angle of Eq. 3.113, one sees a reflection from the depths below, with little light present due to weak scattering of light from the water column back to the surface. Within the water, one also sees diver Tom Jones on his decompression safety stop about 5 meters below the water surface. The slight departures of Fresnel’s window from the indicated critical angle line are due to very weak waves on the water’s surface. Photo by the author, while diving the wreck of the Cedarville near Mackinaw City, Michigan.

Ƨ1

n1 n2 Ƨ2

Critical angle

Visible: vacuum (n1=1) medium (n2>1)

X rays: medium (n1n1

Figure 3.23 Total internal reflection for visible light and X rays. This figure illustrates the critical

angle (purple) for total internal reflection (Eq. 3.113) for the case where medium 2 (n2 ; shaded in blue) has a higher refractive index than medium 1 does (n1 ). With visible light, if medium 2 is water, a viewer looking up from under water can see a view such as that shown in Fig. 3.22.

interface, the blue area should represent vacuum at the vacuum–material interface for X rays. That is, external reflection by x-ray mirrors is really a manifestation of total internal reflection in the vacuum! Now with large refractive indices we express the critical angle θc of Eq. 3.113 relative to the surface normal; however, because the xray refractive index differs very little from 1, for X rays we shall instead refer to the complementary grazing angle θ as shown in Fig. 3.24. With this complementary angle, and with n2 = 1, the expression of Eq. 3.113 for the critical angle becomes in the x-ray Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

66

X-ray physics

n1

n2 Figure 3.24 Grazing incidence critical θc for x-ray reflectivity, where n2 is the vacuum and n1 is

the medium of the mirror material (see Fig. 3.23). The grazing angle θc is complementary to the referenced-to-normal critical angle θc .

case 1−δ (3.114) 1 1 − (θc )2 /2  1 − δ √ (3.115) giving θc  2δ. This can also be written as θc = λ 2α f1 using Eq. 3.65. Now α ∝ na (Eq. 3.66) and na ∝ ρ/A (Eq. 3.21), while f1 ∝ Z (see Fig. 3.16, as well as Eq. B.43 in Appendix B at www.cambridge.org/Jacobsen). Therefore the scaling of the critical angle with material type and x-ray wavelength can be written as (3.116) θc ∝ λ ρZ/A, cos(θc ) =

so the improvements by going to higher atomic number Z and density ρ materials are helpful but are offset somewhat by the 1/A atomic weight scaling and the square root dependence of these terms together.

3.6

X-ray reflectivity The existence of a grazing-incidence critical angle θ for total internal reflection implies that x-rays incident at grazing angles below this value will be externally reflected, and indeed this effect can be seen in Fig. 3.19 with a more detailed description arriving a few years later [Jentzsch 1929]. In general, waves are partially reflected from the boundaries between two materials with different refractive indices, as described by the Fresnel equations [Griffiths 1989, Sec. 9.3.2]. At normal incidence, the Fresnel reflectivity R⊥ for rays going from vacuum to a refractive medium is given by  1 − n 2  , R⊥ =  (3.117) 1+n which for X rays (where Eq. 3.67 describes the refractive index) leads to  1 − (1 − δ) 2  δ 2 δ2  =    . R⊥ =  1 + (1 − δ) 2−δ 4

(3.118)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.6 X-ray reflectivity

ơ/ƣ=0

1.0

ơ/ƣ=10 -2 ơ/ƣ =1 0 -1

ơ/ ƣ

0.4

.3 =0

0.6

ơ/ƣ =1

Reflectivity Rs

0.8

67

0.2

0.0 0.0

0.5

1.0

Ƨ’/Ƨ’c

1.5

2.0

2.5

Figure 3.25 Grazing incidence reflectivity of x-ray mirrors as a function of the ratio β/δ of the

absorptive to phase-shifting parts of the x-ray refractive index n = √ 1 − δ − iβ. Absorption leads to a softening of the reflectivity cutoff around the critical angle θc = 2δ. See also [Attwood 2017, Fig. 3.8].

Since δ is in the range of 10−3 –10−6 for X rays in media, normal incidence reflectivity from a single refractive interface is very weak. Crystals and layered synthetic multilayers can produce high reflectivity by using a coherent superposition of many weak individual reflected amplitudes (Section 4.2.3), but otherwise it is not practical to work with normal incidence reflective optics for X rays. The expression for grazing incidence reflectivity is considerably more complicated [Parratt 1954, Henke 1972, Henke 1981], though one can obtain similar results using finite difference methods [Fuhse 2006] or multislice propagation methods [Li 2017a] with n = 1 − δ − iβ (multislice methods are described in Section 4.3.9). X-ray reflectivity involves a factor a2 of   1 a2 ≡ (3.119) sin2 θ − δ + (sin2 θ − δ)2 + β2 2 where δ and β are from n = 1 − δ − iβ (Eq. 3.67) and θ is the grazing angle of incidence. The reflectivity Rσ (θ ) for X rays with the electric field vector oscillating parallel to the plane of reflection is then given by Rσ (θ ) =

4a2 (sin θ − a)2 + β2 . 4a2 (sin θ + a)2 + β2

(3.120)

In this expression, absorption in the mirror material leads to a “softening” of the reflectivity cutoff around the critical angle θc , as shown in Fig. 3.25. The ratio of reflectivity Rπ (θ ) for when the electric field oscillates perpendicular to the plane of reflection diDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

68

X-ray physics

Grazing angle Ƨ’ (degrees) 0.2

100

0.4

Grazing angle Ƨ’ (degrees)

0.8

Critical angle Ƨ’c

1.0

10.0

10-1

Reflectivity

60

Ir reflectivity at 10 keV

40 20 0

0.1 1

80

% reflectivity

0.6

10-2

Ir reflectivity at 10 keV

10-3 10-4

Critical angle Ƨ’c

10-5 10-6 0

5

10

Grazing angle Ƨ’ (mrad)

15

1

10

100 200

Grazing angle Ƨ’ (mrad)

Figure 3.26 Reflectivity of an iridium-coated mirror at 10 keV, on both a linear and a logarithmic scale. The critical angle for iridium at 10 keV is about 8.3 mrad, as given by √ θc = 2δ (Eq. 3.115).

vided by Rσ (θ ) is Rπ (θ ) 4a2 (a − cos θ cot θ )2 + β2 = , Rσ (θ ) 4a2 (a + cos θ cot θ )2 + β2 so of course the total reflectivity for an unpolarized beam is  1 Rπ (θ ) . R(θ ) = Rσ (θ ) 1 + 2 Rσ (θ )

(3.121)

(3.122)

Since bending magnet sources and most linear undulator sources at synchrotrons deliver radiation with the electric field in the horizontal direction, a vertically deflecting mirror involves Rσ (θ ) while a horizontally deflecting mirror involves Rπ (θ ), but in any case the polarization dependence is small, and observed mainly below 100 eV. As Eq. 3.115 shows, x-ray mirrors show strong reflectivity only at very shallow grazing angles, and reflectivity becomes quite low at angles beyond the critical angle, as shown in Fig. 3.26, or alternatively at energies above the energy at which the critical angle is approximately equal to the grazing angle; this is shown in Fig. 3.27. Therefore x-ray mirrors can be used as low-pass filters: x-rays below a certain energy will be reflected, while those above will not. This is illustrated in Fig. 3.27, which shows how a mirror can be used to block high diffraction orders from a grating monochromator (Section 7.2.1) and thus improve spectral purity of the beam used in an experiment. Grazing incidence mirrors are often used as first optics in higher-energy synchrotron light sources, where only a fraction of the beam power and essentially none of the harder X rays of the source (or gamma rays from Bremsstrahlung produced by electron scattering from residual gas in the storage ring) get deflected into the beamline where an x-ray microscope is located. Grazing incidence mirrors require incredibly smooth surfaces, as will be discussed in Section 5.2. To understand why, lay a flashlight on a hard surface in a darkened room and notice how prominent any dust and debris appear, or look out an airplane window Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

3.6 X-ray reflectivity

69

100 290 eV

% reflectivity

80

75.5%

580 eV

40 mrad

61.5% 60 60 mrad

40

33.8%

Fused silica

20 5.4% 0 0

200

400

600

800

1000

1200

Photon Energy (eV) Figure 3.27 Calculated soft x-ray reflectivity of a fused silica mirror at grazing angles of incidence of 40 and 60 mrad. For carbon edge spectromicroscopy, it is helpful to remove second-order light from grating monochromators (like 580 eV light when acquiring data at the carbon K edge around 290 eV). One way to do this is to use a fused silica mirror which has a change in its reflectivity around the oxygen K edge around 540 eV, so that the ratio of monochromator second- to first-order light is (5.4/61.5) = 8.8 percent at 60 mrad grazing angle versus (33.8/75.5) = 44.8 percent at 40 mrad grazing angle. Mirrors at a fixed grazing angle near the critical angle for a certain x-ray energy can act as low-pass spectral filters to remove from an x-ray beam at photon energies above a range of interest.

at the long shadows cast by mountains at sunrise or sunset. To quantify the reduction in mirror reflectivity that this leads to, we will travel ahead to grab two results: 1. Bragg’s law, which tells us that the optical path length difference from waves reflected from two partially reflecting surfaces separated by d is 2d sin θ , where θ is the grazing angle of incidence (see Fig. 4.9 and Eq. 4.33). 2. When a wavefield is subject to random phase errors characterized by a Gaussian deviation with square root variance Θ, the mean amplitude is reduced by a factor exp[−Θ2 /2] (Eq. 4.20) so the intensity is reduced by exp[−Θ2 ]. That is, for Gaussian-distributed random surface height errors characterized by a root mean square (RMS) roughness of σ, the mirror reflectivity will be reduced [Davies 1954] by a factor ησ of     2σ sin θ 2 ) = exp −(4πσ sin θ /λ)2 ησ = exp −(2π (3.123) λ so that to achieve 90 percent of the theoretical efficiency limit one must keep the RMS Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

70

X-ray physics

surface height errors down to a value of √ ln(1/0.9) λ σ≤ . (3.124) 4π sin θ For an iridium mirror at 6 mrad grazing incidence with 10 keV X rays, this means that one must have σ ≤ 0.53 nm. This emphasizes that x-ray mirrors must be quite smooth to have high reflectivity (see Fig. 5.5).

3.7

Concluding limerick We conclude our discussion of the physics of x-ray interactions with materials with a limerick: In atoms each electron will wait Sitting tight in its own quantized state They will all interact With a wave; that’s a fact Letting amplitudes find their own fate

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004

4

Imaging physics

4.1

Waves and rays In Chapter 3 we talked about the ways that X rays interact with individual atoms, and then with amorphous media as described by the refractive index. But x-ray microscopes are not able (yet?) to image individual atoms, and images of uniform amorphous media are not very interesting. We therefore turn to how x-ray wavefields interact with everything in between. In 1690, Christiaan Huygens’ Traite de la Lumi`ere put forward his wave theory of light which included a simple picture for the collective effect of a series of point emitters of spherical waves of light. As one goes some distance from the emitters, one sees a smooth wavefront from their collective effect, as shown in Fig. 4.1. As a result, by rearranging the set of emitters, or by altering the time delay from which they emit, one can generate wavefronts that can be thought of as the line of peak electric field at a moment in time for an electromagnetic wave. The local wave direction is then perpendicular to the wavefront. We can think of each Huygens point source as emitting a spherical wave with amplitude λ (4.1) ψ = ψ0 e−i(kr − ωt) + iϕ , r where k = 2π/λ is the wave number, ω = 2π f is the angular frequency, and ϕ is a phase advance. (Again, we have made a particular choice of sign convention for forwardpropagating waves as discussed in Box 3.4.) With many Huygens emitters arranged in a row, we can add them up (Eq. 4.64) to arrive at plane waves with a vector wave number k indicating the propagation direction, and three-dimensional positions x, so that a plane wave propagates as  (4.2) ψ = ψ0 e−i(k · x − ωt) + iϕ .

4.1.1

Adding up waves While the Huygens construction provides a great conceptual picture of how waves superimpose, let’s dive a bit more into the mathematics. The expressions of Eqs. 4.1 and 4.2 both involve complex exponentials of the form ψ = Aeiϕ as shown in Fig. 4.2; here, A is the magnitude of the vector, and ϕ the phase (see Box 4.1 regarding the terminology we use here). From the complex amplitude ψ = Aeiϕ representing the wavefield, the real

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

72

Imaging physics

Figure 4.1 The Huygens construction for producing wavefronts from a series of point source

emitters of waves. A plane wave is shown at left, and a converging spherical wave is shown at right, along with wavefields created by summing up the contributions of Huygens point sources (see Eq. 4.64).

Box 4.1 Amplitude, magnitude, and phase Unfortunately there is some variation in how the words “amplitude” and “magnitude” are used. We prefer to say that magnitude A and phase ϕ are combined in the complex amplitude ψ = Aeiϕ , which when multiplied by its conjugate gives the intensity I = |ψ† ψ| = |A2 |. (We also refer to image contrast modes as absorption contrast and phase contrast in Section 4.7, rather than amplitude and phase contrast.) Some use “amplitude” to refer to A, but when they do so we like to quote Inigo Montoya from the movie The Princess Bride: “You keep using that word. I do not think it means what you think it means.”

part Re[Aeiϕ ] = A cos ϕ often represents some measurable quality, like the electric field when referring to electromagnetic waves. The wavefield ψ times its complex conjugate ψ† gives the wave intensity or I = ψ† · ψ.

(4.3)

One can think of the imaginary part Im[Aeiϕ ] = A sin ϕ as holding onto the momentarily “invisible” property of the wave (after all, though ϕ = π/2 gives a real part Re[Aeiϕ ] = 0, the wave is still in existence with its magnitude A unchanged). A time-dependent wave eiωt goes through a 2π phase change over a time period T (with frequency f in cycles per second, or angular frequency ω = 2π f in radians per second). Let us then think about the addition of several waves that oscillate at the same frequency ω. We can freeze a moment in time, and know that the sum at that moment is going to be the same as the sum at a later time t except for the common rotation of all waves in the complex plane of eiωt . The addition can be done graphically by placing the head of one wave’s vector at the toe of another. If we add N waves each described by Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.1 Waves and rays

Im

73

A

Re

Figure 4.2 Complex circle representation of the wave amplitude Aeiϕ as a phasor (represented by

an arrow in the complex plane). Here A is the magnitude and ϕ the phase, and real and imaginary parts are indicated by Re[Aeiϕ ] = A cos ϕ and Im[Aeiϕ ] = A sin ϕ respectively.

ψ j = A j eiϕ j , a little bit of trigonometry lets us find the intensity result R2 to be ⎞2 ⎛ ⎞2 ⎛ ⎟⎟⎟ ⎜⎜⎜ ⎟⎟⎟ ⎜⎜⎜ 2 A j sin ϕ j ⎟⎟⎠⎟ + ⎜⎜⎝⎜ A j cos ϕ j ⎟⎟⎠⎟ R = ⎜⎜⎝⎜ j

=

N  j=1

(4.4)

j

A2j + 2

N  N 

A j Ak cos(ϕ j − ϕk ).

k> j j=1

Now in the case where all waves have the same phase ϕ, the cosine term will always be 1 and one can then show that ⎞2 ⎛ N ⎜⎜⎜ ⎟⎟⎟ R2coherent = ⎜⎜⎜⎝ A j ⎟⎟⎟⎠ (4.5) j

so if all waves have the same magnitude A, one arrives at R2 = N 2 A2 .

(4.6)

However, if we have completely uncorrelated phases, uniformly distributed around the circle of Fig. 4.2, then the cos(ϕ j − ϕk ) terms will tend to be negative as often as positive and that part of the sum will drop out, leaving a net magnitude of    N  A2j (4.7) |Rincoherent | = j

so if all waves have the same magnitude A, one arrives at √ |R| = N|A|

(4.8)

and |R|2 = N|A|2 .

(4.9)

That is, the net result of adding random phases with a uniform √ distribution over all phase angles is to produce a vector with some non-zero length N|A|, but with a phase that Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

74

Imaging physics

Imaginary

+Ƨu Real

-Ƨu

Figure 4.3 Adding phasors with phases uniformly distributed over a restricted range of −θu to +θu . The resultant has a phase of zero, and a magnitude reduction ηu given by Eq. 4.13.

cannot be predicted. This is sometimes referred to as the drunken sailor problem: if a sailor takes N steps of equal length but √ each in a completely random direction, the sailor is likely to travel a distance of N but in a direction that neither we (nor the sailor!) can predict.1 Let’s consider something in between fully coherent, and fully incoherent, wave addition. Now the expectation value  f of a function f (x) modulated by a probability distribution P(X) is given by  ∞ f (x) P(x) dx. (4.10) f = −∞

Let’s apply this to the case of adding up waves when each one has an individual phase difference θ, or Aei(ϕ+θ) = ψeiθ . Let’s consider first the case where the phases are distributed uniformly over a range from −θu to +θu (see Fig. 4.3). The probability of having one particular value of θ is then P(θ) = 1/(2θu ), so the expectation value for the wave amplitude ψu can be found from Eq. 4.10 to be   θu  θu  θu 1 ψ ψu = ψeiθ dθ = cos(θ) dθ + i sin(θ) dθ . (4.11) 2θu 2θu −θu −θu −θu Now because sin(−θ) = − sin(θ), the integral of the imaginary part from −θu to 0 will cancel out the integral from 0 to +θu , so we are just left with  θu 1 ψ sin(θu ) [sin(θu ) − sin(−θu )] = ψ ψu = ψ cos(θ) dθ = . (4.12) 2θu −θu 2θu θu That is, the net amplitude for phases uniformly distributed between −θu and +θu is reduced by a factor ηu = ψu /ψ of ηu (θu ) =

sin(θu ) θu

(4.13)

which has a value of 1 when θu = 0, as can be shown using L’Hˆopital’s rule. The 1

It makes one want to cry out in song: What shall we do with a drunken sailor?

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.1 Waves and rays

75

Relative probability

1.0

0.8

0.6

0.4

0.2

0.0 -3

-2

-1

0

x

1

2

3

Figure 4.4 The Gaussian or normal distribution function√with zero mean of exp[−x2 /(2σ2 )],

shown without its integral normalization factor of 1/(σ 2π). About two-thirds of the events (68.3 percent) occur within one standard deviation (−σ to +σ) of the mean, 95.5 percent fall within ±2σ, and 99.7 percent of the events occur within ±3σ. In addition, the full width at half-maximum is related to the standard deviation by FWHM = 2.35σ.

absolute value of the function |ηu (θu )| is shown in Fig. 4.5; the absolute value is shown because when θu goes beyond π one begins to have a preponderance of phasors on the left, or negative half, of the real plane. We will encounter this function again in Eq. 4.106 where we will give it the name of a “sinc” function sinc(θ) in Eq. 4.105. Now let’s consider phases which are distributed according to a Gaussian or normal probability distribution, which is usually written as   (x − u)2 1 . (4.14) P(x, u) = √ exp − 2σ2 σ 2π The shape of this function is shown without the normalizing factor in Fig. 4.4. In this case, the net wave amplitude ψn is found from    θσ 1 θ2 iθ e ψn = ψ (4.15) √ exp − 2 dθ 2θσ −θσ θσ 2π which can be expanded to   θσ  θσ 1 θ2 θ2 cos(θ) exp[− 2 ]dθ + i sin(θ) exp[− 2 ] dθ . ψn = ψ √ 2θσ 2θσ −θσ θσ 2π −θσ

(4.16)

Once again, the imaginary part is anti-symmetric about θ = 0 so the −θσ to 0 integrand cancels out the 0 to θσ integrand, leaving ψn = ψ

1 √

θσ 2π





cos(θ) exp[− −∞

θ2 ] dθ. 2θσ2

(4.17)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

76

Imaging physics

Using the standard result of  ∞

 cos(kx) e

−∞

−ax2

dx =

π −k2 /(4a) e , a

(4.18)

with a = 1/(2θσ2 ), we arrive at ψn = ψe−θσ /2 2

(4.19)

or an amplitude reduction factor ηn = ψn /ψ of ηn (θσ ) = e−θσ /2 = e−σθ /2 2

2

(4.20)

for Gaussian distributed phases characterized by a standard deviation σθ in radians; this function is shown in Fig. 4.5. The intensity goes as the square of the amplitude, or ηI = e−σθ , 2

(4.21)

which is a result well-known in the literature [Mar´echal 1947a, Mar´echal 1947b, Ruze 1952, Mahajan 1983]. The resulting reduction in the peak intensity versus the peak intensity with no errors present is referred to as the Strehl ratio [Strehl 1895, Strehl 1902]. Of course, when the errors are limited to σθ  π/2, one can approximate Eq. 4.21 with ηi  1 − σ2θ

(4.22)

which, when applied to aberrations across a lens, is known as the Mar´echal approximation [Mar´echal 1947a, Mar´echal 1947b].

4.1.2

Rayleigh quarter wave criterion The amplitude reduction factors ηu (θr ) of Eq. 4.13 and ηn (θσ ) of Eq. 4.20 are the basis behind an understanding articulated by John William Strutt (1842–1919), the British physicist whom we usually refer to as Lord Rayleigh due to his inheritance of a barony. Consider a distribution of errors in an optical system (such as index of refraction gradients due to thermal gradients in the atmosphere, or surface imperfections on mirrors or lenses). If those errors are kept below about λ/4, then the Rayleigh quarter wave criterion tells us that the performance reduction of the optical system will be modest. Errors within a total range of λ/4, or λ/8 on either side of the correct value, lead to phase variations in the sense of Fig. 4.3 of 1/8 of 2π or π/4. As can be seen from Fig. 4.5, ηu (π/4)  0.90 while ηn (π/4) = 0.73, so with either distribution one indeed has a relatively modest modification to the net amplitude (of course, these numbers should be squared when considering intensity reductions).

4.1.3

Connecting waves and rays The Rayleigh quarter wave criterion provides a good way to think about the connection between waves and straight-line ray paths. Fermat’s principle says that light travels along the path of least time, or lowest accumulated optical path length of optical path length = n

(4.23)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.1 Waves and rays

77

1.00 0.90

Magnitude reduction

0.75

0.73 0.63 Uniform:

0.50 Gaussian: 0.29 0.25

0.00 0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

Figure 4.5 Illustration of the Rayleigh quarter wave criterion. When adding up the complex

amplitudes of many waves with some degree of randomness in their individual phases, the net amplitude is reduced by a factor η = ψ /ψ, which may be acceptable if close to unity. Shown here is |ηu | (Eq. 4.13) for a uniform phase distribution over a range −θu to +θu as shown in Fig. 4.3. Also shown is ηn (Eq. 4.20) for a Gaussian or normal distribution of phases characterized by a standard deviation θσ . For the uniform case, the net wavefield is reduced only slightly (ηu = 0.90) at θu = π/4, or a full path length range of λ/4. This is known as the Rayleigh quarter wave criterion.

where n is the index of refraction of the medium, and is the geometric distance along the path; the accumulated optical phase is then optical phase = 2π

n . λ

(4.24)

Let’s consider the possible paths involved in light traveling from Point A to Point B as shown in Fig. 4.6. Path 1 with length 1 = z will be the straight-line path that Fermat’s principle favors, while Path 2 involves light traveling through a point offset by a distance y at the midpoint of the straight line path with a distance of

  2 z

2y z 2 2 1+  z 1 + 2y2 /z2 , 2 = 2 ( ) + y = 2 2 2 z where we have made use of the binomial approximation (1 + a)m  1 + ma for a  1.

(4.25)

The optical phase difference θ between Path 2 and Path 1 is then θ = 2π

2 − 1 z(1 + 2y2 /z2 ) − z 4πy2 = 2π = . λ λ λz

(4.26)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

78

Imaging physics

Path 3

Path 2 Point A

z/2

y Path 1

z/2

Point B

Figure 4.6 Illustration of Fermat’s principle for optical paths.

If we were to consider the addition of all paths with offsets of −y to +y, we really have a set of phases bound from −θu to +θu , which is precisely what we solved for in our expression for ηu of Eq. 4.13. In other words, if we limit θu to ±π/4 as being within the Rayleigh quarter wave criterion, we have a limit yλ/4 of √ λz (4.27) yλ/4 = 4 within which all optical paths contribute coherently. This means that optical rays can be rather fat; for visible light with λ = 500 nm traveling across a room with dimension z = 10 m, we have yλ/4 = 560 μm while on a typical synchrotron light source beamline with λ = 1 nm and z = 70 m we have yλ/4 = 66 μm. As we’ll see in Section 4.11, even one photon explores all optical paths within the Rayleigh quarter wave criterion, so this example really does show us how light rays are not infinitesimally skinny (like a fashion model), but with some substantial width (like real people).

4.2

Gratings and diffraction The picture of the addition of many Huygens point sources provides a convenient way to understand diffraction and interference, which we will now explore.

4.2.1

Slits and plane gratings If we place a slit of width b next to an even row of in-phase Huygens point sources, a downstream plane will see only those point sources within the slit aperture. If we pair up each point source with one that is exactly half an aperture distance away, as shown in Fig. 4.7, there will be a certain angle θ with a λ/2 optical path length difference to these paired point sources. At that angle in the far field, each and every point source will perfectly cancel the wave amplitude contribution of its partner and there will be no light intensity; that is, the first minimum of the single slit diffraction pattern is at an angle given by b sin θ = λ.

(4.28)

With slits that are large compared to the wavelength (or b  λ), the angles are small so sin θ  θ (the small angle approximation). In this case we can write θ

λ b

(4.29)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.2 Gratings and diffraction

79

Ƨ b/2 Ƨ b/2 sinƧ

Figure 4.7 Schematic explaining the first minimum in the diffraction pattern of a slit. Each Huygens point source has a matching partner (in this case, with a matching color) which is exactly π out of phase, due to an optical path length difference of λ/2 when reaching to a distant measurement plane. We thus have the condition (b/2) sin θ = λ/2, or b sin θ = λ as the condition for the first minimum in intensity in the diffraction pattern.

as the angle to the first minimum of the single slit diffraction pattern. This will be important when we consider diffraction from lens apertures, and slits in beamlines. We will look in more detail at slit diffraction when we discuss Fraunhofer diffraction in Section 4.3.5. Let’s now consider a thin plane grating with period d, as shown in Fig. 4.8. If we now pair Huygens point sources from one grating aperture with the equivalent source in the next aperture, we will have the mth integer order of constructive interference (and a maximum in the diffraction pattern) when we meet the condition d [sin(θ1 ) + sin(θ2 )] = mλ.

(4.30)

Note that we have not yet said anything about the phase difference between one pairing of point sources within the open holes of the grating versus another pairing; one can even have single slit diffraction minima at the same angle of higher orders of plane grating diffraction maxima, thus canceling out interference maxima from the grating. When the incident wave is perpendicular to the grating and we consider the m = 1 or first order of diffraction only, we have θ1 = 0 so we can drop sin(θ1 ) from Eq. 4.30 and refer to θ2 simply as θ. In this case we can rewrite Eq. 4.30 as 1 sin(θ) θ =  , d λ λ

(4.31)

where the latter version is in the small angle approximation. When we discuss Fourier transforms in Section 4.3.3 and Fraunhofer diffraction in Section 4.3.5, we will introduce the concept of a spatial frequency u of u=

1 θ  d λ

(4.32)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

80

Imaging physics

d Ƨ1

Ƨ2 dsin(Ƨ1)

dsin(Ƨ2)

Figure 4.8 Constructive interference in a plane grating. When plane waves are incident at an

angle θ1 , one has constructive interference between Huygens point sources at a specified position within the open bars of the grating at angle θ2 when the condition d[sin(θ1 ) + sin(θ2 )] = mλ is met.

for a specified periodicity d. This allows us to equate a property of the grating (its inverse period 1/d) with a wavelength-scaled diffraction angle of u  θ/λ.

4.2.2

Volume gratings and Bragg’s law Now let’s consider the case of having two thin partially reflecting mirrors located a distance d apart from each other, as shown in Fig. 4.9. If a plane wave is incident on this structure at an angle θ relative to the surface plane, part of the wave amplitude will reflect off of the first surface while part will continue to the surface below, where a fraction of the wave amplitude will reflect again. Now when waves are incident at an angle from the surface greater than the critical angle θc of Eq. 3.114, we know that the xray reflectivity of a single mirror surface will be very low. However, when many weakly reflected wave amplitudes are added in perfect superposition, the net amplitude can become quite large and high reflectivity can result. For this to occur, the optical path length difference between waves reflected by subsequent surfaces must be an integer number m of wavelengths, or 2d sin(θ) = mλ,

(4.33)

which is known as Bragg’s law, after William Lawrence Bragg who worked with his father William Henry Bragg. A slight correction [Compton 1923, Eq. 2] to Bragg’s law can be made for refraction in the crystal, leading to  δ = mλ, (4.34) 2d sin(θ) 1 − sin2 θ and indeed this is how one of the first measurements of the phase shifting part δ of the x-ray refractive index n = 1 − δ − iβ was made [Stenstr¨om 1919]. In Fig. 4.10, we compare a plane grating of period dG where the constructive interDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.2 Gratings and diffraction

81

Wavefronts

ƧB

ƧB Ƨ

d

d sin(ƧB)

d sin(ƧB)

2ƧB

Partially reflecting planes

Figure 4.9 Bragg’s law for strong reflectivity from a series of weak partially reflecting surfaces

separated by a distance d. When the path length difference between a wavefront reflecting from a subsequent plane is longer by a distance 2d sin(θB ) = mλ where m is an integer, one has a coherent superposition of the partial waves reflected from each surface. The net deflection angle of the wave is 2θB .

ference condition of Eq. 4.30 becomes dG sin(θ) = mλ,

(4.35)

and a volume grating of period dB where Bragg’s law (Eq. 4.33) becomes 2dB sin(θB ) = mλ.

(4.36)

For the Bragg grating, the net deflection angle of the beam is θ = 2θB .

(4.37)

Since the Bragg grating spacing dB has a component dG = dB / cos(θB )

(4.38)

perpendicular to the direction of the incident wave, we can rewrite Eq. 4.36 as θ θ (4.39) 2dB sin(θB ) = 2dG cos(θB ) sin(θB ) = 2dG cos( ) sin( ) = mλ, 2 2 where in the last step we have used Eq. 4.37. From the trigonometric identity sin(2ϕ) = 2 cos(ϕ) sin(ϕ) and the substitution ϕ ≡ θ/2, we can rewrite Eq. 4.39 as dG sin(θ) = mλ,

(4.40)

which reproduces Eq. 4.35 provided dG = dG . That is, we see that a Bragg grating reproduces the diffraction condition of a plane grating perpendicular to the beam when viewing the Bragg grating period projected along that same perpendicular-to-the-beam direction (dG ). Of course the Bragg grating has a zˆ axis component to its periodicity, as is shown in Fig. 4.10 (something that we’ll return to in Section 4.2.5 when we discuss the Ewald sphere). Bragg’s law is applicable to a variety of situations. With visible light, one can create Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

82

Imaging physics

ƧB ƧB Ƨ

Ƨ h

h  h 

Figure 4.10 Schematic of diffraction from plane and volume (Bragg) gratings. Diffraction from a plane grating obeys dG sin(θ) = mλ (Eq. 4.30), while diffraction from a volume grating obey’s Bragg’s law of 2dB sin(θB ) = mλ (Eq. 4.33). For a Bragg grating, the periodicity dG of the edges of the grating planes viewed perpendicular to the incident beam direction is dG = dB / cos(θB ), allowing one to equate the plane and volume grating diffraction cases as shown in Eq. 4.40. The transition from plane to volume grating diffraction is characterized by the Klein–Cook parameter QK–C in Eq. 4.140.

volume gratings from layers of materials with alternating refractive indices (in which case one must modify Eq. 4.33 to incorporate the effect of the refractive index n of the alternating materials into the optical path length). Color holograms are made in this fashion, because for a fixed angle of incidence there is only one wavelength λ which will meet Bragg’s law from a particular periodicity. As a result, when the hologram is illuminated by broad-spectrum spatially coherent light, one can reflect different wavefields from different wavelengths of light, thus producing the desired real or virtual color image for viewing. The thickness at which one transitions from planar to volume diffraction can be described by the Klein–Cook parameter QK–C , given by Eq. 4.140.

4.2.3

Bragg’s law and crystals The Bragg son–father team was not thinking of color holograms in 1913, of course – holography would not be invented for another 35 years! Instead, they were trying to understand the diffraction characteristics of x-ray beams on crystals (more on this in Section 10.1). As we saw in Fig. 4.1, the Huygens construction allows one to create parallel wavefronts from the combination of coherent spherical waves from an array of closely spaced point sources. What the Braggs realized was that weak x-ray scattering from individual atoms in a crystalline lattice would behave in exactly the same way. In fact, in a regular arrangement of atoms in a crystal, one can have many different “planes” of atoms; these are denoted by their Miller index hkl , which gives the number of atomic units between planes in three dimensions as shown (in 2D) in Fig. 4.11. The number of planes that participate in a coherent superposition of reflected waves is determined by how far the x-ray beam penetrates into the crystal, and thus the narrowness of the angular range over which this superposition is maintained. In x-ray microscopy, one of the most common uses of crystal diffraction is for x-

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.2 Gratings and diffraction

83

1



2

1 1

1

Figure 4.11 Bragg planes in crystals, and their Miller indices. Since this book is not printed on 3D paper, we show a 2D array of atoms and example Bragg planes with their corresponding 2D Miller indices hk .

ray monochromators at synchrotron light sources (see Section 7.2.1). Thanks to the semiconductor industry, one can obtain silicon crystals of amazingly high quality at quite reasonable cost. Silicon is a relatively light element, which leads to a reasonably favorable ratio of f1 to f2 (scattering depends on both f1 and f2 , as shown in Eq. 3.44, while absorption depends on f2 only, as shown in Eq. 3.75; see Fig. 3.16). For diffraction from the 111 planes, the d spacing is 0.31356 nm, so for a 10 keV x-ray beam Bragg’s law gives θ = 11.4◦ . Now the 1/e intensity absorption length μ−1 (see Eq. 3.75) of 10 keV X rays in Si is μ−1 = 135 μm, so the number n p of Bragg planes that end up being illuminated along the into-and-out set of ray paths is given by np 

134 μm μ−1 =  2.2 × 105 . d 2 · 0.314 × 10−3 μm

(4.41)

As a result, one would expect to be able to maintain Bragg diffraction over an angular range dθ of about 1/(2.2×105 ) radians on either side of the incident beam angle, or a full width of about dθ = 9.3 μradians. The real story [Batterman 1964, James 1982, AlsNielsen 2011] is given by the Darwin width ωD for crystal diffraction, which for Si 111 is ωD = 26.6 μrad (see Fig. 4.12). Once one has determined the angular spread dθ of the beam, differentiation of Bragg’s law (Eq. 4.33) can be used to show that E2 dθ (4.42) hc so that the Darwin width ωB for Si 111 at 10 keV gives an energy resolution of 1.41 eV. dE = 2d cos θ

4.2.4

Synthetic multilayer mirrors Perfect crystals represent nature’s method of generating beautiful periodic partially reflecting planes. However, as one goes to longer wavelengths, it becomes harder and harder to find crystals with appropriate d spacing; one of the largest values known is for YB66 which has d = 2.344 nm [Wong 1990]. How might one achieve this without crystals? Starting back in 1935, DuMond and Youtz had the idea that one might use vacuum

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

84

Imaging physics

Ƨ (degrees) 11.402

11.404

11.406

11.408

1.0

Reflectivity

0.8

0.6

Ƨ%UDJJ 0.4

0.2 0.0 -40

-20

0

20

40

60

80

100

GƧ ѥUDG Figure 4.12 The rocking curve, or reflectivity versus incidence angle, for diffraction of 10 keV X rays from Si 111 . The Bragg angle calculated using Eq. 4.34 is 11.4042◦ . The Darwin width of ωD = 28.4 μrad (the angular width of the rocking curve) is found from a calculation with dynamical diffraction effects taken into account. The rocking curve shown here was calculated using Sergey Stepanov’s X0h program (http://sergey.gmca.aps.anl.gov/x0h.html).

evaporation of successive Au and Cu thin films to create synthetic multilayers as a longer-period diffractive structure to measure the wavelength of X rays [DuMond 1935], and in a later paper [DuMond 1940] they referred to even earlier attempts on this goal by Deubner and by Koeppe. A later study using alternating layers of Pb and Mg (giving a greater difference in the x-ray refractive index between the two thin-film materials) was made by Dinklage and Frerichs [Dinklage 1963] but the multilayers had their x-ray reflectivity fade in days due to interlayer diffusion. Multilayers for x-ray spectroscopy [Fischer 1966, Henke 1966, Mattson 1966] produced using self-assembly of organic films (known Langmuir–Blodgett films [Blodgett 1935, Blodgett 1937]) suffered from radiation damage. Subsequent work by Dinklage using Fe or Au and Mg yielded somewhat better results [Dinklage 1967]. The breakthrough came in the 1970s, when Eberhard Spiller at IBM Research Labs came up with a key conceptual understanding, and began to produce multilayers that eventually led to high-performance x-ray mirrors [Spiller 1972, Spiller 1974]. His idea was this: since one can think of the interference between the incident and reflected waves as a standing wave pattern with nodes and anti-nodes, one can enforce this pattern by placing high-density, absorptive layers where the nodes must be located. Soon Spiller and collaborators were producing multilayers of Re/W and C with d = 9.2 nm and with near-normal incidence reflectivities approaching 10 percent at about 65 eV [Haelbich 1979], and using them as the basis for designs for x-ray microscopes with normal incidence mirrors [Lovas 1982]. As noted in Chapter 2, Underwood and Barbee Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.2 Gratings and diffraction

85

at Stanford and CalTech soon produced very successful multilayers using sputtering [Underwood 1979, Underwood 1981a, Underwood 1981b]. Synthetic multilayers obey Bragg’s law with the refractive correction included, except that for two layers with respective thicknesses d1 and d2 one uses an overall periodicity d = d1 + d2

(4.43)

and a thickness-weighted [Spiller 1994, Eq. 7.8] phase-shifting part of the refractive index δ¯ of d1 δ1 + d2 δ2 , (4.44) δ¯ = d1 + d2 in which case one can use δ¯ in Bragg’s law with refraction included (Eq. 4.34). Detailed expressions for finding the optimum values of d1 and d2 for a given material pair are available [Yamamoto 1992]. Another way to understand multilayer mirrors is to go back to the expression for nor√ mal incidence reflectivity in Eq. 3.118. You get an amplitude contribution R⊥ from each interface, and with properly spaced interfaces each weak reflected amplitude adds up coherently over N interfaces to yield a net intensity reflectivity of N 2 R⊥ . As with a true crystal, absorption limits the number of layers that can contribute to the reflectivity. In addition, the portion of the wave that is reflected by the upper layers does not contribute to the reflectivity by the deeper-lying layers. Still, in the extended ultraviolet (EUV) and soft X ray region, incident waves will penetrate tens to hundreds of layers with the proper choice of material combinations, so that near-normal incidence reflectivities of up to 69.5 percent have been achieved at 92 eV [Yulin 2006], 71 percent at 98 eV [Bajt 2002], 20 percent at 395 eV with d = 1.59 nm [Eriksson 2008], and 2.5 percent at 511 eV with d = 1.22 nm [Eriksson 2008]. As the photon energy is increased and the wavelength shortens, random phase errors introduced by interface roughness at a lengthscale approaching the x-ray wavelength λ become increasingly detrimental, much in the way that grazing incidence mirror reflectivity is reduced due to surface roughness (Eq. 3.123). Larger d spacings can be used for shorter wavelengths λ at grazing incidence angles; for example, d = 1.97 nm WSi2 /Si multilayers have delivered reflectivities of 54 percent at 10 keV and 66 percent at 25 keV [Liu 2004]. If one adjusts the layer spacing across the length of a curved mirror (known as a graded multilayer), one can also use multilayer reflective coatings on nanofocusing mirrors, as will be discussed in Section 5.2.4. The number of layers, N, determines both the wavelength range (bandwidth) that is reflected, and the angular acceptance. The fractional bandwidth Δλ/λ and the angular range Δθ are both approximately equal to 1/N. Procedures for detailed calculations on the performance of multilayer mirrors are described elsewhere [Vinogradov 1977], and websites such as one provided by the Center for X-ray Optics (use an internet search for “henke.lbl.gov multilayer reflectivity”) provide interfaces for numerical calculations of multilayer mirror performance. Modern multilayer mirrors can have a wide range of d spacing – down to the thickness of just a few atoms – as long as the material properties are appropriate. To avoid strain buildup, it is important to use materials with good matching of atomic lattice Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

86

Imaging physics

Box 4.2 Momentum transfer and 2π There is, unfortunately, some discrepancy in the literature on the vector form of Bragg’s law of q = k0 − k (Eq. 4.49). In order to be consistent with the wave number k0 of Eq. 3.32, we use here a convention for the magnitude of the incident wave vector based on 2π , (4.45) |k0 | = λ which is also in use in much of the literature [Williams 2003, Als-Nielsen 2011, Attwood 2017]. However, another convention (see for example [Ewald 1969, Chapman 2006b]) uses the definition 1 . (4.46) λ To some extent, physicists use Eq. 4.46 while materials scientists use Eq. 4.45 – but to borrow a saying attributed to Mark Twain, “all generalizations are false, including this one,” and one book [Cowley 1981, Cowley 1995] has even switched conventions between editions! The choice of convention obviously affects relationships involving wave vectors k and the crystal momentum transfer vector q, such as Eq. 4.57 of q  2πu. We also use the term “Fourier space” to refer to spatial frequencies u and “reciprocal space” to refer to momentum transfer q, and refer to the Ewald sphere in reciprocal space. |k0 | =

spacing, and low diffusion of atoms across the interface; popular choices today include alternating layers of W and B4 C, Mo and Si [Barbee Jr 1985], or WSi2 and Si [Liu 2005, Liu 2006]. Multilayers can be deposited on curved surfaces using vacuum evaporation or (preferably) ion-induced sputtering. While the Bragg condition needs to be satisfied, the range of angles (or the range of wavelengths) accepted can be quite sizable. Basically, it is the number of layers contributing to the constructive interference that determines the range of acceptance, as discussed above.

4.2.5

Momentum transfer and the Ewald sphere In Fig. 4.10, we showed the difference between diffraction from plane and volume gratings, and in Section 4.2.1 we described how the periodicity d of a plane grating can be shown (within the Fraunhofer approximation; Section 4.3.2) to diffract light to a spatial frequency u = 1/d as given by Eq. 4.32. With volume gratings, the equivalent construction is the Ewald sphere [Ewald 1913]. This construction, illustrated in Fig. 4.13, involves vectors in a 3D far-field diffraction space called reciprocal space (for planar gratings, the equivalent is Fourier space; see also Box 4.57). In reciprocal space, an incident wave of wavelength λ and wave number k0 = 2π/λ is incident on the crystal as a vector k0 , and the Bragg-diffracted ray has a direction k with identical wavelength (see Box 4.2 for a note on the factor of 2π in k, and Box 4.3 for a note on why |k0 | = |k| in practice). The diffraction is done by a crystal

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.2 Gratings and diffraction

(a)

(b)

qz

k0

q

ƧB

ƧB

Ƨ

Ewald sphere

ƧB ƧB

q/2

qy

(c) -k0

k q

87

q

k

ƧB

ƧB

ky kz

k0

d

g

atin

e gr

m Volu

Figure 4.13 Illustration of the Bragg diffraction condition and the Ewald sphere (shown here in two dimensions for the azimuthal angle ϕ = π/2 and thus q x = 0, as can be seen from Fig. 3.11).  which In (a), a volume grating with vector periodicity d has a momentum transfer q = 2π/d, causes an incoming wave k0 to undergo Bragg diffraction to a direction k. The Bragg angles are represented by θB , and the total scattering angle is θ. A graphical representation of q = k − k0 (Eq. 4.49) is shown in (b). For a fixed incoming beam direction k0 , as one rotates the volume grating through angles θB one samples a set of accessible momentum transfer vectors q that trace out positions on the Ewald sphere (c; shown here in 2D with {ky , kz } rather than 3D with {k x , ky , kz }). At higher Bragg angles, one can reach larger values of q corresponding to smaller  unit cell periods d.

of periodicity d which has a momentum transfer q of q =

2π d

(4.47)

so that the Fourier transform (Section 4.3.3) into reciprocal space is written as [Cowley 1981, Eq. 1.21]  ∞ C a(r)e−iq·r dr. (4.48) A(q) = 2π −∞ (Here C is a weighting constant for the interaction strength.) In reciprocal space, the relationship between the incident k0 and scattered k waves, and the crystal periodicity q, is given by q = k − k0

(4.49)

as shown graphically in Fig. 4.13B. The momentum transfer q of the volume grating has a length that can be found from Fig. 4.13B as |q| = |k0 | sin(θB ). 2

(4.50)

The vector components of the momentum transfer q can be found from either Eq. 4.50 or Eq. 4.47. We also include the azimuthal angle ϕ as shown in Fig. 3.11 (with ϕ = π/2 representing the case of the crystal’s momentum transfer q being aligned to the yˆ axis as shown in Fig. 4.13). This gives the following relationships between the scattering angles and the vector components of the corresponding momentum transfer vector within the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

88

Imaging physics

crystal: 2π cos(θB ) cos(ϕ) d 2π 4π sin(θB ) cos(θB ) cos(ϕ) = sin(2θB ) cos(ϕ) = λ λ 2π qy = |q| cos(θB ) sin(ϕ) = cos(θB ) sin(ϕ) d 2π 4π sin(θB ) cos(θB ) sin(ϕ) = sin(2θB ) sin(ϕ) = λ λ 2π qz = −|q| sin(θB ) = − sin(θB ) d 4π 2 = − sin (θB ) λ

q x = |q| cos(θB ) cos(ϕ) =

(4.51)

(4.52)

(4.53)

so that q2 = q2x + q2y + q2z .

(4.54)

The relationship between the Bragg angle as θB and the total scattering angle θ is θ = 2θB ,

(4.55)

allowing us to use θ in the usual sense for optics. When the scattering angle θ is small (and with ϕ = π/2), we can approximate qy as qy =

θ 2π θ2 2π 2π cos( )  (1 − )  , d 2 d 4 d

(4.56)

where in the last expression we have assumed that we can neglect θ2 /4 relative to 1. We can therefore use the relationship u = 1/d of Eq. 4.32 to relate the spatial frequency uy of a 2D grating (Eq. 4.32) with the yˆ momentum transfer of a crystal qy (Eq. 4.52) as qy  2πuy

(4.57)

in the notation convention described in Box 4.2. The reason that q is referred to as a momentum transfer is shown in Fig. 4.13A: it leads to a change in the vector momentum of a photon (in reality, the magnitude of the momentum removed from the photon and transferred to the crystal is very small; see Box 4.3). For crystals, Bragg diffraction spots arise when there is the happy intersection of the Ewald sphere (defined by the direction and wavelength of the illuminating and scattered beams k0 and k which in turn give the momentum transfer q according to Eq. 4.49), and the reciprocal lattice points of the regular array of atom positions. This is shown in Fig. 4.14. It is for this reason that only a few Bragg diffraction spots show up from a crystal in a given given illuminating beam direction, so that crystals are often rocked during crystallographic data collection to allow more reciprocal lattice points to coincide with the Ewald sphere. But let’s return from crystals to talk about microscopy! Whereas the structural periodicity in crystals really does concentrate the scattering signal into a few reciprocal lattice points as shown in Fig. 4.14, the 3D Fourier decomposition of a more general object (introduced in 2D in Section 4.4.7) is composed of a more continuous distribution Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.2 Gratings and diffraction

89

Box 4.3 Momentum transfer in x-ray scattering The momentum transfer q of Eq. 4.49 that is imparted to a crystal during Bragg scattering is quite small; after all, it’s an elastic rather than inelastic process (see Section 3.2). Consider the case of a 10 keV x-ray photon that is fully backscattered, so that the Bragg angle is θ = 90◦ yielding 2θ = 180◦ ). The change in photon momentum Δpλ = 2h/λ gives rise to a change in the crystal momentum Δpy = m Δv of Δpλ = Δpy = m Δv 2h = mc Δβ λ 2 · hc , (4.58) Δβ = λ mc2 where we have used the classical result for crystal momentum p = mv, the normalized velocity β = v/c, and the relativistic expression for photon momentum p = E/c. Even if our crystal were as light as a pair of carbon atoms each of 12 atomic mass units, the change in velocity of the two-atom “crystal” from backscattering of a 10 keV photon would be Δβ =

2 · (1240 eV · nm) = 8.9 × 10−7 (0.124 nm) · (24 · 931.5 × 106 eV)

where we have used Eq. 3.7. For this two-atom “crystal” initially at rest, the kinetic energy imparted would be 1 2 1 mc (Δβ)2 = · 24 · (931.5 × 106 eV) · (1.8 × 10−6 )2 = 0.009 eV 2 2 so indeed the change in the energy of the 10 keV incident photon would be difficult to measure – and of course the mass of a real crystal rather than a pair of atoms would lead to an even smaller energy change. Finally, one can turn things around and use a momentum transfer argument to arrive at Bragg’s law [Duane 1923].

of Bragg lattices. Therefore the 3D diffraction pattern is also more continuous, with the characteristics of a speckle pattern, as will be described in Section 10.3.1. We have shown in Eq. 4.57 that there is a direct connection between diffraction by a plane grating, and diffraction by a volume grating. In fact, it’s worth looking a bit more at the Ewald sphere representation of several circumstances relevant to microscopy, as shown in Fig. 4.15. The details depend on the imaging case discussed: • For 2D imaging, in Section 4.4.7 we will use an optical transfer function, or OTF(u x , uy ), to describe how certain plane grating spatial frequencies u in Fourier space are preserved by the imaging system. • For 3D imaging with conventional tomography, in Chapter 8 we will make the assumption of obtaining pure projection images with no diffraction effects involved (that is, that the object fits within the depth of field as discussed in Section 4.4.9). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

90

Imaging physics

Ewald sphere Periodicity “hits”

ky Periodicity “misses”

kz

Figure 4.14 Crystal “hits” and “misses” on the Ewald sphere. It is only when reciprocal lattice points of a crystalline material intersect with the Ewald sphere that the Bragg condition is met and a diffraction spot appears. For non-crystalline objects, one does not have the same concentration of scattering into reciprocal lattice points, so the 3D diffraction pattern is more continuous (with the characteristics of a speckle pattern as will be discussed in Section 10.3.1).

In that case, each projection image produces in Fourier space a flat plane with information in the {u x , uy } spatial frequency directions, but with no extent in the uz direction which is along that projection’s viewing direction. This is shown in E and F in Fig. 4.15. • For 3D imaging with coherent, monochromatic beams as in holography and coherent diffraction imaging (Chapter 10), data acquired from one illumination direction will fill in information on the surface of an Ewald sphere in reciprocal space (case B in Fig. 4.13). While this in principle means that there is some zˆ information in a coherent image obtained from one viewing direction, the information is far too little to use to reconstruct anything but the simplest, highly constrained 3D objects, as will be discussed in Section 10.2.3. Instead, one needs to fill in more of 3D reciprocal space with information, as shown in Fig. 4.13 and also in Fig. 10.10. A more in-depth2 discussion of coherent imaging of thick objects and the limitations of the pure projection approximation is provided in Section 10.5. In fact, there are connections between these different pictures, as we shall now see. Let us first consider the question of when the Ewald sphere “lifts off” in the zˆ direction from the { xˆ, yˆ } plane, as shown in Fig. 4.16. For a specimen enclosed within a depth s, the smallest momentum transfer in the zˆ direction where one will start to see differences from the specimen’s depth extent compared to a pure projection image is q s = 2π/s. If we collect scattering information out to an angle θ, the maximum extent of the Ewald sphere in the zˆ direction will be given by qz , which we find from Eqs. 4.53 and 4.55 as qz = (4π/λ) sin2 (θ/2). Thus the Ewald sphere will “lift off” of the object when qz = 2

Yeah, I couldn’t resist. . .

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.2 Gratings and diffraction

Illumination direction

91

kx,y kz

(a) Full Ewald sphere, (b) Collimated monochromatic, illumination, single image monochromatic, single image

(c) Collimated illumination, polychromatic, single image

(d) Non-collimated illumination, monochromatic, single image

(e) Collimated (f) Pure projection, (g) Pure projection, several illumination illumination, single image directions monochromatic, several illumination directions Figure 4.15 The Ewald sphere describes the region in reciprocal space over which one obtains information during scattering using a single wavelength λ. With collimated, monochromatic illumination, one obtains information in reciprocal space along the surface of the Ewald sphere (a, b, and e). Limiting the angular extent over which scattering is obtained (thus limiting the effective numerical aperture (N.A.) of the experiment) limits the extent of the Ewald sphere, as shown in (b)–(e). With polychromatic illumination, one obtains information in the volume between the two spheres corresponding to the two bounding wavelengths (c). With non-collimated illumination (such as one obtains with a condenser lens bringing convergent illumination onto the specimen; see Section 4.4.7) one obtains information between Ewald spheres corresponding to the limits of the illumination angles (d). If instead one obtains pure projection images (the usual assumption in standard tomography; see Chapter 8), one has the situation in which no volume diffraction effects are observed (e and f). Finally, information obtained over several illumination directions delivers either a set of Ewald spheres (e; see also Fig. 10.10), or in the case of pure projections without volume diffraction, a set of tomographic projections (g; see also Fig. 8.2).

q2 /2, which we can rewrite using Eq. 4.53 and q s = 2π/s as s=

λ . θ2

(4.59)

If we represent the maximum detected scattering angle θ with the numerical aperture, or N.A., and the largest permitted object depth extent s with the depth of field, or DOF, Eq. 4.59 becomes λ , (4.60) DOFEwald = N.A.2 which matches the depth resolution δz of a lens (Eq. 4.213 with cz = 1), and which Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

92

Imaging physics

qz

ky qy

kz qs

Ewald sphere

Specimen depth extent

Figure 4.16 The limit of a pure projection image from the perspective of the Ewald sphere occurs when the Ewald sphere “lifts off” of the specimen depth extent in reciprocal space, due to the values of qy and qz at its outer edge. This leads to an Ewald-sphere-based depth of field DOFEwald given by Eq. 4.60 (see also Section 4.4.9).

is half of the depth of field (Eq. 4.214) of a lens (see Box 4.7 for a short discussion of depth resolution versus depth of field). If we use the Ewald sphere construction for transverse resolution, we have from Eqs. 4.52 and 4.55 the result θ 2π 4π sin( ) = , λ 2 d

(4.61)

which leads to N.A. =

λ , 2 Δy

(4.62)

where we have again used θ = N.A. and furthermore used the grating half-period Δy = d/2 as an expression for the minimally resolved feature size. Substituting Eq. 4.62 into Eq. 4.60 gives a relationship between depth of field and transverse resolution of DOFEwald = 4

Δ2 (Δy)2 =4 r, λ λ

(4.63)

where in the second case we use the symbol Δr used to represent the real-space pixel size throughout this book. The expression of Eq. 4.63 is close to the lens-based estimate of 5.4cz (δr )2 /λ of Eq. 4.215, as will be discussed in Section 4.4.9. Other estimates in the coherent imaging literature include DOFEwald  2(Δr )2 /λ with N.A.  λ/Δr [Rodenburg 1992, Eq. 15], DOFEwald < λ/(2N.A.2 ) [Chapman 2006b, Eq. 21], and DOFEwald ≤ 5.2(Δr )2 /λ in numerical studies of multislice ptychography [Tsai 2016]. An extended discussion3 on the coherent imaging of thick specimens and the limitations of the pure projection approximation is given in Section 10.5.

4.3

Wavefield propagation As electromagnetic waves encounter apertures, refractive media, and the like, a proper handling of their propagation to downstream planes would include discussions of bound3

Again with the puns. . .

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

93

Figure 4.17 Coordinate system used for wave propagation.

ary conditions, the Helmholtz equation, and more, and there are many excellent treatments of this problem (see for example [Goodman 2017, Chap. 3] or [Schmidt 2010]). The treatment we present below is conceptually simple, and sufficient for solving most problems.

4.3.1

The Huygens construction Our discussion of the Huygens construction began with the idea of constructing wavefronts from a set of points, each of which emits a spherical wave (Fig. 4.1). This provides a good framework for tackling the problem of wavefield propagation, using the coordinate system of Fig. 4.17. Consider a plane wave ψ = ψ0 e−i(kz−ωt) that is incident on a plane (x0 , y0 , 0) where it is modulated by a complex function g˜ (x0 , y0 ). This modulation could include an aperture outside of which g˜ (x0 , y0 ) = 0, or it could include a biological cell which modulates the magnitude and phase of the incident wave. We’ll assume that the net effect of the modulation g˜ (x0 , y0 ) is to produce a modified wavefield ψ0 (x0 , y0 ) immediately after the object. To calculate the light amplitude at a downstream point (x, y, z), we simply add up the contributions of the modulated Huygens point sources as shown in Fig. 4.1 to find λ ψz (x, y) = A



∞ −∞

ψ0 (x0 , y0 )

e−ikr cos θ dx0 dy0 , r

(4.64)

where we have dropped the eiωt time-dependent phase since it applies to both ψz (x, y) and ψ0 (x0 , y0 ). The prefactor λ/A provides a scaling to cancel both the dimensionality of 1/r via the λ term, and to cancel the dx0 dy0 area via the term 1/A term. The cos θ term accounts for the obliquity of waves so as to correctly reduce their energy per area when reaching a downstream plane at non-normal angles of incidence. The radius r from the point (x0 , y0 , 0) to (x, y, z) is given by  r = z2 + (x − x0 )2 + (y − y0 )2 = z

1+

(x − x0 )2 (y − y0 )2 + . z2 z2

(4.65)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

94

Imaging physics

Box 4.4 The Fresnel approximation in x-ray microscopy Equation 4.68 gave the Fresnel approximation as 2πz ρ4 π  λ 8z4 2 with ρ as the transverse distance. In Section 4.4.3 we will see that the Rayleigh resolution for a lens is δr = 0.61λ/N.A.. The numerical aperture of a lens is N.A. ≡ n sin(θ) (Eq. 4.172), with n = 1 − δ  1 for the case of X ray focusing. Therefore we can write N.A. = 0.61λ/δr , and since the resolution for x-ray microscopes is not comparable to the wavelength, we can assume N.A. = ρ/z. This lets us write the Fresnel approximation condition of Eq. 4.68 as 2πz 0.614 λ4 π  2 λ 8δ4r 0.614 3 λz δ4r  2  3 1/4 λz . δr  0.61 2

(4.69)

For propagation distances of z = 1 μm at 500 eV, this gives a resolution limit to the Fresnel approximation of 6 nm, while at 10 keV it gives a resolution limit of 0.6 nm. For distances of z = 1 mm, the limits are 32 nm at 500 eV and 3 nm at 10 keV. That is, the Fresnel approximation is very well satisfied in hard x-ray microscopes, and usually satisfied in soft x-ray microscopes (especially for short propagation distances such as are used in multislice methods as discussed in Section 4.3.9).

In the limit of [(x − x0 )2 + (y − y0 )2 ]  z2 , we can expand this as   (x − x0 )2 + (y − y0 )2 (x − x0 )4 + (y − y0 )4 r =z 1+ − + . . . 2z2 8z4   (x − x0 )2 + (y − y0 )2 z 1+ 2z2

(4.66) (4.67)

where the truncated series version of Eq. 4.67 involves the Fresnel approximation, terms (for more on Augustin-Jean Fresnel, where we ignore the x4 /z4 and higher-order see the beginning of Section 5.3). Let ρ = (x − x0 )2 + (y − y0 )2 represent transverse distances; the Fresnel approximation assumes π 2πz ρ4  , λ 8z4 2

(4.68)

where the phase error is the maximum allowed by the Rayleigh quarter wave criterion (Section 4.1.2). This condition is satisfied in most cases (see Box 4.4). If we apply the Fresnel approximation expansion of Eq. 4.67 to exp[−ikr] but simply use r  z for 1/r, and assume cos(θ)  1, we can write Eq. 4.64 as [Goodman 2017, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

95

 (x − x0 )2 + (y − y0 )2 ψ0 (x0 , y0 ) exp −iπ dx0 dy0 , λz −∞

(4.70)

cf. Eq. (4-14)]  ψz (x, y) = B





with B≡

λ 2πz exp[−i ]. Az λ

(4.71)

The exp[−i2πz/λ] phase factor simply incorporates the fact that any plane wave has a phase that oscillates per period λ, while the λ/(Az) term is there to provide unit cancelation for the area of integration dx0 dy0 as well as to incorporate the 1/r drop-off of the amplitude from spherical waves – but since we are considering plane waves ψ0 (x0 , y0 ) which have been modulated by a complex object transmittance g˜ (x0 , y0 ), we have no 1/z dropoff in the amplitude of a perfect plane wave. Thus we shall drop the factor B here, and in what follows. If we expand the (x − x0 )2 and (y − y0 )2 terms, we can also write this as [Goodman 2017, cf. Eq. (4-17)]   ∞ x2 + y2 ψ0 (x0 , y0 ) ψz (x, y) = exp −iπ λz −∞ ⎡ ⎤ # x02 + y20 ⎥⎥⎥ ⎢⎢⎢ xx + yy0 $ ⎥⎦ exp i 2π 0 exp ⎢⎣−iπ (4.72) dx0 dy0 . λz λz That is, since the term exp[−iπ(x2 + y2 )/(λz)] does not depend on the integration variables (x0 , y0 ), it can be pulled outside the integral. These two equivalent expressions of Eqs. 4.70 and 4.72 are known as the Fresnel diffraction integral.

4.3.2

Fraunhofer approximation The expression of Eq. 4.72 has already made use of the Fresnel approximation of Eq. 4.68. Before proceeding further, let’s consider an additional approximation of π

x02 + y20 π  λz 2

(4.73)

x02 + y20 . λ

(4.74)

or z4

This is the well-known Fraunhofer far-field approximation. For a 10 μm diameter aperture, Eq. 4.74 requires z  0.16 m with 500 eV soft X rays or z  3.2 m with 10 keV hard X rays, so the Fraunhofer approximation is considerably more restrictive. However, the payoff it provides is to considerably simplify the Fresnel diffraction integral of Eq. 4.72 to the Fraunhofer diffraction integral of  ∞  % xx0 yy0 & + ψ0 (x0 , y0 ) exp i2π (4.75) ψz (x, y)  dx0 dy0 λz λz −∞ ∞  ψ0 (x0 , y0 ) exp[i 2π (u x x + uy y)] dx0 dy0 , (4.76) −∞

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

96

Imaging physics

where we have also dropped the factor outside the integral of exp[−iπ(x2 + y2 )/(λz)] which we will not notice if we are looking only at intensities I = ψ† · ψ. In the second expression, we have made use of spatial frequencies u x and uy , which we introduced in Eq. 4.32 as wavelength-normalized diffraction angles.

4.3.3

Fourier transforms: analytical and discrete The Fraunhofer approximation has led us to Eq. 4.76, and most readers will recognize that it shows that the far-field wavefield is simply a Fourier transform of the input wavefield ψ0 (x, y). Therefore it is appropriate to take a short detour to discuss Fourier transforms. The Fourier transform of a function a(t) in the time domain leads to a function A( f ) in the frequency domain of  ∞ a(t) ei 2π f t dt. (4.77) A( f ) = −∞

For example, the sound from a musical instrument captured on a microphone will lead to a voltage as a function of time or a(t), while the Fourier transform will show the frequency representation of that signal A( f ). The equivalent for functions a(x) in real space going to their Fourier space representation A(u) in spatial frequencies is obviously  ∞ a(x) ei 2πux dx (4.78) A(u) = −∞

and the Fourier transform is invertible as  ∞ A(u) e−i 2πux d f. a(x) =

(4.79)

−∞

Now there are entire books written on Fourier transforms and their properties (see for example [Bracewell 1986]), but the requirements for validity include that the function a(t) must be integrable and without infinite discontinuities. Fourier transforms are so integral to imaging (forgive the pun) that we will represent them by a more compact notation of A(u) = F {a(x)}

(4.80)

a(x) = F −1 {A(u)}.

(4.81)

and

That is, we’ll use a lower-case letter for the function in real space a(x), and an uppercase letter for the function in Fourier space A(u). We’ll also make use of the convolution theorem of Fourier transforms, which states  ∞ a(s) b(x − s) ds (4.82) a(x) ∗ b(x) = −∞

= F −1 {A(u) · B(u)}.

(4.83)

The convolution between two functions of Eq. 4.82 involves a sequence of shift (x − s), multiply, and add (integrate) operations; for example, the convolution of a broad square Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

1.0 0.8

97

1.0

a(x)

0.8

0.6

0.6

0.4

0.4

0.2

b(x)

0.2

0.0

0.0 -0.4

-0.2

0.0

0.2

x

0.4

-0.4

-0.2

0.0

x

0.2

0.4

1.0 0.8 0.6

a(x)*b(x)

0.4 0.2 0.0 -0.4

-0.2

0.0

x

0.2

0.4

Figure 4.18 Illustration of the convolution of two functions a(x) and b(x), as defined in Eq. 4.82.

function b(x) with a narrow but rounded function a(x) will lead to a broad function with rounded edges, as shown in Fig. 4.18. We need to consider two other important properties of Fourier transforms, proofs of which we leave to other sources [Bracewell 1986]. One is the Dirac delta function δ(x − x0 ) or impulse, which is a function with the properties ⎧ ⎪ ⎪ ⎨+∞ x = x0 δ(x − x0 ) = ⎪ ⎪ ⎩0 x  x0  ∞ δ(x − x0 )dx = 1 (4.84) −∞

In other words, δ(x − x0 ) is nonzero only at x = x0 and zero elsewhere, and with an integral equal to 1. The Fourier transform of this function is simple: F {δ(x − x0 )} = 1,

(4.85)

or a flat function at all spatial frequencies. One can think of this in musical terms as a sharp strike on a cymbal, which produces a sound at a wide range of frequencies. Another important property of Fourier transforms involves the shift theorem of F {a(x − x0 )} = A(u)e−i 2πux0 .

(4.86)

In optics terms, as one translates an aperture sideways by a distance x0 , the transmitted wave receives a linear phase ramp at a particular plane, which can be thought of as a plane wave coming in at an angle θ = λ/x0 . For numerical calculations, the analytical Fourier transform expression of Eq. 4.78 is replaced by the discrete Fourier transform (DFT) using N sampling points over an even spacing or pixel size of Δr of A(um ) =

N−1 

a(n · Δr ) ei 2π (n·Δr ) um Δr

(4.87)

n=0

The Nyquist sampling theorem states that a sampling interval of Δr is sufficient for Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

Imaging physics

(a) Image

(b) log10(Reciprocal space intensity)

(c) Vignetted image

(d) log10(Reciprocal space intensity)

Reciprocal space intensity

(streaks)

98

10-2 (e) Power spectrum 10-4 10-6 10-8 10-10 10-12 0.001

Corners 0.01

0.1

0.5 1.0

Radial spatial frequency ur

Figure 4.19 Example of fast Fourier transform or FFT-based image processing. Panels (a) and (c) show images of Wilhelm Conrad R¨ontgen, who discovered X rays; in panel (c), the image has been digitally vignetted as described below Eq. 4.95. Panels (b) and (d) show the power spectrum images from images (a) and (c); the power spectrum image is the square of the discrete Fourier transform of the image (effectively the intensity of the diffraction pattern of the image) shown here on a logarithmic intensity scale. Note how the lack of digital vignetting in image A leads to streaks on the horizontal and vertical axes in the power spectrum (c) due to the periodicity of the discrete Fourier transform (see Eq. 4.95). Panel (e) at right shows the azimuthally averaged image power from panel (d) as a function of radial spatial frequency ur . The pixel size of the image is 1 unit, so the maximum spatial frequency at the left–right and bottom–top edges of the Fourier transform is given by the Nyquist limit (Eq. 4.88) as u x,y = 1/(2 · 1) = 0.5 inverse pixel √ units while the diagonal lines to the corners lead to spatial frequencies up to 2 higher, or ur (max) = 0.71 inverse pixel units. It is very common to find that image power declines with spatial frequency ur as approximately u−a r ; in this image, a = 2.95 (this is discussed further in Section 4.9.1). Image of R¨ontgen from the public domain via Wikipedia.

representing a function A(u) only if it is bandwidth-limited up to a maximum frequency umax of umax =

1 . 2Δr

(4.88)

Looking back at the expression of Eq. 4.32 that gave a spatial frequency of a grating with period d of u = 1/d, one can see that the Nyquist sampling theorem corresponds to a cycle of one open bar and one closed bar on a grating with a period of 2Δr . The discrete Fourier transform is discussed in more detail elsewhere (see for example [Press 2007, Chap. 12] or [Bracewell 1986]), but it is worthwhile making a few brief comments. The well-known fast Fourier transform (FFT) algorithm [Cooley 1965] uses symmetries in the sine and cosine representation of the Fourier transform to carry out the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

99

computation using only about FFT computational steps  N log(N)

(4.89)

computational operations. The discrete Fourier transform of Eq. 4.87 would otherwise require a sum over N spatial positions at each of N spatial frequencies, or N 2 computational operations. FFTs work best with N being some power of 2 (so-called “radix-2” FFTs), like N = 256 or N = 1024, though some FFT routines also work well with other integers > 2 factored into N. One can always “zero-pad” a smaller array into a radix-2 array, such as by embedding a 240 × 240 image in a 256 × 256 array, with no effect on image information but much faster FFT processing times. In fast Fourier transforms, the number of sampling points N is preserved between real and Fourier space. The discrete positions cover a range of {xn } = {0, 1, . . . , (N − 1)} · Δr , while the discrete frequencies cover a range of + , N/2 − 1 1 (N/2 − 1) {um } = −1, − , . . . , 0, . . . , . N/2 N/2 2Δr

(4.90)

(4.91)

The size of a pixel Δu in the Fourier array is therefore found from N Δu = umax , 2 for which we can use Eq. 4.88 to obtain Δu =

1 . N Δr

(4.92)

For a detector with physical pixel size Δdet located a distance Zdet away in the far field of a real space object, one can also write Δu =

Δdet λZdet

(4.93)

as explained in Eq. 10.16. Most FFT routines deliver their output in the order of zero to maximum spatial frequencies, followed by most negative to least negative frequencies, or + , (N/2 − 1) 1 1 N/2 − 1 , −1, − ,...,− {um } = 0, 1, . . . , (4.94) N/2 N/2 N/2 2Δr where 1/(2Δr ) represents the Nyquist limit of Eq. 4.88. Thus one must do a shift of the array by half of its width to rearrange the output to go in the order of Eq. 4.91. The FFT assumes that the input sequence a(xn ) is periodic; that is, it assumes that a(xn=N ) = a(xn=0 ).

(4.95)

In other words, the array is assumed to be repeated like tiles in a floor. This can lead to edge-ringing artifacts if there is a discontinuity from the left edge to the right edge of the array, or bottom to top; this will produce streaks along the horizontal and vertical axes, respectively, in the power spectrum image as shown in Fig. 4.19. As a result, it Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

100

Imaging physics

is often useful to digitally vignette the image prior to carrying out FFT-based analysis. One way to do this is as follows: 1. Subtract the average or “D.C.” value from the image (D.C. is used here to mean “direct current” as opposed to “alternating current” as a carryover from electrical engineering terminology). 2. Roll off the edges to zero with a smooth function such as a Gaussian with a standard deviation of 4–6 pixels (other functions like the Hamming or Chebyshev windows can be used, but they vignette more of the image). 3. Add the D.C. value back. An example is shown in Fig. 4.19C. Multidimensional FFTs are done as a series of 1D FFTs. That is, in 2D, one first takes a set of row-by-row FFTs followed by a set of column-by-column FFTs. Also, most FFT algorithms work “in-place,” meaning the FFT output is overwritten onto the same array memory as supplied the FFT input. The speed of FFTs on modern computers must be appreciated in the context of some history: back in 1987, it took the author about five minutes to do a 5122 complex FFT on a MicroVAX II computer which cost something like US$25,000 and occupied about the same space as a two-drawer file cabinet. As of 2015, many mobile phones could perform the same calculation in about 10 μsec! The discrete Fourier transform can also be thought of as a matrix operation F between an input vector a = ain and an output vector A = aout , so that one can write aout = Fain .

(4.96)

This concept can be helpful when using numerical optimization methods to solve inverse problems [Gilles 2018], as discussed in Section 10.3.6.

4.3.4

Power spectra of images With the FFT now being part of our toolkit, it is often very informative to look at the power spectrum of an image (again, this terminology is a carryover from the electrical engineering literature). By digitally vignetting an image, taking its discrete Fourier transform using an FFT, and squaring the result to look at Fourier space intensities, one can see how information in the image is distributed across spatial frequencies (see Fig. 4.19; it is nearly always better to view the logarithm of the Fourier space intensities since on a linear scale one will see little beyond the power at the very lowest spatial frequencies). It is also useful to carry out an azimuthal average, and examine the power spectrum which is the Fourier space intensity as a function of radial spatial frequency ur [usually viewed as log10 (Power) versus log10 (ur )]. In our experience with images across a broad range of length scales and imaging modalities, it is very common to find that the power spectrum declines as a constant power law factor a, or I(ur ) ∝ u−a r .

(4.97)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

101

In Fig. 4.19 we found a = 2.95, while in the x-ray fluorescence images shown in Fig. 4.49 we have a ranging from 2.8 to 3.0. (The scaling of image signal with spatial frequency will be discussed further in Section 4.9.1). If limited photon statistics are involved, the power spectrum will often roll off at the high-frequency end to a flat noise floor as discussed below Eq. 4.207; this is seen in Fig. 4.49. Indeed, spatial frequency position of the “knee” between the u−a r image signal decline and the flat noise floor provides a rapid estimate of image resolution, as given by Eq. 4.251. If one has multiple images of the same object, an even better resolution estimate is provided by Fourier shell correlation as given by Eq. 4.255.

4.3.5

Fraunhofer diffraction With the compact notation of Fourier transforms in our toolkit, we see that the Fraunhofer diffraction integral of Eq. 4.76 can be written in a rather simple way: ψz (x, y) = F {ψ0 (x0 , y0 )}.

(4.98)

As stated immediately after Eq. 4.76, the far-field diffraction intensity I = ψ† · ψ from an input wavefield ψ0 (x0 , y0 ) is just the square of the Fourier transform of ψ0 (x0 , y0 ). Let’s consider the simple example of 1D diffraction from a slit of width b as shown in Fig. 4.7. The wavefield modulation g˜ (x0 ) is 1 inside the range −b/2 to b/2. This is often written as a rectangle or “rect” function rect(x0 /b) = 1 for −b/2 ≤ x0 ≤ b/2 and 0 otherwise. The far-field wavefield (ignoring the outside-the-integral exp[−iπx2 /(λz)] phase term shown in Eq. 4.72) using the Fraunhofer diffraction integral of Eq. 4.76 becomes  ψ(u) = ψ0 = ψ0

b/2

1 ei 2π ux0 dx0

−b/2  b/2 −b/2

 cos(2πux0 ) dx0 + i

b/2

−b/2

 sin(2πux0 ) dx0 .

(4.99)

Now because sin(−θ) = − sin(θ), the sine integral from −b/2 to 0 cancels out the integral from 0 to b/2, and because cos(−θ) = cos(θ) we can just double the cosine integral from 0 to b/2. As a result, the integral can be simplified as 

b/2

ψ(u) = 2ψ0

cos(2πux0 ) dx0 .

(4.100)

0

If we make the substitution θ ≡ 2πux0

(4.101)

we have dθ = 2πu dx0 or dx0 = dθ/(2πu), and the upper integration limit becomes Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

102

Imaging physics

θmax = πub, so the integral becomes  θmax 2 1 ψ(u) = ψ0 ψ0 [sin(θmax ) − sin(0)] cos(θ) dθ = 2πu πu 0 sin(πub) 1 ψ0 sin(πub) = b ψ0 = πu πub sin(ν) = b ψ0 ν with ν ≡ πub and the intensity I = ψ† · ψ is

 I=

ψ20

sin(ν) ν

(4.102)

(4.103) 2 (4.104)

where in Eq. 4.104 we have left out the multiplying factor b2 (just as we left the simple multiplying factors of Eq. 4.71 out). The astute reader4 will notice that we have already arrived at this result when adding up phases over a restricted range to arrive at an expression for ηu in Eq. 4.13, with |ηu | plotted in Fig. 4.5; that integration is indeed what we are doing when calculating the Fourier transform of a slit. The expression sin(ν)/ν appears so often that it is given a special name as a “sinc” function sin(ν) (4.105) ν which has a value of sinc(0) = 1 as can be found from L’Hˆopital’s rule. The rect and sinc functions are paired via the Fourier transform as x0 F {rect( )} = sinc(πub). (4.106) b Returning to the solution of Eq. 4.102, the sinc function has its first minimum at ν = π, which translates into a first minimum at the spatial frequency u of sinc(ν) =

θ π = πu b = π b. λ Therefore the angle θ of the first minimum in the diffraction pattern is λ , b which is exactly what we anticipated in Eq. 4.29. θ =

4.3.6

Fresnel propagation by integration, and by convolution Now that we have written Fraunhofer diffraction using the compact notation of ψz (x, y) = F {ψ0 (x0 , y0 )}, let us return to the Fresnel diffraction integral. The first form of Eq. 4.70 was    ∞ (x − x0 )2 + (y − y0 )2 dx0 dy0 . ψ0 (x0 , y0 ) exp −iπ ψz (x, y) = λz −∞ 4

And you’re astute, right?

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

103

Box 4.5 Propagators: proper, and improper The derivation of the Fresnel diffraction integral shown here is based on a simple picture of adding up Huygens point sources. A more proper approach is given by the Rayleigh–Sommerfeld theory involving Green’s functions for electromagnetic wave solutions within aperture boundaries. This can be shown [Sherman 1967] to give a different form of the Fourier space propagator (Eq. 4.108) of [Goodman 2017, Eq. 3.78]    2πz 2 2 1 − (λu x ) − (λuy ) (4.110) H(u x , uy ) = exp −i λ while for the real space propagator (Eq. 4.107) a more accurate expression is    2π 2 2 2 h(x, y) = exp −i x +y +z . (4.111) λ However, the Fresnel approximation effectively reduces Eq. 4.110 to the form shown in Eq. 4.108, which is sufficient for most calculations of interest in x-ray microscopy.

Now look at this expression while also considering the convolution theorem of Fourier transforms of Eqs. 4.82 and 4.83:  a(x) ∗ b(x) =



−∞

a(s) b(x − s) ds = F −1 {A( f ) · B( f )}.

We therefore see that the first form of the Fresnel diffraction integral simply involves a convolution of the input plane wavefield ψ0 (x, y) with a Fresnel propagator h(x, y) of $ # π h(x, y) = exp −i (x2 + y2 ) λz

(4.107)

with the feature that the Fourier transform of a propagator has much the same form [Goodman 2017, Eq. (4-20)] of H(u x , uy ) = exp[−iπ λz (u2x + u2y )].

(4.108)

(More exact expressions for the propagator functions are shown in Box 4.5.) As a result, we can rewrite the first form of the Fresnel diffraction integral of Eq. 4.70 as . ψz (x, y) = ψ0 (x0 , y0 ) ∗ h(x0 , y0 ) = F −1 F {ψ0 (x0 , y0 )} · H(u x , uy ) .

(4.109)

This convolution approach to the Fresnel diffraction integral involves taking the Fourier transform of the input plane wavefield ψ0 (x0 , y0 ), multiplying it by a propagator in Fourier space H(u x , uy ), and taking the inverse Fourier transform of the result. The second form in which we wrote the Fresnel diffraction integral was shown in Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

104

Imaging physics

Eq. 4.72 as

 ∞  x2 + y2 ψz (x, y) = exp −iπ ψ0 (x0 , y0 ) λz −∞ ⎡ ⎤ $ # x02 + y20 ⎥⎥⎥ ⎢⎢⎢ y x ⎥⎦ exp i2π( x0 + y0 ) dx0 dy0 . exp ⎣⎢−iπ λz λz λz

We see that this includes a Fourier transform of the product of ψ0 (x0 , y0 ) and an input plane propagator h(x0 , y0 ), which is then multiplied by an output plane propagator h(x, y) to yield ψz (x, y) = h(x, y) F {ψ0 (x0 , y0 ) h(x0 , y0 )}

(4.112)

for an equivalent expression of the Fresnel diffraction integral.

4.3.7

Fresnel propagation, distances, and sampling We have two equivalent approaches for propagating wavefields: the convolution approach of Eq. 4.109, and the single Fourier transform approach of Eq. 4.112. What are the conditions for using one approach versus another? The difference between the two lies in the version of the propagator function that the input wavefield is multiplied with: in the convolution approach of Eq. 4.109, one multiplies the Fourier transform of the input wavefield F {ψ0 (x0 , y0 )} with the Fourier space propagator H(u x , uy ), whereas in the single Fourier transform approach of Eq. 4.112 the input wavefield ψ0 (x0 , y0 ) is multiplied by the real space propagator h(x0 , y0 ). To see why one approach might be favored over another [Voelz 2009, Li 2015a], consider Fig. 4.20 which shows h(x0 , y0 ) and H(u x , uy ) for two different example distances z for a given wavelength λ. At short propagation distances, the Fourier space propagator H(u x , uy ) varies more slowly, while the real space propagator h(x0 , y0 ) undergoes rapid oscillations. Especially in the case of numerical wavefield propagation, these rapid oscillations can require a very fine spacing of sampling points, and thus very large array sizes. Let’s consider the case [Li 2015a] of a wavefield propagation calculation where we want to know the output wavefield ψz (x, y) over the same maximum radius R in which we know the input wavefield ψ0 (x0 , y0 ), and let’s use N sampling points in both cases with a spacing Δr = R/N. The total number Nr of π phase half-cycles for the real space propagator (Eq. 4.107) of exp[−iπr2 /(λz)] is Nr =

N 2 Δ2r R2 = . λz λz

(4.113)

In Fourier space, Nyquist sampling (Eq. 4.88) gives a corresponding maximum radial spatial frequency of ρmax = 1/(2Δr ), so the number Nρ of π phase half-cycles for the Fourier space propagator (Eq. 4.108) is Nρ = λzρ2max =

λz 4 Δ2r

(4.114)

The matching distance z0 at which we arrive at an identical number of π phase halfcycles in real and reciprocal space, or N = Nr = Nρ , is found from Eqs. 4.113 and 4.114 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

For ƪ QPZLWKz0 PP Near: z PP Far: z PP

1.0

U\SDUW ,PDJLQD art Real p

0.5 Real Space

105

0.0

6PRRWKVDPSOLQJ ZLWKVLQJOH)7 PHWKRG

ï0.5

Reciprocal Space

ï1.0 1.0 0.5

0.0

6PRRWKVDPSOLQJ ZLWKGRXEOH)7 PHWKRG

ï0.5

ï1.0

0

1

2

3

4

0

1

r ѥP

2

3

4

5

r ѥP

Figure 4.20 Real space h(x0 , y0 ) and Fourier space H(u x , uy ) propagators for two example

distances z for a given wavelength λ, plotted in terms of a radius r = x2 + y2 which corresponds to radial spatial frequencies ρ = r/(λz). In each case, the real part is shown in blue and the imaginary part in red. The Fourier space propagator is more slowly varying at short propagation distances, while the real space propagator is more slowly varying at longer distances. The propagator functions are defined in Eqs. 4.107 and 4.108. Figure adapted from one made by Kenan Li [Li 2015a].

to be z0 =

2RΔr 2R2 = . λ λN

(4.115)

This leads to the conclusions of Box 4.6 for the approach that should be used for wavefield propagation as a function of distance z [Li 2015a]. Finally, we note that the classical Fresnel number F0 for propagation from an aperture of radius a over a distance L is given by F0 =

a2 , λL

(4.116)

which we will later see matches the radius of the central zone of a Fresnel zone plate (Eq. 5.20). If we solve Eq. 4.115 for N, we obtain N=2

R2 = 2F0 . λz0

(4.117)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

106

Imaging physics

Box 4.6 Wavefield propagation methods and distances The matching distance z0 of 2RΔr 2R2 = λ λN of Eq. 4.115 lets us decide between two methods for Fresnel wavefield propagation based on the pixel size Δr , calculation grid radius R and number of grid points N, and wavelength λ. z0 =

• If the propagation distance is z < z0 , it is preferable to use the convolution-based approach (Eq. 4.109) of . ψz (x, y) = ψ0 (x0 , y0 ) ∗ h(x0 , y0 ) = F −1 F {ψ0 (x0 , y0 )} · H(u x , uy ) . with H(u x , uy )  exp[−iπ λz (u2x + u2y )] because at shorter distances H(u x , uy ) varies more slowly. • If the propagation distance is z > z0 , it is preferable to use the double-Fourier-transform approach (Eq. 4.112) of ψz (x, y) = h(x, y) F {ψ0 (x0 , y0 ) h(x0 , y0 )} π (x2 + y2 )] because at longer distances h(x0 , y0 ) varies with h(x, y) = exp[−i λz more slowly.

In both cases, the more exact propagator functions h(x, y) and H(u x , uy ) are given in Box 4.5.

Therefore we see that the number of sampling points N required at the matching distance z0 is equal to twice the Fresnel number F0 if the aperture a spans the whole space R. Wavefield propagation can also be computed using other methods such as the finitedifference method, and these methods can offer faster computation speed and better accuracy for larger values of the index of refraction n [Van Roey 1981, Scarmozzino 1991]. While these methods have been used in a few cases for simulating x-ray wavefield propagation [Fuhse 2006, Melchior 2017], we have emphasized the Fourier transform based approach here both for conceptual simplicity and because thus far it has been used by most x-ray microscopy researchers.

4.3.8

Propagation and diffraction in circular coordinates Lenses are often round; in such cases, it is best to work in circular coordinates with x = r cos(θ) y = r sin(θ) r = x2 + y2  (4.118) u x = ρ cos(θ∗ ) uy = ρ sin(θ∗ ) ρ = u2x + u2y so that r is the radius in real space and ρ in reciprocal space. Extension of the Fresnel diffraction integrals of Eqs. 4.70 and 4.72, and the Fraunhofer diffraction integral of

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

107

1.0

J0i

0.5

J1i 0.0

-0.5 0

2

4

i

6

8

10

Figure 4.21 The Bessel functions J0 (r) and J1 (r). These Bessel functions of the first kind are

used in calculating the Hankel transform of Eq. 4.123 as well as the diffraction pattern from a pinhole as given by Eq. 4.134.

Eq. 4.76, to circular coordinates is straightforward in terms of dealing with propagator functions of   (4.119) h(r) = exp −iπr2 /(λz) in real space and

  H(ρ) = exp −iπ λz ρ2

(4.120)

in reciprocal space. What is less straightforward is the equivalent of the Fourier transform, which we now consider. The 1D Fourier transform expression of Eq. 4.78 of  ∞ a(x) ei 2π f x dx A( f ) = −∞

becomes







/ 0 a(r) exp i 2π (ρ cos(θ∗ ) r cos(θ) + ρ sin(θ∗ ) r sin(θ) r dr dθ r=0 θ=0  2π  ∞ / 0 a(r) r dr exp i 2π ρr cos(θ∗ − θ) dθ (4.121) =

A(ρ, θ∗ ) =

r=0



θ=0

in circular coordinates. The integral over θ is known as a Bessel function of the first kind with zero order J0 of  2π 1 J0 (w) = exp[iw cos(θ)] dθ, (4.122) 2π 0 which is a pure real function (see Fig. 4.21) apart from an arbitrary uniform phase set by the choice of θ∗ . If we choose θ∗ = 0 and use the Bessel function result of Eq. 4.122 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

108

Imaging physics

1.0 0.8

2J1(i)/i

0.6 0.4

[2J1(i)/i]2

0.2 0.0 -0.2 0

5

10

i

15

20

Figure 4.22 The Airy function with no central stop (b = 0) shown both as the amplitude 2J1 (ν)/ν of Eq. 4.132, and the intensity [2J1 (ν)/ν]2 of Eq. 4.134. The Airy function describes diffraction by a circular aperture, or the focal spot of a lens. Note that the sign of the amplitude flips in successive side lobes.

in Eq. 4.121, we have 



A(ρ) = 2π

a(r) J0 (2πρr) r dr = H{a(r)},

(4.123)

0

which is known as the Fourier–Bessel or zeroeth-order Hankel H{} transform of the function a(r). Like the Fourier transform, the Hankel transform is invertible so a(r) = H −1 {A(ρ)}.

(4.124)

We have arrived at circular coordinates equivalents of the Fresnel diffraction integrals. The convolution form of Eq. 4.109 can be written as ψz (r) = H −1 {H{ψ0 (r0 )} · H(ρ)} ,

(4.125)

while the single transform approach of Eq. 4.112 can be written as ψz (r) = h(r) H {ψ0 (r0 ) h(r0 )} .

(4.126)

The Fraunhofer diffraction integral of Eq. 4.98 becomes ψz (ρ) = H{ψ0 (r0 )}.

(4.127)

Several papers discuss numerical implementation of wavefield propagation using Hankel transforms [Yu 1998, Guizar-Sicairos 2004, Norfolk 2010, Li 2015a]. In Section 4.3.5, we calculated the Fraunhofer diffraction pattern of a slit of width b, describing it as a rectangle function or rect(x/b). Let us consider here the Fraunhofer diffraction pattern of a pinhole of radius a, which we’ll describe in terms of a circle function or circ(r/a). From Eqs. 4.123, we see that the far-field or Fraunhofer diffraction Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

109

1.0

0.8

0.6

Ƭ › 0.4

0.2

0.0 0.0

0.5

1.0

1.5

Ƭ›

2.0

2.5

2 Figure 4.23 The sinc (ν) = [sin(ν)/ν]2 function describing the intensity of light diffracted by a slit of width b, with ν = πub (Eq. 4.104), and the Airy intensity function [2J1 (ν)/ν]2 describing the intensity of light diffracted from a circular aperture of radius a, with ν = 2πρa (Eq. 4.134). The first minimum of the slit diffraction pattern is at νfirst min = π, while for the pinhole it is at νfirst min = 1.22π. The Airy intensity function also describes the focus of a lens, as will be discussed in Section 4.4.3.

amplitude of such a pinhole is 



ψz (ρ) = 2π 0

ψ0 circ



= ψ0 2π

r a

J0 (2πρr) r dr

a

J0 (2πρr) r dr.

(4.128)

0

If we make the substitution r = 2πρr, we obtain ψz (ρ) = ψ0

2π (2πρ)2



a

J0 (r ) r dr .

(4.129)

0

Now a recurrence relationship of Bessel functions of the first kind is  d  n+1 x Jn+1 (x) = xn+1 Jn (x), dx which leads to



r J0 (r ) dr = r J1 (r )

(4.130)

(4.131)

which when evaluated at the integration limits of 0 to a allows one to write the solution Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

110

Imaging physics

of Eq. 4.129 as 2π 2J1 (2πρa) 2πρa J1 (2πρa) = ψ0 (πa2 ) 2πρa (2πρ)2 2J1 (ν) = ψ0 (πa2 ) ν

ψz (ρ) = ψ0

(4.132)

with ν ≡ 2πρa.

(4.133)

Thus one has the area of the pinhole πa2 as a scaling factor, and an intensity distribution I = ψ† · ψ of  2 2 2 2 2J1 (ν) . (4.134) I(ρ) = ψ0 (πa ) ν This expression for the far-field diffraction pattern of a pinhole makes use of the Airy function 2J1 (ν)/ν (named after the British mathematician George Biddell Airy), which is shown in both amplitude and intensity form in Fig. 4.22. The first minimum of the intensity function is at νfirst min = 1.220 π,

(4.135)

whereas the diffraction pattern of a rectangular aperture of [sinc(ν)]2 = [sin(ν)/ν]2 has a first minimum at νfirst min = π. From Eqs. 4.133 and 4.135, the divergence semi-angle θfirst min = λρfirst min from the optical axis to the first minimum of the Airy pattern is given by λ θfirst min = 0.61 . a

(4.136)

The Airy (circular aperture) and sinc (square aperture, or slit) function intensities are shown together in Fig. 4.23.

4.3.9

Multislice propagation So far we have discussed the propagation of waves over some distance through free space. If “empty” samples were all we could simulate for x-ray microscopy, we would be in a sorry state! In the case of sufficiently thin specimens with thickness t, we found in Eq. 3.71 that a wavefield is modulated by exp[kt(iδ(x, y) − β(x, y))] to yield an exit wave within the pure projection approximation. For thicker specimens, the multislice method [Cowley 1957, Ishizuka 1977] (also called the beam propagation method [Van Roey 1981]) provides a way to simulate wavefield propagation through real objects with a refractive index distribution n(x, y, z) = 1 − δ(x, y, z) − iβ(x, y, z) (Eq. 3.67), as shown in Fig. 4.24. The sample is considered as if it were sliced into a set of thin slabs of material along the beam propagation direction. For each slab, two separate steps are carried out: 1. Within the slab of thickness Δz, the incoming wavefield ψ j is modulated by the net

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.3 Wavefield propagation

6ODE·VRSWLFDO modulation

Thick object

111

)UHHVSDFH SURSDJDWLRQ

Beam direction

6z

ƶj

ƶj ƶj+1

Slab j

ƶ'j

ƶ'j

ƶj+1

6z Figure 4.24 Schematic representation of the method of multislice propagation, used to simulate a wavefield propagating though a non-homogeneous refractive medium [Cowley 1957]. Along the beam direction, the object is represented by a series of slices. At the entrance of a slice, the incident wavefield ψ j is first modulated by the refractive effects of the slab of material, leading to a modified wavefield at the same plane of ψj . This wavefield is then propagated to the next slab entrance, yielding the next slice’s wavefield of ψ j+1 .

optical effect of the slab integrated along the beam direction, giving a modulated wavefield ψj of ψj (x, y) = ψ j (x, y)



Δz

exp[−ikn(x, y, z) z] dz    Δz & 2πz % exp = ψ j (x, y) iδ(x, y, z) − β(x, y, z) dz λ 0   2πΔz (i δ(x, y, z j ) − β(x, y, z j ) ,  ψ j (x, y) exp λ 0

(4.137)

where the latter expression is appropriate for a sample that is defined on a regular grid of longitudinal positions z j . 2. This material-modulated wave is then brought to the next plane ψ j+1 by free-space propagation using the convolution propagation method of Eq. 4.109. (As noted at the end of Section 4.3.7, one can instead use finite-difference wavefield propagation methods with potentially faster computation speed and higher accuracy.) Further details on its formulation for problems including spherical wave propagation are available [Munro 2019]. This approach allows one to simulate the exit wave (see Box 10.2) that would emerge from the illuminated object without invoking either the Born or Rytov approximations (Section 3.3.4). Once one has exited the object and is in free space, this exit wave can be propagated some further distance, including into the far-field condition for obtaining the diffraction pattern of the extended-in-depth object [Thibault 2006]. The multislice method is easy to implement, and it applies to a surprisingly wide range of x-ray optical phenomena (including grazing incidence reflection, and diffracDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

112

Imaging physics

tion by thick, high aspect ratio optics where coupled wave equation methods have been used previously [Li 2017a]). How many slices must one use? Since the trans√ verse distance from an edge to the first Fresnel fringe in propagation scales as λz (see Eq. 4.217), we wish to have the transverse pixel size Δr be a small fraction 1 of this distance, or √ (4.138) Δr = 1 λz. Nyquist sampling would suggest 1 ≤ 0.5 and the Rayleigh quarter wave criterion would suggest 1 ≤ 0.25. The transverse sampling condition implies that the longitudinal sampling be some small fraction 2 of z = Δ2r /( 12 λ), giving Δz =

2 Δ2r . 12 λ

(4.139)

Values of 1 = 0.1 and 2 = 0.1 give good agreement with a test of the convergence of the multislice method as the slice thickness Δz is decreased [Li 2017a], though the way to be sure is to decrease 1 and 2 and see that one approaches an asymptotic limit for small step size. Another way to answer the question is to consider the transition from planar to volume diffraction gratings for a grating of a thickness Δz and period d as given by the Klein–Cook parameter QK–C of [Klein 1967] QK–C =

2π λ(Δz) , n d2

(4.140)

where n is the mean refractive index (which of course is n  1 for X rays based on Eq. 3.67). Values of QK–C  1 are adequately described using plane grating diffraction, while the condition QK–C  1 means that volume diffraction effects begin to come into play (Section 4.2.2). If the grating half-period is bΔr , one can rearrange Eq. 4.140 in terms of the slice thickness to find ΔzK–C =

2nb2 QK–C (Δr )2 . π λ

(4.141)

The condition of having 1 = 0.1 and 2 = 0.1 corresponds to QK–C = 5π/(nb2 ). As an example, if the pixel size is one-fifth the finest zone width drN in a Fresnel zone plate, one has b = 5 and QK–C = π/5, which indeed satisfies QK–C  1.

4.4

Imaging systems Now that we have the basics in hand for how wavefields propagate, we can go on to discuss imaging systems. We will connect wavefield propagation with lens imaging in Section 4.4.2, but we begin with a short review of the basics of lens-based imaging. A lens with focal length f serves to image an object from a longitudinal position s to an image position s according to 1 1 1 + = . (4.142) s s f

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

113

Figure 4.25 Thin lens imaging. An object at a distance s is imaged to a point s by a lens of focal

length f according to Eq. 4.142. The magnification M is given by M = h /h, which is negative and with a magnitude greater than 1 in the case shown here. Parallel rays are directed through the focal points f of the lens, and rays that go through the focal point emerge parallel to the optical axis.

The lateral magnification M of the image is given by M=

s h =− , h s

(4.143)

where negative magnifications indicate an inverted image (the case shown in Fig. 4.25). Given the fact that the size times angular divergence of a beam is a constant at any focus (Eq. 4.190), one can also relate the object divergence θ to the image convergence θ via θ = −θ/M.

(4.144)

For large magnification, s is near a focal length f so we can write s = f + Δz with Δz  f , giving 1 1 Δz 1 1 1 1 1 +  = +    (1 − ) +  s s f + Δz s f (1 + Δz/ f ) f f s 1 Δz 1  − 2 + , f s f which when substituted into Eq. 4.142 yields 1 Δz = 2.  s f

(4.145)

The lateral image magnification can then be expressed as M=

−s − f 2 /(Δz) − f =  , s f + Δz Δz

(4.146)

which means we can write approximate expressions for imaging distances of s = f + Δz = f (1 + s =

f2  −M f. Δz

Δz f ) f − f M

(4.147) (4.148)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

114

Imaging physics

That is, if we have a lens with a focal length of f = 1 mm and image with a magnification of M = −1000, the object will be at a distance s = Δz = −(1 mm)/(−1000) = 1 μm more than a focal length f away from the lens, and the image will be at a distance of s = −(−1000) · (1 mm) = 1 m. For large demagnification of an x-ray source to a small focus (for example, with M = −1/100) we have the equivalent approximate expressions of f (4.149) s− M  s  f − M f, (4.150) which would place the source at 100 f away and the image at f + ( f /100) away from the lens in this example. Now let us consider the longitudinal magnification (the magnification in the zˆ direction). If we move the object a distance Δs farther away from the lens, the image will be at a distance Δs farther away as well. That is, we have 1 1 1 + = s + Δs s + Δs f

(4.151)

so we wish to approximate the left-hand side 1 1 1 1 Δs 1 1 Δs +  +   (1 − ) +  (1 −  ) =    s + Δs s + Δs s(1 + Δs/s) s (1 + Δs /s ) s s s s  Δs Δs 1 1 1 (4.152)  +  − 2 + 2 = , s s f s s which then allows us to subtract 1/s + 1/s from the left-hand side of Eq. 4.152, and 1/ f from the right-hand side by using Eq. 4.142. This leaves us with the condition s2 = −Δs M 2 (4.153) s2 or the statement that the longitudinal or z magnification Mz is related to the lateral magnification M by Δs = −Δs

Mz = M 2 .

(4.154)

Let us consider a practical example of an x-ray microscope with a depth of field (Section 4.4.9) of DOF = 5 μm, and an image magnification of M = −200 with an optic with a focal length of f = 2 mm. The image-recording detector will be at a distance of s = −(−200)(2 mm) = 4 meters from the lens, while the DOF will be magnified by Mz = 2002 to (2 × 10−6 ) · (2002 ) = 0.2 meters. That is, the entire DOF region will show up in the image at the same magnification (the detector is much thinner than 0.2 meters) so one can treat image features within the DOF as being presented as a pure projection to the detector, with no depth-coupled lateral magnification changes.

4.4.1

Field of view In thin-lens imaging theory, rays that go through the center of a lens are undeviated no matter what their incident angle is (the lens surfaces are always flat right on the optical

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

115

a DOF

Optical axis

Ƨ Im

ag

ing

ax

is

c

b

Figure 4.26 Schematic used for calculating the field of view of an imaging system. An object at distance s and height h is imaged at a distance s at a height −h if a lens were able to truly image a plane to a plane; in fact, imaging happens between the spherical surfaces shown as dashed lines, so that the longitudinal offsets a and b should be less than the depth of field DOF for imaging.

axis). When the source is moved off-axis by a distance h, the image is therefore moved off-axis by a distance h as given by the magnification M of Eq. 4.143. However, in setting up the geometrical optics conditions of either a thin refractive lens or a Fresnel zone plate, the distances s and s turn out to be distances from the object to the lens center, rather than offset distances from the lens along the optical axis; that is, the thinlens imaging equation of Eq. 4.142 is for imaging a spherical surface to a spherical surface (the dashed lines shown in Fig. 4.26). When imaging a planar object to a planar detector, we must consider the departures from the conditions of spherical object and detector planes. Considering the geometry of Fig. 4.26, we see that the modified object distance is s + a, and the modified image distance is s + b. Projecting this to a distance along the optical axis gives s = (s + a) cos(θ) and s = (s + b) cos(θ), from which we find (1 − cos θ) cos θ  (1 − cos θ) b=s . cos θ a=s

(4.155) (4.156)

Now because the object radius from the lens is increased from s to s + a, the image radius from the lens is decreased to (s − c) where c is a positive number, leading to a thin-lens imaging condition along the imaging axis of 1 1 1 + = . s + a s − c f

(4.157)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

116

Imaging physics

By taking steps like those used in Eq. 4.152, we obtain the result c=a

a s (1 − cos θ) s2 , = 2 = 2 2 cos θ s M M

(4.158)

which shows the M 2 dependence on changes in the object’s longitudinal position, as expected from Eq. 4.154. Given that we know now the image point to be shifted by a distance c inside the normal image radius on the imaging axis, we can project the net shift Δz along the optical axis as  s (1 − cos θ)  (1 − cos θ) + s cos θ Δz = (c + b) cos θ = cos θ M 2 cos θ   s  2  s   h + s + s (1 − cos θ)  = M2 M2 2s2  2 2 1 h h (4.159)  − M  −M , 2 s M f where we have used 1 − cos θ  θ2 /2, θ  h/s, s  f as given by Eq. 4.147, and M = −s/s as per Fig. 4.26. In order for this position to be imaged at full resolution, it should be within the microscope’s DOF as will be discussed in Section 4.4.9. That is, we require Δz ≤ DOF, or (using Eq. 4.215) −M

2cz δ2r h2 ≤ . s 0.612 λ

Renaming the height h to be hDOF , the radial field of view as set by DOF limits is

2cz f δr . (4.160) hDOF ≤ 0.61 |M| λ Consider the case of δr = 40 nm resolution imaging with a Fresnel zone plate with a focal length of f = 0.84 mm at λ = 2.42 nm, used in an imaging system with a magnification of M = −500 (and assume cz = 1 as will be discussed in Section 4.4.9). In this case we have hDOF = 6.0 μm, or a full-diameter field of view of 2hDOF = 12.0 μm with full image sharpness.

4.4.2

Optical system via propagators A lens acts as a Fourier transform device, since a light source placed at the focal point will generate a plane wave emerging from the lens, and an incident plane wave will produce a point focus. The point source or focus is a Dirac delta function δ(x − x0 ) centered at the transverse position x0 , and the Fourier transform of a delta function is 1 at all frequencies (Eq. 4.85). Moving the point source sideways to some other transverse position leads to a plane wave at a different angle relative to the optical axis, as shown in Fig. 4.27. This is the shift theorem of Fourier transforms (Eq. 4.86) in action! With this understanding, let’s briefly consider a Fourier optics approach to an imaging system [Goodman 2017, Chap. 5]. Let’s start with a plano-convex lens with curvature

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

117

Figure 4.27 Lenses act as Fourier transform devices. Point sources of light placed a focal length away from the lens lead to plane waves emerging from the lens, and an offset of the point source leads to rays at an angle (red) as expected from the shift theorem of Fourier transforms (Eq. 4.86).

R2 as shown in Fig. 4.28. The thickness t(r) of the lens as a function of radius r can be found from t(r) = |R2 | cos(θ) − r2 = |R2 | cos(θ) − (|R2 | − tmax ) = tmax − |R2 |[cos(θ) − 1] θ2 1 r2 r2 (4.161)  tmax − |R2 | = tmax − |R2 | = tmax − 2 2 2 2|R2 | |R2 | where we have used the approximation cos(θ)  1 − θ2 /2 for a thin lens with small aperture. For visible light where (n − 1) > 0, the phase retardation produced by this radially dependent glass thickness is ϕ(r) = −k (n − 1) t(r),

(4.162)

where k = 2π/λ as usual. Ignoring the constant [−nktmax ] term, and using the fact that the sign convention for lens radii has R2 = −|R2 |, the phase shift exp[iϕ(r)] relative to there being no lens at all is given by ϕ(r) = k (n − 1)

r2 2|R2 |

(4.163)

If we repeat the same calculation for a convex-plano lens with |R1 | = +R1 , and add the results together to represent a double-convex lens, we have    1 πr2 πr2 1 , (4.164) − = ϕ(r) = (n − 1) λ R1 R2 λf where in the last step we have used   1 1 1 = (n − 1) − , f R1 R2

(4.165)

which is known as the lensmaker’s equation. Let’s now describe an optical system using propagators and lens phase functions. We start with a point source which we’ll represent with a Dirac delta function δ(x − 0, y − 0). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

118

Imaging physics

t(r) R2 Ƨ r2

r tmax

Figure 4.28 Geometry for calculating the glass thickness of a plano-convex lens, leading to

Eq. 4.161.

If we use the convolution approach to propagation of Eq. 4.109, the wavefield ψ1 (x, y) entering the lens is given by . ψ1 (x, y) = ψ0 F −1 F {δ(x − 0, y − 0)} · H(u x , uy ) . = ψ0 F −1 1 · H(u x , uy )   x 2 + y2 = ψ0 exp −i π (4.166) λs where we have used the fact that the Fourier transform of a Dirac delta function is 1 everywhere (Eq. 4.85), and where we have written out the real space propagator h(x, y) – which is, after all, the Fourier transform of H(u x , uy ) – explicitly for the propagation distance s. We then multiply by the phase function of the lens eiϕ(r) (using Eq. 4.164). The wavefield ψ1 (x, y) exiting the lens is then given by  2  x + y2  ψ1 = ψ1 exp iπ λf   x2 + y2 , (4.167) = ψ0 exp −iπ λZ  where 1 1 1 ≡ − .  Z s f

(4.168)

If f = s, then 1/Z  = 0 and the entire quadratic phase term becomes exp[−i · 0] = 1. In that case, the lens has taken a point source and turned it into a plane wave, as shown in Fig. 4.27. If we now propagate ψ1 by a distance s to obtain a wavefield ψ2 , we have / 0. ψ2 = F −1 F {ψ1 } exp −iπ λs (u2x + u2y ) - / 0 / 0. = ψ0 F −1 exp −iπ λZ  (u2x + u2y ) exp −iπ λs (u2x + u2y ) . (4.169) Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

119

1.0

Sum Spot 1

0.8

Spot 2

Airy intensity

0.735

0.6

0.4

0.2

Ƭ ›

0.0 -5

0

Ƭ

5

10

Figure 4.29 The Rayleigh resolution criterion is met when the Airy intensity pattern of one diffraction-limited focal spot is centered at the first minimum of the Airy intensity pattern of a second focal spot, or at a separation of ν = 1.22π. This leads to the Rayleigh resolution of δr = 0.61λ/N.A. as given in Eq. 4.173. The summed intensity in between the two images (the “dip”) drops to 73.5 percent of the single-source intensity maximum. When the lens has a half-diameter central stop or b = 0.5 in Eq. 4.176, the “dip” drops to 52.2 percent of the maximum. Other resolution criteria include the Sparrow criterion shown in Fig. 4.35.

Consider the case when s = −Z  : the quadratic phase factors then cancel each other out, so we are left with ψ2 = ψ0 F −1 {1} = ψ0 δ(x − 0, y − 0).

(4.170)

That is, we have imaged from a point to a point! Going back to Eq. 4.168, the condition of s = −Z  can be written as  1 1 1 1 1 1 1 − + = (4.171) = − = − or s Z s f s s f which simply reproduces Eq. 4.142. We therefore see that a lens works to counteract the Fresnel propagation of a wavefield from the object point to the image point.

4.4.3

Diffraction and lens resolution In the above analysis, we have neglected to include an important factor: lenses have a limiting aperture, which we will refer to as a pupil function p(a) where a is the radius of the lens. Since we have seen that a plane wave incident on a lens is imaged to a point, or F {1} = δ(x − 0, y − 0), the focus of a finite-aperture lens will involve a Fourier transform of the pupil function p(a). We have already solved this problem when we calculated the far-field diffraction pattern of a pinhole: the amplitude is given by Eq. 4.132 as the Airy function [2J1 (ν)/ν]. In the present case we wish to calculate the light amplitude

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

120

Imaging physics

as a function of radius r at the focal plane using ν = 2πρr. Now the spatial frequency ρ = θ/λ is a wavelength-normalized diffraction angle, which in this case is determined by the limiting angle of a lens known as its numerical aperture of a (4.172) N.A. ≡ n sin(θ)  . f The exact expression allows for there to be a refractive medium n between the lens and the focus, while the approximate form assumes n = 1 (entirely appropriate for X rays) and small angles. We then have ν = 2π(N.A./λ)r in the Airy function, and in Eq. 4.135 we found that the first zero in the Airy amplitude (and first minimum in the Airy intensity) is at νfirst min = 1.22π from which we find a spatial resolution δr of δr = rfirstmin = 0.61

λ . N.A.

(4.173)

This is the celebrated Rayleigh resolution for a lens, with an intensity profile as shown in Fig. 4.23. This intensity profile, which represents the image of an infinitely small illumination source as produced by an aberration-free circular lens, is known as the intensity point spread function psf(ν) of the lens of 2  2J1 (ν) (4.174) psf(ν) = ν with r ν = 1.22π . δr

(4.175)

This is also called the diffraction-limited focus of the lens. Departures from this result due to finite illumination sources are discussed in Section 4.4.6. Rayleigh arrived at the expression of Eq. 4.173 by considering the image of closely separated incoherent sources (the light from stars as imaged in a telescope, in his case); he declared them to be resolvable when the center of the image of one star was located at the position of the first Airy minimum of the image of the second star, as shown in Fig. 4.29. In Fig. 4.30, we show 2D images of the sum of two Airy intensity spots as a function of their separation. The analysis above was for a normal optic that is continuous from the outer aperture to the center. Fresnel zone plates used in scanning x-ray microscopes usually have a central stop which is used in conjunction with an order-sorting aperture as shown in Fig. 5.17 (zone plate monochromators also require central stops, as shown in Fig. 6.4). We therefore consider the properties of a circularly symmetric optic with a central stop of fractional diameter b (the case of non-circularly symmetric optics, such as Kirkpatrick–Baez mirrors and multilayer Laue lenses, is considered in Section 4.4.5). With a central stop fraction b, the intensity point spread function is modified to an obstructed Airy function [Linfoot 1953, Tschunko 1974] of 2 2   2J1 (ν) 1 2 2J1 (b ν) − b (4.176) psf(ν, b) = ν bν (1 − b2 ) with ν = 1.22π r/δr as in Eq. 4.175. In Fig. 4.31, we show the modifications to the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

0.8

1.0

1.2

0.2

0.4

0.6

121

Figure 4.30 The Rayleigh resolution criterion is illustrated by showing the sum of two Airy intensity spots separated by the indicated fraction of the Rayleigh resolution criterion of δr = 0.61λ/N.A. as given in Eq. 4.173. The Sparrow resolution criterion [Sparrow 1916], where the “dip” between the two spots just disappears, corresponds to a fraction 0.77 of the Rayleigh resolution (see Eq. 4.177) for the case of an unobstructed circular optic.

Airy function that this causes. While increasing values of b lead to a slight narrowing of the central Airy disk that can be interpreted as an improvement in spatial resolution [Rarback 1981] and as a narrowing of the full-width at half-maximum (FWHM) probe diameter (Fig. 4.32), the fraction of energy that goes into the central Airy disk decreases as b is increased (Figs. 4.33 and 4.34) and we will see below that the optical transfer function is also affected (Fig. 4.48). It is not easy to remove these side lobes without reducing the resolution of the focal spot; for example, if one were to try to modify the optic to produce a circ-function-like cutoff of intensities beyond the first Airy minimum, the Fourier transform of such a function would be an Airy amplitude pattern with which one would have to modulate the optic, thereby losing flux while also requiring higher numerical aperture without a corresponding improvement in spatial resolution [Lu 2006, Sec. 2.7]. The Rayleigh resolution criterion is of course somewhat arbitrary; it is based on the specific properties of an unobstructed circular lens and the position of its first diffraction minimum. Another criterion that is sometimes used is the Sparrow resolution criterion [Sparrow 1916], which is when the “dip” between the two point source images just disappears (that is, the second derivative of the intensity profile becomes zero at the midpoint between the two point sources; see Fig. 4.35). In Sparrow’s original paper, he pointed out that the human visual system’s propensity for edge detection means that an observer will detect two lines in a spectrum (rather than one) when this condition is met, though this is perhaps optimistic for imaging of more “crowded” specimens. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

122

Imaging physics

1.0

Intensity

0.8

0.6 Full aperture 0.4 Half-diameter central stop

0.2

0.0 -6

-4

-2

0

2

4

6

Ƭ Figure 4.31 Focused intensity profile for a full optic (with an Airy intensity profile as given by Eq. 4.134), and for one with a half-diameter central stop (or b = 0.5 in Eq. 4.176). The optic with a central stop would produce a lower peak intensity by a factor of 1 − b2 , which is not shown in this plot. While the central spot is narrower (the first minimum is at ν = 1.001π for b = 0.5, versus ν = 1.22π for b = 0), the first Airy ring carries a larger fraction of the focused energy, as is shown in Fig. 4.33, and the optical transfer function is also significantly affected, as is shown in Fig. 4.48.

For an optic with no central stop or b = 0, the Sparrow resolution criterion is met when ν = 0.941π, while with a half-diameter central stop or b = 0.5 it is met when ν = 0.862π, giving spatial resolution values of λ δr [Sparrow, b = 0] = 0.47 N.A.

λ δr [Sparrow, b = 0.5] = 0.43 N.A.

I[Sparrow] = 1.126

(4.177)

I[Sparrow] = 1.083,

(4.178)

which can be compared with the standard Rayleigh result of Eq. 4.173. For an unobstructed lens, the Sparrow criterion gives a resolution that is (0.941/1.22) = 0.77 the value of the Rayleigh resolution, which one can see in Fig. 4.30.

4.4.4

Beating the diffraction limit in light microscopy Returning to the Rayleigh resolution criterion given in Eq. 4.173, it does exactly what it was asked to do: it gives us a simple estimate for how easily one can distinguish two nearby incoherent sources. In the following section, we will see that image transfer functions provide some nuance to judging resolution beyond a single hard-and-fast number. In visible light microscopy, there are a wide range of approaches that have enabled imaging well beyond the Rayleigh resolution limit, including the following: • Structured illumination involves imaging a grating onto the illuminated field of view

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

123

0.85 0.843 at b=0 Fraction of Rayleigh resolution

0.80

0.747 at b=0.5

0.75

0.70

0.65

0.60 0.0

0.2

0.4 0.6 Central stop fraction b

0.8

1.0

Figure 4.32 The full-width at half-maximum (FWHM) probe size versus Rayleigh resolution for optics with central stop fractions b of the total optic diameter. As the central stop fraction b increases, the FWHM probe diameter decreases but more energy is thrown into the outer rings of the Airy function (Figs. 4.33 and 4.34). 2

Table 4.1 Integral of the two-dimensional Airy and sinc intensity distributions at the radius

νfirst min = 1.22π corresponding to the Rayleigh resolution for an optic with no central stop (Eq. 4.135). This integral result, referred to in Fig. 4.34 as IR , is listed below for circular optics with various fractional center stop diameters b, along with the actual radius νdark ring of the first minimum or “dark ring” of the modified Airy function (Eq. 4.176). The sinc2 distribution for crossed cylindrical optics is addressed further in Section 4.4.5. Optic type sinc2 circular circular circular circular circular circular circular circular circular circular

b 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

IR 0.826 0.843 0.824 0.770 0.690 0.596 0.499 0.405 0.314 0.221 0.120

νdark ring 3.832 3.786 3.664 3.502 3.322 3.144 2.974 2.814 2.666 2.530

of an object, essentially allowing for the doubling of the maximum spatial frequency from which structural information is collected and thus a twofold improvement in spatial resolution [Gustafsson 2000]. Nonlinear fluorescence response can improve this further [Gustafsson 2005]. • Near-field scanning optical microscopes (NSOMs) use a sub-wavelength aperture Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

124

Imaging physics

8

Energy in central Airy disk

0.8

6

Ƭ of dark ring

Fraction of total energy

1.0

0.6

0.4

0.2

0.351 at b=0.5

0.072 at b=0

0.0 0.0

0.2

0.4

Energy in first Airy ring

0.6

Central stop fraction b

Ƭ=7.01 at b=0

Ƭ=7.19 at b=0.5

Sec

ond

dark

ring

4

First dark ring 2

0.8

1.0

0 0.0

0.2

0.4

0.6

0.8

1.0

Central stop fraction b

Figure 4.33 Properties of an optic as a function of increasing the fractional diameter b of a central stop. The left-hand figure shows the energy in the central Airy disk and the first Airy ring as a function of b, while the right-hand figure shows the value of ν of the location of the first and second Airy dark rings. Values of the fraction of energy for the central Airy disk, and the positions νdark ring of the first dark ring, are shown in Table 4.1, while values for the energy in the first Airy ring and the position of the second dark ring are indicated on the plot. Central stops are often a required feature of x-ray optics (see Figs. 5.17 and 6.4), and while the central Airy disk is narrowed as b is increased (Fig. 4.31), a greater fraction of energy goes into subsidiary Airy rings (Fig. 4.34) and the optical transfer function is also affected (Fig. 4.48).

to generate a small spot of light. The fraction of light that makes it through this aperture is small, and the beam quickly diverges over a distance of about 100 nm from the tip, but images with a resolution of tens of nanometers can be obtained [Betzig 1991]. • Stimulated emission depletion (STED) microscopy works by using a phase spiral to produce a “hollow” focus spot of one wavelength which suppresses the excitation of visible-light fluorescence transitions in a molecule, and an overlapping focal spot of an excitation beam. The net effect is that visible-light fluorescence is limited only to the center of the “hollow” beam so that a spatial resolution of better than 50 nm can be obtained, as proposed and first demonstrated by Hell et al. [Hell 1994, Klar 2000]. • One can measure the position of widely separated objects with a precision far greater than the resolution. While two objects are hard to distinguish at separations finer than the Rayleigh resolution, as shown in Figs. 4.29 and 4.30, one can fit a function such as a Gaussian to the focal spot of an isolated object such as the point spread functions shown in Fig. 4.31. Even if the point spread function is not exactly Gaussian in profile, and even if it is “noisy” due to a limited number of photons being collected in the measurement, one can still find the position of the center of the Gaussian to a precision much finer than the width of the Gaussian. In visible-light microscopy, Eric Betzig proposed such a trick if one could somehow control the turning on and off of fluorescence from nearby emitters [Betzig 1995]. It was soon observed by Dickson, Moerner, and collaborators that some individual fluorophores will spontaneously switch into a long-lasting dark state yet they Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

125

1.0 b=0.4

b=0.0

Integrated intensity

0.8

b=0.1

b=0.5

b=0.2 2

0.6

sinc

b=0.6

b=0.3 b=0.7

0.4 b=0.8

First dark ring

0.2

b=0.9

1.22› 0.0 0

2

4

6

8

10

Ƭ Figure 4.34 Fraction of energy versus ν, where νfirst min = 1.22π = 3.83 is the probe radius

corresponding to the Rayleigh resolution (Eqs. 4.135 and 4.175). This is shown for circular optics with central stops with various fractions b of the optic diameter (b = 0.5 is typical for order sorting in scanning x-ray microscopes, as shown in Figs. 5.17 and 6.4). The position of the first dark ring in the Airy distribution is also shown (this is also shown at right in Fig. 4.33). Finally, this figure also shows the radial integral of the sinc2 intensity distribution that applies to orthogonal pairs of cylindrical optics, as will be discussed in Section 4.4.5. The numerical values of IR , the integrated intensity at the Rayleigh resolution, are given in Table 4.1, along with the radii νdark ring of the first dark ring in the Airy distribution. This figure is inspired by [Michette 1986, Fig. 8.17].

can be subsequently triggered with light into going back to a fluorescing state [Dickson 1997]. This eventually led to the development of the superresolution techniques of PALM (for photoactivated localization microscopy) [Betzig 2006], STORM (stochastic optical reconstruction microscopy) [Rust 2006], FPALM (fluorescence PALM) [Hess 2006], and their variants.

These and other related advances are summarized in recent review papers [Hell 2007, Yamanaka 2014]. Eric Betzig, Stefan Hell, and William Moerner received the 2014 Nobel Prize in Chemistry for the development of superresolution methods in light microscopy. They have changed the emphasis on high spatial resolution alone in x-ray microscopy of biological specimens; instead, x-ray microscopy provides important complementary capabilities such as the ability to obtain isotropic resolution 3D images of the full native, unlabeled content of thicker specimens (Chapter 8 and ptychographic tomography in Chapter 10), and intrinsic chemical and elemental information separate from the use of specific visible-light fluorophores (Chapter 9). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

126

Imaging physics

1.2

1.126

Sum

Ƭ ›

Airy intensity I

1.0

Spot 1

0.8

Spot 2

0.6

0.4

0.2

0.0 -5

0

Ƭ

5

10

Figure 4.35 In the Sparrow resolution criterion, two nearby incoherent sources are considered to be resolved when the net intensity profile flattens out in between the sources (its second derivative is zero). For two Airy intensity patterns from unobstructed (b = 0) circular optics, this condition occurs at ν = 0.941π rather than the ν = 1.22π separation corresponding the Rayleigh resolution criterion shown in Fig. 4.29. The corresponding Sparrow resolution formula for an unobstructed lens is given in Eq. 4.177, with Eq. 4.178 giving the equivalent for a lens with a half-diameter central stop (b = 0.5).

4.4.5

Cylindrical (1D by 1D) optics With visible-light optics, and when using Fresnel zone plates (Section 5.3) and some compound refractive lenses (Section 5.1.1) for x-ray focusing, circularly symmetric lenses are employed so the discussions given above are directly applicable. With circular optics, the focus amplitude is the Fourier transform of a circ(r/a) pupil function, which yields an Airy2 (r/a) intensity pattern as shown in Fig. 4.22. However, not all x-ray optics are circularly symmetric! Kirkpatrick–Baez mirrors (Section 5.2.2), multilayer Laue lenses (Section 5.3.6), and some compound refractive lenses (Section 5.1.1) use separate optics to focus in each of the two directions orthogonal to the beam propagation direction, as shown in Fig. 4.36. The equivalent in visible light optics is the use of cylindrical (rather than spherical) lenses that produce line foci, so that 2D focusing is achieved with two cylindrical lenses arranged orthogonal to each other and to the beam direction as with Kirkpatrick–Baez optics (Fig. 2.1). With such a pair of line-focusing optics, the focus amplitude is the Fourier transform of a rect(x/a x ) · rect(y/ay ) pupil function which yields a sinc2 (x/a x ) sinc2 (y/ay ) intensity pattern (a comparison between full-aperture Airy and sinc functions was shown in 1D in Fig. 4.23). If a square central stop with fractional width b is employed, the focus spot

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

d bd

Vertical lenses

Vertical lens

d

bd Circular lens with optional central stop

Cylindrical lenses with optional central stop

Horizontal lens

Horizontal lenses

d

127

bd Kirkpatrick—Baez, MLL optics

Figure 4.36 The pupil functions of different optics along with the intensity distribution in focus. In circular optics (left), one obtains an obstructed Airy2 focus corresponding to an optic with diameter d and a central stop with fractional diameter b (Section 4.4.3). When using orthogonally crossed one-dimensional focusing optics such as grazing incidence optics or multilayer Laue lenses, the direct equivalent is to have separate optics (each with diameters d as indicated, and central stops of fractional width b) for focusing in the horizontal and vertical directions, respectively as shown in the middle. For this case (which is almost never encountered with x-ray optics), the focus intensity distribution is given by Eq. 4.179. With crossed orthogonal compound refractive lenses (CRLs; see Section 5.1.1), the case at middle is what applies except the fractional central stop diameter is usually b = 0, so the intensity profile is as shown at right. Another common arrangement (right) is to have single off-axis optics for each focusing direction; this represents the case for Kirkpatrick–Baez mirror optics (Fig. 2.1, and Section 5.2.2) as well as for most multilayer Laue lenses (MLLs; Section 5.3.6). In this latter case, because there is no central stop, one does not have the interference effect between two focused beams converging from different directions (the case of b > 0 in the middle figure) so there are no enhanced sidelobes off of the central focus. Instead, the lenses in each direction (horizontal, and vertical) act like full-aperture 1D optics with a full diameter of d as indicated at right. One must still pay careful attention to the alignment of the two 1D off-axis optics [Yan 2017].

instead becomes [Yan 2012] I(X, Y) =

 sin(X) sin(Y) sin(bX) sin(bY) 2 I0 − 2 2 X Y X Y (1 − b )

(4.179)

with X = 2πRx/(λ f ) and Y = 2πRy/(λ f ), where R is the distance from the optical axis to the outer aperture of the optic and f is the focal length. (This result is essentially the sinc()2 version of the obstructed Airy function of Eq. 4.176.) Crossed cylindrical optics that are symmetric about the optical axis and have central stops therefore show strong side lobes surrounding the central focus, which one can think of as arising from interference between the two light beams converging on the optical axis from opposite directions (Fig. 4.36). However, in x-ray optics, this situation almost never arises: compound refractive lenses (CRLs) are symmetric about the optical axis but do not need central stops, while Kirkpatrick–Baez mirrors (KB mirrors) and multilayer Laue lenses (MLLs) are almost always located only on one side of the optical axis, as shown at right in Fig. 4.36. With circular optics, the Rayleigh resolution δr = 0.61λ/N.A. (Eq. 4.173) is defined based on the radial position of the first minimum in the diffraction pattern of an optic Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

128

Imaging physics

with no central stop (b = 0). As was noted in connection with comparisons of diffraction from slits and circular apertures (Fig. 4.23), the position of the first diffraction minimum for a cylindrical optic is a bit narrower, giving a value for the spatial resolution of λ . (4.180) N.A. However, because the focus profile is not circularly symmetric, Eq. 4.180 cannot be interpreted as providing a precise measure of the spatial resolution in all directions. An alternative therefore is to consider the fraction of focused light intensity contained within various radii. This is shown in Fig. 4.34 and Table 4.1, where one can see that 83 percent of the intensity is contained within a radius corresponding to the central Airy disk for an unobstructed (b = 0) circular optic. It is worthwhile to explicitly review the situation that applies to each of several types of x-ray optics: δr,cyl  0.5

• Fresnel zone plates are circular optics as shown at left in Fig. 4.36. They have circular optic properties as described in Section 4.4.3, with further detail provided in Section 5.3.1. • Kirkpatrick–Baez and Montel nanofocusing systems (Section 5.2.2) are usually cylindrical half-optics, and furthermore, they usually do not reach to the center of the optical axis. That means they are represented by the case shown at right in Fig. 4.36. We showed in Section 4.4.2 that Fresnel propagation of the wavefield transmitted through a lens to the focal plane leads to a focal spot characterized by the Fourier transform of the pupil function of the lens. Therefore, with an off-axis half-optic, the pupil function in each direction is just a simple shifted rect() function so the resulting focal spot is simply a sinc()2 function with no modification to account for a central stop and thus no accentuation of the side lobes. However, the N.A. of the optic is now set not by the maximum reflection angle (see for example Eq. 5.10), but by the total aperture d of the optic, or (referring to Fig. 5.7) the meridional angle αm from the optic to the image plane. Therefore N.A. = αm /2 or N.A. x,y = d x,y /(2 f x,y ) where f x,y is the focal length in each direction, and the diffraction limit to spatial resolution is approximately given by using that N.A. in Eq. 4.180. • Multilayer Laue lenses (Section 5.3.6) are also usually half-optics located off of the optical axis, with the same properties of producing a sinc()2 intensity distribution in each direction. The numerical aperture of the optic is again given by N.A. x,y = d x,y /(2 f x,y ) rather than by N.A. = λ/(2drN ) as would have been the case for a Fresnel zone plate (Eq. 5.27). This means the diffraction-limited spatial resolution for an MLL will not be given by δr = 1.22drN as one would have expected from Eq. 5.28; instead, one again has a spatial resolution more like that given by using the optic’s N.A. in Eq. 4.180, which is reduced from what one would have expected based on the width of the thinnest layer in the multilayer Laue lens. • Compound refractive lenses (Section 5.1.1) can be fabricated either as circularly symmetric optics [Lengeler 1999b] in which case the usual considerations of circular optics apply, or as orthogonal pairs of cylindrical optics made using eiDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

Coherence width

129

Plane wave

Coherence length

Figure 4.37 Coherence lengths and widths refer to the distances over which there is good phase correlation between wavefronts. Shown here are the directions of coherence lengths and widths, respectively, for a quasi-monochromatic plane wave.

ther macroscopic [Snigirev 1996] or nanofabrication [Aristov 2000b] techniques. There is no reason to use central stops with compound refractive lenses, so their focal spot profile is the sinc(X)2 sinc(Y)2 distribution that one obtains from Eq. 4.179 with b = 0. However, absorption limits the transmission at the outer ends of the aperture, so it is difficult to directly apply the results of either Eq. 4.173 (for circularly symmetric optics) or Eq. 4.180 (for orthogonal 1D or cylindrically symmetric optics) to estimate the spatial resolution of compound refractive lenses. Thus we see that it is a bit difficult to apply a single, precise number of the spatial resolution of many x-ray optics. This will be seen further in Section 4.4.7 and Fig. 4.49. Finally, one must use care in aligning two 1D optics relative to each other [Yan 2017].

4.4.6

Coherence, phase space, and focal spots In our discussion of the diffraction-limited point spread function of an optic, we made assumptions of point source wave emitters, and perfect plane waves. We now consider the consequences of imperfection in these assumptions, leading us into a discussion of coherence. We do this in a simplified treatment, rather than follow the more sophisticated approaches of the Gaussian–Schell model [Coisson 1997] or the Wigner distribution approach [Kim 1986] as applied to synchrotron radiation experiments. Coherence refers to the degree of phase correlation that exists across a wavefield. For a plane wave, we can consider two separate aspects of coherence, as shown in Fig. 4.37: • Coherence length c refers to wavefront positions separated in time, or separated longitudinal positions, over which there is good phase correlation. This is related to the degree of monochromaticity of a wave, so that the coherence length c of an illuminating beam is given by the number λ/Δλ of nicely overlapping waves times the wavelength λ, or E λ2 =λ (4.181) c = Δλ ΔE where Δλ represents the spread of wavelengths in a quasi-monochromatic beam

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

130

Imaging physics

p2 2r

Ƨ

p1

Figure 4.38 The van Cittert–Zernike theorem can be used to calculate the degree of mutual coherence between two points when illuminated by an incoherent quasi-monochromatic source illuminating an aperture. Shown here is the example of Eq. 4.183 of calculating the degree of mutual coherence μ1,2 between points 1 and 2 separated by an angle θ in the far field, when the aperture is a pinhole with diameter 2r.

(see also Eq. 7.4). The coherence length is sometimes referred to as longitudinal coherence. • Coherence width wc refers to wavefront positions separated transversely to the wave propagation direction over which there is good phase correlation. The coherence width is sometimes referred to as transverse coherence. Of course the details of calculating coherence lengths and widths depends both on the degree of mutual coherence that one wishes to achieve, as well as the statistical distribution describing the departures from monochromatic plane waves. We already gained some feel for these issues in Section 4.1.1. One approach for considering spatial coherence is given by the van Cittert–Zernike Theorem [Zernike 1938, van Cittert 1939, van Cittert 1958], which involves integrating wavefront sources over an arbitrary aperture to gauge the degree of partial coherence between two points downstream, as shown in Fig. 4.38. It allows one to calculate the degree of mutual coherence μ1,2 that exists between wavefronts at location 1 versus location 2, and this degree of mutual coherence is equal to the fringe visibility V of V=

Imax − Imin Imax + Imin

(4.182)

in a simple interferometry experiment. If a pinhole aperture of diameter 2r is illuminated with a spatially incoherent but monochromatic source as shown in Fig. 4.38, the van Cittert–Zernike theorem gives a degree of mutual coherence μ1,2 between two points separated by an angle θ of [Born 1999, Eq. 10.4.28] μ1,2 =

2J1 (νμ ) , νμ

(4.183)

2πrθ . λ

(4.184)

with νμ =

The alert reader should sense something familiar here.5 The result of Eq. 4.183 is just 5

To quote the legendary New York Yankees baseball catcher Yogi Berra, “It’s like d´ej`a vu all over again.”

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

131

the same Airy function result that we saw in Eq. 4.132 and Section 4.4.3, and we plotted [2J1 (ν)/ν]2 in Fig. 4.23. To put the van Cittert–Zernike expression for mutual coherence into perspective, let’s consider first its relationship to the Heisenberg uncertainty principle in quantum mechanics. This was originally stated [Heisenberg 1927] in terms of the commutation of operators for position xˆ and momentum pˆ as [ xˆ, p] ˆ = i where  = h/(2π) (Eq. 3.1). If it is instead stated as the product of the standard deviation in position σ x and in momentum σ p , one arrives at [Griffiths 2004, Eq. 3.63] σx σ p ≥

 2

(4.185)

or, as more commonly written,  . (4.186) 2 We saw the energy–time version of Eq. 4.185 in Eq. 3.24. Now let’s consider a nonrelativistic particle with momentum pz = mv, so that it has a de Broglie wavelength from Eq. 3.5 of λ = h/pz . If this particle encounters a slit of width of 2Δx, its position distribution can be measured at a downstream plane, as shown in Fig. 4.39. We would expect from the single slit diffraction result of Eq. 4.29 that the first zero in its probability distribution is at an angle of θ = λ/(2Δx). Taking this as a measure of uncertainty in momentum in the xˆ direction Δp x , we have (Δx) · (Δp) ≥

Δp x = pz sin(θ) h h λ = = λ 2Δx 2Δx

(4.187)

or h . (4.188) 2 In fact in Eq. 4.104 we found that the intensity distribution for single slit diffraction is given by I(ν) = sinc2 (ν) with a first minimum at νfirst min = π. Thus if (Δx) · (Δp x ) = h/2 corresponds to ν = π, then the Heisenberg uncertainty relationship of (Δx) · (Δp) ≥ /2 corresponds to ν ≤ 1/2. Therefore, if one were to treat the Heisenberg uncertainty relationship as providing a degree of mutual coherence described by the van Cittert– Zernike theorem but for a 1D slit described by sin(ν)/ν = sinc(ν) rather than a 2D pinhole described by 2J1 (ν)/ν, one would have (Δx) · (Δp x ) =

μ12 = sinc(ν = 1/2) =

sin(1/2) = 0.96, 1/2

which is a very strong phase correlation. Of course it is not quite right to treat a slit halfwidth of Δx as being equivalent to σ x , which is the standard deviation of a Gaussian, nor is it quite right to treat the position of the first minimum sinc(νfirst min ) as being equivalent to σ p (and one can almost hear Wolfgang Pauli’s ghost saying “it is not even wrong!” [Peierls 1960]). But hopefully it is informative. This Heisenberg-inspired connection to particle mechanics brings to mind Liouville’s theorem in classical mechanics, which states [Goldstein 2002, Sec. 9.9] that a system Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

132

Imaging physics

6x

Ƨ pz

px

Figure 4.39 The Heisenberg uncertainty principle can be thought of as a single slit diffraction experiment, where a particle with momentum pz is restricted to a position range 2Δx with a corresponding intensity distribution in angles Δp x /pz .

with a constant Hamiltonian H = T + V (where T is the kinetic energy and V is the potential energy) has a constant volume in phase space, or (Δp) · (Δq) = constant.

(4.189)

If we relate Δp to trajectory angles as we have done in Eq. 4.187, then Liouville’s theorem can be thought of as saying that the size times the divergence of a beam is a constant at any focus, or sθ = s θ

(4.190)

in the notation of simple imaging systems discussed in connection with Eq. 4.142. Therefore one can use optics (which involve no change in the Hamiltonian) to image a beam of particles to a smaller width with larger divergence, or a larger width with smaller divergence. So let’s consider the focus of a light beam in light of the Rayleigh resolution result of Eq. 4.173 of δr = 0.61λ/N.A., and work not with the radius but the diameter contained within the central Airy probe, and not the half-angle N.A. but the full opening angle 2N.A. of the lens. This leads to a full-width, full-angle phase space area product p0 of λ ) · (2N.A.) = 2.44λ (4.191) p0 = (2 · 0.61 N.A. for the diffraction-limited focus of an aberration-free lens. We started out this section by wondering about departures due to illumination sources that were not point-like, and how they would affect the focus of a lens. The situation with very small and very large sources is obvious, as illustrated in Fig. 4.40. But what about the in-between cases? In scanning microscopes using absorption or fluorescence contrast, the net intensity distribution of the focus spot is the relevant characteristic, and this is given by a convolution (Fig. 4.18) of the diffraction-limited focus with the geometrical image of the source. Referring to the geometry of Fig. 4.41, we wish to consider the illumination source size–angle product p of p = h(2θ) = h (2θ )

(4.192)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

133

Image: demagnified source convolved with aperture diffraction

Illumination source Near-diffraction-limited

Source-limited

Figure 4.40 Small versus large illumination sources and their effect on the focus of a lens. When the geometric image of the source is small compared to the diffraction-limited focus, the focus is almost unaffected, whereas with a large source the lens focus will strongly resemble the geometric image of the source.

and how it affects the focus of a lens. Now p0 = 2.44λ describes a geometrical image h with a radius extending out to the position of the first Airy minimum, or ν = 1.22π for a circular optic with no central stop. Therefore, p = 1λ

corresponds to

ν = π/2,

(4.193)

so convolution of the point spread function of a lens with a disk extending to a radius of ν = π/2 will produce the p = 1λ result. The results of such a calculation are shown in Figs. 4.42 and 4.47, where one can see that the focus is nearly unaffected [Jacobsen 1992b, Winn 2000] for values of the phase space parameter p less than about 1λ. Another way to describe spatial coherence from a radiation source is to think of the case of multimode lasers. While there are details of optical cavities that we shall gloss over here, some lasers emit pure, single-coherence-mode beams (such as a TEM00 mode) while others will emit into multiple modes. In the latter case, one can use a spatial filter to “clean up the beam” by removing most of the contribution from other modes. These spatial filters often consist of a microscope objective used to image the incoming laser beam to a small spot with a convergence semi-angle θ, and then a pinhole is placed at that spot to limit the beam height h and thus control the phase space area of the beam that passes beyond the spatial filter. (In synchrotron beamlines, beamline optics and slits can perform the same function; see Section 7.2.2). The number Msource Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

134

Imaging physics

Figure 4.41 Geometry for convolution of an illumination source with the diffraction-limited focal spot of a lens with diameter 2δr . The source size h is imaged down to a smaller spot h , and the beam divergence/convergence angles changing in opposite proportion; that is, hθ = h θ .

of incoming spatially coherent modes can be characterized by Msource =

p h(2θ) = λ λ

(4.194)

in each transverse direction, so that Msource = 1 corresponds to p = λ. One can then adjust the size h of the pinhole in the spatial filter to control the number of modes that are transmitted, with a flux-versus-coherence trade-off as shown in Figs. 4.42 and 4.47. We now bring this p = 1λ phase space criterion back into the framework of the van Cittert–Zernike theorem result for a pinhole of radius r (Eq. 4.183), which was a function of νμ = 2πrθ /λ with θ as the angular separation between two measurement points. In our case, the pinhole radius is r = h/2 so the value of νμ corresponding to one edge of the optic versus its center (giving θ = θ from Fig. 4.41) is given by νμ = 2π

(h/2)θ hθ rθ = 2π =π . λ λ λ

(4.195)

Substituting p = h(2θ) in the above gives νμ =

πp 2λ

(4.196)

as the argument to give the degree of mutual coherence between the center and edge of the illuminated lens. Therefore, for p = 1λ (for which the focus of a lens maintains neardiffraction-limited performance as shown in Fig. 4.42), the degree of mutual coherence between the lens center and edge is μcenter,edge =

2J1 (νμ ) 2J1 (π/2) = 0.72, = νμ π/2

(4.197)

which happens to be similar in magnitude to the value of ηn = 0.73 that we found for the Rayleigh quarter wave criterion for with Gaussian or normally distributed phases in Section 4.1.2. We summarize our discussion of coherence, phase space, and lens focal spots with the following comments: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

135

Normalized to constant source photon areal density

Normalized to constant source energy

4 3 2 1 -10 0 -15

-5

0

5

10

15

4 3 2

Ƭ

1

0-15 -10

-5

0

5

10

15

Ƭ

Figure 4.42 Intensity profile of the focus of a lens as a function of illumination source modes Msource as given in Eq. 4.194. The calculation at left is normalized to constant total energy, thus emphasizing the preservation of the sharpness of the focus at Msource  1, or p  1λ (Eq. 4.192). The calculation at right is normalized to constant areal photon density at the source, showing how the total focused flux Φ increases as one opens up an optic’s illumination aperture h. A circular lens with a half-diameter central stop was used, with a diffraction-limited focal profile I(ν) as given by Eq. 4.176 and shown in Fig. 4.31. An earlier version of the figure at left was shown by Winn et al. [Winn 2000]; see also Fig. 4.47.

• To reach near-diffraction-limited focusing with a lens in a scanning microscope, one should restrict the illumination phase space to a product of p = h(2θ)  1λ as shown in Fig. 4.41. This must be done in each orthogonal direction (that is, both in xˆ and yˆ for an illumination beam propagating in the zˆ direction). This is equivalent to saying Msource  1

(4.198)

in each transverse direction. • If one has a source with a phase space area significantly larger than p  1λ, one good strategy is to form an intermediate image of the source, and place an aperture at that location to limit the source size h imaged by the scanning microscope’s objective lens. This is what a spatial filter does. • As will be discussed in Section 7.1.1, x-ray sources are often characterized in terms of their spectral brightness Bs (Eq. 7.3), which describes the flux per source size per solid angle per spectral bandwidth. The spatially coherent flux Φc within a given spectral bandwidth is given by Φ c = B s · λ2

(4.199)

based on p = 1λ in each transverse direction, which to our knowledge was first pointed out independently by Green [Green 1976] and Kondratenko and Skrinsky [Kondratenko 1977]. The “per spectral bandwidth” characteristics of source brightness describe the coherence length c of Eq. 4.181. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

136

Imaging physics

Other modes Other modes

Multimode source

Nanofocus

Central coherent mode

Secondary source aperture

Multimode source

Nanofocus

Central coherent mode

Other modes

L1

L2

L1

L2

L3

L4

Figure 4.43 One-stage and two-stage source demagnification beamline design schemes. These represent two alternative beamline choices for selecting a single coherent mode for nanofocusing experiments. One-stage schemes offer minimal flux loss due to imperfect beamline optics, while two-stage schemes offer shorter beamlines and greater control over flux versus resolution trade-offs. Figure adapted from [de Jonge 2014].

• In the latest synchrotron light sources (Section 7.1.4), the electron beam emittance (size·angle product at an electron beam focus) is now decreasing toward or even dipping below the intrinsic photon wavelength λ (these two quantities are combined to yield the net source emittance, as will be shown in Eq. 7.12). These facilities are then referred to as diffraction-limited storage rings [Eriksson 2014]. In fact, many synchrotron light sources have already approached or exceeded diffraction-limited status with their vertical emittance. However, the horizontal emittance is typically about a hundred times larger than in the vertical due to dispersion of the beam in its near-circular orbit, and it is only now that the horizontal emittance is being reduced to give Msource  1 in the latest machines. • Another way to refer to a source’s extent in phase space is through the classical optics term of e´ tendue (or e´ tendue g´eom´etrique). • Full-field imaging systems do not require single-mode illumination. Instead, they can accept about as many spatially coherent modes in each direction as the illumination field divided by twice the spatial resolution, as discussed in Section 4.5. They still benefit from source spectral brightness Bs , since it determines the photon flux per resolution element in the image. • The requirement of Msource  1 for high-resolution scanning microscopy is similar to the requirement for coherent diffraction imaging and ptychography, as will be discussed in Section 10.3.2. One can of course bring much more powerful methods into play to discuss the effects of partial coherence [Goodman 2015, Kim 1986, Coisson 1997, Vartanyants 2016], but the above summary is often sufficient for practical experiments. When the radiation source has many spatial modes Msource that must be thrown away in order to achieve a diffraction-limited focus, one can be very flexible with the optical design of a beamline; for example, one does not have to be concerned with the tilting of phase space ellipses with propagation distance (as discussed in Section 7.2.2, and shown in Fig. 7.10), which can otherwise lead to a loss of useful flux and an undesired correlation of angle with position in the illumination arriving at apertures or beamline Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

137

optics. With a diffraction-limited storage ring, one must be much more careful to preserve the acceptance and coherence of the selected mode; this involves both increased precision of the beamline optics, and also proper optical design. In Fig. 4.43, we show two alternative schemes [de Jonge 2014] for selecting a single coherent mode from a nearly diffraction-limited source: • One option (Fig. 4.43A) is to place the nanofocusing optic at some distance from the source, so that it accepts only a single coherent mode and demagnifies the source directly. Because no beamline focusing optics or apertures are used between the source and the nanofocusing optic, one does not lose flux or degrade the coherence of the central mode due to imperfections in beamline optics. However, one must then choose the diameter of the nanofocusing optic based on properties of the source; this sets conditions on the optic that may not be optimal due to other considerations, such as minimum focal length due to working distance constraints, or maximum diameter due to thickness limits in multilayer Laue lenses (Section 5.3.6) or field diameter limits in electron beam lithography of Fresnel zone plates (Section 5.3). One may also require different desired optic diameters in the horizontal and vertical directions, complicating the use of circular zone plates or refractive lenses. This approach tends to lead to long beamlines, with attendant conventional construction costs. For example, to demagnify a 40 μm source (representative of the horizontal source size expected for the next generation of high-brightness storage rings) to a 20 nm spot while using a nanofocusing optic with a convenient focal length of 0.1 meters, one would need to have the distance L1 in Fig. 4.43A be about 200 meters. Finally, oscillations in source position will lead to oscillations in the probe position or, equivalently, pixel position errors in scanning microscopy; oscillations in the source angle will lead to oscillations in focused beam intensity only to the degree that the source has one or a few modes. • Another option (Fig. 4.43B) is to use beamline optics to image the source onto a secondary source aperture, which can be adjusted to pass one or several coherent modes (see Section 7.2.2). This allows the experimenter to make a flux-versusresolution trade-off if the source has multiple coherence modes, and it also allows one to reject beamline-optics-caused degradations of the coherence of a single mode (spatial filters in visible-light laser laboratories work on the same principle by placing a pinhole at a lens focus) at the cost of a further loss of flux. Oscillations in either source position or angle would lead to oscillations in the focused beam intensity, which can in principle be corrected using a mostly transparent beam flux monitor (such as a thin diamond film) after the secondary source aperture. One can also adjust the secondary source aperture’s diameter, and distance L3 to the nanofocusing optic (Fig. 4.43B), so as to accommodate a desired nanofocusing optic diameter. By using two-stage demagnification, one can design a shorter beamline with lower conventional construction costs; for example, if one chose L1 = 30 m due to the distance at which a first optic can be placed after an accelerator shield wall and L2 = 3 m, one can demagnify a 40 μm source to 4 μm at the point of the secondary source aperture, and then if L3 = 20 m to work Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

138

Imaging physics

with a nanofocusing optic focal length of L4 =0.1 m, one will have demagnified a 40 μm source to a 20 nm focus in a total beamline length of only 53.1 m. One of the engineering challenges in this approach is to make high-quality, controllable apertures of just a few micrometers in width and with the ability to handle high power densities (Section 7.2.3), while another is to not add “phase noise” via imperfections in the refocusing optic. The high conventional construction costs associated with long beamlines can be balanced against the cost of optics and slits for the two-stage scheme. As will be seen in Section 7.1.6, undulators in synchrotron light sources produce radiation with a spectral bandwidth of ΔE/E  1/Nu where Nu , is the number of magnetic periods in the undulator (Nu  100 in most cases). This bandwidth can be broadened by angular divergence of the source, so that diffraction-limited storage rings will provide improved spectral purity which can also be exploited to yield further gains in the flux of a nanofocused beam. Rather than doing spectral filtering with a crystal monochromator with a bandpass of order of 0.1–1 eV, one could use a multilayer-coated nanofocusing mirror (Section 5.2.4), or a multilayer-coated deflecting mirror (Section 4.2.4), or a double multilayer monochromator (Section 7.2.1) to select the entire approximately 1 percent spectral bandwidth of the undulator’s harmonic output. Smaller-diameter Fresnel zone plates have only 100–200 zones, and again they could use the entire spectral output of the source, as could a conventional non-multilayer-coated Kirkpatrick–Baez reflective nanofocusing system (Section 5.2.2).

4.4.7

Transfer functions In Fig. 4.19 we introduced the idea that one can take the Fourier transform of an image and learn about its distribution of information in reciprocal space. Now consider the fact that the Fourier transform takes a signal and represents it as some linear combination of sine waves with different frequencies, complex amplitudes, and phase offsets. With this picture in mind, we can represent the object which is imaged by its Fourier decomposition, where the object is represented by a set of diffraction gratings of various periodicities, orientations, and strengths. We can then consider how an imaging system transfers information from these gratings of different spatial frequency. This is precisely the approach put forward by the great German optical physicist Ernst Abbe (1840–1905). More detailed treatments are provided elsewhere (see for example [Goodman 2017]); we provide here a more conceptual summary. Let us start by considering a coherent plane wave incident on a grating of period d, which is then imaged by an objective lens, as shown in Fig. 4.44. As noted in the discussion of Eqs. 4.149 and 4.150, x-ray microscopes are often used for large demagnification so we will consider this grating to be located at a small distance Δz from the focal length of the lens, so that the semi-angle subtended by the lens is very nearly the numerical aperture N.A.. As one decreases the grating period d, the +1 order diffraction angle θ  λ/d increases until it reaches the numerical aperture N.A. of the lens, at which point the diffracted ray is no longer collected by the lens and imaged (the same happens

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

139

Ray position in lens aperture Grating with periodicity

d

Ƨ

Objective lens (imaging lens)

+ud

Spatial frequency

-ud

1

Contrast transfer function

Figure 4.44 Optical transfer function for coherent imaging as a function of spatial frequency u

normalized to ud of Eq. 4.200. If a purely parallel wave is incident on a grating with period d, diffracted rays are captured by the lens and imaged as long as the grating spatial frequency does not exceed uN = N.A./λ (Eq. 4.200) and the grating half-period is larger than or equal to Δmin,coherent = λ/(2N.A.) (Eq. 4.201).

with the −1 diffraction order). Thus the maximum spatial frequency ud from the grating captured by the coherent imaging system is ud =

1 N.A. = dmin λ

(4.200)

and the finest half-period feature that can be resolved has a width of λ dmin = . (4.201) 2 2N.A. All spatial frequencies within the range −ud to +ud are collected with unit efficiency, so we speak of this coherent imaging system as having an optical transfer function (OTF) of 1 within the range −uN to +uN , as shown in Fig. 4.44. When considering an overall imaging with imperfections in detectors or other components, one can also speak of an overall system contrast transfer function (CTF), or the unsigned version (for reasons that will become clear when we discuss phase contrast in Section 4.7) as the modulation transfer function (MTF). Now let us consider an imaging system as shown in Fig. 4.45, where a condenser lens is used to image a source of illumination onto a grating after which an objective lens again collects light to deliver an image (see also Fig. 1.1). Consider a ray from the upper edge of the illuminating lens that travels at a negative angle (and thus negative spatial frequency u = θ/λ). It can be diffracted by a grating of period d to the maximum positive spatial frequency that can be collected by the objective lens. From Eq. 4.30, which becomes 2d θ = λ with θ = N.A., one can see that the highest spatial frequency Δxmin,coherent =

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

140

Imaging physics

Condenser lens (illuminating lens)

Condenser -u Objective

+u

Ƨ

Objective lens (imaging lens)

Ƨ

Transfer function

-2

Grating with periodicity dȾ

0

2

u/ud

Figure 4.45 Optical transfer function for incoherent imaging as a function of spatial frequency u

normalized to ud of Eq. 4.200. The finest grating period d that can be detected is one where a ray comes from one extreme angle from the condenser or illuminating lens, and is diffracted to the opposite extreme angle and just collected by the objective or imaging lens. As shown at right, the range of collection angles in the orthogonal direction is reduced in this case, and the optical transfer function (OTF; Eq. 4.203) is determined by the degree of overlap between these apertures in reciprocal space. With an incoherent imaging system, there is some (decreasing) transfer for spatial frequencies twice as high as the limit ud (Eq. 4.200) for coherent imaging systems.

ud that can be transferred from the object through to the image is ud =

1 2N.A. = 2ud . = d λ

(4.202)

That is, one can see features of half the size with incoherent brightfield imaging with critical illumination as compared to coherent illumination. However, when collecting the most extreme rays in one direction, the lens apertures become very narrow in the orthogonal direction; the degree of overlap between the illuminating and collecting apertures (the condenser and the objective lens, respectively) becomes very small. As a result, the OTF for the finest detectable periodicities becomes small. One can in fact calculate the OTF for incoherent brightfield imaging from the degree of overlap of these apertures, leading to a result [Goodman 2017, Eq. 7-33] of    % u &2 % u & 2 u OTFincoherent (u) = 1− , (4.203) − arccos π 2ud 2ud 2ud which is shown in Fig. 4.46. One can arrive at the OTF of an imaging system from another approach: by calculating the normalized autocorrelation function of the amplitude transfer function [Goodman 2017, Eq. 7-29]. For brightfield incoherent imaging, which is the case that described absorption contrast in a full-field microscope with critical illumination, a scanning microscope with a large area transmitted flux detector, or a scanning microscope with fluorescence detection, this means that the OTF can be calculated from a Fourier transform Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

Coherent

1.0

nt; er e op oh l st In c e n tr a c no

Transfer function value

141

0.8

0.6

0.4

Incoherent; b=0.5 central stop

0.2

0.0 -2

-1

0

1

2

f/fd Figure 4.46 Optical transfer function for coherent and incoherent imaging with optics with no central stop, and also the incoherent transfer function for an optic with a half-diameter (b = 1/2) central stop.

of the intensity point spread function (Eq. 4.174), or OTFincoherent (u x , uy ) = F {psf(x, y)}.

(4.204)

One might not think this so useful since Eq. 4.203 is easy to calculate and plot, but the real virtue of this approach comes when one consideres modifications to a standard point spread function such as due to partial coherence of the illumination (Fig. 4.47), or optics with central stops (Fig. 4.48), or defocus effects, as will be discussed in the next section. Our discussion here is aimed at providing just a taste of OTFs. The full meal includes servings of bilinear transfer functions [Saleh 1979, Courjon 1987], partial coherence [Hopkins 1951, Hopkins 1957], and treatments using the Wigner distribution function [Bastiaans 1986]. In Section 4.5, we will see how the N.A. of the condenser lens affects the OTF.

4.4.8

Deconvolution: correcting for the transfer function For incoherent brightfield imaging, the image delivered by a microscope can be described by a convolution of the microscope’s point spread function psf(x, y) with the transmittance o(x, y) of the object being imaged, yielding an image i(x, y) of i(x, y) = o(x, y) ∗ psf(x, y) = F −1 {O(u x , uy ) · OTF(u x , uy )},

(4.205)

where we have made use of the properties of convolution given in Eq. 4.83 and the result of Eq. 4.204 that the OTF is a Fourier transform of the point spread function. The expression of Eq. 4.205 can be rearranged to give + , F {i(x, y)} −1 o(x, y) = F , (4.206) OTF(u x , uy ) Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

142

Imaging physics

No central stop (b=0)

Half-diameter central stop (b=0.5) 4.00

4.00 0.1

0.1

3.50

3.50 3.00

0.1

3.00

0.5

0.5

0.1

2.50

2.50

2.00

2.00 1.50

1.50 0.1 0.4 0.2 0.5 0.3

1.00 0.50 0.00 -2.0

0.50

0.3 0.5 0.1 0.2 0.4

-1.5

-1.0

0.1 0.5 0.3 0.2

1.00

-0.5

0.0

0.5

1.0

1.5

2.0

-1.5

0.4

0.3 0.5 0.4

0.1 0.2

0.00 -2.0

-1.0

-0.5

u/ud

0.0

0.5

1.0

1.5

2.0

u/ud

Figure 4.47 Optical transfer function (OTF) for incoherent brightfield imaging as a function of illumination source modes Msource (Eq. 4.194), or illumination phase space p as given in Eq. 4.192, and spatial frequency u normalized to ud of Eq. 4.200. The OTF is shown for a full-diameter optic (left), and one with a half-diameter central stop (right). In this combination of gray-scale image and contour lines, the OTF for a fully coherent source (Msource = 0) and full-diameter optic as plotted in Fig. 4.46 is shown here as a height out of the paper along the abscissa. These contour images are the OTF representation of the same information shown in Fig. 4.42. 1.00 0.05

Central stop fraction b

0.05

0.80

0.1 0.1

0.60 0.2 0.2

0.40 0.3 0.4 0.5

0.20 0.1 0.2 0.4 0.5 0.05 0.3

0.00 -2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

u/ud Figure 4.48 Optical transfer function of a lens versus the fractional diameter b of the central stop. With large central stops, the mid-range spatial frequencies u are increasingly lost due to the large side lobes in the point spread function. See also Figs. 4.31 and 4.33.

which makes it seem deceptively easy to recover the true object without the blurring effects of the lens. One would indeed be deceived to apply Eq. 4.206 directly! To understand why, consider two results shown earlier: the fact that image signals tend to drop off with spatial frequency as ∝ u−a , as shown in Fig. 4.19, and that OTFs decrease towards very small values at high spatial frequencies, as shown for example in Fig. 4.46. As a Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

50

25

6

10 Ca

105

102 1

104 10

Ca

3

102

101

MTF

K

10

S

500

(b) 3

0

+DOISHULRG QP 100

50

25

2.0

Deconvolution Filter

10 Intensity (a.u.)

+DOISHULRG QP 100

500

(a)

143

1.0 Ca

0.1 S

N

K

10-1 0.01

0.5

1

5

10

Radial spatial frequency ur ѥP-1) (d) Original

20

1

5

10

20

Radial spatial frequency ur ѥP-1) (e) Deconvolved

(c) Probe (PSF)

ѥP

Ca fluorescence

ѥP

Figure 4.49 The most basic way to deconvolve the blurring due to an optical system’s point spread function from an image is to apply a Fourier-plane filter which is the inverse of the optical transfer function (OTF), as shown in Eq. 4.206. However, because the OTF reaches low values at high spatial frequencies u, this could lead to a magnification of noise. In this example, power spectra (a) (Section 4.3.4) were obtained from x-ray fluorescence images (Section 9.2) with different signal levels due to intrinsic concentrations of the elements sulfur, potassium, and calcium within a frozen hydrated alga (see Fig. 12.2 for a related image). Since each element’s power spectrum had a separate signal dependence S ∝ u−a r and “noise floor” N, one can find for each element a value of the spatial frequency ur,S=N where the signal trend reaches the noise floor (at a falue of ur,S=N = 5.2 μm−1 for Ca in this example). This gives a reasonably good estimate of the spatial resolution δr,S=N of that element’s image by using Eq.4.251, giving δr,S=N = 96 nm in this example. One can also use each element’s signal trend and noise floor to calculate a separate Wiener filter W(ur ) (Eq. 4.207) for each element Z which can be combined with the modulation transfer function (MTF) to calculate an element-specific deconvolution filter D(u x , uy ) (Eq. 4.209) of which azimuthal averages of D(ur ) are shown in the upper-right plot (b). To obtain the MTF, the intensity probe function psf(x, y) shown in the lower-left image (c) was reconstructed along with a transmission image using ptychography (Section 10.4). The lower-center image (d) shows a calcium x-ray fluorescence image from the alga, while the lower-right image (e) shows this image with the calcium deconvolution filter D(u x , uy ) applied. Figure modified from [Deng 2017b], which shows deconvolved images for other elements as well.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

144

Imaging physics

result, the unthinking application of Eq. 4.206 to object deconvolution in the presence of any noise in the measured image would lead to a rather unsatisfactory situation: the weak, high-frequency parts of the image signal – which would likely include significant noise – would be divided by a very small number, thus multiplying the noise relative to the “good” signal at lower spatial frequencies. A good way to avoid this problem is to incorporate a Wiener filter [Wiener 1949] W(u) which is given by W(u) =

|S (u)|2 , |S (u)|2 + |N(u)|2

(4.207)

based on an estimate of the trend of signal |S (u)|2 and noise |N(u)|2 powers as a function of spatial frequency u, as discussed in Section 4.3.4 [Press 2007, Eq. 13.3.6]. In images where the main source of noise is due to photon statistics (as discussed in Section 4.8), the noise will be uncorrelated from one pixel to the next, so that noise will have the characteristics of a Dirac delta function which has a Fourier transform that is uniform across all spatial frequencies (Eq. 4.85). As a result, one can usually construct a Wiener filter W(ur ) rather easily from the trend of |S (ur )|2 ∝ u−a r and a flat “noise floor” as shown in Fig. 4.49. This leads to a modification of Eq. 4.206 for recovering the object o(x, y) to . (4.208) o(x, y) = F −1 F {i(x, y)} · D(u x , uy ) , where D(u) is a Wiener deconvolution filter function of D(u x , uy ) =

|S (ur )|2 1 · , OTF(u x , uy ) |S (ur )|2 + |N(ur )|2

where of course ur =



u2x + u2y .

(4.209)

(4.210)

Since x-ray microscope images are often photon-statistics-limited, image signal considerations limit the resolution gain that deconvolution can provide [Jacobsen 1991, Deng 2015c], as shown in the example of Fig. 4.49. Alternative approaches to deconvolution involve methods to recover the “true” object seek to incorporate constraints known, or assumed, to apply to its characteristics. These approaches include computer optimization methods as discussed in Section 8.2.1, including those that explicitly account for Poisson noise models such as the maximumlikelihood/estimation maximization (MLEM) algorithm discussed in Section 8.2.2.

4.4.9

Depth resolution and depth of field With the exception of the brief discussion of multislice propagation in Section 4.3.9 and the Ewald sphere sampling conditions discussed in Section 4.2.5, up until now we have restricted our discussion to two-dimensional objects imaged in-focus. For a circular lens, the intensity distribution I(z) along the focal distance goes like [Linfoot 1953,

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

)

145

No central stop (b=0) -3

-2

-2

-1

) Defocus (fraction of

Half-diameter central stop (b=0.5)

-3

-1

Beam direction

Intensity point spread function

0

0

1

1

2

2 3

3 -25 -20 -15 -10 -5

0

Ƭ

5

Defocus (fraction of

1

0.1

-2

0.05

-1

-1 0.1 0.05

10 15 20 25

0.05

-2

0.4

5

-3

transfer function

0.2 0.3

0

Ƭ

0.05 reversal Contrast

-3 modulus of optical

0

-25 -20 -15 -10 -5

10 15 20 25

0.1

0.5 0.5 0.4 0.3

0.2

0.05

0.4

0.3

0 1

0.2 0.1

0.4

0.5

0.1

0.05

2

2

0.2

0.5 0.3

0.05

3

3

0.05

-2.0

2.0 -2.0

0.1

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

u/ud

u/ud

Figure 4.50 Point spread function (PSF; top) and modulation transfer function (MTF; bottom) versus defocus from a circular lens with no central stop (b = 0; left) and with a half-diameter central stop (b = 0.5; right). These figures show how the depth of field for absorption or fluorescence contrast x-ray imaging is 2δz . For the point spread function, the transverse positions are shown in the normalized coordinate ν of Eq. 4.175 where the Rayleigh resolution is at ν = 1.22π = 3.83, while the defocus distances Δz are shown as a fraction of the longitudinal resolution δz of Eq. 4.213, with cz = 1. The MTF is shown as a gray-scale image with contour lines superimposed, and the transverse spatial frequency coordinate is shown as a fraction of the coherent imaging cutoff frequency ud of Eq. 4.200. The side lobes of the Airy pattern lead to defocus-based contrast reversals at certain mid-range spatial frequencies, as described in Section 4.4.9. These simulations recreate earlier published results [Wang 2000].

Eq. 9] 

% & 2 sin u(1 − b2 ) I(z) ∝ u

(4.211)

with u≡

π N.A.2 z 2 λ

(4.212)

and with b as the central stop fraction like in Eq. 4.176. The Rayleigh criterion (Section 4.4.3) used the position of the first minimum of the Airy intensity [2J1 (ν)/ν]2 as the measure of the transverse resolution. The first minimum of the longitudinal intensity distribution of Eq. 4.211 occurs when u = π, giving a suggested longitudinal resolution of 2λ/N.A.2 . In fact, a more realistic criterion is to define the depth resolution δz as half Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

146

Imaging physics

this value, or λ , (4.213) N.A.2 with cz = 1. This choice is illustrated first in Fig. 4.50, which shows at top the intensity point spread function for a circular lens as a function of the radial parameter ν and the longitudinal or depth parameter δz as defined in Eq. 4.213. These point spread functions show a first minimum in the axial intensity distribution at 2λ/N.A.2 as expected (and furthermore they show that the longitudinal resolution is decreased as one introduces a central stop to the optic, or b  0). At the bottom is shown the modulus of the OTF (the MTF) for the various defocus distances as calculated using the following procedure: δz = cz

1. The Airy amplitude of the lens focus was calculated using the non-squared version of Eq. 4.174. 2. This amplitude was propagated by the specified defocus distance using the neardistance approach of Eq. 4.109. 3. The defocused probe was multiplied by its complex conjugate to obtain the intensity I(x, y, Δz), since that is the relevant probe function for absorption or fluorescence contrast. 4. The optical transfer function OTF(u x , uy ) was obtained by Fourier transform of this defocused probe intensity, as described in Section 4.4.7. 5. The center row of this 2D pattern was extracted to fill in this defocus distance row in the 2D array OTF(u x , Δz). The spatial frequencies were then scaled according to the coherent transfer function cutoff frequency ud of Eq. 4.200 with a maximum incoherent frequency of u/ud = 2, as was shown in Eq. 4.202, and the defocus distances were scaled according to the longitudinal resolution δz of Eq. 4.213 with cz = 1. With the PSF and MTF functions thus visualized, it is clear that the criterion of δz = cz λ/N.A.2 of Eq. 4.213 with cz  1 indeed describes the loss of image contrast versus defocus distance better than 2λ/N.A.2 does. The depth of field is twice the depth resolution, as discussed in Box 4.7. Therefore the longitudinal dependence of the point spread function of Eq. 4.211 leads to a DOF of λ , (4.214) N.A.2 with cz  1. It has been suggested that optics with a central stop fraction b would have an extended depth of field [Welford 1960b]. However, the optical transfer function with b = 0.5 does not show an increase in depth of field in Fig. 4.50 because of the defocus effects on the side lobes of the point spread function. As the defocus is increased to a point where the side lobes of the Airy pattern approach the central lobe in integrated intensity, one can have a reversal of image contrast for spatial frequencies that somewhat match the inverse length scale between these side lobes (recall that the side lobes of the Airy intensity pattern have alternating positive and negative amplitudes; see Fig. 4.22). This leads to contrast reversals at mid-range spatial frequencies for certain values of defocus; these regions are indicated in Fig. 4.50. The DOF = 2δz = 2cz

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.4 Imaging systems

147

Box 4.7 Depth resolution and depth of field We use depth resolution δz in the spirit of the Rayleigh criterion for transverse resolution δr , as shown in Fig. 4.29: the minimum displacement of object B relative to an object A at which one can see that there are two objects, not one. With the Rayleigh resolution in the transverse direction, one could also set object C to be on the other side of object A; if one pulls objects B and C in just closer than the Rayleigh resolution δr , one has a depth of field 2δr in which one cannot (by strict application of the Rayleigh resolution criterion) distinguish the image as being of one object versus three. It is for this reason that we define the depth of field as twice the depth resolution, or DOF = 2δz as in Eq. 4.214. In our thinking, the depth of focus and depth of field are two phrases for the same thing, except the term “depth of field” is more applicable to lensless imaging (Chapter 10, and in particular Section 10.5).

effect is perhaps easier to visualize from an image rather than from a transfer function, as shown in Fig. 4.51 for which the simulations were done in the following manner: 1. The Airy amplitude of the lens focus was calculated as described above. 2. This amplitude was propagated by the specified defocus distance as described above. 3. The defocused intensity point spread function was calculated for absorption or fluorescence contrast imaging as described above. 4. This defocused intensity point spread function was convolved with an object consisting of several absorptive bar structures shown at top in Fig. 4.51 to yield a defocused 2D image I(x, y, Δz). 5. The center row of this defocused image was extracted to fill in this defocus distance row in a sort of “longitudinal image” I(x, Δz), with distances x scaled to the Rayleigh resolution δr of Eq. 4.173, and defocus distances Δz scaled to δz of Eq. 4.213, with cz = 1. Figure 4.51 reinforces the idea that the DOF is 2δz = 2cz λ/N.A.2 with cz = 1, as given in Eq. 4.214. The value of DOF = 2δz , as given in Eq. 4.214, is an important consideration when carrying out nanotomography in x-ray microscopes (Chapter 8). Conventional tomography assumes that each image taken from a particular viewing angle of an object represents a pure projection through the object, leading to the Radon transform of Eq. 8.2. If an object extends beyond the DOF, one violates this assumption as some parts of the object will no longer be viewed in-focus in a particular projection image. From the definitions of transverse resolution of Eq. 4.173 and DOF of Eq. 4.214, one can re-write the depth of field as δr 2cz δ2r DOF = 2δz =  5.4cz δr , (4.215) λ 0.612 λ where we suggest that cz = 1 (this suggestion is in good agreement with experiments [Wang 2000]). That is, as the transverse resolution approaches the x-ray wavelength, the DOF approaches the transverse resolution, as can be seen in Fig. 4.52. This is the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

148

Imaging physics

Bar widths: 0.5, 1.0, 2.0 times Rayleigh resolution

-15

-10

-5

0

5

Distance (fraction of

)

No central stop (b=0.0)

Defocus (fraction of

-3

10

15

) Half-diameter central stop (b=0.5)

-3

Contrast reversal

-2

-2

-1

-1

0

0

1

1

2

2

3

3 -15

-10

-5

0

5

Distance (fraction of

10

)

15

-15

-10

-5

0

5

Distance (fraction of

10

15

)

Figure 4.51 Images of various feature sizes versus defocus. An object (top) consisting of absorptive bars of width of 0.5, 1.0, and 2.0 times the Rayleigh resolution of Eq. 4.173 was convolved with a defocused point spread function, and the center row of the image was extracted to form this image I(x, Δz). The transverse positions across the image are scaled in terms of the Rayleigh resolution δr , and the defocus positions are scaled according to δz of Eq. 4.213. These simulations recreate earlier results [Wang 2000].

reason that some soft x-ray tomography experiments use zone plates not with the highest possible spatial resolution, but with a slightly reduced value (or Fresnel zone plates with larger zone width, like δrN = 45 nm, even if δrN = 20 nm zone plates might be otherwise available). Otherwise, if Eq. 4.215 indicates that the maximum allowable sample thickness is about 1 μm or less, one might consider the relative merits of soft xray tomography relative to tomography in the electron microscope, as will be discussed in Section 4.10. If one is able to take a series of images through the depth extent of the specimen, there are computational methods to combine the sharpest information from each image to yield a projection with depth-of-field effects reduced [Jochum 1988, Liu 2012b, Sp¨ath 2014, Selin 2015, Ot´on 2016]. The discussion above applies to lens-based imaging. What about imaging methods where it is not the lens, but the maximum recorded coherent scattering angle from the specimen that sets the resolution δr ? This was illustrated in Fig. 4.16 in Section 4.2.5, and we arrived at the expression DOFEwald = λ/N.A.2 in Eq. 4.60, as well as DOFEwald = 4Δ2r /λ in Eq. 4.63. We also noted other expressions in the literature that differed by a factor of 2 in either direction. More detailed numerical simulations carried out in the context of multislice ptychography [Tsai 2016] have arrived at an empirical result of 5.2(δr )2 /λ, in very close agreement with Eq. 4.215 with cz = 1. Coherent imaging of samples that are thicker than the DOF (thus violating the pure projection approximation) is discussed further in Section 10.5. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.5 Full-field imaging

149

  Q   P    N N H9 H 9  Ѥ  Ѥ   N   H9   Q  Ѥ Q P P     QP N H9  Ѥ   QP

10

H9 

Ѥ

100

   N

'HSWKRIILHOGLQѥP

1000

1

0.1

0.01 1

10

100

Transverse resolution

1000

in nm

Figure 4.52 Depth of field (DOF) versus transverse resolution for different x-ray energies. The DOF of Eq. 4.215 represents a thickness limit for simple 3D imaging using an x-ray lens to deliver a pure projection image onto a lower-resolution pixelated detector, and this limit is more challenging to work within at lower x-ray energies. The energy of 0.29 keV corresponds to the carbon K absorption edge, which is popular for spectromicroscopy studies of speciation in organic materials (Section 9.1), while the 0.54 keV x-ray energy is at the high energy end of the “water window” for high-contrast imaging of organic materials in water, between the carbon and oxygen K absorption edges.

4.5

Full-field imaging The discussion of illumination phase space and partial coherence of Section 4.4.6 was appropriate for scanning microscopes, where an optic is used to produce a small focus spot. With full-field imaging microscopes, the situation is somewhat different: • Every resolved region in the object involves a coherent phase space of the Rayleigh resolution (the resolved region radius, rather than diameter) times the full opening angle of the lens, or 1.22λ rather than 2.44λ as given by Eq. 4.191. • A Rayleigh resolution region only needs to have phase correlation with its immediate n neighbors, where there is overlap of the Airy pattern (usually n = 2–3). However, there does not need to be any phase correlation with regions several multiples of m away. Thus each grouping of n pixels can make use of a different coherent mode, so that a transmission x-ray microscope (TXM) can accept a number of coherence modes MTXM of the image field of view w divided by n times the Rayleigh resolution δr , or MTXM =

w nδr

(4.216)

coherent modes in each transverse direction { xˆ, yˆ }. This is easier to achieve with Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

150

Imaging physics

1.7 0.8

0.7 0.61

0.6

b=0.00

b=0 .2 b=0 0 .4 b= 0 b= 0.60 0. 80

0.5

1.6 0.7 0.61 0.6

1.5

0.5 1.4

Best ratio

Rayleigh-like resolution factor k

Rayleigh-like resolution factor k

0.8

0.4

0.4

1.3 0

1

2

3

0.0

0.2

0.4 0.6 0.8 Condenser obstructed fraction b

1.0

Figure 4.53 Image resolution versus the ratio m = N.A.cond /N.A.obj of condenser to objective numerical apertures. What is shown at left is the factor k corresponding to the usual 0.61 in the Rayleigh resolution of δr = 0.61λ/N.A. of Eq. 4.173. The factor k is plotted as a function of m for condensers with various central stop fractions b (the same parameter was used to characterize objective lens central stops in Eq. 4.176). The dots in the figure at left are at the best value of k, and the condenser ratio m at which it was achieved, for each condenser stop size b; those values are plotted at right as a function of b. As the condenser aperture is increased, the objective aperture can accept diffraction from slightly finer grating periods (see Fig. 4.45). However, partial coherence considerations lead to a maximum improvement at a ratio of m of about 1.5. Calculation based on an approach described by Hopkins and Barham [Hopkins 1950] as extended for condensers with central stops by McKechnie [McKechnie 1972].

laboratory sources, which generally have large areas and emit into a solid angle of 2π (thus filling a large number M of modes). With synchrotron light sources, most beamline optics (Section 7.2) do not deliver such a large source phase space or e´ tendue to an endstation, and one must often use a phase diffuser [Uesugi 2006] (such as a rotating piece of paper with thickness t variations to produce random phase variations ϕ = kδt, as given by Eq. 3.69), or a wobbling condenser lens [Rudati 2011] to spread the illumination out into a sufficiently large phase space area. This is especially true with undulator sources (Section 7.1.6), which are intrinsically small phase space sources at modern synchrotron light sources. • Full-field imaging systems thus depend both on the total flux delivered by the source into a large phase space area (large field of view), and also the spectral brightness of the source (short exposure time per pixel). It is for the above reasons that some full-field imaging microscopes at synchrotron light sources are operated at bending magnets for ease in obtaining illumination over large fields of view, while others use undulator sources for the fastest possible exposure time in smaller fields of view. As Fig. 4.45 indicated, the numerical aperture of the condenser lens N.A.cond can play a role in the finest periodicity that one can see in a full-field imaging system. This would suggest that one can image even higher spatial frequencies than the cutoff of ud = 2ud of Eq. 4.202 by using a condenser with a higher N.A.cond , as more extreme illumination angles are diffracted into the acceptance aperture N.A.obj of the objective Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.5 Full-field imaging

m=0.2

m= 0

.8

0.8

0.6

5 0. m=

Apparent transfer function T(f)

1.0

151

m=N.A.cond/N.A.obj

m=1.5 m=1.0

0.4

0.2

0.0 0.0

0.5

1.0

1.5

2.0

u/ud Figure 4.54 Optical transfer function as function of the ratio m = N.A.cond /N.A.obj of the condenser-to-objective numerical aperture. Shown here is the “apparent” transfer function for a sine wave [Becherer 1967, Eqs. 58, 59, 61]. Because the response of a partially coherent imaging system is not linear, the assumption of carrying out a simple Fourier decomposition of the object and applying a transfer function to the various spatial frequency components is not fully valid.

lens. What that simple picture neglects is that there can start to be offsetting contributions of bright-field and dark-field (Section 4.6) signals in this situation, leading to a reduction of contrast and non linear response. In Fig. 4.53, we show one measure of this effect as calculated by Hopkins and Barham [Hopkins 1950] and as extended for condensers with central stops by McKechnie [McKechnie 1972] (this effect is also shown via numerical calculations in a later paper by Jochum and Meyer-Ilse [Jochum 1995]). This calculation shows the equivalent k of the 0.61 factor in the Rayleigh resolution formula of δr = 0.61λ/N.A. of Eq. 4.173, as one changes both the condenser-to-objective aperture ratio m = N.A.cond /N.A.obj , and as one changes the condenser central stop fraction b. To understand the calculation, first recall that Fig. 4.29 shows that the intensity at the image positon between two point objects drops to 73.5 percent of its maximum when the objects are separated by a distance equal to the Rayleigh resolution. One can calculate the intensity distribution between two point objects with different values of b and m [McKechnie 1972], and find the separation where the “dip” in the center drops to 73.5 percent of its maximum; this gives the Rayleigh-like resolution factor k which is plotted in Fig. 4.53. As can be seen, it is advantageous to use a condenser numerical aperture that is about 1.5 times larger than the objective numerical aperture, or m = N.A.cond /N.A.obj  1.5. Another way to think of the resolution of a full-field microscope as a function of the condenser-to-objective aperture ratio m = N.A.cond /N.A.obj is in terms of the OTF. However, the non-linearity of image response noted above means that the basic assumption of applying a transfer function to the Fourier decomposition of an object is not directly applicable. Even so, one can calculate an “apparent transfer function” T ( f ) for a sine-wave object [Becherer 1967], as is shown in Fig. 4.54, and from it one can Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

152

Imaging physics

Direct illumination

Condenser Objective

N.A.cond Illumination source

N.A.obj

Object plane

Köhler illumination

Condenser Objective

Illumination source

Object plane

Condenser aperture

Figure 4.55 Critical and K¨ohler illumination in full-field imaging. When imaging the source

directly onto the object in the critical illumination method, any imperfections in the uniformity of light output from the source are imaged directly onto the specimen. In K¨ohler illumination [K¨ohler 1893, K¨ohler 1894] the source is imaged onto the back focal plane of the condenser; positions at this plane become plane wave illumination directions at the object so that the source is not directly imaged onto the object. This comes, however, at the cost of a more complicated optical system. The relative numerical apertures of the condenser lens N.A.cond and objective lens N.A.obj are also shown.

see that the response at low frequencies is enhanced when one uses a smaller condenser numerical aperture. This has been exploited in modern zone plate x-ray microscopes [Schneider 1998a, Schneider 2010]. Besides considering just the numerical aperture of the condenser, one can also make choices in how the illumination source is transferred to the object, as is shown in Fig. 4.55. With critical illumination, the illumination source is imaged directly onto the object. With K¨ohler illumination, positions on the source are transferred to incidence angles on the object [K¨ohler 1893, K¨ohler 1894]; this adds to the complexity of the optical system, but it has the advantage that any spatial structure in the source is not imaged directly onto the object. Finally, it should be noted that aberrations on the condenser lens have no influence on the resolution of the objective lens and thus the imaging system [Zernike 1938] (see also [Born 1999, Sec. 10.5.2]).

4.5.1

TXM condensers, STXM detectors, and reciprocity In Fig. 4.45, we considered the transfer function for incoherent imaging with equal condenser and objective lens N.A.. In that simple description, there is in fact no difference

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.6 Dark-field imaging

153

Full-field imaging Condenser

Objective

Illumination source

Detector

Object plane

Detector Condenser

Illumination source Objective

Scanning Figure 4.56 Reciprocity between full-field and scanning microscopy. For small illumination sources, the imaging characteristics produced by a given condenser aperture in full-field imaging are the same as those for a given detector aperture in scanning microscopy [Welford 1960a, Zeitler 1970b, Zeitler 1970a, Barnett 1973].

in the transfer function whether the light “rays” are going from left to right as shown, or from right to left. And what if they were going from right to left? Then the objective lens would be producing the illumination of the object, and the condenser lens would be collecting the light and delivering it to a detector. Now consider the case of a scanning microscope: the illumination comes from a source with small phase space extent at some distance to the right, so only a Rayleigh-resolution-sized region is illuminated on the object. As a result, there is no need to image the object onto the detector, but the angular extent of the detector will still affect what spatial frequencies one can collect information from. This leads to an important concept for comparing the optical performance of full-field versus scanning microscopes: the principle of reciprocity as shown in Fig. 4.56. This was first put forward by Welford in 1960, who stated

We can show in fact that if the scanning objective A has NA α and the collector B has NA β then the scanning imagery is, within the approximation of scalar diffraction theory, exactly equivalent to conventional microscopy with an objective having NA α [and] condenser of NA β . . . [Welford 1960a].

Welford’s original paper did not in fact show how this comes about, nor did a later paper outlining the same principle [Crewe 1970], but a more detailed discussion has been provided by Zeitler and Thomson [Zeitler 1970b, Zeitler 1970a] and by Barnett [Barnett 1973]. The notion holds true for dark-field imaging [Engel 1974], as will be described in the following section, and it holds for the Zernike method for phase contrast imaging as discussed in Section 4.7.3. Just as Fig. 4.53 shows that there is an optimum condenser aperture for full-field imaging of about 1.5 times the objective aperture, for scanning microscopy there is an optimum detector collection angle from the optical axis which is about 1.5 times the objective N.A.; this becomes especially important in dark-field imaging, as discussed in the next section and shown in Fig. 4.59. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

154

Imaging physics

Full-field

Annular aperture

Condenser Objective

Extended source

DF stop

Object

Position-sensitive detector

Scanning Coherent source

Objective DF stop

Area-integrating detector

Object

Figure 4.57 Schematic of optical configurations for dark-field imaging in full-field and scanning x-ray microscopes. In the full-field case, an annular aperture is placed at the back focal plane (about a focal length behind) the condenser, and the direct illumination is blocked by a complementary ring-shaped stop in the back focal plane of the objective. In the scanning case, a circular stop is used to block the direct illumination. Figure adapted from [Vogt 2001].

4.6

Dark-field imaging Consider a coherent wavefield incident upon a circular aperture of radius a in an opaque screen. We know that the far-field diffraction pattern will have an amplitude distribution given by Eq. 4.132 and an intensity distribution given by Eq. 4.134. Babinet’s principle [Babinet 1837] (see also [Born 1999, Sections 8.3.2 and 11.3]) says that the optical amplitude downstream of a disk of radius a will be the exact complement of that from the pinhole; that is, when adding the two diffraction amplitudes together one will simply have the original incident wavefield. What does this mean for imaging? Consider the example optical layouts shown in Fig. 4.57. In both the full-field and scanning microscope examples, a dark-field stop is used to block the direct illumination, while light scattered by the sample is deflected out of the beam path and into the detector (one can also separate the bright- and dark-field signals by using a pixelated detector [Thibault 2009b, Menzel 2010]). The resulting effect on an image is shown in the simulation of Fig. 4.58. As can be seen, the dark field image is a complement to the bright field image (as expected from Babinet’s principle), so that small features and edges are enhanced in the dark-field image. However, darkfield images of periodic objects must be examined with a knowing eye, as fine periodic features can appear to be “doubled” [Morrison 1992a], as shown at lower left in the simulated dark field image of Fig. 4.58. Dark field imaging is especially useful for highlighting small, dense features in a

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.6 Dark-field imaging

Object

Bright field

155

Dark field

100 nm

1.0 Dark field

ld ie

Transfer function

f ht rig

B

0.8 0.6 0.4 0.2 0.0 0

1 2 3 Normalized spatial frequency u/ud

4

Figure 4.58 Simulation example of bright-field and dark-field imaging, demonstrating how the dark-field image is a complement to the bright-field image as expected from Babinet’s principle. The object (shown at left) consists of an opaque 400 nm diameter circle, opaque bars with a width starting from 10 nm at left with a width and spacing that increase with the square root of the bar number, and two 25 nm diameter gold spheres with complex transmittance expected for 520 eV X rays. The image at middle is a simulation using the bright field transfer function OTFincoherent of Eq. 4.203 for a Fresnel zone plate with drN = 30 nm or a cutoff in its spatial frequency response at 2ud = 1/drN = 33.3 μm−1 . The image at right is a simulation using the dark field transfer function shown, which is the complement of OTFincoherent with a rolloff at 1.5 times 2ud , in accordance with the ideal condenser-to-objective aperture ratio m = N.A.cond /N.A.obj as shown in Fig. 4.53 (or, in scanning microscopes, the detector aperture as expected from reciprocity and as shown in Fig. 4.59). Notice the apparent doubling of the finest bars in the dark-field image, due to the image’s edge-enhanced nature. The dark-field image was shown on an intensity scaling of I 0.1 to highlight the lower-intensity features, since in dark field these require detection of only a few photons above a low or negligible background.

larger object. One example in which this is useful is the identification of gold labels that can be attached to specific proteins in cells using immunolabeling techniques [Chapman 1996c, Meyer-Ilse 2001]. While there is no intrinsic photon exposure advantage to using dark field rather than bright field for imaging immunogold labels (as will be discussed in Section 4.8.4), in practice it is often more convenient to see such labels clearly isolated in dark field images [Chapman 1996d, Chapman 1996c], as shown in Fig. 4.60. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

Imaging physics

m=N.A.det/N.A.obj=1.5

Image plane distance (nm) with 36.6 nm Rayleigh resolution objective

Image plane distance (nm) with 36.6 nm Rayleigh resolution objective

28 26 24 22 20 18 16 14 12 10 8 6 4 2

Fixed 24 nm separation, 36.6 nm Rayleigh resolution

1.07 1.25 1.50 2.14 3 5

m=N.A.det/N.A.obj

Object separation (nm)

m=N.A.det/N.A.objA’

120 120 110 110 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 -100-75 -50 -25 0 25 50 75 100 -100-75 -50 -25 0 25 50 75 100

Detector equivalent zone width (nm)

156

15 -100 -75 -50 -25

0

25

50

75 100

Image plane distance (nm)

Figure 4.59 Effect of varying the detector aperture on dark field image contrast in scanning microscopy. For the left and middle figure, two cylinders of protein (each 20 nm in diameter) were placed with a center-to-center distance indicated by “Object separation.” A 2D dark field image was then calculated for an objective zone plate with outermost zone width of drN = 30 nm (and a Rayleigh resolution of δr = 1.22drN = 36.6 nm) at 517 eV photon energy, and a line was extracted from each 2D image and shown here as the row at the specified “Object separation” value. This was done for a detector-to-objective aperture ratio m = N.A.det /N.A.obj of m → ∞ (left) and the approximate optimum value of m = 1.5 (middle), showing better distinguishability at smaller separations at m = 1.5. At right is shown a simulation with a fixed 24 nm center-to-center separation for the two cylinders, with a row extracted per 2D image as the value of m was changed between each simulation. Because of reciprocity (Section 4.5.1) between scanning and full-field imaging systems, it is not surprising to find that m = N.A.det /N.A.obj  1.5 is preferred for scanning microscopy, and also m = N.A.cond /N.A.obj  1.5 is preferred for full-field imaging as shown in Fig. 4.53. Figure adapted from from Figs. 5 and 6 of [Vogt 2001].

4.7

Phase contrast Imaging in absorption contrast mode is conceptually straightforward. However, when we discussed the X ray refractive index in Section 3.3.2 and obtained the expression n = 1 − αλ2 ( f1 + i f2 ) = 1 − δ − iβ (Eq. 3.65), we saw in Fig. 3.16 that the phase-shifting part of the refractive index of f1 or δ becomes significantly larger than the absorptive part f2 or β at x-ray energies above about 1 keV. In hindsight it is obvious that one should exploit the phase-shifting part δ of the x-ray refractive index for high-contrast imaging, but the first clear statement of this came somewhat late in the history of the field via a conference presentation in August 1986 by Schmahl and Rudolph [Schmahl 1987], who discussed soft x-ray microscopy but also pointed towards the potential for using higherenergy X rays. (An earlier paper by Bonse and Hart on an x-ray crystal interferometer [Bonse 1965] mentioned the possibility of phase contrast x-ray imaging with further brief comments appearing in subsequent reviews [Hart 1970b, Hart 1975], but Schmahl and Rudolph were the first to directly point out the potential for reduced radiation dose). In fact, it is truly remarkable that absorption contrast x-ray radiography has been used for over a century in medical imaging with nobody thinking of the potential of using phase contrast for lower radiation exposure – even though Einstein speculated on how n = 1−δ might produce grazing incidence reflection effects in medical imaging (Section 2.2) way back in 1918. Contemplate for a moment the collective blindness of so many

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.7 Phase contrast

157

ѥP

Figure 4.60 Combined bright-field and dark-field imaging of immunogold labels. This image shows a fibroblast with silver-enhanced 10 nm gold nanoparticles immunolabeled for actin, imaged after air-drying. The gray-scale image is a bright-field image showing absorption within the cell, while the red image overlay is a dark-field image that highlights the small silver-enhanced gold particles. Both images were acquired using a scanning x-ray microscope with a drN = 45 nm outermost zone width zone plate operated at 496 eV [Chapman 1996c].

x-ray scientists (myself included, in spite of work in x-ray holography around this time [Howells 1987]) for so long! As we now know, phase contrast is immensely important for transmission imaging in hard x-ray microscopy. As an example, consider the absorption and differential phase contrast images of a diatom taken at 10 keV, as shown in Fig. 4.61: the diatom is essentially invisible in absorption contrast, and easily identifiable in phase contrast. Since lighter elements also have very poor x-ray fluorescence yield (Figs. 3.5 and 3.7), phase contrast imaging provides the best way to study light material samples in hard x-ray microscopes. When studying biological specimens using x-ray fluorescence excited by multi-keV X rays (Section 9.2), phase contrast lets one see overall cellular structure and even measure the local mass so that one can make maps not just of elemental content but of concentration [Hornberger 2008, Holzner 2010, Kosior 2012a, Gramaccioni 2018]. Once one gets to frequencies above the microwave range, one cannot measure phase directly. Instead, the phase of a wavefield is measured by mixing it with some other Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

158

Imaging physics

Absorption

ѥP

Differential phase

ѥP

Figure 4.61 Phase contrast provides the best way to image lighter materials when using multi-keV x-ray microscopes (Fig. 3.16 explains why). The image at left shows the x-ray absorption signal of a diatom (phytoplankton cell) obtained from the sum of all segments of a transmitted flux detector used for scanning x-ray microscopy at 10 keV; the light elements that dominate the diatom’s composition produce so little absorption that the diatom is almost invisible. The image at right shows a differential phase contrast image (Section 4.7.4) of the same diatom, using the same segmented detector signals, but this time looking at the signal difference between the indicated segments divided by the sum. Figure adapted from [Hornberger 2008].

known wavefield so that the phase is transferred into a measurable intensity, using constructive or destructive interference. One of the simplest ways to achieve this mixing is to use a propagation distance z so that nearby Huygens point sources provide the known or reference wave. As shown in Eq. 4.70, those nearby positions contribute to a measurement point (x, y) while being modulated by a propagator phase term; any differences in magnitude or phase of these nearby points can contribute to the net optical amplitude at the measurement point. One example well-known in classical optics is diffraction from a half-slit. If an illumination source is placed at a very large distance away compared to the downstream propagation distance z, the intensity distribution versus the transverse distance x is found using the Cornu spiral [Jenkins 1976, Eq. 18k] to give Iedge (w), as shown in Fig. 4.62, where  2 (4.217) w=x λz is a dimensionless parameter. From the values of wfbf = 1.217 for the first bright fringe (fbf), and wfdf = 1.872 for the first dark fringe (fdf), we can write the transverse positions of the first bright and dark fringes, respectively, as  √ λz  0.861 λz (4.218) xfbf = 1.217 2  √ λz = 1.324. λz (4.219) xfdf = 1.872 2 Consider another example as shown in Fig. 4.63. In this case, Fresnel propagation was carried out using the convolution approach of Eq. 4.109 as appropriate for near-field distances. As one can see, while a phase object is invisible in terms of an intensity distribution at a plane immediately downstream of the object (or in the focus of a microscope), it becomes visible when one defocuses the microscope to produce Fresnel √ fringes at a transverse position of about λz and beyond. This effect is commonly used Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.7 Phase contrast

159

in transmission electron microscopy of light materials, which show mainly phase contrast; the proper focus position can be determined from when the object “disappears” (when the image contrast approaches zero), after which a known defocus position (usually the Scherzer defocus [Reimer 1993, Eq. 3.69]) is chosen to provide phase contrast. Phase contrast imaging with sub-100 nm spatial resolution was first demonstrated in soft x-ray microscopy [Schmahl 1994], but even more activity has taken place in micrometer-scale spatial resolution imaging using hard X rays [Davis 1995] (for recent reviews, see [Momose 2005, Wilkins 2014]). For coarser spatial resolution with hard X rays, a number of approaches have been used for phase contrast imaging: • One can place an object within a Bonse–Hart interferometer constructed using Bragg diffraction from two crystals cut within a single crystalline silicon block for stability [Bonse 1965, Momose 1996], or use a second analyzing crystal [Davis 1995]. • One can also use just the analyzing crystal in an approach called diffraction-enhanced imaging [Chapman 1996a, Chapman 1997a]. • Another popular approach is to use one grating (which can be a phase grating) to produce self-interference fringes via the Talbot effect [Talbot 1836, Rayleigh 1881] at a downstream plane, and a second absorptive grating to measure deviations in the interference pattern produced by an object placed between the two gratings [Weitkamp 2004, Weitkamp 2005]. This is an approach which works even with sources of low coherence when a third grating is used [Pfeiffer 2006]. The grating method can also be used for dark-field imaging [Pfeiffer 2008]. This approach is growing in importance for phase contrast in medical imaging, with much activity. • As a wavefield propagates downstream from a phase object, Fresnel fringes begin to appear at the object’s boundaries from which one can obtain a phase contrast image, as shown in Fig. 4.63. Approaches of this type are discussed in Section 4.7.2. These methods all work at the micrometer-scale spatial resolution of an x-ray image detector system which typically consists of a scintillator, microscope objective, and visible-light camera (Section 7.4.7). Together the lower resolution approaches are seeing considerable activity and research impact, though they lay beyond the scope of this book on submicrometer x-ray microscopy.

4.7.1

Phase contrast in coherent imaging methods One does not require a fully spatially coherent beam in order to obtain phase contrast images; this will be made clear in the discussions of propagation-based phase contrast (Section 4.7.2) and Zernike phase contrast (Section 4.7.3). However, when one does have a nearly fully coherent beam, one can use methods like x-ray holography (Section 10.2) or especially x-ray ptychography (Section 10.4) to obtain high-quality phase contrast images. These methods are discussed in Chapter 10.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

160

Imaging physics

I(1.217)=1.37

1.4

I(2.344)=1.20

1.2

Iedge(w)

1.0 0.8

I(2.739)=0.84 I(1.872)=0.78

0.6 0.4 0.2 0.0 -1

0

1

2

w

3

4

5

Figure 4.62 Fresnel diffraction from a half-plane aperture. Shown here is the intensity √ distribution Iedge (w) with w = x 2/(λz) being a dimensionless parameter. The position of the first bright fringe is at wfbf = 1.217, where one has an intensity of 1.37I0 , as indicated in the figure, while the position of the first dark fringe is at wfdf = 1.872. The intensity is calculated using the Cornu spiral [Jenkins 1976, Eq. 18k].

4.7.2

Propagation-based phase contrast Wave propagation over short distances (or small Fresnel numbers; see Eq. 4.116) involves just a few Fresnel fringes around the edges of objects, as shown in Fig. 4.63. In this case the standard in-line holographic treatment for image reconstruction that will be discussed in Section 10.2 is not terribly successful: the twin image is at a very close-by longitudinal or z distance from the desired image, leading to severe problems in image interpretation. However, the presence of fringes is still revealing of phase information about the specimen, so it is not surprising that there can be other approaches for reconstructing phase contrast images using these near-field fringes. We outline two of these approaches here. Because the object is convolved with a propagator function appropriate for a particular distance, one can use a deconvolution process as described in Section 4.4.8, except that one now wishes to use the known, complex optical transfer function OTF(u x , uy ) with the square root of the intensity I(x, y, z) at the downstream plane z providing |i(x, y)| = I(x, y, z) in the expression (Eq. 4.206): + , F {i(x, y)} −1 o(x, y) = F . OTF(u x , uy ) The transfer function OTF(u x , uy ) is simply a propagator function H(u x , uy ) as shown in Fig. 4.20. Unfortunately, the propagator function has zero-crossing points which present difficulties when dividing by OTF(u x , uy ), and the absence of phase in |i(x, y, z) also leads to errors in the recovered object. One approach is to record near-field intensity distributions (which are effectively holograms) at multiple, carefully selected [Zabler 2005] values of the propagation distance z and back-propagate each wavefield to the object plane and thus use them in a combined fit to reconstruct the complex object [Cloetens 1999a]. When combined with specimen rotations, this provides for a very

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.7 Phase contrast

Opaque object

161

90° phase object

0.02

First dark fringe at

1000

0.05

500 300

0.20

Phase advance and energy flow

0.50 1.00

100 50 30

2.00

Fresnel number

Propagation distance z ѥP

0.10

10 5.00

5 3

10.00 20.00

1 50.00

100 nm

Figure 4.63 Illustration of propagation phase contrast. A wavefield of 520 eV soft X rays descends from above to illuminate an 2a = 500 nm wide opaque bar at left, and a 2a = 500 nm wide phase-shifting bar (which advances the phase by 90 ◦ ) at √ right. The first dark fringe appears as shown in Fig. 4.62, at a transverse position of xfdf = 1.87 λz/2 from the edge (Eq. 4.219). The Fresnel number F0 = a2 /(λz) (Eq. 4.116) changes with the size of the object, while the first fringe width does not. Because the phase object advances the phase of the wavefield, it leads to an energy flow out to the sides as the propagation distance increases. The presence of fringes in the downstream intensity distribution can be used to calculate the phase object that produced them, as discussed in Section 4.7.2. Note that the periodicity of discrete Fourier transforms (Eq. 4.95) means that one sees fringes that seem to appear from other bars to the left and right of this array at the larger propagation distances.

successful approach for phase contrast tomography called “holotomography” [Cloetens 1999b]. An alternative way to represent the intensity distribution recorded downstream of an object is to use the transport of intensity (TOI) or equivalently the transport of intensity equation (TIE). This equation considers the intensity I(x, y, z) and phase ϕ(x, y, z) distributions at one plane, and describes the near-field evolution of these distributions as  2π ∂  I(x, y, z). ∇ x,y I(x, y, z) · ∇ x,y ϕ(x, y, z) = λ ∂z

(4.220)

The object is assumed to be composed of a single material with a net thickness projected onto a plane of t(x, y), so that I(x, y, z = 0) = I0 exp[−μt(x, y)] and ϕ(x, y, z = 0) = −(2π/λ)δt(x, y) where μ = (4π/λ)β and δ+iβ are as in Eq. 3.67. With these assumptions, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

162

Imaging physics

one can reconstruct [Paganin 2002] the projected thickness t(x, y) of the object using  - F {M 2 I(Mx, My, zd )/I0 . 1 t(x, y) = − log F −1 μ , (4.221) μ zd δ(4π2 u2x + 4π2 u2y ) + μ with M = (z s + zd )/z s being the geometrical magnification (Eq. 6.1) produced by a point source at a distance z s upstream of the source and a detector recording the intensity I(Mx, My, zd ) at a distance zd downstream of the specimen (see Fig. 6.1). This expression uses forward F and inverse F −1 transforms, and spatial frequencies u x and uy (Eq. 4.32). This approach does not require a very high degree of spatial coherence, since it makes use of only the first few Fresnel fringes from the edge of an object (that √ is, it requires good mutual coherence over a width of about λzd as given by Eq. 4.217), so it finds widespread use. One can also consider a hybrid of these two approaches, where one uses the transportof-intensity reconstruction of Eq. 4.221 to provide a first guess of the complex object. Since free-space propagation of a wavefield from the object’s exit field to any downstream plane can be calculated in a straightforward manner (Section 4.3.7), one can then calculate the complex amplitude at any downstream plane. One can then develop a procedure where the magnitudes |i(x, y, zi )| at several downstream planes zi indexed by i are squared and compared with the recorded intensities, and iteratively “nudge” these magnitudes towards the recorded values I(x, y, z) using iterative algorithms as discussed in Chapter 10. This can yield an estimate of the complex image i(x, y, 0) with very high fidelity [Mayo 2003, Krenkel 2013], provided one has a high degree of mutual coherence over the lateral distance corresponding to the number of Fresnel fringes recorded at the largest of the distances zi . The approaches outlined above are among the ones most commonly employed for propagation-based phase contrast imaging, but additional approaches exist as described in a recent comparison [Burvall 2011].

4.7.3

Zernike phase contrast imaging When an optical system is used to image a weak phase object, a simpler and more direct method can be used, as developed by Fritz Zernike in 1935 [Zernike 1935, Zernike 1942a, Zernike 1942b] for which he received the 1953 Nobel Prize in Physics. In order to best understand the Zernike method, let’s first consider the overlap of two waves in terms of the addition of two phasors in the complex plane. If the phasor from a weak phase object has a small phase shift, the length of the result of this phasor added with a “reference” phasor changes very little, as shown in Fig. 4.64. If, however, the “reference phasor” is at a relative phase of 90◦ , then small phase changes from the object produce larger changes in the length of the addition of the two phasors; that is, small phase changes lead to larger-intensity variations. The next step is to create the conditions in a full-field microscope for interfering light scattered by the object with a 90◦ phase-shifted “reference” wave. The usual implementation of Zernike phase contrast involves matching an annular aperture in the illumination optics with an annular phase-shifting ring in the imaging optics, as shown

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.7 Phase contrast

Imaginary

163

d

ifte

)

(length

Ref.

d

ifte

Sh

Reference

Sh

Unshifted (lengths) Real

Unshifted

Figure 4.64 Addition of phasors in the Zernike phase contrast method. Consider the case of trying to measure small phase shifts in one wave (labeled “shifted”) via intensity differences when it is made to interfere with a reference wave. When the reference wave is parallel to the “shifted” wave as shown at left, the length of the resulting vector shows very little difference from the “unshifted” case, so the intensity variation is small. When the reference wave is at 90◦ relative to the “shifted” wave as shown at right, the small change from “unshifted” to “shifted” phase in from the object leads to larger intensity variations when the reference phasor is added.

in Fig. 4.65. The annular aperture is placed in the back focal plane of the objective; each point on the annular aperture then illuminates the entire field of view of the microscope with a plane wave from one direction, giving a net effect of an azimuthal ring of illuminating plane waves all at the same radial spatial frequency ρ. A single angle of illumination into the objective lens gets focused to a single point in its back focal plane, so the annular aperture produces a ring of light in the objective’s back plane. In other words, the light from the condenser’s annular aperture is imaged to a ring in the objective’s back focal plane, where a phase ring is located to impose a phase shift of 90◦ (or 270◦ or 450◦ . . .) on this “reference” illumination wave. In the meantime, the illuminated sample scatters light into a wide range of angles which are collected by the objective lens, and this wide range of angles translates to a wide range of positions in the objective’s back focal plane. In this one nearly all of the light scattered by phase gradients in the sample escapes being modified by the phase ring, leading to the desired interference condition shown at right in Fig. 4.64. A calculation of the image intensities produced by a feature of material f embedded in a matrix of a background material b (with both the feature and the matrix having a thickness t) has been carried out by Rudolph and Schmahl [Rudolph 1990, Eqs. 4.12 and 4.13]. Their calculation uses the x-ray refractive index n = 1 − δ − iβ (Eq. 3.67) and μ = 2kβ (Eq. 3.82) to write the linear absorption coefficients within the feature and background materials as μ f = 4πβ f /λ and μb = 4πβb /λ, respectively. The perwavelength phase advances in the two materials are η f = 2πδ f /λ and ηb = 2πδb /λ. The phase ring has similar attenuation and phase change coefficients μ p and η p for a phase ring thickness t p . The image intensities with a feature present (I f ) and absent (the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

164

Imaging physics

Bertrand lens

Object Source

Annular aperture

Objective

Condenser

Phase ring

Image

Figure 4.65 Schematic of the Zernike method for achieving phase contrast in a full-field microscope. An annular aperture is placed in the back focal plane of the condensor lens so that its ring of positions is transferred to a ring of illumination input angles onto the specimen (as indicated by the green lines); this ring of angles is in turn imaged onto a phase ring in the back focal plane of the objective lens, where it receives a 90 ◦ phase shift. The fraction of the wavefield scattered by a specimen feature at the object plane (indicated by red lines) passes mostly around the phase ring to reach its imaging point on the detector, ensuring that most of the object wavefield is not phase-shifted, thus fulfilling the condition shown in Fig. 4.64. The Bertrand lens (named after a French mineralogist) can be inserted to image the phase ring onto the detector so that it can be properly aligned relative to the image of the annular aperture.

background intensity Ib ) are given by  I f ,zpc = I0 e−μb t (e−μ p t p + 1) + e−μ f t −(μ f /2)t −(μb /2)t −(μ p /2)t p

+ 2e

e

e

(4.222) cos[(η f − ηb )t − η p t p ]



− 2e−(μ f /2)t e−(μb /2)t cos[(η f − ηb )t] − 2e−μb t e−(μ p /2)t p cos(η p t p ) Ib,zpc = I0 e−μb t e−μ p t p .

(4.223)

Of course in most cases one wishes to have cos(η p t p ) near zero as the phase ring provides a phase shift of 90◦ or 270◦ or increments thereof. Absorption in the phase ring can help increase contrast by making the length of the “reference” and “shifted” phasors closer to each other. We can simplify the above expressions by ignoring absorptive effects, using cos(θ − 90◦ ) = sin(θ)  θ, and using cos(θ)  1. These weak-specimen approximations lead to a lowest-order simplification of Eqs. 4.223 and 4.223 of # t$ (4.224) I f ,zpc  I0 1 + 4π(δ f − δb ) λ and Ib,zpc  I0

(4.225)

as approximate expressions for Zernike phase contrast of thin objects, with a difference of t |I f ,zpc − Ib,zpc | = 4π |δ f − δb | (4.226) λ which is an expression that will be used in Eq. 4.270. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.7 Phase contrast

165

There are several additional factors to consider with the Zernike phase contrast method as implemented in x-ray microscopes: • The width of the annular condensor aperture and corresponding phase ring must be optimized based on two competing considerations. As the annulus width is increased, more light is transmitted from the source so that exposure times are decreased. However, with a wider phase ring, a greater range of low spatial frequencies from the object also pass through the phase ring so that the desired 90◦ phase shift is not realized between these object spatial frequencies and the illumination. The resulting effect is that of a high-pass filter, producing a halo-like appearance on images of larger objects [Yamamoto 1983]. One way to utilize more of the source’s illumination while also minimizing the halo effect is to use a large number of small source apertures in the condenser back focal plane, and provide a matching number of small phase-shifting “dots” in the objective’s back focal plane [Stampanoni 2010]. • Because of the reciprocity of condenser and objective in full-field microscopy, and detector and objective in scanning microscopy (Section 4.5.1), one can also place a phase ring in the back focal plane of the objective lens in a scanning microscope and use an annular detector region to achieve Zernike phase contrast (this was suggested by Wilson and Sheppard [Wilson 1984] and by Siegel et al. [Siegel 1990] prior to its first demonstration [Holzner 2010]). While further improvements can be achieved using refinements of the approach [Vartiainen 2015], if one uses a pixelated detector for the transmitted signal it may be better to use the method of ptychography as discussed in Chapter 10.4 since small focal spots in scanning microscopes require a high degree of coherence in the illumination (Section 4.4.6), meeting the criteria for ptychography. • With compound refractive lenses (Section 5.1.1), the extended length of the lens system means that one has additional options for the implementation of the Zernike method [Falch 2018].

4.7.4

Differential phase contrast While the Zernike method can be implemented in scanning x-ray microscopes as noted above, a simpler method is to use a segmented detector to record a differential phase contrast signal. Consider a sample with thickness t composed of material b, which has a feature of material f (Fig. 4.66). If an x-ray beam of width Δr is directed to the edge between feature and background, the beam will be refracted by an angle θr given by the fractional advance of the wavefront (Δϕ/2π)λ divided by the distance over which the phase undergoes that change, or (Δϕ/2π) λ k|δ f − δb |t λ t = = |δ f − δb | , (4.227) Δr 2π Δr Δr where δ f and δb are the phase shifting parts of the refractive index n = 1−δ−iβ (Eq. 3.67) for the two materials. While this refraction angle is small, so is the numerical aperture N.A. of typical x-ray focusing optics, so that this refractive shift is enough to lead to a θr =

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

166

Imaging physics

6r b

t

f Ƨ

Beam

Wavef ro

nt

directio

n

Figure 4.66 Refractive prism model for differential phase contrast in scanning x-ray microscopy. The phase advance of the beam in a feature f relative to a matrix of a background material b with thickness t over the width Δr of the beam leads to a refraction angle given by Eq. 4.227.

noticeable signal difference in a segmented transmission detector, as shown in Figs. 4.67 and 4.61. While a variety of detector configurations have been used [Morrison 1992b, Kaulich 2002a, Feser 2006, Hornberger 2008], the most simple case is a detector split into two segments along the direction of the phase gradient. When a feature is present, the intensity recorded will be   θr 1 θr 2θr 1 )−( − ) = I0 (4.228) I f ,dpc = Ileft − Iright = I0 ( + 2 N.A. 2 N.A. N.A. because each of the two segments accepts light from the semi-angle given by N.A.. Now the width Δr over which the phase gradient takes place is the Rayleigh resolution of the focal spot, or Δr = δr = 0.61λ/N.A., as given by Eq. 4.173. We thus have N.A. = 0.61λ/Δr, which along with Eq. 4.227 lets us write Eq. 4.228 as I f ,dpc = I0

4 t |δ f − δb |. 1.22 λ

(4.229)

The differential phase contrast signal in the case of no feature present is zero, or Ib,dpc = 0. Several other approaches can be used to realize differential phase contrast in x-ray microscopes. One can use optics upstream of the objective lens [Polack 1995, Joyeux 1998], or pairs of offset Fresnel zone plates [Wilhein 2001a, Wilhein 2001b, Kaulich 2002b], or single specially designed zone plates [Chang 2006] to produce an effect like having two slightly separated, phase-shifted focal spots. A phase gradient across these two focal spots will produce a shift of interference fringes within the objective lens or within the detector acceptance angle, again leading to changes in the intensity of the detected image. With segmented detectors, one can use one of two approaches to recover the full phase contrast image from the simple differential version: • One can calculate the optical transfer function (Section 4.4.7) for the image recorded by each detector segment [McCallum 1995, McCallum 1996], and use this as a Fourier filter for image deconvolution [Hornberger 2007b] (see Section 4.4.8). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.7 Phase contrast

Segmented transmission detector

167

Energydispersive detector

Specimen OSA

Zone plate

X-ray beam

Figure 4.67 Differential phase contrast in a scanning x-ray microscope with a segmented

detector. Phase gradients in the specimen lead to a deflection of the transmitted beam (Eq. 4.227), and this can lead to differences in the signal recorded in different segments of the transmitted beam detector. Figure adapted from [Hornberger 2008].

This approach delivers the full contrast of finer features in the specimen, though it requires careful calibration of the detector response and alignment. • Alternatively, the horizontal and vertical differential phase contrast images can be integrated using a Fourier derivative method developed for light microscopy [Arnison 2004], which has subsequently been applied to both grating-based x-ray phase contrast imaging [Kottler 2007] and to scanning x-ray microscopy [de Jonge 2008]. In this approach, differential phase contrast images Δϕ x and Δϕy in orthogonal directions are combined to yield a phase contrast image ϕ using + , F {Δϕ x + iΔϕy } ϕ(x, y) = F −1 , (4.230) 2πi(u x + iuy ) This approach works especially well for low spatial frequency structures (that is, the resulting images do not suffer from the “halo” effect seen in Zernike phase contrast), though the method by itself does not correct for reduced response at higher spatial frequencies. Even greater flexibility in differential phase contrast imaging can be achieved by using a pixelated detector in a scanning microscope [Thibault 2009b, Menzel 2010], but (as noted above) in this case it can be even better to proceed to using ptychography as discussed in Section 10.4.

4.7.5

Grazing incidence imaging Another way to obtain phase contrast in an image is via topography in the grazing incidence image geometry [Fenter 2006], as shown in Fig. 4.68. Recall that Bragg’s law of 2d sin(θ) = mλ (Eq. 4.33) is based on there being a mλ optical path length difference (or a 2mπ phase shift) between partial reflection from layers separated by a distance d, as shown in Fig. 4.9. Therefore, if one inclines the illumination system

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

168

Imaging physics

Source Image detector

Illu

mi

na

ƧȾ

tio

ns

ide

y

e

Step feature

y’

g gin

sid

a

Im

Figure 4.68 Grazing incidence x-ray imaging of topographic features. If a specimen is illuminated at a grazing angle θ and the imaging system is inclined at the same angle θ , then one can image surface topography with high sensitivity, including the ability to image single atomic steps [Fenter 2006]. The image becomes elongated along the reflection direction, so that distances in the y direction in the image become distances y = y/ sin(θ ) along the surface.

(source plus condenser optic) by a grazing angle of incidence θ while also inclining the imaging system (objective optic plus image detector) by the same angle, surface steps will produce no image contrast if one satisfies Bragg’s law with whole integer values of m. If, however, the grazing angle θ and surface step height d are set such that m = 1/2, one will have a π phase shift across the position of the edge in the imaging field, and even a slight defocus will allow this phase jump to be viewed as an image intensity variation, as can be seen by considering the example of Fig. 4.63. Thus the condition for obtaining good contrast for a surface step height of dstep is dstep =

λ , 4 sin(θ )

(4.231)

so one prefers larger grazing angles in order to see small steps. Consider the case of imaging at a grazing angle beyond the critical angle θc of Eq. 3.115. The reflectivity (which can be calculated using Eqs. 3.119–3.121) may be quite low at angles beyond the critical angle θc but it is not zero (see Fig. 3.26). The payoff is that increasing θ decreases dstep . In the first demonstration of this approach, the 100 surface of orthoclase (KAlSi3 O8 , density 2.59 g/cm3 ) was imaged since it naturally forms clean steps that are one unit cell or d = 0.6464 nm high. At 10 keV, this material has a phase-shifting decrement of the refractive index of δ = 5.4 × 10−6 so the critical angle for reflectivity is θc = 0.19◦ . When imaged at a grazing angle of θ = 2.7◦ where the reflectivity calculated using Eq. 3.122 is R  1.5 × 10−4 , one has dstep = 0.66 from Eq. 4.231 so that single atomic steps should be observable with high image contrast. This has indeed been observed [Fenter 2006], and this demonstration has then led to the imaging of the growth of ferroelectric epitaxial thin films [Laanait 2014] and xray reaction front dynamics at the calcite–water interface [Laanait 2015]. One can also use grazing incidence phase contrast due to topography as a contrast mechanism for coherent diffraction imaging [Sun 2012] or CDI (Chapter 10). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

Number of events

Number of events

4.8 Image statistics, exposure, and dose

12 000 10 000

20 000 15 000

8000

Poisson Gaussian

10 000

6000 4000

5000 0 0

2

4

n

6

8

2000 0 0

2

4

n

6

8

10

25

30

6000 5000

8000 6000

4000

4000

3000 2000

2000 0 0

169

5

10

n

15

20

1000 0

0

5

10

15

n

20

Figure 4.69 A comparison of the Gaussian and Poisson distribution functions for small integer event counts n given an expected value of n¯ . As can be seen, the Gaussian distribution provides a good approximation to the correct Poisson distribution even at very small values of n¯ of 5–10.

4.8

Image statistics, exposure, and dose X-ray microscopes naturally involve ionizing radiation, so it is important in many cases to use the minimum exposure possible for recording an image. Having discussed image contrast mechanisms above, and x-ray absorption in Section 3.3.3, we are ready to consider the question of photon statistics in imaging and the consequences of irradiation of the object. As will be discussed in Section 11.2, the best metric for comparing irradiation levels in different materials and with different exposure levels and photon energies is the absorbed dose, which is a measure of ionizing energy absorbed per mass (usually in units of a gray, which is 1 joule absorbed per kilogram). Our discussion below is for the case of 2D imaging. For 3D imaging, dose fractionation means that much the same conclusions apply, as discussed in Section 8.6.

4.8.1

Photon statistics and the contrast parameter Θ We first consider the question of the statistics of individual discrete events. The mathematical foundations6 were laid down by the French mathematician Sim´eon Denis Poisson; though he initially disagreed with Fresnel’s theory of diffraction, he made so many other contributions to mathematics and physics that we can forgive him. Poisson considered the case of discrete events that produce an integer result (such as rolling a die) averaging to n¯ , and found that the probability of having a result of n in one particular measurement is given by n¯ n exp(−¯n), (4.232) P(n, n¯ ) = n! where the factorial is n! = n · (n − 1) · (n − 2) . . . 1. This expression is difficult to calculate directly for all but the smallest values of n and n¯ , but fortunately it is well approximated 6

An amusing guide is available [Gonick 1993].

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

170

Imaging physics

Figure 4.70 Image appearance versus the mean number of photons per pixel n¯ . These images

were simulated by using the image pixel value as the value for n¯ , and then replacing that pixel with a noisy version n based on the Poisson distribution (Eq. 4.232). The “cameraman” image is commonly used in the image-processing literature, and can be found through a web search.

by a Gaussian distribution of 

 (n − n¯ )2 P(n, n¯ ) = √ exp − . 2¯n 2π¯n 1

(4.233)

The Gaussian result should be truncated to P = 0 for n < 0 so as to exclude the incorrect non-zero values of P that would otherwise result for negative (nonsensical) values of n due to the long tails of the Gaussian distribution. While we showed a zero-mean Gaussian distribution in Fig. 4.4, in Fig. 4.69 we show a comparison for integer events with different expected values of n¯ using both the Poisson and Gaussian distributions; as can be seen, the Gaussian distribution closely approximates the Poisson distribution even for very small values of n¯ . Comparing Eq. 4.233 with Eq. 4.14, we make the association that the standard deviation σ is given by √ (4.234) σ = n¯ . As noted in Fig. 4.4, about two-thirds of the events fall within ±σ of the mean of n¯ . Finally, a set of images with differing values of n¯ are shown in Fig. 4.70. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.8 Image statistics, exposure, and dose

171

Let us now consider the case of looking for a certain feature f within a background material b in a sample. If we have some imaging system which delivers a unit-normalized image intensity I f at a pixel where the feature is present, and an image intensity Ib otherwise, the (unsigned) signal we will get when comparing a feature-present pixel against a feature-absent pixel when using a mean illumination of n¯ photons per pixel will be S = |¯nI f − n¯ Ib | = n¯ |I f − Ib |.

(4.235)

This signal must be detected against a background of fluctuations due to particular values of n detected in one measurement of the intensity at a pixel versus another measurement. Using the Gaussian approximation to the √ Poisson distribution, we can characterize these fluctuations by their standard deviation n¯ times the respective unit-normalized intensities I. Now if we are comparing the measurements from a feature-containing pixel versus a background-containing pixel, the fluctuations in the two measurements will be uncorrelated in which case we can add the fluctuations together in root-sum-ofsquares fashion to arrive at a noise estimate of  √ (4.236) N = ( n¯ I f )2 + ( n¯ Ib )2 = n¯ I f + Ib . We therefore see that the signal-to-noise ratio (SNR) in the measurement is given by √ |I f − Ib | n¯ (I f − Ib ) = n¯ SNR = √ I f + Ib n¯ I f + Ib √ = n¯ Θ,

(4.237)

where we have defined Θ as a contrast parameter [Glaeser 1971, Sayre 1977b] of |I f − Ib | . Θ≡ I f + Ib

(4.238)

The contrast parameter Θ differs from the usual definition of contrast, or fringe visibility V given in Eq. 4.182, by use of the square root in the denominator. The SNR relationship given by Eq. 4.237 will be used to estimate required image exposures as detailed below. It can also be related to detective quantum efficiency (DQE), as will be shown in Eq. 7.34. Note that some authors in electron microscopy use a different definition of SNR = S 2 /N 2 which scales linearly with the illumination rather than as the square root [van Heel 2000], so one must exercise some care in comparing SNR results across research communities. If we want an image to have a certain SNR, how many photons do we need? Solving Eq. 4.237 for the mean number of incident photons per pixel n¯ , we find n¯ =

SNR2 . Θ2

(4.239)

And what value of SNR is acceptable? The usual criterion comes from the work of Albert Rose, who looked carefully into the response of the eye, and photographic film, and electronic imaging systems in the early days of television [Rose 1946]; he concluded Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

172

Imaging physics

Critical Detection level limit

Ơ=0.07

ơ=0.03

ơ Ơ

LC

LD Measurement intensity

Figure 4.71 Binary hypothesis testing and error rates. Given a certain detection limit LD for

features being present (blue curve) versus the background (red curve), binary decisions based on a critical level LC will erroneously count a fraction α of the background events as false positives, and a fraction β of the feature events as false negatives. In this example, the false positive rate α was set to 7 percent such that kα = 1.476, while the false negative rate β was set to 3 percent such that kβ = 1.881.

that human observers were satisfied with a light intensity that produced an SNR of 5 when considering the “pixels” of the retina (the rod and cone cells), or SNRRose = 5.

(4.240)

As a result, for an object with a contrast parameter Θ = 1 and an imaging system with no light loss, we need to illuminate the object with at least n¯ = 52 or 25 photons per pixel. In some cases even lower exposures are used. For example, in single-particle electron microscopy the SNR on images of individual molecules is much lower than 5 due to radiation damage limitations; even so, the SNR on the final 3D molecular reconstruction can be quite high, because it combines the signal from low-dose images of 103 –105 individual molecules [Frank 2002]. Strictly speaking, the fluence F is defined in terms of an area rather than a pixel, which means we can calculate the required fluence from the required per-pixel exposure n¯ as F=

n¯ (Δr )2

(4.241)

where Δr is the physical size of a pixel.

4.8.2

Minimum detection limits Another approach to think about image statistics is to consider minimum detection limits. Consider two Gaussian distributions as shown in Fig. 4.71, with respective mean

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.8 Image statistics, exposure, and dose

173

False negative fraction Ơ

1

0.1

0.01

0.001

0.0001 0

1

2

3

4

kƠ Figure 4.72 Hypothesis testing error rate α as a function of kα , as given by Eq. 4.243. At an

intensity of n¯ Ib + kα σb , the fraction of false positive values above the critical level Lc is given by α. An identical relationship exists between β and kβ .

values of n¯ Ib for the background regions and n¯ I f for the feature-containing regions, √ with standard deviations σb = n¯ Ib and σ f = n¯ I f respectively. Given a particular measurement intensity at one pixel, one might want to carry out binary hypothesis testing: is this pixel part of the background, or is it showing a feature? If one reduces the feature-present intensity n¯ I f down to a detection limit LD such that its distribution begins to overlap with the background intensity n¯ Ib , there will be a critical level LC between the two distributions such that one will have two errors present in a binary hypothesis test against the critical level LC : • False positives are events with a value above the critical level LC which are part of the background distribution but which are falsely counted as being features. The fraction of total background events which give rise to false positives is designated by α, and the critical level LC is located at a position of kα σb above the background mean n¯ Ib . • False negatives are events with a value below the critical level LC which are part of the feature distribution but which are falsely counted as being background. The fraction of the total feature events which give rise to false negatives is designated by β, and the critical level LC is located at a position of kβ σ f below the feature mean n¯ I f . The relationship between α and kα involves the error function erf(x) of  x 2 erf(x) = √ exp(−t2 ) dt, π 0

(4.242)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

174

Imaging physics

Table 4.2 Minimum detection limit examples. If the background signal is low (0.05 versus 0.2 in the third row of the table), one can decide if a pixel represents a background region or a feature-present region (based on a threshold at the critical level LC ) even with quite low per-pixel incident photon count n¯ .

Normalized feature intensity I f Normalized background intensity Ib Incident number of photons n¯ Signal separation scaling kα = kβ False negative, positive rate α = β (Eq. 4.243) Critical level LC in photons (Eq. 4.244) Detection limit LD in photons (Eq. 4.245) Contrast parameter Θ (Eq. 4.238) Signal to noise ratio SNR (Eq. 4.237)

1.0 0.2 10.0 1.75 0.040 4.5 10.0 0.73 2.31

1.0 0.05 10.0 2.45 0.007 2.2 10.0 0.93 2.93

which is frequently provided in numerical analysis subroutine libraries; one then has √ 1 − erf(kα / 2) α= , (4.243) 2 which is plotted in Fig. 4.72; the same relationship applies to β and kβ . The critical level LC for hypothesis testing is defined [Currie 1968] as LC = n¯ Ib + kα σb = n¯ Ib + kα n¯ Ib (4.244) while the detection limit LD is defined as

LD = LC + kβ σ f = n¯ Ib + kα n¯ Ib + kβ n¯ I f .

(4.245)

If the standard deviations in the feature-present and background regions are the same (that is, if σ f = σb ), and if one accepts equal false positive α and false negative β rates, then the critical level lies halfway between the background and feature distribution peaks. In Table 4.2, we consider some numerical examples that might pertain to elemental detection using x-ray fluorescence, as will be discussed in Section 9.2.3. These examples show that if the background levels can be kept low, one can reliably detect the presence of features even in quite noisy, low-photon-illumination (small n¯ ) images.

4.8.3

Signal to noise and resolution from experimental images We have described above an approach for predicting the photon exposure needed for imaging an object in the case where we can predict the image intensities measured with a feature present or absent, so that the contrast parameter Θ can be calculated using Eq. 4.238. We now consider a different perspective on signal to noise in imaging: the case of determining the SNR and spatial resolution of an aquired image of an unknown object. We begin with an overall measure of image signal-to-noise. One approach is to acquire two images I1 and I2 of the object using identical experimental conditions, and then to compare those images. Features actually present in the object should be highly

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.8 Image statistics, exposure, and dose

175

correlated, while noise should not. We make the comparison using a method developed for waveform signal processing [Bershad 1974] which was later applied to electron microscopy [Frank 1975b]; our treatment here draws on an analysis that is consistent with the definition of SNR of Eq. 4.237 [Huang 2009a]. The signal S should be correlated between the two images, while the two noise patterns N1 and N2 should not, giving I1 = S + N1

and

I2 = S + N2 .

(4.246)

The signal S has a mean value over all 2D pixels of S , while the noise patterns should have equal fluctuations above and below the signal level (assuming photon per pixel values n¯ high enough that the Gaussian noise distribution of Eq. 4.233 well-approximates the Poisson distribution of Eq. 4.232, as shown in Fig. 4.69). As a result, the noise means are N1 = N2 = 0, so that the image means become I1 = I2 = S . We then wish to consider the total signal and total noise for the two images using their variances: S 2 = (S − S )(S − S )∗ = S 2 − S 2 , ∗

N = (N1,2 − N1,2 )(N1,2 − N1,2 ) = 2

2 N1,2 ,

(4.247) (4.248)

where N1,2 = 0 has been used in the final equality of Eq. 4.248. Again, the average is done over all pixel indices of one 2D image, which differs from a variance calculation for a particular pixel in a set of separately measured images. This can be shown [Huang 2009a] to lead to a correlation coefficient r between the two images of r=

(I1 − I1 )(I2 − I2 )∗ (I1 − I1 )2 (I2 − I2 )2

from which one can calculate a SNR of  SNR =

S2 = N2



r . 1−r

(4.249)

(4.250)

This expression is the square root of the expression α = r/(1 − r) used in some electron microscopy papers [Frank 1975b], but as noted before some authors in electron microscopy prefer to define SNR = S 2 /N 2 , which scales linearly with the illumination rather than as the square root [van Heel 2000]. Our definition in Eq. 4.250 scales as the square root of illumination [Huang 2009a], as expected from Eq. 4.237. One can also use the spatial frequency dependence of the SNR in images to estimate the spatial resolution. With a single image, one can often see a power law rolloff of signal versus spatial frequency as given by Eq. 4.97, while uncorrelated pixel-to-pixel noise fluctuations give rise to a “flat” power spectrum as discussed in Section 4.3.4. Thus one can estimate the spatial frequency ur,S=N at which the signal trend reaches the noise floor, and obtain a simple but useful measure of the half-period spatial resolution of 1 . (4.251) δr,S=N  2ur,S=N such as is shown in Fig. 4.49. One can obtain an even better measure if, like with Eq. 4.249, one has two independently measured images of the same object. The information about the object imaged Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

176

Imaging physics

250 nm

100 nm 50 nm

25 nm

10 nm

106 20 ms 50 ms 100 ms 200 ms

200 ms

Intensity (a.u.)

105

100 ms

104 103 102 101

50 ms

20 ms 0

500 nm

100

200 ms 20 ms 1

2

5

10

20

50

6SDWLDOIUHTXHQF\ ѥP-1) 0.65 (rad)

Figure 4.73 Image resolution versus exposure time, or fluence F on the specimen. In this case, four separate phase contrast images were acquired of a microfabricated test pattern using 5.2 keV X rays, with continuous scanning and per-pixel dwell times ranging from 20 ms to 200 ms as indicated. The larger, low-spatial-frequency features in the object have almost identical appearance in the four images, but finer, high-spatial-frequency features become more apparent as one increases the photon fluence F as evidenced by increasing reconstructed feature power at spatial frequencies above 20 μm−1 . These images were obtained using x-ray ptychography (Section 10.4) in which the usual noise “floor” in the power spectrum (such as is shown in Figs. 4.19 and 4.49) is removed by the reconstruction process. Figure adapted from [Deng 2015a].

should be highly correlated, while the noise should not. In this case, one can measure the correlation of the phase of the Fourier transforms of the images as a function of spatial frequency in a method called Fourier ring correlation (FRC) or, in 3D, a Fourier shell correlation (FSC) [Saxton 1982]. When considering a specific range ur,i (or shell, or ring) of spatial frequencies, the Fourier shell correlation FSC12 between images 1 and 2 is given by [van Heel 2005]  † ur ∈ur,i F 1 (r) · F 2 (r) , (4.252) FSC12 (ur,i ) =   2 2 ur ∈ur,i F 1 (r) · ur ∈ur,i F 2 (r) where at the same time one can calculate the number of pixels contained in the Fourier shell ur,i as n(ur,i ). As was shown in the discussion of image power spectra in Section 4.3.4, at low spatial frequencies one can expect a strong correlation due to the same object information being present in both Fourier transforms, while at high spatial frequencies one will see poor correlation between two instances of uncorrelated noise. A commonly used criterion for estimating the spatial resolution in a way that is consistent with acceptable correlations in crystallography datasets is to use a so-called 1/2 bit criterion, or a T 1/2 bit threshold value of [van Heel 2005] 0.2071 + 1.9102/ n(ur,i ) T 1/2 bit (ur,i ) = . (4.253) 1.2071 + 0.97102/ n(ur,i ) Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.8 Image statistics, exposure, and dose

177

X-ray beam direction Overlaying material

Underlaying material

Figure 4.74 Imaging of a feature of material f in a slab of background material b, with a

thickness tb,o of overlying material. The feature has a width of Δr , and the feature/background slab has a thickness t. This simple model is used to estimate the exposure required for imaging with a given SNR.

The center of the spatial frequency shell ur,i,1/2 bit at which FSC12 (ur,i,1/2 bit ) = T 1/2 bit (ur,i,1/2 bit )

(4.254)

then gives a half-period spatial resolution δr,1/2 bit of δr,1/2 bit =

1 2ur,i,1/2 bit

.

(4.255)

The FRC (2D) and FSC (3D) method has sometimes been used to evaluate the resolution of images obtained using iterative reconstruction methods of the type discussed in Chapter 10. If one compares two reconstructions from separate, statistically independent datasets then the FSC/FRC method applies directly as illustrated in Fig. 11.12. If instead one compares different random iterative reconstruction starts from one dataset, the FSC/FRC interpretation is not directly applicable, though one can still gain insights on the reliability of iterative phasing using the phase retrieval transfer function (PRTF) of Eq. 10.34.

4.8.4

Estimating the required photon exposure From the definition of the contrast parameter Θ given in Eq. 4.238, we can produce estimates of the number of photons n¯ required to image a feature based on image intensities produced with, and without, the feature present. We present here a simplified discussion of this topic which is explored in greater detail elsewhere [VillanuevaPerez 2016, Du 2018]. Consider a slab of a background material b with thickness t. Within this slab, we want to see if a given pixel of width Δr is composed of the feature material f or not (Fig. 4.74). Let us also consider an overlying thickness tb,o of the background material. Using absorption contrast imaging with a 100 percent efficient optical system with 100 percent contrast transfer at the desired resolution scale, the image intensity with the

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

178

Imaging physics

Box 4.8 Modeling protein for biological imaging estimates For estimates of contrast, exposure, and dose when imaging biological systems, one needs to have an estimate of the linear absorption coefficient μ = 2kβ and the phase shift per wavelength δ of organic material in a cell (Eqs. 3.67 and 3.82). While absorption coefficients for protein were given in a pioneering early estimate of image contrast and dose in x-ray microscopy [Sayre 1977b, Sayre 1977a], the assumed composition was not detailed. Soon after, when there was speculation of developing x-ray lasers pumped by nuclear weapons [Broad 1986], Johndale Solem of Los Alamos National Laboratory wrote a fascinating technical report considering the possibilities of using such a laser for x-ray holography of living cells [Solem 1982a], which led to several follow-on publications [Solem 1982b, Solem 1984, Solem 1986]. Solem’s work involved signal estimates from x-ray scattering from protein, but the assumed composition of proteins was still not made explicit. With the advent of laser-driven x-ray laser [Matthews 1985, Suckewer 1985], researchers at Lawrence Livermore National Laboratory, including Jim Trebes, Louis DaSilva, and Richard London, studied their potential use for biological x-ray microscopy [London 1989, DaSilva 1992]. As part of this work, London made the reasonable assumption that the stoichiometric composition of a representative protein can be described from the average of all 20 amino acids, leading to a stoichiometric composition of H48.6 C32.9 N8.9 O8.9 S0.6 with a density when dry of 1.35 g/cm2 [London 1989]. The protein content of cells varies by type, but 25 percent protein in water is typical [Fulton 1982, Luby-Phelps 2000].

feature present is given from the Lambert–Beer law (Eq. 3.76) by I f ,abs = I0 exp[−μ f t] exp[−μb tb,o ],

(4.256)

while the intensity in the background regions is Ib,abs = I0 exp[−μb t] exp[−μb tb,o ] with thin-sample and no-overlying-background approximate expressions of # t$ I f ,abs  I0 1 − 4πβ f λ and

# t$ Ib,abs  I0 1 − 4πβb λ

(4.257)

(4.258)

(4.259)

giving a thin-sample expression of t |I f ,abs − Ib,abs | = 4π |β f − βb |. λ

(4.260)

Using a unit-normalized incident flux I0 = 1 to be scaled with mean incident photon Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.8 Image statistics, exposure, and dose

179

number n¯ , the contrast parameter Θ of Eq. 4.238 becomes |I f − Ib | | exp[−μ f t] − exp[−μb t]| = Θabs = exp[−μb tb,o /2] I f + Ib exp[−μ f t] + exp[−μb t] √ t  2π 2 |β f − βb | λ

(4.261) (4.262)

where in the last expression we have used the limit of a thin specimen (so e−x  1 − x) and no overlying background material [Hornberger 2006]. If we instead use the Zernike phase contrast signal intensities given by Eqs. 4.224 and 4.225, we arrive at √ t (4.263) Θzpc  2π 2 |δ f − δb | λ while if we use the differential phase contrast signal intensity of Eq. 4.229 we arrive at Θdpc =

4 t |δ f − δb | 1.22 λ

(4.264)

√ because in the differential phase contrast case the denominator term of I f + Ib is 1 √ since Ib,dpc = 0. (Other approaches for Θdpc have yielded a prefactor of 2 instead of 4/1.22 [Hornberger 2006]). As can be seen, all three contrast parameters are quite similar, except for the use of the absorptive β or phase-shifting δ parts of the refractive index of n = 1 − δ − iβ. If we require SNR = 5 to meet the Rose criterion of Eq. 4.240, the required number of incident photons per pixel n¯ is n¯ =

SNR2Rose . Θ2

(4.265)

We can therefore calculate the number of photons required for detecting the presence or absence of a feature using the thin-sample approximate expressions for each of the three contrast methods considered here. For absorption contrast, we have n¯ abs 

1 1 25 λ2 25 1 = , 8π2 t2 |β f − βb |2 8π2 λ2 t2 |α f f2, f − αb f2,b |2

(4.266)

for Zernike phase contrast we have n¯ zpc 

1 1 25 λ2 25 1 = 2 22 , 2 2 2 8π t |δ f − δb | 8π λ t |α f f1, f − αb f1,b |2

(4.267)

and for differential phase contrast we have n¯ dpc 

1 1 25 · 1.222 λ2 25 · 1.222 1 = . 2 2 2 2 16 16 t |δ f − δb | λ t |α f f1, f − αb f1,b |2

(4.268)

In each case, the latter expression uses δ + iβ = αλ2 ( f1 + i f2 ) as given by Eqs. 3.65 and 3.67, along with α ≡ re na /(2π) as given in Eq. 3.66. The latter expressions make the scaling with wavelength more explicit. The similar forms of these three expressions allow us to make a few statements about exposure requirements in x-ray microscopes: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

180

Imaging physics

• Absorption contrast relies on differences in the absorptive part of the refractive index β, or of the complex number of oscillation modes f2 , while both phase contrast methods rely on the phase-shifting terms δ or f1 . As shown in Fig. 3.16, the phase-shifting terms f1 maintain large numerical values at higher x-ray energies E, while f2 declines approximately as 1/E 2 . This means phase contrast is increasingly favored over absorption contrast at higher photon energies (though one must eventually consider Compton scattering at high photon energies). • These calculations assume 100 percent detective quantum efficiency efficiency for the entire x-ray imaging system. In practice the efficiency is much lower. For example, when using zone plate optics downstream of the specimen in a fullfield imaging system, one must account for the focusing efficiency of the zone plate which might be in the 5–15 percent range in practice (theoretical values are shown in Fig. 5.15). Hard x-ray imaging detectors based on scintillators and visible light lenses also show low efficiency, as discussed in Section 7.4.7, and there are additional detector statistical considerations as outlined in Section 7.4.1. • These calculations also assume 100 percent optical transfer of information at all spatial frequencies. This is not usually the case; for example, Fig. 4.45 shows how the OTF decreases in incoherent brightfield imaging, leading to a higher exposure for feature sizes approaching the Rayleigh resolution limit. • As the expressions of Eqs. 4.266, 4.267, and 4.268 make clear, the photon exposure per pixel scales as the square of decreases of feature thickness t. For an isometric sample, the feature thickness is usually of the same scale as its lateral dimensions Δr . If we assume Δr = t, the radiation exposure scales as n¯ ∝ t−4 .

(4.269)

This apparent fourth-power-law scaling of the required number of incident photons per pixel n¯ also applies to radiation dose (Eq. 4.285), and is discussed further in Section 4.9.1. This scaling presents significant challenges for high-resolution imaging, as will be discussed in Chapter 11. • The pre-factor terms of 1/(8π2 ) in Eqs. 4.266 and 4.267 are about seven times larger than the pre-factor term of 1.222 /16 used in Eq. 4.268. However, the differential phase contrast estimate is for a feature with a size equal to the Rayleigh resolution limit, while the calculations for absorption and Zernike phase contrast do not account for any of the signal transfer losses that would otherwise appear in the OTF for imaging as noted above. This is likely to bring the pre-factor terms closer to each other when imaging small features near the resolution limit of an x-ray microscope. These expressions also ignore noise sources other than those due to photon statistics. In spite of the simplistic nature of the exposure estimates provided in the above results, they provide a very helpful guide for understanding exposure requirements in x-ray microscopy (more detailed views are presented elsewhere [Schropp 2010c, Villanueva-Perez 2016, Du 2018]). In Fig. 4.75, we show a calculation of n¯ = 25/Θ2 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.8 Image statistics, exposure, and dose

1012

30 nm resolution for Cu in Si

30 nm resolution for protein in ice Absorption contrast

Absorption contrast

1010

8

 ѥPWKLFN6L ѥP ѥ P

104 102

0.2

0.5

1

ѥP

Zernike phase contrast

2

5

Photon energy (keV)

10

20

108

ѥPWKLFNLFH 

106

P ѥ

106

Incident photons per pixel

1010



Incident photons per pixel

1012

10

181

ѥP

Zernike phase contrast

ѥP

104 0.2

The “water window” 0.5

1

2

5

10

20

Photon energy (keV)

Figure 4.75 Photon exposure n¯ = 25/Θ2 for imaging 30 nm thick features of copper in silicon,

or protein in ice. These calculations used the exact expressions for image intensities of Eqs. 4.256 and 4.257 for absorption contrast, and Eqs. 4.223 and 4.223 for Zernike phase contrast with a non-absorbing phase plate (slight improvements can be obtained if the phase plate is absorbing). The feature was assumed to be embedded in a background layer of silicon (for the materials science example) or ice (for the biological example) of 3, 10, 30, and 100 μm thickness, as indicated by the various color curves. The “water window” [Wolter 1952] spectral region between the carbon and oxygen K edges at 290 and 540 eV shows especially good contrast for hydrated organic materials. These calculations assume 100 percent efficiency for the x-ray imaging system, and no noise sources beyond photon statistics.

using the exact rather than thin-sample approximate expressions for Θ for two example situations: 1. The imaging of 30 nm copper features in a background of overlying silicon with transmission exp[−μb tb,o ] (see Eqs. 4.256 and 4.261. This is an example of imaging materials science specimens. 2. The imaging of 30 nm thick protein features (see Box 4.8) in a background of ice. This is an example of imaging biological specimens. This calculation is done for a series of overlying thickness layers of the background material, to show the effects of absorption on signal loss. It is done for absorption and Zernike phase contrast only, since the differential phase contrast expression is identical to Zernike phase contrast except for the different numerical factor.

4.8.5

Imaging modes and diffraction The similarity of the expressions of required exposure for absorption (Eq. 4.266) and Zernike phase contrast (Eq. 4.267) point to several further important conclusions about the exposure required in x-ray microscopes. As discussed in Section 4.6, Babinet’s principle states that the optical wavefield downstream from an illuminated object is the complement of the wavefield produced by the complement of the object; that is, a pinhole and an absorptive disk produce complementary wavefields which add up to the incident beam. This means that both absorptive

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

182

Imaging physics

and phase shifting objects lead to scattering of the beam, so that one can expect that a scattering experiment will have a unit-normalized image intensity that can be calculated from the combination of Eqs. 4.226 and 4.260 as  t  I f ,scat = 4π |δ f − δb | + |β f − βb | , (4.270) λ which is effectively the combination of absorption and phase contrast. The relative strengths of absorption and differential phase contrast in scanning x-ray microscopy have also been considered using a slightly different approach [Thibault 2009b]. A related conclusion concerns the relative merits of collecting images with a 100% efficient lens, versus collecting diffraction patterns which must then be iteratively phased to yield an image as will be discussed in Chapter 10 (these iterative phasing procedures seem not to add any extra noise to the reconstructed image [Fienup 1978, Williams 2007b, Huang 2009a, Godard 2012]). The question of the dose efficiency of imaging versus diffraction was considered by Richard Henderson [Henderson 1995] in the context of electron microscopy. Henderson stated: It can be shown that the intensity of a sharp diffraction spot containing a certain number N √ of diffracted quanta will be measured with the same accuracy ( N) as would the amplitude (squared) of the corresponding Fourier component in the bright field phase contrast image that would result from interference of this scattered beam with the unscattered beam [Henderson 1992]. The diffraction pattern, if recorded at high enough spatial resolution, would therefore contain all the intensity information on Fourier components present in the image. It would lack only the information concerning the phases of the Fourier components of the image which are of course lost. Thus, for the same exposure, holography should be equal to normal phase contrast in performance, and diffraction methods inferior because of the loss of the information on the phases of the Fourier components of the image.

Let us consider Henderson’s conclusion in the context of assuming that a signal with amplitude b at an object pixel is scattered to reach a certain detector pixel, with no other signal present. We can then use Eq. 4.238 to calculate a contrast parameter Θ for this diffraction experiment of |I f − Ib | |b2 − 0| = b. = √ Θdiffraction = I f + Ib b2 + 0

(4.271)

The number of photons n¯ that we must then illuminate the object pixel with is found from Eq. 4.237 to be n¯ diffraction =

SNR2 SNR2 = . Θ2 b2

(4.272)

Now let’s consider the case of imaging where this scattering amplitude is mixed with a strong illumination wave a at a detector pixel (this happens naturally in imaging systems; see for example Fig. 4.65). We obtain the maximum signal difference when the phasors for a and b are parallel (we’re looking for the presence of the amplitude b, rather than its phase shift which was what we sought in Zernike phase contrast as illustrated in Fig. 4.64). Therefore we’ll assume that amplitudes a and b are both real and positive. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.9 From exposure to radiation dose

183

We then have an image intensity when the scatterer is present of I f = (a + b)(a + b)† = a2 + b2 + 2ab,

(4.273)

and we have Ib = a2 when the scatterer is absent. The contrast parameter Θ is then |I f − Ib | |(a2 + b2 + 2ab) − (a2 )| b2 + 2ab = = √ Θimaging = I f + Ib 2a2 + b2 + 2ab (a2 + b2 + 2ab) + (a2 )

(4.274)

where we have used the binomial approximation of Eq. 4.25. Assuming the scattering wave to be weak compared to the illumination, we have b  a and b2  ab, so that we can approximate Eq. 4.274 as √ √ 2ab b b (4.275) = 2√  2 Θimaging  √ 2 1 + b/(2a) 1 + b/a 2a + 2ab from which we calculate the required exposure as n¯ imaging =

SNR2 SNR2  2 . 2 Θ 2b (1 − b/a)

(4.276)

We therefore find a ratio of n¯ imaging b b2 1 = 2 = (1 + ) n¯ diffraction 2b (1 − b/a) 2 a

(4.277)

so that the mixing of a strong reference signal in with the diffracted signal – which happens in imaging – offers a reduction in the required exposure n¯ of only a factor of about 2. When the actual focusing efficiency of x-ray objective lenses is considered, diffraction in fact becomes more favorable. This conclusion follows through for fluence F, and for radiation dose. From these perspectives, one can say that scattering and coherent imaging experiments involve a required exposure that is well approximated by the lesser of the absorption and phase contrast exposures shown in Fig. 4.75. The attainable spatial resolution is limited by the fluence F (photons per area) on the specimen, and the contrast of features in the specimen. Detailed comparisons between different x-ray microscopes then involve the efficiency of whatever optics and detectors lie downstream of the specimen in the illumination path.

4.9

From exposure to radiation dose X ray photons have energies far in excess of the 1.5–11 eV energy of chemical bonds (Box 3.2), so that x-ray exposure can cause radiation damage in the specimen being imaged. While the mechanisms of radiation damage are discussed in more detail in Section 11.2, we will concern ourselves here with the quantity that is used to calculate the magnitude of damage effects: the radiation dose. Dose is given by the energy absorbed per mass, leading to units of 1 gray ≡ 1

joule kilogram

and

1 rad ≡

100 erg gram

(4.278)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

184

Imaging physics

1012

QPUHVROXWLRQIRU&XLQ6L

1014

QPUHVROXWLRQIRUSURWHLQLQLFH

ѥPWKLFN6L

Dose in Gy

P ѥ

1010

ѥP

108

Absorption contrast

1012

Absorption contrast



ѥP

ѥPWKLFNLFH





106

104

0.2

10

Zernike phase contrast 0.5

1

2

5

10

20

6

3.0 P

0.2

P

P ѥ

108



Dose in Gy

1010

ѥP

The “water window” 0.5 1 2

Photon energy (keV)

Zernike phase contrast 5

10

20

Photon energy (keV)

Figure 4.76 Radiation dose in Gy for imaging 30 nm thick features of copper in silicon, or protein in ice. These calculations are based on the photon exposures shown in Fig. 4.75; the dose to the feature (Cu at left, or protein at right) is calculated assuming that the feature is in the middle of the overlying background material of thickness as specified. These calculations assume 100 percent efficiency for the x-ray imaging system, and no noise sources beyond photon statistics.

so that 1 gray = 100 rad, with gray being the preferred unit in the modern literature (the unit, abbreviated as Gy, is named after the British radiobiologist Louis Harold Gray). When it comes to effects in living systems, one must apply a slight correction called the relative biological effectiveness (RBE; also called a radiation weighting factor WR ) which is 1 for X rays and electrons, 2 for protons, and 20 for alpha particles and heavy ions [Valentin 2007, Table 2]. This leads to a second unit called the “dose equivalent” or “exposure” H which is given by H = D · RBE,

(4.279)

which is in sieverts in SI units (the unit, abbreviated Sv, is named after the Swedish medical physicist Rolf Sievert). An earlier term for exposure is the REM (which stands for the rather awkward phrase “R¨ontgen Equivalent Man”), where 1 Sv = 100 REM. As will be discussed in Section 11.2.3, the LD50 measure for human exposures to radiation is about 4 Sv; that is, if a set of people receive a radiation exposure of 4 Sv, about half of them will die even if given basic medical care. For an object exposed to an x-ray fluence F (Section 7.1.1) of n¯ incident photons per area Δ2r , the “skin dose” (the dose imparted to the upstream, beam-facing surface of the material) can be found from considering the energy deposition per thickness dE/dx. Since the intensity declines according to the Lambert–Beer law (Eq. 3.76) as I = I0 exp[−μx], the decrease in intensity per thickness x is given by dI = −μ. dx

(4.280)

With n¯ photons each of energy E, the skin dose D (energy deposited per mass) is then Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.9 From exposure to radiation dose

185

12

Water window

7.5

11 10 9 8

200

100 12

70

30 20

7.5

9

10

7

9 11

5

10 9

8

8

10

12

,FHWKLFNQHVV ѥP

50

7.5

8 7.5

2

7.5

3 7

7

Contour N: dose in 10N Gy for x-ray imaging of 10 nm protein in amorphous ice

7

1 0.1

0.2 0.3

0.5 0.7 1

2

3

5

7 10

20

3KRWRQHQHUJ\ NH9 Figure 4.77 Contour plot and image of the radiation dose estimated for 10 nm resolution imaging of protein features in ice, as a function of x-ray energy and of ice thickness. The contour lines are labeled as the power of 10 of radiation dose in Gy; that is, a contour line of 9 means 109 Gy. These contour lines lie above a gray-scale representation of dose with lower dose as dark grey and higher dose as light grey. This figure shows that the “water window” spectral region between the carbon and oxygen K absorption edges offers the lowest dose for samples up to a few micrometers thick, and that for thicker specimens phase contrast at several keV hard x-ray photon energies begins to offer radiation dose advantages [Wang 2013, Du 2018]. Figure adapted from [Du 2018].

given by the fraction of photons lost from the transmitted beam due to absorption, or D = n¯

dI E Eμ = n¯ 2 dx ρΔr ρΔ2r

(4.281)

where ρ is the density of the absorbing material. One can also make use of the expression for μ from Eq. 3.45, atom number density na from Eq. 3.21 (with Section 3.3.5 describing the calculation for molecules and mixtures), and α as defined in Eq. 3.66 to write the skin dose as α (4.282) D = n¯ 4π hc 2 f2 ρΔr where hc  1240 eV·nm as given in Eq. 3.7. The expression of Eq. 4.282 gives one the impression that radiation dose decreases at higher x-ray energies E, due to the fact that f2 declines as about E −2 (see Fig. 3.16). However, one must also account for the number of photons n¯ required to see a small structure. If we use the expression of Eq. 4.266 for the required number of photons n¯ abs for absorption contrast microscopy, we can write the necessarily delivered skin dose D Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

186

Imaging physics

absorbed by the feature (thus using α f , ρ f , and f2, f in Eq. 4.282) as Dabs =

α f f2, f 25 E 2 1 . 2π hc ρ f Δ2r t2 |α f f2, f − αb f2,b |2

(4.283)

while from Eq. 4.267 the dose for Zernike phase contrast imaging is given by Dzpc =

α f f2, f 25 E 2 1 . 2 2 2π hc ρ f Δr t |α f f1, f − αb f1,b |2

(4.284)

Let’s look in particular at the expression for phase contrast imaging at multi-keV photon energies. At these energies, f2, f ∝ E −2 , while f1, f  Z f and f1,b  Zb , as was shown in Fig. 3.16. Thus we see that the E 2 term and the E −2 scaling of f2, f largely cancel each other out, so at higher energies the radiation dose for Zernike phase contrast imaging tends toward a constant value. This can be seen in a more detailed numerical calculation for imaging copper in silicon, and protein in ice, in Fig. 4.76; it shows that absorption contrast for thin features tends to be minimized at lower photon energies, while the dose required for phase contrast imaging is less strongly biased toward low energies. Another view of the same type of calculations is shown in Fig. 4.77. Finally, more detailed calculations are presented elsewhere [Du 2018] which include effects such as inelastic scattering, which become of increasing importance at higher energies.

4.9.1

Dose versus resolution The dose expressions of Eq. 4.283 and 4.284 reveal something else very important about x-ray microscopy. In both cases, the dose scales as the inverse of pixel size squared and thickness squared, or 1 (4.285) D∝ 22 Δr t (see also Eq. 4.269). Real-life features often have a width Δr that matches their thickness t, so we see that the radiation dose (which, of necessity, must be imparted to image features of a certain size) tends to scale with the fourth power of decreases in their size. That is, to double of the spatial resolution, one must increase the x-ray fluence F and radiation dose by a factor of 24 = 16. This leads to challenges in radiation dose imparted to the specimen, as shown in Fig. 11.7 and discussed further in Section 11.3.4. In fact the story may not be quite so simple. In examination of power spectra of images such as Figs. 4.19 and 4.49, or coherent diffraction intensity recordings such as will be discussed in Chapter 10, we have always seen that there is a power law decline in Fourier plane power of I(ur ) ∝ u−a r , as indicated in Eq. 4.97. However, the power law parameter a varies from example to example: • In Fig. 4.19, we found a = 2.95. Of course, this is from a visible-light photograph, rather than a x-ray micrograph. • In x-ray fluorescence images of trace element distributions in frozen hydrated cells [Deng 2015c] as shown in Fig. 4.49, values of a have varied from 2.78 for sulfur, to 2.81 for phosphorous, to 2.94 for potassium.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.10 Comparison with electron microscopy and microanalysis

187

The power spectrum from an image is calculated by taking the Fourier transform of the images’ transmission signal and then squaring it (Section 4.3.4). This should be identical to the far-field or Fraunhofer diffraction pattern of the object (Section 4.3.5). For far-field diffraction patterns, again a power law relationship seems to hold, with power law parameters a as follows: • In 0.75 keV soft x-ray coherent diffraction imaging experiments with plunge-frozen, freeze-dried yeast cells, a value of a  2.6 over spatial frequencies of 1–20 μm−1 and a  4.2 over spatial frequencies of 20–50 μm−1 has been observed [Shapiro 2005]. In 7.9 keV x-ray nanodiffraction experiments, a value of a = 3.19 was observed from initially living hydrated fibroblasts while a value of a = 3.89 was observed from chemically fixed, hydrated cells in the spatial frequency range 200– 500 μm−1 [Weinhausen 2014]. • Calculations based on scattering from a complex electron density for cubic 3D objects lead to a value of a = 4 [Howells 2009, Villanueva-Perez 2016]. An earlier paper obtains values of a = 3 for constant sample volume, and a = 4 for the case of spherical 3D objects [Shen 2004]. • In simulations of coherent diffraction patterns resulting from various objects, values of a range from 3.3 for random-thickness protein distributions and simulated cells [Huang 2009a], to 3.83 for 2D and 3.95 for 3D gold particles that are approximately spherical [Schropp 2010c]. • In small-angle x-ray scattering (SAXS), a cosine expansion of a spherical function gives dominant terms with a = 2 and a = 4. Departures from a pure spherical shape enhance the a = 4 term. While of course there are differences between the SAXS patterns of different characteristic object shapes such as spheres and rods (see for example [Svergun 2003, Fig. 5]), it is generally observed that the SAXS intensity follows a I(q)  q−4 dependence, with a relationship between momentum transfer q and spatial frequency u as given in Eq. 4.57. This q−4 dependence is sometimes referred to as Porod’s law [Porod 1982]. So what is the real answer? The safest one is to say there may be some dependence on the specimen, and the measurement method, but that the scaling is usually between the inverse third and fourth power (that is, a is between 3 and 4 in Eq. 4.97). One way to find the scaling for a particular specimen and imaging mode is to acquire images with several different exposure times, and measure the dose-dependence of the achieved spatial resolution as measured using power spectra estimates (Eq. 4.97 and Fig. 4.73) or the Fourier ring correlation (FRC) approach given in Eqs. 4.252 and 4.253.

4.10

Comparison with electron microscopy and microanalysis This is a book about x-ray microscopy, so it is quite natural that we have not said much about microscopies using other radiation. Nevertheless, it is important to understand x-ray microscopy in the context of other techniques so that one can use the right microscope type for the imaging task at hand.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

188

Imaging physics

4.10.1

Elemental mapping In Section 4.4.3, we included brief comments on superresolution light microscopy which, along with conventional fluorescence light microscopy, allows one to image specific fluorophores in cells. These fluorophores are detected by using shorter wavelength light to excite certain electronic transitions in atoms and molecules, with longer wavelength light used to detect the eventual response. Fluorescence light microscopy can be used to track the motion of single molecules within living cells, whereas in Chapter 11 we will see that x-ray microscopes are not able to take more than one image of a living cell due to radiation damage limitations. However, microscopies that can image total elemental content based on core-level electron transitions offer important complementary capabilities. Molecular fluorophores require atoms to be in a specific chemical configuration in order to be labeled and excited, while core-level electrons are largely impervious to chemical bonds (Section 3.1.3); as a result, one can use core-level electrons to measure the total content of a particular element independently of binding affinities. In materials science, visible-light fluorescence plays a lesser role, and again core-level electron transitions are often used to find both major and minor constituents of a material. Core-level electron transition energies can be approximated using the Bohr energy of Eq. 3.3, and the actual energies are shown in Fig. 3.2 and are well tabulated [Bearden 1967, Zschornack 2007]. One can remove electrons from these states using a probe with an energy above these binding energies, after which either x-ray fluorescence or Auger electron emission results, as described in Section 3.1.1. Important differences arise when using different energetic probes of different types, however. Electrons are the lightest charged particle, and they can be focused to exquisitely small, sub-0.1 nm focus spots in modern aberration-corrected electron microscopes (and sub-5 nm spots even in inexpensive scanning electron microscopes). However, when electrons enter a material they can sometimes swing around the strong point charge of an atom’s nucleus and produce a broad spectrum of continuum or Bremsstrahlung radiation, as will be shown in Fig. 7.4. This produces a background signal within which one must detect electron-excited x-ray fluorescence emission. In addition, in thicker materials an electron beam begins to undergo significant side-scattering (Fig. 11.4), so the volume from which x-ray fluorescence is emitted begins to become larger than the lateral beam direction, thus affecting the spatial resolution. Nevertheless, scanning electron microscopes equipped with energy-resolving detectors (Section 7.4.10), known sometimes as electron microprobes as used for electron probe microanalysis (EPMA), serve as powerful, low-cost, compact laboratory instruments for imaging the distribution of various elements in a material [Jercinovic 2012, Rinaldi 2015]. Electron microscopes can be used for imaging core-level electron transitions in another way. As an electron beam enters a thin specimen, some fraction of the electrons can undergo inelastic scattering where they transfer energy to bound electrons in the material. An electron spectrometer can then be used to measure the energy of these inelastically scattered electrons in a technique known as electron energy-loss spectroscopy (EELS). Part of an EELS spectrum for 100 kV electrons in amorphous ice was shown in Fig. 3.15; the EELS spectra for both amorphous ice and Epon (a plastic embedding

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.10 Comparison with electron microscopy and microanalysis

10-1

189

22.4 eV

10-2 20.7 eV

Fraction/(0.1 eV)

-3

10

Epon single-scatter (arbitrary fraction)

10-4 Amorphous ice

10-5

10-6 Amorphous ice single-scatter

-7

10

0

100

200

300

400

500

600

700

Electron energy loss ƋE (eV)

Figure 4.78 Electron energy loss spectroscopy (EELS) measurement in amorphous ice and in Epon, a plastic embedding medium. This figure shows the zero-loss peak for amorphous ice, the plasmon resonances (shown in greater detail for amorphous ice in Fig. 3.15), and the carbon K (290 eV) and oxygen K (540 eV) edges in the energy loss spectra. For amorphous ice, the as-measured spectrum shown in dark blue includes plural inelastic scattering effects, while the light blue spectrum shows the single-scatter spectrum σinel (ΔE) calculated using a Fourier-log deconvolution approach [Johnson 1974, Wang 2009a]. Amorphous ice data courtesy of Richard Leapman, National Institutes of Health (similar to results shown in [Leapman 1995]), and Epon data courtesy of Ming Du, Qiaoling Jin, and Kai He, Northwestern University.

medium) are shown over a broader range in Fig. 4.78, where one can see a step-like rise in inelastic cross sections at 290 eV (carbon K edge in Epon) and 540 eV (oxygen K edge in amorphous ice). One can combine EELS spectroscopy with imaging systems either in scanning or in full-field (transmission) electron microscopes, in a technique that electron microscopists refer to as spectrum imaging [Jeanguillaume 1989, Hunt 1991b] and which x-ray microscopists refer to as spectromicroscopy (Section 9.1). EELS works best when the sample is no thicker than about one inelastic scattering mean free path Λ, and this distance will be shown in Fig. 4.80. For sufficiently thin specimens, EELS can offer exquisite sensitivity for trace element imaging of light elements [Aronova 2011]. Protons can also be used to excite core-level x-ray fluorescence in proton microprobes in a technique sometimes called proton-induced x-ray emission or PIXE. Here, the sensitivity for trace element imaging can be much higher, since protons have about 2000 times the mass of electrons and thus produce dramatically lower levels of continuum radiation or Bremsstrahlung. However, the higher mass of protons means they also can transfer much more energy to the atom, including so-called “knock-on” damage in which they go beyond ionizing (and thus disrupting) an atom’s electronic state but displace entire atomic nuclei as in a microscopic game of billiards. Still, the high sensitivity of proton microprobes (due to lower levels of continuum radiation) means that they play an important role in trace element analysis [Ryan 2000, Mulware 2015]. Finally, one can use X rays to remove inner-shell electrons by absorption and thus Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

190

Imaging physics

10-8 pf LƠ pf KƠ

10-9

ef LƠ

10-10

10-11 XabsN

10-13

Xf LƠ

Xa M

eelsL

ee K ls

10-12

Xa K

Mass•dose in joules

ef KƠ

Xf KƠ Xa L

10

20

30

40

50

60

70

80

90

Atomic number Z Figure 4.79 Sensitivity–radiation dose product for the detection of a given mass of trace elements. The plot shows methods using X rays, electrons, and protons (pf , ef , and Xf ) to generate x-ray fluorescence at various characteristic lines Kα and Lα. X-ray differential absorption contrast Xa is also shown at various absorption edges K, L, M, and N. Finally, electron energy loss spectroscopy (EELS) is shown as eels for K and L edges. These estimates assume that the mass m of the element to be detected is present as a trace quantity in a “matrix” or majority material that is organic, and the dose D is as delivered to this organic matrix. The differential x-ray absorption curves Xa were calculated for an areal mass density of m/Δ2r = 10−7 grams/cm2 , with a scaling as (m/Δ2r )−1 for other areal densities with a pixel size of Δr . Figure re-drawn from [Kirz 1980b].

excite the production of x-ray fluorescence in scanning x-ray fluorescence microscopes or x-ray microprobes. While this will be described in more detail in Chapter 9, for the moment we note that because x-ray photons have no electrical charge they do not swing around nuclei so no continuum radiation is produced. As a result, x-ray-excited x-ray fluorescence spectra show a very high peak-to-background ratio so that very high sensitivity is obtained for studies of trace elements in materials (Section 9.2.3). In addition, the x-ray beam itself does not undergo blurring due to side-scattering, since the cross section for absorption of X rays is so much higher than the elastic or inelastic scattering cross section at most energies, as shown in Fig. 3.10. The relative merits of using protons, electrons, or X rays to stimulate x-ray fluorescence for elemental analysis are discussed in greater detail in a number of older papers [Birks 1964, Birks 1965, Cooper 1973, Goulding 1977, Horowitz 1978, Kirz 1978, Kirz 1980a, Grodzins 1983a] and a more recent review [Janssens 2000a]. Of particular note was the first demonstration of x-ray fluorescence analysis using synchrotron radiation at the Cambridge Electron Accelerator [Horowitz 1972]. A more detailed inDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.10 Comparison with electron microscopy and microanalysis

191

troduction to the topic was provided by Sparks [Sparks 1980], and several other papers introduced the gains in elemental detection sensitivity available using synchrotron radiation [Sparks Jr. 1979, Gordon 1982]. Accurate predictions of sensitivity, required exposure, and associated radiation dose depend strongly on the details of the sample and the experimental setup (with further details for x-ray-induced x-ray fluorescence microscopy provided in Section 9.2), but one comparison [Kirz 1980b] which considers radiation dose is shown in Fig. 4.79. This latter comparison included estimates for electron energy-loss spectroscopy, as will be discussed in the following section, and differential x-ray absorption, as will be discussed in Section 9.1.1. As the figure shows, X ray induced x-ray fluorescence offers the best combination of sensitivity and minimal radiation dose for elements heavier than about Z = 20 (calcium), and EELS and x-ray differential absorption are better for lighter elements only if one can prepare sufficiently thin specimens. Other trace element mapping techniques such as laser-ablation mass spectrometry [Becker 2014] or nanoscale secondary ion mass spectrometry (SIMS) [Moore 2011] offer high sensitivity but are destructive to the specimen. Surveys of these and other trace element detection methods exist [Brown 2005].

4.10.2

Transmission electron microscopy Electron microscopes have a fundamental advantage of small wavelength relative to xray microscopes, which in most cases leads to higher spatial resolution. For an electron with a kinetic energy Ek given by acceleration over a given voltage V (for example, V = 300 kV for a Ek = 300 keV electron), the Lorentz factor γ of special relativity is given by 1 Ek = (4.286) γ =1+ me c 2 1 − β2 where Einstein’s famous formula gives me c2 = 511 keV as energy associated with the rest mass of an electron and  v (4.287) β = = 1 − 1/γ2 c is the electron’s velocity v relative to the speed of light c. The relativistic momentum is given by p = γβme c, so that the electron’s de Broglie wavelength (Eq. 3.5; see also Eq. 3.7 for the numerical value of hc) becomes hc hc h = = p pc γβme c2 1240 eV · nm , = γ 1 − 1/γ2 · (511 × 103 eV)

λ=

(4.288)

or 0.0037 nanometers for a 100 keV electron. Therefore the spatial resolution in electron microscopes is not limited by the electron wavelength, and electron lenses with low numerical aperture can still show very high resolution. Instead, the spatial resolution of electron microscopes is usually limited by spherical aberrations in electron Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

192

Imaging physics

600

Mean free path ƒ (nm)

500

400

Ice 300

ƒel

ƒ inel

Protein ƒel ƒ inel

200

ƒ inel ƒel Si

100 0 80

100

200

300

400

Electron energy (keV) Figure 4.80 Electron mean free paths Λel and Λinel for elastic and inelastic scattering,

respectively, in protein, ice, and silicon as calculated using approximate formulae [Langmore 1992]. The shorter mean free path for inelastic scattering in protein and ice means that this interaction is more probable than elastic scattering; each inelastic scattering event involves a mean energy deposition ΔE (Eq. 4.290) of about 40 eV in soft materials. Inelastic mean free paths for low-energy electrons in several materials are shown in Fig. 6.9.

optics to about 0.15 nm, though aberration correction systems [Hawkes 2015] are now making it possible to achieve a spatial resolution of 0.05 nm or better (with electron ptychography – discussed in Section 10.4 – also delivering sub-0.1 nm resolution images [Jiang 2018]). The very high resolution of electron microscopy only applies to thin materials, and (like with x-ray microscopy) one must be concerned with radiation damage limitations especially for studies of soft and biological materials, as discussed in Chapter 11. One can use the same considerations of image contrast and required quanta as discussed in Section 4.8.1 to understand exposure requirements in electron microscopy, and go on to estimate the relationship between image resolution, specimen thickness, and required radiation dose. However, electron interactions are somewhat different than x-ray interactions with materials: • Whereas absorption dominates over elastic scattering in x-ray interactions as shown in Fig. 3.10, electrons are rarely absorbed but instead undergo plural elastic and inelastic scattering in materials. One can obtain estimates for atomic cross sections σ and mean free paths Λ (as we have done for X rays in Section 3.2) to about 15 percent accuracy over the electron kinetic energy range of interest (30–300 keV) using some simple expressions [Langmore 1992] that extend earlier work [Langmore 1973, Wall 1974]. The resulting mean free paths shown in Fig. 4.80 indicate that electrons are strongly interacting. One can use these exDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.10 Comparison with electron microscopy and microanalysis

300 kV electrons in Si

1

300 kV electrons in amorphous ice

1

Single scattered [1el] 0.01

0.001 0

In

el

Fraction of intensity

ered catt Uns scat] [no

Fraction of intensity

Scattered beyond objective aperture [out]

0.1

as

tic

al

ly

sc

at

te

re

d

[in

el

Plural scattered [multel]

500

1000

]

1500

2000

Si thickness in nm

193

0.1 Single scattered [1el]

Scattered beyond objective aperture [out] Inela stic ally sca ttere d [in el] Unscattered [noscat]

0.01 Plural scattered [multel] 0.001 0

1000

2000

3000

Amorphous ice thickness in nm

Figure 4.81 Electrons sorted out into interaction categories for two materials: silicon, and amorphous ice (ρ = 0.92 g/cm3 ), which the latter of which is the background material when imaging hydrated biological specimens. This plot was calculated as described elsewhere [Du 2018] for electrons with a kinetic energy of 300 keV, representing the energy at which most intermediate voltage electron microscopes (IVEMs) operate. The electrons scattered outside the acceptance of the objective aperture [out] lead to a general darkening of the image with increased sample thickness, while the inelastically scattered electrons [inel] have had their energy and thus de Broglie wavelength changed; as a result, they contribute an out-of-focus “haze” to the recorded image unless they are removed by an in-column energy filter (delivering a so-called “zero loss” image [Schr¨oder 1990]). This usually sets the limit on overall sample thickness, though for radiation-hard specimens one can use high-angle darkfield (HADF) imaging for thicker specimens, though with a loss in spatial resolution [Ercius 2006]. Phase contrast is obtained by mixing the unscattered and singly elastically scattered electrons [noscat and 1el]. Figure adapted from [Du 2018].

pressions in a transport model to understand as a function of specimen thickness what fraction of electrons are unscattered, what fraction undergo single and plural elastic scattering, what fraction have their energy (and thus focusing properties with electron optics) changed due to inelastic scattering, and what fraction are scattered to large angles [Langmore 1992, Du 2018]. For molecular resolution imaging, one obtains phase contrast images from the interference of unscattered and single scattered electrons, and one can see from Fig. 4.81 that this interference declines rapidly for specimen thicknesses in the hundred nanometer range. • At coarser resolution scales than for molecular imaging, one can estimate a refractive index for electron interactions in media based on the inner potential [Bethe 1928, Lenz 1954, Wyrwich 1958], whereby electrons are sped up in a media as they “see” the strongly concentrated positive charges of the nuclei of atoms (the atoms’ electrons appear more diffuse to high-energy electrons). This inner potential Ui leads to a refractive index [Reimer 1993, Eq. 3.20] of n=1−

Ui E + me c2 , E E + 2me c2

(4.289)

which is about n = 1 + 1.6 × 10−5 for carbon at 300 keV, based on measurements of Ui  −7.8 eV obtained using electron biprism interference experiments [Keller Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

194

Imaging physics

1961]. Thus one has phase contrast in this case with a strength that is somewhat comparable to that seen in x-ray microscopy where n = 1 − δ (Eq. 3.67). • Unlike the case with X rays, electrons are not simply absorbed with the deposition of all of their energy in one interaction. Instead, they undergo inelastic scattering with a loss per inelastic scattering event that can be calculated from an EELS measurement of the inelastic scattering energy spectrum with cross section, such as was shown in Fig. 4.78. From such a spectrum, one can use Eq. 4.10 to calculate the average energy deposited per inelastic electron scattering event ΔE as 1∞ (ΔE) σinel (ΔE) d(ΔE) , (4.290) ΔE = 0 1 ∞ σinel (ΔE) d(ΔE) 0 provided σinel has been corrected for plural inelastic scattering effects using an approach such as Fourier-log deconvolution [Johnson 1974, Wang 2009a]. Applying this to the single-inelastic scattering spectra shown in Fig. 4.78 yields ΔE = 39.3 eV for amorphous ice, and ΔE = 38.6 eV for Epon (similar to earlier measurements showing ΔE  37 eV in nucleic acids [Isaacson 1973, Isaacson 1975]). The energy ΔE is well above the energy of chemical bonds (Box 3.2), but it is also well below the energy deposited per x-ray absorption event; this has consequences for atomic resolution imaging of soft materials, as described in Box 4.9. • Once one has used the properties of electron interactions to calculate the required exposure n¯ TEM of electrons per pixel of width Δr , one can calculate the radiation dose DTEM based on the fraction of events where an electron undergoes inelastic scattering as n¯ TEM dE/dx n¯ TEM ΔE = . (4.291) DTEM = ρ Δ2r Δ2r Λinel ρ Using Λinel = σinel na and a parameterized approximation [Langmore 1992] for σinel for protein, Eq. 4.291 gives an estimate for the radiation dose associated with a 1 e− /nm2 exposure of 32, 21, and 17 kGy for 100, 200, and 300 keV electrons in protein, respectively. The damage caused to soft materials by these radiation doses will be discussed in Section 11.2.1. One can use these characteristics of electron interactions to estimate the required electron exposure and resulting radiation dose in electron microscopy.

4.10.3

A comparison of transmission imaging with electrons and with X rays The calculations above allow one to compare electron and x-ray microscopy for transmission imaging of organic material in a background of ice, representative of a biological specimen imaged at cryogenic temperatures (see Section 11.3). Detailed calculations of this sort have been described for absorption contrast in x-ray microscopy in a landmark pair of papers [Sayre 1977b, Sayre 1977a], and the x-ray calculations were then extended for phase contrast x-ray microscopy [G¨olz 1992, Jacobsen 1992a]. Early calculations for thick specimen imaging in electron microscopy are also available

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.10 Comparison with electron microscopy and microanalysis

195

Box 4.9 Electrons versus X rays for atomic resolution imaging In a seminal paper published in 1970, Breedlove and Trammel considered the limitations that radiation damage sets for atomic-resolution imaging of molecules [Breedlove Jr. 1970] (other authors have since carried out a similar analysis [Mueller 1977, Henderson 1995]). Let us consider a simplified version of their analysis, for the case for X rays and electrons only. With electrons, Fig. 4.80 shows that the mean free path for 300 keV electrons in protein is Λinel = 250 nm for inelastic scattering (which deposits ΔE = 39 eV) and Λel = 420 nm for elastic scattering. This means that for each elastic scattered electron, one deposits about (420/250) · 39 = 66 eV of energy. Since one must scatter at least 25 electrons to detect the presence of an atom at a position with SNR = 5, this means at least 25 · (420/250) · 39 = 1, 600 eV (electrons) of ionizing energy is deposited per atom which is more than enough to break all of the atom’s bonds (the implications for the ultimate resolution in electron microscopy of organic materials are discussed in Box 4.9). With 10 keV X rays (where the wavelength is at atomic dimensions), the data plotted in Fig. 3.10 indicate that the cross section for absorption is about 13 times stronger than for elastic scattering, so by Babinet’s principle (Fig. 4.58) photon absorption will dominate the scattering process used to detect an atom. As a result one will have deposited about 25 · 13 · 10, 000 = 3, 300, 000 eV (X rays) of ionizing energy per atom in order to see it with 25 scattered photons. As a result, radiation damage prevents both electrons and X rays from obtaining atomicresolution images of soft materials, but the problem is 2,000 times worse for X rays. Of course there are ways around this fundamental limitation; in electron microscopy, single particle methods [Frank 1975a, Frank 1981, Frank 2002] are used to divide the radiation dose among many images of identical molecules, and this approach is being used in x-ray free-electron laser experiments, as will be discussed in Section 10.6. Crystallography takes a similar approach by combining information from many identical molecules with the added advantage that they are all at identical orientations and arranged in regular spacing in a crystal lattice. At larger length scales, the relative merits of x-ray and electron microscopy are changed from this atomicresolution picture, so that x-ray microscopes become advantageous for specimen thicknesses beyond about 1 micrometer, as shown in Fig. 4.82.

[Grimm 1998]. A more recent paper uses the same set of assumptions for both x-ray and electron microscopy to provide a more direct comparison for transmission imaging [Du 2018]). In Fig. 4.82, we show this paper’s result of using Eq. 4.291 for electron microscopy, and Eq. 4.281 for x-ray microscopy, for the imaging of 10 nm resolution protein features in varying thicknesses of ice. This figure encapsulates a number of features of what we see as the relative merits of x-ray and electron transmission imaging: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

196

Imaging physics

1012 1011

10 keV “hard” X rays

109 108

0.5 keV “soft” X rays

U

105 0.05 0.1

U WH ILO \

R

Q

QH

106

\

UJ

H HQ

UJ

107

H ILOW

H

Dose in Gy

1010

QPUHVROXWLRQ SKDVHFRQWUDVW LPDJLQJRISURWHLQ in ice

300 keV electrons 0.3

1

3

10

30

100 200

,FHWKLFNQHVVLQѥP Figure 4.82 Radiation dose calculated for phase contrast imaging of 10 nm protein features in ice, as a function of ice thickness. This calculation includes 300 keV transmission electron microscopy with and without the use of an energy filter for “zero loss” imaging, 0.5 keV soft x-ray microscopy in the “water window” spectral region between the carbon and oxygen K absorption edges, and 10 keV hard x-ray microscopy. For ice thicknesses less than about 0.5 μm, electron microscopy imparts a lower radiation dose for high-resolution imaging for the reasons discussed in Box 4.9; these thicknesses are compatible with imaging macromolecules and viruses, and even archaebacteria [Grimm 1998] and peripheral regions of eukaryotic cells [Medalia 2002]. However, x-ray microscopy offers lower dose for imaging whole eukaryotic cells and tissues. A similar calculation result is shown elsewhere [Du 2018].

• For specimens with a thickness of a few hundred nm or less, electron microscopy offers lower radiation dose and higher spatial resolution for imaging at an equivalent resolution for the reasons described in Box 4.9. For biological applications, this applies to imaging macromolecules and even very large viruses [Xiao 2005, Xiao 2009]; archaebacteria [Grimm 1998] and peripheral regions of eukaryotic cells [Medalia 2002] are also accessible with electron microscopy. There may still be cases in which x-ray microscopy would be preferred for these smaller samples, such as if one wanted to include information on trace element distributions, as described in Section 4.10.1. • For specimens with a thickness of a micrometer or more, electron microscopy quickly becomes too difficult due to the basic properties of electron interactions as shown in Figs. 4.81 and 4.82. Here, x-ray microscopy offers the ability to image whole eukaryotic cells, as will be highlighted in Section 12.1. In materials science, x-ray microscopy can be used to image circuit features even within unthinned silicon wafers as will be described in Section 12.4. • For spectroscopic imaging of chemical binding states and electronic configurations in materials, one can compare EELS, and its near-absorption-edge cousin of enDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

4.11 See the whole picture

197

ergy loss near-edge spectroscopy (ELNES), against x-ray absorption near-edge spectroscopy (XANES; see Section 9.1.2). Because the inelastic mean free path for electron scattering is often significantly shorter than the absorption length for x-ray absorption 1/μ (Eq. 3.75), EELS offers greater sensitivity when considering trace quantities in thin specimens (Fig. 4.79) and furthermore EELS allows one to study several chemical elements at one time provided the electron spectrometer has an appropriate combination of spectral range and resolution. However, ELNES spectra appear on a background of plural inelastic scattering from the plasmon modes (see Fig. 4.78), whereas x-ray interactions are largely free from multiple-scattering complications (see Fig. 3.10). Energy deposition into the plasmon modes is dominant in EELS, and absent in x-ray absorption spectroscopy, so x-ray absorption spectroscopy offers lower dose for inner-shell electron spectromicroscopy studies of light materials [Isaacson 1978, Rightor 1997], while plasmon-mode EELS can be superior in some cases [Yakovlev 2008]. Our perspective is that electron and x-ray microscopy offer important complementary capabilities.

4.11

See the whole picture In this chapter we have embraced the odd nature of light. We have used a wave description for imaging theories based on a Fourier grating decomposition of an object, and for how a wavefield evolves as it reaches downstream planes. We have used a discrete photon model for estimating the illumination required to see a certain object, and the radiation dose imparted during imaging. As we said at the start of Section 3.3, we treat a photon as a particle on Mondays and Wednesdays, and as a wave on Tuesdays and Thursdays (allowing for three-day weekends, which are so common in the life of scientists). It is worthwhile then to conclude this chapter by reminding ourselves of a view of how these different pictures work together. The count degeneracy parameter δc defines the number of photons per phase space area per coherence time [Goodman 2015, Sec. 9.3]. Consider the example of the FLASH free-electron laser at Hambug [Singer 2012], which can deliver about 7 × 1012 photons per pulse with about 65 percent of the pulse being delivered into a single spatially coherent mode (the self-amplified spontaneous emission or SASE mechanism of free-electron lasers – discussed in Section 7.1.8 – means that one does not get pure single-mode emission). The pulse length is about 100 femtoseconds, whereas a measurement of the typical spectral bandwidth yields a coherence time of about 2 fs so only about 2 percent of the pulse is delivered within a coherence time. Thus one has a degeneracy parameter δc of about  photons (4.292) δc = 1012 · (0.65 spatial) · (0.02 temporal)  9 × 1010 , pulse so there are many photons in a single spatially and temporally coherent mode. Now consider a synchrotron beamline with a flux of 1010 photons/second after spatial filtering

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

198

Imaging physics

to get a single spatially coherent mode. As will be seen in Table 7.1, this might come in the form of pulses of 34 ps in duration and 153 ns spacing; thus in 1 s there are 6.5×106 pulses so there are 1.5×103 photons per 34 ps pulse. If a silicon monochromator is used to spectrally filter the beam to a bandpass of Δλ/λ = 10−4 , then the coherence time is found from 104 waves with a time per wave of T = λ/c = 4 × 10−19 s, yielding a coherence time of 4 × 10−15 s The temporally coherent fraction of a pulse is thus (4 × 10−15 )/(34 × 10−12 ) = 1.2 × 10−4 , so one arrives at a degeneracy parameter of about  3 photons δc = 1.5 × 10 (4.293) · (1 spatial) · (1.2 × 10−4 temporal)  0.2 pulse for this example synchrotron source. This means that separate photons do not have much overlap with each other in the optical system in today’s synchrotron light source experiments (most laboratory sources have even lower degeneracy parameters). Consider now an x-ray microscope at a synchrotron light source. Because the count degeneracy parameter is small, we must treat each photon as an individual event; in other words, there is only one photon in the microscope at a time. That photon emerges from the accelerator, and it experiences the entire monochromator and exit slit so that its spectral properties are determined by its wave characteristics. The photon’s wavefield is then modulated by the condenser lens so that the wavefield becomes confined to the illumination region at the sample. The wavefield interacts with the entire sample in the ways we have described in terms of Fourier decomposition and OTF (Section 4.4.7), and it then propagates to the objective lens where again the wavefield is modulated as described in Section 4.3.6. Finally, the wavefield reaches the image detector, at which point something else happens: a photon is absorbed at some point on the detector according to a probability distribution given by the wavefield times its complex conjugate. And then the next photon comes, and the process is repeated! One can see images forming from the accumulation of photons [Hecht 2002, Fig. 1.1], or electrons [Tonomura 1993, p. 14]; these examples make it clear that each photon (or electron) must sample the entire optical system and specimen before arriving at some location with a wave-determined probability.

4.12

Concluding limerick How does light know how to work such magic? It demands a limerick: A microscope works with a wave And ends with a photon so brave One by one; not by swarm Is how wavefields do form Together an image to save!

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005

5

X-ray focusing optics

Janos Kirz contributed to this chapter. In our everyday lives, we take lenses and mirrors for granted. They focus visible light and form images with ease. All this is made possible by the availability of materials that have an index of refraction n that is substantially different from 1, as described in Eq. 3.62 and Appendix B.1 online at www.cambridge.org/Jacobsen. In the case of lenses one normally uses glass with a real index of refraction of n = 1.3–1.5, while metal-coated visible-light mirrors can have near 100 percent reflectivity over a broad range of incidence angles. In the X ray region of the electromagnetic spectrum, we have seen from Eq. 3.67 and Appendix B.2 that the refractive index n = 1 − δ − iβ is complex, and only slightly less than 1. Hence x-ray optics tends to be very different from optics for visible light, as the zoology of approaches shown in Fig. 5.1 make clear. We discuss the principles behind these optics in this chapter.

5.1

Refractive optics Simple refractive lenses have a focal length f as given by the lens-maker’s formula (Eq. 4.165) of  1 1 1 = (n − 1) − , (5.1) f R1 R2 where R1 and R2 are the radii of curvature of the lens surfaces, and n is the refractive index. For a double-convex lens for visible light, R1 is positive, while R2 is negative (since the centers of curvature lie on opposite sides of the lens). For glass with a refractive index difference from vacuum of n−1 ∼ 0.3–0.5, radii of order of a few centimeters lead to a focal lengths of similar magnitude. For X rays with n − 1 = −δ which is of order 10−5 , the situation becomes very different: a double-convex lens has a negative focal length and causes a parallel incoming beam to diverge rather than converge. In addition, the focal length for centimeter-scale radii of curvature will not be centimeters, but rather distances approaching a kilometer! To make matters worse, X rays will be attenuated as they pass through the lens. Hopeless? Many people including Paul Kirkpatrick [Kirkpatrick 1948a, Kirkpatrick 1949a], Alan Michette [Michette 1991], and the author, believed that refractive optics would never be practical for X rays.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006

200

X-ray focusing optics

Compound refractive lens (half)

Fresnel zone plate Multilayer Laue Lens

Source

nce

Grazing incide mirror

Normal incidence multilayer mirror Image

nce Grazing incide r multilayer mirro

Figure 5.1 A zoology of several x-ray focusing optics types. These optics are shown

superimposed upon a set of ellipses representing large-incidence-angle reflective optics that could transfer light from the source to the image. In the case of multilayer optics, only the high refractive index layers are shown; materials with a lower refractive index lie between these layers. This figure was inspired by one shown by Spiller [Spiller 1994, Fig. 4.1].

As is frequently the case, a superficial analysis is unjust. The story was set straight first by Bingxin Yang [Yang 1993]. Given that the ratio of phase shifting to absorptive parts of the x-ray refractive index δ/β = f1 / f2 increases with x-ray energy and also with lighter materials (see Fig. 3.16), he echoed Kirkpatrick’s suggestion that refractive lenses would work best using light materials such as beryllium to focus hard X rays. Since converging lenses would have to be concave, the thickness of the material near the optic axis could be kept to a minimum, thereby minimizing absorption. Yang also considered Fresnel lenses to further reduce absorption in the material. He pointed out, as Kirkpatrick had before [Kirkpatrick 1949a], that it may be easier to make cylindrical lenses producing line foci, and use two of these in a crossed configuration to generate a point focus, and furthermore he lauded “the benefit of manipulating beams with multiple lenses, especially in situations where one single element could not satisfy our need.”

5.1.1

Compound refractive lenses The experimental breakthrough came just a few years later [Snigirev 1996] in a delightfully simple way: a series of holes were drilled in a metal block to make a 1D lens as shown in Fig. 5.2 (a USA patent application on such an approach was filed by Tomie the year before [Tomie 1997]). The two surface pairs between each hole act as a cylindrical lens with focal length f = R/(2δ). It is easy to show that a set N lenses, each of focal length f , have a net focal length of f /N when placed in close proximity to each other (that is, their spacing is small compared to f ). As a result, one can make a compound refractive lens (CRL) with N lens surfaces that has a net focal length of f =

R R = , 2Nδ 2Nαλ2 f1

(5.2)

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006

5.1 Refractive optics

201

Figure 5.2 Schematic of a compound refractive lens for x-ray focusing. The first demonstration

[Snigirev 1996] was done by drilling a series of holes in a metal block, leading to double-concave lens elements as indicated. Today it is more common to either use lithography with directional etching for 1D lenses, or to use mandrils to press into materials to produce 2D lens elements which can be combined as shown at right. In both cases parabolic profiles can be fabricated to reduce spherical aberration, though only spherical profiles are shown here.

where the second form (using Eq. 3.69) shows how the focal length scales as 1/λ2 (more detailed calculations yield a slight modification of the focal length of compound refractive lenses [Simons 2017], and a more detailed discussion of aberrations has been presented [Osterhoff 2017]). By shrinking the radii of the holes to a fraction of a millimeter, and using the combined power of N = 30 lenses, Snigirev et al. produced a lens with a practical focal length of 1.8 m for 14 keV X rays. This demonstration generated a lot of interest, and a variety of approaches have been pursued since. Bruno Lengeler and collaborators developed the technique of shaping a series of paraboloidal indentations in beryllium with R as small as 50 μm to create 2D focusing with reduced spherical aberration [Lengeler 1999b] (as suggested by Yang), and optics of this type are commercially available from rxoptics.de. This has led to “transfocator” systems in which one can rapidly add or remove lenses to change N; they have become very useful optical elements in some synchrotron beamlines [Snigirev 2009, Vaughan 2010]. These CRL focusing systems are rugged, and can handle the high heat load present in these applications. Another approach has been to use lithographic patterning and highly directional etching techniques [Aristov 2000a] or deep x-ray lithography methods [Shabelnikov 2002, Nazmov 2004] to produce 1D lenses in materials such as silicon or polymers (Fig. 5.3), or 1D or 2D lenses in electroplated materials such as nickel using the LIGA (German: LIthographie, Galvanik und Abformung) process [Nazmov 2005, Nazmov 2007]. When used as orthogonal pairs of 1D optics, one obtains not an Airy2 focus intensity but a sinc(x/a x )2 sinc(y/ay )2 focus intensity, as discussed in Section 4.4.5. These approaches are readily extensible to the production of kinoform lenses [Jordan 1970, Evans-Lutterodt 2004] as shown in Fig. 5.4. These greatly reduce x-ray absorption and thus increase efficiency; at the high-resolution end they can be thought of as blazed Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006

202

X-ray focusing optics

am

y -ra

be

X

ѥP

Figure 5.3 Example of a 1D compound refractive lens made by nanopatterning on a silicon

wafer followed by deep reactive ion etching. Shown here are a subset of a total of N = 61 double-concave parabolic lenses with a curvature of R = 9.4 μm (giving a focal length of 31 mm at 14 keV) and an aperture of 40 μm. Image courtesy of Lukas Grote of the University of Hamburg, and data courtesy of Frank Seiboth, DESY, Hamburg.

zone plates as will be discussed in Section 5.3.1. However, they do impose 2π phase shears on the wavefield exiting the optic. The spatial resolution that can be obtained using refractive optics is determined by several factors. At extreme radii, a ray incident on the “steep” curvature of a refractive lens will encounter a surface that is nearly parallel to the optical axis and the ray will be reflected (rather than transmitted into, and refracted by, the surface); this sets a resolution limit for a single refractive lens that is equivalent to the case for a reflective optic discussed below (Eq. 5.10) [Evans-Lutterodt 2003]. In compound refractive lenses, the curvature of any one lens is weaker so this is less of a problem, and furthermore one can design an adiabatic adjustment into the focal length of each of the CRLs to reach, in principle, few-nanometer spatial resolution [Schroer 2005b] with < 20 nm resolution having been achieved [Patommel 2017]. For non-kinoform compound refractive lenses, absorption limits the effective aperture [Snigirev 1996]. If one limits the combined lens thickness to an attenuation length of μ−1 = λ/(4πβ) (Eq. 3.75) so that transmission is 1/e  38 percent at the edge, the usable aperture diameter A can be written as

λR λR = , (5.3) A= πβN πNαλ2 f2 where in the latter form (using Eq. 3.65) we can see more explicitly the scaling with xray wavelength λ and complex oscillator strengths ( f1 + i f2 ) as shown in Fig. 3.16. For an example [Kurapova 2007] of silicon at 21 keV with N = 100 and R = 2 μm, Eq. 5.3 gives A = 10 μm, showing that nanofocusing CRLs are usually used in conjunction with prefocusing optics to bring the beam width down to match such small apertures. This limited effective aperture A of CRLs also sets a limit on spatial resolution. The Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006

5.1 Refractive optics

203

ts

X-ray beam axis

Parabolic refractive lens

Long kinoform

Short kinoform

Figure 5.4 X-ray kinoform refractive optics. At left is shown a parabolic refractive optic. By

removing material in defined thickness steps t s , one can produce kinoform optics [Jordan 1970] with reduced x-ray beam absorption, either as long or short kinoforms [Evans-Lutterodt 2004]. The best performance is obtained when the thickness steps correspond to a phase shift of 2π, or t s = δ/λ (Eq. 3.69), in which case the short kinoform becomes equivalent to a curved profile zone plate [Tatchyn 1982, Tatchyn 1984].

numerical aperture is N.A. = A/(2 f ) in the small-angle limit, giving

1 Nαλ3 f12 . N.A. = 2 πR f2

(5.4)

Since the Rayleigh resolution of a perfect lens is δr = 0.61λ/N.A. (Eq. 4.173), compound refractive lenses have a spatial resolution limit due to absorption of

πR 1 f2 (5.5) δr = 1.22 Nα λ f12 Using the same example of silicon at 21 keV with N = 100 and R = 2 μm, this gives a resolution limit of δr (CRL) = 66 nm which is in fact representative of what is achieved in experiments (some papers use a numerical factor other than 0.61 in δr and thus quote smaller values for the achievable spatial resolution [Lengeler 1999a]; at the same time, adiabatic CRLs have been used for demonstrations of sub-20 nm resolution 2 [Patommel √ 2017]). Considering that f2 decreases as about λ (Fig. 3.16) in addition to the 1/ λ term in Eq. 5.5, one can see the advantages of working at higher photon energies when using CRLs. The expression of Eq. 5.5 also emphasizes the advantages of using light elements like Be [Schroer 2002] or even Li [Dufresne 2001] with more favorable ratios of phase shift over absorption, or f1 / f2 = δ/β, for obtaining the maximum resolution and lowest absorptive losses. At energies of around 50 keV and up, Compton scattering instead sets the limit on lens aperture [Elleaume 1998, Eq. 11]. Other factors (such as the existence of high-quality deep reactive ion etch processes for silicon [Aristov 2000a, Kurapova 2007], or high heat load requirements leading to the choice of diamond [Snigirev 2002, Ribbing 2003, N¨ohammer 2003]) also come into play. Gaussian fluctuations in individual surface profiles [Lengeler √ 1999a] with a standard deviation of σ s (adding in an uncorrelated fashion over 2N surfaces) lead to a Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006

204

X-ray focusing optics

√ net phase fluctuation characterized by θ s = 2πδ 2Nσ s /λ, with acceptable performance according to the Rayleigh quarter wave criterion (Section 4.1.2) when θ s ≤ π/4, or 1 λ . (5.6) √ 8 2N δ For our example of a 21 keV CRL made of silicon with N = 100, one requires σ s ≤ 480 nm, which is easily achieved in nanolithography processes or even in presses used to “stamp” out optics. Finally, one can measure the phase of a CRL-focused wavefield using ptychography (Section 10.4), and then add a phase correction optic to improve it [Seiboth 2017]. It is useful to consider the energy tunability of CRLs. From Eq. 5.2 we see that the focal length scales with the square of changes in photon energy (assuming f1 has little change with energy, which is the case away from absorption edges, as shown in Fig. 3.16). Taking the derivative of Eq. 5.2 gives ⎛ ⎞ 1 d f1 ⎟⎟⎟ R ⎜⎜⎜ 2 R −2 −1 d(λ f1 ) = − df = (5.7) ⎝⎜ + ⎠⎟ dλ. 2Nα 2Nαλ2 λ f12 dλ σs ≤

At the same time, we can calculate the absorptive-aperture-limited depth of field of a CRL from DOF = 2δz = 2λ/N.A.2 (Eqs. 4.213 and 4.214) and Eq. 5.4, giving DOF = 2δz =

8πR f2 . Nαλ2 f12

(5.8)

If we set the change in focal length due to wavelength changes to be a fraction χ of the depth resolution δz (thus keeping the focal spot from being blurred due to chromatic aberrations), we find that the spectral bandpass dλ/λ  dE/E used to illuminate a CRL with an aperture reaching the absorption limit (Eq.. 5.3) must be kept to below f2 dλ ≤ χ 4π 2 λ f1

(5.9)

which for χ = 1 and silicon at 21 keV works out to be dE/E  0.3 percent. Refractive optics went from something considered impractical to becoming very useful optics in both x-ray beamlines and in x-ray microscopy. This must be celebrated in a limerick! Christian Schroer of DESY has generated some beauties, and what follows is heavily inspired by Schroer: We once thought that x-ray refraction was too weak for focusing action. But compounding the lens adds up many bends. We focus with great satisfaction!

5.2

Reflective optics The normal incidence reflectivity of X rays from single interfaces goes like R⊥  δ2 /2 (Eq. 3.118), which is vanishingly small. However, as noted in Sections 2.2 and 3.5, x-

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006

5.2 Reflective optics

205

1

10 5

0.5

1 0.2 0.5 0.1 0.2 0.1

5RXJKQHVV QP

6ORSHHUURU ѥUDG

2

0.05

0.05 0.02 1990

2000

2.44drN ). Not only will the design wavelength pass through this pinhole unobstructed, but slightly longer and shorter wavelengths will as well. In order to estimate the full-width at half-maximum (FWHM) spectral bandwidth, we will consider the wavelength change Δλ for which the transmission is reduced to half of its value. This means we wish to know √ the wavelength change Δλ for which the geometric beam size at the pinhole plane is 2d p so that the beam area is twice the pinhole area. Since this diameter divided by Δ f gives the same tan(θ) as the condenser diameter dz divided by f + Δ f , we have √ 2d p dz dz =  (6.12) Δf f + Δf f √ 2d p Δλ Δf = = , (6.13) f λ dz where we have used Eq. 6.11 in Eq. 6.13. This is for the half-width at half-maximum, so to obtain the FWHM value we must consider the wavelength change in the longer wavelength direction as well, leading to a FWHM bandwidth of √ 2 2d p Δλ |FWHM = . (6.14) λ dz That is, the monochromaticity λ/Δλ is given by the ratio of condenser diameter dz to monochromator pinhole diameter d p . This has driven the development of very large condenser zone plates, such as zone plates with diameters of dz = 9 mm and outermost zone width of drN = 54 nm made either using UV holography [Schmahl 1993] or electron beam lithography [Anderson 2000]. In the case of electron beam lithography, it is very challenging to combine both large diameters and narrow zone widths, so that electron Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

6.3 Full-field microscopes, or transmission x-ray microscopes

249

Pinhole diameter dp

Central stop: diameter fraction b

Zone plate diameter dz Figure 6.4 Fresnel zone plates can be used as a combination condenser lens and linear

monochromator. A zone plate with diameter dz and central stop fraction b accepts a polychromatic beam, and a pinhole of diameter d p is placed at the focal position for the design wavelength λ. While the blue dashed line shows a shorter wavelength that will still make it through the pinhole unobstructed, the solid blue line√shows a wavelength λ − Δλ for which light is spread out over twice the area of the pinhole (or 2 larger diameter than the pinhole). This condition is then used to calculate the FWHM spectral bandwidth of Eq. 6.14.

beam writing times of 48 hours have been used! These condenser zone plates also see higher beam powers than the specimen or the objective lens (because monochromatization removes most of the power from a polychromatic beam), so thermal management is important to ensure zone plate condenser survivability.

6.3.2

Capillary condensers The latest microscopes that have evolved out of the G¨ottingen legacy have used standard x-ray beamline grating monochromators (Section 7.2.1) and a capillary optic (Section 5.2.3) as the condenser lens [Heim 2009]. The same has been true for commercial instruments [Zeng 2008]. Capillary condensers have several important advantages: • One can project a monochromatic illumination field onto the object or specimen plane with no wavelength-selecting pinhole required. Especially when doing nanotomography in TXM systems with zone plate condensers, the millimeter-scale size of the mount for the monochromating pinhole significantly limits the ability to obtain large tilt angles when using planar specimen holders such as electron microscope grids or silicon nitride windows. Grating monochromators used with capillary condensers largely remove this limitation [Heim 2009, Schneider 2010]. • The reflection coefficient for single-bounce x-ray reflective optics can be well above 50 percent (Section 3.6), while zone plates often have diffraction efficiencies of 15 percent or lower (Section 5.3.2). √ • The critical angle for grazing incidence reflectivity of θc = 2δ (Eq. 3.115) is often larger than the maximum beam convergence angle that can be produced by practical finest zone widths drN . For example, the critical angle for X rays reflecting

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

250

X-ray microscope systems

from fused silica is 37.5 mrad at 540 eV, so the net deflection angle 2θc can be 75 mrad, whereas to obtain the same diffraction angle (Eq. 4.30) one would need finest zones with a width of drN = λ/(2θ) = 15 nm in the condenser. This finest zone plate is at the limits of what has been achieved in zone plates, and at that limit the efficiency is reduced because of the challenges of fabricating high aspect ratios (Section 5.3.4) so that most condenser zone plates have much larger finest zone width drN . Capillary condensers bypass these limitations. • By using a conventional grazing incidence monochromator, the “white beam” incident power from the x-ray source is spread out over a large surface area on a substrate that is easily water-cooled. Since only the monochromatic beam reaches the condenser, and again the grazing incidence condenser has the beam spread out over a large surface area, thermal engineering challenges on the condenser optic are largely removed. • With a conventional grating monochromator, it is easy to obtain very high spectral resolution values (such as λ/Δλ  10,000 demonstrated in a TXM system [Guttmann 2011]) as needed for imaging across x-ray absorption near-edge resonances (see Section 9.1.2). For these reasons, capillary condensers are growing in popularity in TXM systems. Whatever the condenser optic used, the illumination spot size is given by some combination of the geometrical image of the source and any effects due to aberrations in the condenser. Following the discussion of Section 4.4.6, we point out that one spatially resolved pixel in a TXM has a flux that is determined by the source brightness delivered through the imaging system, while the number of pixels that can be illuminated simultaneously is given by the source size–angle or phase space product Msource produced by the source and accommodated by the condenser. That is, to a first approximation each spatially resolved pixel can be illuminated by a separate spatially coherent mode Msource , up to a limit of the number of spatially resolved pixels in the detector. That is, if a detector with 2048 × 2048 pixels is used and one seeks to have good sampling with two detector pixels Δdet per spatially resolved pixel, one could accept up to 1024 × 1024 spatially coherent modes Msource or a phase space full angle–full width product of 1024λ in each of the directions xˆ and yˆ . In practice, this is difficult to achieve and the delivered illumination phase spaces are often somewhat less in the horizontal, and much less in the vertical, in particular if synchrotron sources are used. To correct for this, a diffuser can be used in the specimen illumination path [Uesugi 2006], or the condenser can be “wobbled” or mechanically scanned during exposure so as to provide an even illumination field [Rudati 2011]. By these means, even illumination over fields of size 10–20 μm is routinely obtained, though at a cost of exposure time (the brightness that could be used for fast imaging of a smaller number of pixels is now shared among a larger number of pixels). One must also keep in mind the defocus aberration limit to the field of view, as discussed in Section 4.4.1. Because of the advances in x-ray source brightness, and the fact that it is brightness that limits the per-pixel exposure time, modern TXMs can record images with exposure times well below one second. To image larger fields of view, the specimen can be transDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

6.3 Full-field microscopes, or transmission x-ray microscopes

251

lated in a step-and-repeat fashion, after which the individual images can be assembled into a larger mosaic image [Loo 2000, Liu 2012a]. In early G¨ottingen TXM systems, photographic films [Rudolph 1984] or nuclear emulsions were used to record the image with detective quantum efficiencies of no more than about 9 percent [B¨assler 1991]. However, charge-coupled device (CCD) cameras were soon being tested with phosphor coatings for conversion of the x-ray image into visible light [Germer 1986], and by 1993 backside-thinned CCDs were being used for direct x-ray detection [Meyer-Ilse 1993]. Today, backside-thinned CCDs deliver very high quantum efficiency, and remain the detector of choice for photon energies below the Si K absorption edge at 1.84 keV (radiation damage seems to limit detector lifetime at higher energies). At higher energies, scintillator–lens–visible light camera systems are usually used, as will be discussed in Section 7.4.7. With all of these detectors, the best practice is to use an optical magnification M sufficient to map the Rayleigh resolution δr onto the width of two detector pixels, or Mδr = 2Δdet , in order to meet the requirements for Nyquist sampling (Eq. 4.88). As noted in Section 2.5, the G¨ottingen group’s work at the older BESSY synchrotron in Berlin inspired the development of similar microscopes on bending magnet beamlines at synchrotron light source facilities around the world, with microscopes on undulator beamlines following more recently (the relative advantages of these two source types are noted in Section 7.2.2). The energy range has been expanded to include TXM systems developed for studies at multi-keV x-ray energies, where one has greater penetration power for thicker specimens, and larger working distance, for working with more elaborate specimen environmental chambers (Section 7.5). While synchrotron light sources are powerful and widely available, there is also a real need for TXMs in home laboratories. Early steps in this direction included demonstrations by the G¨ottingen group with plasma discharge sources [Niemann 1990]. Greater success was obtained by the group of Hans Hertz in Sweden by using lasers to excite plasma emission from in-vacuum liquid jets of ethanol [Rymell 1993] and, with improved emission, from liquid nitrogen [Berglund 1998]. They have gone on to build water-window soft x-ray microscopes using these sources and a normal-incidence multilayer mirror as the condenser optic [Berglund 2000], and have developed cryo nanotomography capabilities [Bertilson 2011]. A very important step has been the development of commercially available laboratory microscopes, so that x-ray microscopy in the home lab can spread to those who are not inclined (or do not have the expertise) to build their own systems. After experience with building synchrotron-based microscopes, Wenbing Yun founded the company Xradia (now Carl Zeiss X-ray Microscopy) in the late 1990s and began delivering TXMs using characteristic line emission from microfocus x-ray sources, capillary reflectors as condensers, and zone plate objectives for imaging and nanotomography at 5.4 keV [Wang 2002]. Exposures typically take several minutes. These laboratory microscopes have been commercially successful both at Zeiss and at Yun’s new company, Sigray, and their variants for use with synchrotron radiation have been installed and are operating at nearly a dozen light sources. We end our discussion of TXMs by noting one important consideration: the objective optic lies downstream of the specimen to be imaged. If one is using an objective lens with low efficiency, such as a Fresnel plate where the efficiency is in the 5–20 percent Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

252

X-ray microscope systems

Objective OSA Source

Detector(s) for other signals

Object

Transmission detector

Figure 6.5 Schematic view of a scanning x-ray microscope or SXM. In this case the objective

lens used to produce the fine focal spot has a central stop, and an order-sorting aperture (or order-selecting aperture; OSA) is used to isolate the focused beam as would be required for a Fresnel zone plate objective (see Fig. 5.17). Scanning transmission x-ray microscopes (STXMs) as well as ptychography systems (Section 10.4) rely primarily on the transmission signal, while scanning fluorescence x-ray microscopes (SFXMs; Section 9.2) and scanning photoelectron emission microscopes (SPEMs) rely primarily on detection of x-ray induced x-ray fluorescence (SFXM) or electrons (SPEM). Additional modes are possible, including luminescence (SLXM; also called x-ray excited optical luminescence or XEOL). For x-ray fluorescence at synchrotron sources, an energy dispersive detector is usually oriented to the side rather than below the specimen; this is discussed in Section 9.2.

range (Section 5.3.2), this affects the radiation dose received by the specimen (compound refractive lenses can also be used as the objective lens [Lengeler 1999a], though they also have some absorption losses). In most cases one must record a certain number of photons per pixel in the image in order to see features of a certain size (Section 4.8), and if the objective lens has only 20 percent efficiency then one must illuminate the specimen with five times more exposure in order to obtain the required statistical significance in the image.

6.4

Scanning x-ray microscopes The basic idea behind a scanning x-ray microscope (SXM) is to demagnify an x-ray source and form the smallest focal spot possible, and then to acquire an image pixel-bypixel through a raster scan. In principle one could scan either the probe or the sample, but in most instruments it is the sample that is mechanically scanned (so that the objective lens remains in constant alignment to the illuminating beam, thus reducing the risk of image intensity variations). As shown in Fig. 6.5, SXM systems have great flexibility in the signal that is used to form the image: • Scanning transmission x-ray microscopes (STXMs, pronounced as “sticks-ems” by cool geeks like us) use primarily the transmission signal. As noted in Section 4.5.1, the shape of the sensitive area of the transmitted beam detector can play as strong a role in image quality as the condenser lens does in TXM systems, so that in the ideal case the detector should subtend an angle corresponding to

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

6.4 Scanning x-ray microscopes





• •

253

m = 1.5 times the N.A. of the STXM objective lens. The detector can also be segmented into quadrants or additional segments for differential phase contrast imaging, as discussed in Section 4.7.4, and full pixel array detectors can be used with the transmitted beam signal for ptychography, as discussed in Section 10.4. For simple area detectors, one can use a silicon photodiode to measure high flux rates as a current, or an avalanche photodiode to measure the photon count rate (including with high time sensitivity for pump-probe experiments [Stoll 2004, Puzic 2010]), with trade-offs as shown in Fig. 7.15. One can also use a phosphor or scintillator to convert the incoming x-ray beam signal into visible light and then use optical detectors either in current or pulse-counting mode. Gas-filled proportional counters have also served well as zero-dark-noise detectors for low flux rates [Feser 1998], though with extra complications of bulk and complexity. Scanning fluorescence x-ray microscopes (SFXMs, or “sphix-ems”; also called x-ray microprobes) use an energy-resolving photon detector to collect x-ray stimulated x-ray fluorescence signals. Most systems use energy-dispersive detectors (Section 7.4.12), where the energy of each detected photon is measured based on the number of charge–hole pair separations produced in a semiconductor (usually silicon with an energy resolution of about 130 eV at 10 keV). SFXM is discussed in further detail in Section 9.2. Scanning photoelectron emission microscopes (SPEMs) use an energy resolving electron detector to collect Auger and photoelectron spectra. At x-ray energies below about 100 eV, multilayer-coated Schwarzschild objectives have been used [Ng 2006], while at higher energies both zone plate optics [Ade 1991, Ko 1995, Marsi 1997] and grazing incidence reflective optics [Voss 1992b] have been used. The characteristics of SPEMs are discussed with photoelectron emission microscopes (PEEMs) in Section 6.5; SPEMs are used primarily for studies in materials science [Kiskinova 1999]. Rather than looking at electrons that have been ejected, one can use the x-ray beaminduced current (XBIC) [Vyvenko 2002] signal for imaging. Scanning luminescence x-ray microscopes (SLXM, or “slicks-em”) involve collection of x-ray stimulated visible light emission signals generated by the same physical processes as are involved in scintillators (Section 7.4.7). In spite of early hopes of imaging visible light emission from organic materials [Jacobsen 1993], radiation damage limitations mean that luminescence is best used for studies of inorganic materials such as ceramics [Zhang 1995a, Moewes 1996]. In studies of semiconductors, the method of using core-shell electron excitation to generate luminescence [Bianconi 1978] is known as x-ray excited optical luminescence (XEOL). This approach is seeing considerable success in scanning x-ray microscopy [Mart´ınez-Criado 2006, Mart´ınez-Criado 2012].

Many instruments combine several of these functions; for example, nearly all SFXM systems include a STXM mode. A microscope developed at DESY in Hamburg included even more detection modes, such as desorbed ions [Voss 1997]. Scanning was first proposed in 1938 for electron microscopy [von Ardenne 1938], Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

254

X-ray microscope systems

Nyquist sampling

Common sampling

Unidirectional

Bidirectional

Figure 6.6 Scanning schemes and sampling in scanning x-ray microscopes. For Nyquist

sampling, one should choose the scan pixel spacing Δr to be half the Rayleigh resolution δr . Since the Rayleigh resolution involves the radius rather than the diameter of the probe function (Fig. 4.29), this produces significant probe overlap, and even illumination along a scan line. Unfortunately, many users of scanning microscopes choose a pixel spacing closer to the spatial resolution (here labeled “Common sampling”). At right is shown the difference in scan trajectories for unidirectional scans versus bidirectional scans. Since many scanning microscopes do not have shutters fast enough to close and then open between scan lines, some additional radiation dose is applied to the specimen in unidirectional scans and there is also a cost of additional scan overhead time during the “flyback” phase of the probe’s motion relative to the specimen.

and in 1953 for x-ray microscopy [Pattee 1953]. In x-ray microscopy, the first demonstrations came a few years later [Cosslett 1956, Duncumb 1957]. As noted in Section 2.5, the scanning fluorescence x-ray microscope developed by Horowitz and Howell [Horowitz 1972] really began to show the possibilities for using synchrotron radiation, though it only used a pinhole to define the probe beam. It was some years later before the group of Janos Kirz at Stony Brook University began to demonstrate STXMs using Fresnel zone plates as high-resolution objective lenses [Rarback 1984], as noted in Section 2.5. When zone plate optics are used, fractional central stop diameters of b = 0.2– 0.5 are used along with order-sorting apertures or OSAs (also called order-selecting apertures) to isolate the first-order focus as shown in Fig. 5.17, with the exact value of b chosen to provide adequate working distance between the OSA and the specimen. Scanning microscopes are quite flexible, especially in the era of computer-controlled instruments. The scan parameters (the step size Δr from pixel to pixel, and the number of pixels N x,y ) can be adjusted over wide ranges; in effect, the image magnification and field of view are freely adjustable. There are even examples of synchrotron SFXM systems being used to scan works of art [Thurrowgood 2016] that are almost half a meter across! However, one should pay attention to proper Nyquist sampling of the scan, as illustrated in Fig. 6.6. In addition, many scanning microscopes use unidirectional scanning (also shown in Fig. 6.6) because it is easier to program in a control computer, but bidirectional scanning offers advantages in speed and reduced radiation exposure on the specimen. There has also been a trend from using a move–settle–measure or “step scan” approach to a continuous or “fly scan” approach1 in STXMs [Jacobsen 1991, Kilcoyne 2003] and SFXMs [Kirkham 2010], as illustrated in Fig. 6.7. Fly scanning is faster but one must account for the fact that the probe function is modified, and this 1

I am not aware of anyone yet taking continuous scan images of small insects; who will do the first fly fly scan?

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

6.4 Scanning x-ray microscopes

255

d

d Position

Position

s

Step

to

te td to

Time

Fly v

te td

Time

Figure 6.7 Step scans, where a measure–move–settle sequence is used, versus “fly” or

continuous scan motions. In step scans, one has the exposure time te plus a possible detector readout “dead” time td , followed by the time t0 it takes to move to the next fixed scan position. In continuous or fly scans, one has only the exposure time te and the possibility of a detector “dead” time td . Continuous or fly scans involve a pixel spacing along a scan row of s, as set by the clocking of data collection during constant probe velocity; this can be larger or (ideally) smaller than the probe diameter d. Figure adapted from [Deng 2015a].

plays a special role in coherent scanned imaging methods like ptychography, as shown first in simulations [Clark 2014], and then in experiments [Pelz 2014, Deng 2015a, Huang 2015]. Finally, in ptychography (Section 10.4) some researchers use spiral, nonrectilinear scan patterns to minimize reconstruction artifacts [Thibault 2009a], and other non-rectilinear scan approaches such as Lissajous scanning [Sullivan 2014] might also offer advantages. Scanning microscopes require a spatially coherent beam in order to deliver a diffractionlimited focus at the objective lens’ resolution limit (Section 4.4.6), and with spatially incoherent (or multiple coherence mode M) sources such as most synchrotron light sources today, one must use some form of spatial filtering to coherently illuminate the objective lens as shown in Fig. 4.43. This provides the opportunity to trade off somewhat lower spatial resolution for higher flux by opening up beamline apertures and thus increasing the source phase parameter p, as discussed in Section 4.4.6. The beam illuminating the object must also be monochromatized to match any dispersive properties of the objective lens, such as a zone plate or multilayer Laue lens (Eq. 5.33) or compound refractive lens system (Eq. 5.9), and this is usually done by using a crystal, grating, or multilayer monochromator (Section 7.2.1). The achromatic properties of single-layer-coated Kirkpatrick–Baez grazing incidence mirrors (Section 5.2.1) offer the great advantage of not requiring beam monochromatization. Because of the spatial coherence requirement, and serial rather than parallel acquisition of image pixels, scanning microscopes often have much slower imaging times than full-field or TXM systems. The highest-performance SXMs operate using undulators (Section 7.1.6) at synchrotron light sources, though there are many examples of successful STXMs at bending magnet beamlines (Section 7.1.5). Even with the brightest sources, it is rare to have per-pixel dwell times (step scans) or transit times (fly scans) as low as 100 μs, in which case a 1000 × 1000 pixel scan would take almost two minutes to acquire even if “fly scans” were used with infinite stage acceleration/deceleration and no other time losses due to data transfer, etc. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

256

X-ray microscope systems

Electron imaging optics and area detector X-ray beam

Electron energy analyzer

X-ray beam Scanning

PEEM

SPEM

Figure 6.8 Photoelectron emission microscopes (PEEMs) and scanning photoelectron emission

microscopes (SPEMs) both use the absorption of X rays to produce photo-electrons and Auger electrons from the surfaces of materials. In a PEEM, electron optics are used to image the emitted electrons onto an area detector, with the electron optics setting the spatial resolution (most PEEMs also include an electron monochromator for imaging emission at a selected electron kinetic energy). In a SPEM, x-ray optics are used to focus the x-ray beam to a small spot, and an electron energy analyzer is used to record the energy spectrum of electrons emitted from the surface.

An important characteristic for SXMs is that the objective lens is located upstream of the specimen rather than downstream. This means that inefficiencies in the objective lens increase total imaging time, but do not lead to higher radiation dose on the specimen. This is different from the case of a TXM, where a zone plate with 10 percent efficiency means that one must expose the specimen to a tenfold higher radiation dose in order to obtain an image with the same statistical degree of significance.

6.5

Electron optical x-ray microscopes (PEEM and others) As noted in the previous section, scanning photoelectron emission microscopes (SPEMs) work by detecting electrons emitted from a small scanned x-ray focal spot. An alternative approach is to illuminate a larger sample area and use electron optics to image the ejected electrons; this is what is done in a photoelectron emission microscope or PEEM system [Feng 2007a], as shown in Fig. 6.8. Photoelectron emission microscopes can image a wide spectral range of emitted electrons from a small region into an electron spectrometer for microspectroscopy, or they can produce high-resolution images of a selected electron energy when using an electron monochromator in their electron optical path. Because they can work with larger x-ray illumination spots (such as x-ray beams with many spatially coherent modes M present), PEEMs can be operated with laboratory x-ray sources and indeed commercial instruments are readily available. When PEEMs are used to image electrons with kinetic energies below about 100 eV, the instruments are sometimes called low-energy electron microscopes (LEEMs) [Bauer 1994], and for studies of magnetic materials an offshoot instrument type is the SPLEEM, with spin-resolved electron detection. We saw in Fig. 3.12 that low-energy electrons travel relatively short distances in solids. In Section 3.1.1, we noted that Auger electron emission provides element-specific information, since the electron’s energy is determined by a specific quantum transition.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

6.5 Electron optical x-ray microscopes (PEEM and others)

257

(nm)

20 10

Inelastic mean free path

5 Polystyrene Si

Au

2 1 0.5

0.2 10

100

1000

10,000

Electron energy (eV) Figure 6.9 Inelastic mean free path Λinel for low energy electrons in polystyrene, silicon (Si),

and gold (Au) [Ashley 1976, Ashley 1978]. As can be seen, photoelectrons and Auger electrons created by the absorption of soft x-ray photons (approx. 100–1000 eV) are able to escape only from regions close to the surface if they are to be detected without having their energy changed through inelastic scattering. Inelastic mean free paths for high energy electrons were shown in Fig. 4.80.

One can therefore use PEEMs and SPEMs for spatially resolved x-ray photoelectron spectroscopy (XPS), or electron spectroscopy for chemical analysis (ESCA). However, in order to carry out these analysis approaches, the detected electrons should not undergo any energy-changing inelastic scattering events; in other words, it is only those electrons produced within a distance of less than an inelastic mean free path Λinel of a material’s surface that will emerge with their energy unchanged. The distance Λinel is typically only a few nanometers when using soft x-rays (as shown for three solids in Fig. 6.9), so this means that both SPEM and PEEM are surface-sensitive imaging techniques. From deeper within a material, an electron may undergo one or many inelastic scattering events so it will emerge with some variable energy lower than that of the Auger peak, and this gives rise to low-energy “tails” on Auger peaks in the electron spectrum. As shown in Fig. 6.10, the directly detected photoelectron and Auger electrons give one information on specific atomic transitions [Carlson 1975]. However, the electrons that have undergone multiple inelastic scattering events are also useful since a significant part of the photoelectron spectrum consists of secondary electrons with low energies of 10–30 eV, where the inelastic scattering mean free path Λinel begins to increase to distances of many nanometers (Fig. 6.9). Low electron energy PEEM systems (or LEEM systems) exploit surface variations in electron emissivity to image surface topography and conductivity. Secondary electron emission from the surface is proportional to the number of X rays absorbed, so PEEMs are frequently used to image the x-ray absorptivity as the x-ray energy is tuned, providing one path to near-edge x-ray spectromicroscopy (Section 9.1). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

258

X-ray microscope systems

106 Ru 3d5/2

Counts per 100 ms

Ru 3d3/2

Ru 4s, Al 2p

105

Si 2p3/2, Si 2p1/2

5•103 0

Al 2s

O KLL Si 2s

Ru 4p

104

O 1s Ru 3p3/2 Ru MNM Ru 3p1/2

O 2s

100

Incident photon energy: 750 eV

200

300

400

500

600

Electron binding energy (eV) Figure 6.10 Photoelectron spectrum of aluminosilicate grown on a Ru(0001) substrate, illustrating some characteristics of x-ray photoelectron spectroscopy (XPS). The spectrum shows photoelectron electron peaks at energies corresponding to the electronic states indicated (e.g., Ru 3d5/2 ), as well as Auger peaks corresponding to O KLL (K core-shell electron ejected, to an L shell inital state filling the vacancy, and an L shell electron being emitted) and Ru MNM. The plasmon spectrum below 30 eV is modified compared to that seen from electron energy loss spectroscopy (EELS; see Fig. 3.15) by inelastic scattering of electrons before escape due to the mean free path shown in Fig. 6.9. This spectrum was obtained as part of a study that looked at changes in the Ru layer following exposure to O2 and H2 gases in an ambient pressure XPS system [Zhong 2016]. Data courtesy of Anibal Boscoboinik, Brookhaven Lab.

If one adds sensitivity to the angle at which electrons are emitted from an illuminated surface, one can gain considerable information on electronic band structure in solids. This technique is called angle-resolved photoemission spectroscopy (ARPES). By combining ARPES with the high spatial resolution of a SPEM, one arrives at a technique that is often called nanoARPES [Rotenberg 2014].

6.6

Concluding limerick Too many microscope systems to keep track of? Not sure of your geek pronunciation of STXM as “sticks-em” and so on? Let’s try a limerick: I like to see wide fields with TXM and probe in small spots with my STXM With metals it seems one can see much with PEEMs Tough problems? These microscopes fix ’em!

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007

7

X-ray microscope instrumentation

Janos Kirz and Michael Feser contributed to this chapter. The first x-ray microscopes were one-of-a-kind instruments that were operated by their builders, in a tradition that continues to this day. Although there was a brief phase in the 1950s where commercial point-projection microscopes were available (see Section 6.2, and [Cosslett 1960, Plates IX.B and X]), up until the year 2000 essentially all microscopes were custom-built. These custom-built microscopes are now joined by commercial instruments offering a wide range of capabilities. No matter whether you are using a commercial instrument where you can pop a sample in and push a button to get an image, or a custom instrument, it is useful to understand what “makes it tick.” Hence this chapter. Section 7.1 discusses x-ray sources, while Section 7.2 discusses the optical transport systems and associated equipment needed to bring the x-ray beam to the imaging system. After some brief comments on nanopositioning systems in Section 7.3, the properties of several types of x-ray detectors are covered in Section 7.4. Finally, Section 7.5 provides a short introduction to specimen environments. The degree of sophistication in modern x-ray microscopes is worth a moment’s pause for thanks. It wasn’t always so! What is available today makes the home-built system (Fig. 2.4) that the author first encountered look unbelievably crude, and at that point things had already made significant advances from earlier years [Kirz 1980c, Rarback 1980]. An amusing anecdote was presented by Arne Engstr¨om in 1980 [Engstr¨om 1980] as he looked back on four decades of work in x-ray microscopy: Another trend in x-ray microscopy and x-ray microanalysis, especially in the field of the biomedical sciences, is the increasing sophistication and complexity of systems and equipment for the collection and treatment of experimental absorption data. However, this trend is not unique to this field of research. In fact, over the last 20 to 30 years there has been such a fantastic development of commercially available instrumentation for research and development that, in retrospect, the immediate post-war conditions seems very primitive indeed. For example, I remember the presentation of an automatic recording optical microabsorptiometer applicable to cellular analysis at an AAAS meeting in Boston in 1951. The demonstrator said proudly to the audience, which consisted of researchers who were, as was usual then, working with essentially self-assembled equipment: “I can watch this machine working, while standing behind it with my hands in my pockets.” From the back of the room there was a hoarse voice, “I bet you have your hands in the government’s pocket.”

So let’s also pause for a moment of thanks for the financial support that has been provided to develop the instrumentation for x-ray microscopy! Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

260

X-ray microscope instrumentation

7.1

X-ray sources The characteristics demanded of an x-ray source depend on the type of x-ray microscope being used, as was discussed in Chapter 6. Three of the most important characteristics are as follows: • Spectral bandwidth, as will be discussed in Section 7.2.1. Most (but not all!) xray microscopes require some degree of monochromaticity, with E/(ΔE) ranging roughly from 100 to 1000 or more. ´ • Etendue or phase space area, as will be discussed in Section 7.2.2. Scanning microscopes require that the illumination be limited to a beam size–angle product, or phase space area, of about λ in each direction so that Msource  1 (Section 6.4), while full-field microscopes (Section 6.3) can work with phase space areas up to the number of pixels in the image. One can always use spatial filtering to restrict the e´ tendue of a source at a cost in flux, or diffusers or wobbled optics to increase the e´ tendue, as noted in Section 6.3. • Flux, and time structure. One needs enough photons per pixel to obtain an image with sufficient signal-to-noise ratio (SNR; Section 4.8). Oftentimes the source has a regular time structure where photons are “on,” or being delivered during a time to out of a cycle or repetition time of tr , as shown in Fig. 7.1. This might happen because of the time bunch structure of a synchrotron light source (Section 7.1.4), or of a pulsed laser source used to generate hot plasmas or high harmonic gain, or the recharge time for capacitors in a pinched-plasma source (Section 7.1.3). We define a temporal duty cycle dt of dt =

to , tr

(7.1)

for which example values are given in Table 7.1. Sources with a time-averaged spectral brightness Bs,ave have a peak brightness Bs,peak given by Bs,peak =

Bs,ave , dt

(7.2)

so obviously if you are trying to convince people of how great a low repetition rate source is you will emphasize peak rather than average brightness! If one is trying to study high-electric-field phenomena in atomic physics, then all of the photons should be delivered in a very short time to and one might average together signals from a large number of pulses. In “diffraction before destruction” experiments such as those discussed in Section 10.6, we again want many photons in a very short time to followed by a long enough time (tr − to ) to read out the detector and deliver a fresh sample to the beam region. In both of these cases a source with high peak brightness Bs,peak is desired, even if the duty cycle is very low. Otherwise, in microscopy we must allow for steady-state heat dissipation in the specimen as discussed in Section 11.1 so that we favor high duty cycles and high time-averaged brightness Bs,ave . Photon-counting detectors require either (a) the dead time td of the detector to be short compared to the “on” time to , or (b) that Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

261

Source output

Repetition time tr “On” time to

Time

Figure 7.1 Time characteristics of an x-ray source. In some cases the source might have an “on”

time to after which there is no x-ray emission for the remainder of a repetition time tr . This applies to the bunch structure of synchrotron light sources, or sources driven by pulsed lasers or electromagnetic discharges, leading to duty cycles (Eq. 7.1) as indicated in Table 7.1. Table 7.1 Duty cycles dt as given by Eq. 7.1, for various x-ray sources based on the “on” time t0 ,

and the repetition time tr shown in Fig. 7.1. In many cases the time structure of the source can be modified, and there are considerable variations between sources of a given type. Therefore the values shown here are representative rather than exact. Source Laboratory electron impact Plasma pinch [Partlow 2012] Synchrotron: [email protected] Laser-produced plasma [Martz 2012] High-harmonic gain (HHG) [Popmintchev 2018] XFEL: [email protected] [Emma 2010]

to

tr

dt 1

500 ns 33.5 ps 600 ps 20 fs 50 fs

0.5 ms 153 ns 0.5 ms 1 ms 8.33 ms

1.0 × 10−3 2.2 × 10−4 1.2 × 10−6 2.0 × 10−11 6.0 × 10−12

one collect far fewer than one photon per “on” time on average. Photon integrating detectors are better able to handle many photons arriving in an “on” time to , as will be shown in Fig. 7.15. Having described these general characteristics of x-ray sources, we take a short detour into photometry before going on to discuss specific x-ray source types.

7.1.1

Photometric measures The characteristics of a source can be described by several photometric measures. We use the simple term “intensity” or I to refer to the square of a wave’s complex amplitude (Box 4.1). Definitions of photometric terms and their symbols vary; those listed below represent a mixture of the recommendations in the Gold Book of the International Union of Pure and Applied Chemistry (IUPAC), the holy writ of optics [Born 1999], and common usage among x-ray microscopists. • Flux Φ is the number of photons per second (photons/s). Usually this is used in

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

X-ray microscope instrumentation

Brightness (ph/sec/mm2/mrad2/0.1% BW)

262

1025

XFELs MAX-IV

10

20

Undulators 0RRUH·VODZ

1015 Bending magnets 1010

X-ray tubes 1960 1970 1980 1990 2000 2010 2020

Year Figure 7.2 History of the maximum available x-ray source brightness. This has traced a path

from conventional electron impact sources, to early parasitic use of synchrotrons, to dedicated storage rings as light source facilities, to facilities with low emittance with undulators, to x-ray free-electron lasers (XFELS) and the first multibend achromat storage ring source (MAX-IV in Sweden). The increase in available source brightness has been greater than the well-known Moore’s “law” in microelectronics [Moore 1965], which noted that the number of transistors that could be incorporated into a single integrated circuit doubled about every two years. Those who have been around computing for a while know how remarkable this trend has been; x-ray sources have seen even greater advances! Figure adapted from [Jacobsen 2016b] with the kind permission of the Societ`a Italiana di Fisica.



• • •

connection with a specified spectral bandwidth (typically 0.1 percent), giving the spectral flux, but the prefix “spectral” is often left out. Fluence F is the cumulative number of photons per area (photons/m2 ). Again, this is usually used in connection with a specified spectral bandwidth, but the prefix “spectral” is often left out. Irradiance IE is the power received per area (W/m2 ); see Eq. B.47 in Appendix B at www.cambridge.org/Jacobsen. Spectral intensity I s is the photon flux from a source per solid angle dΩ with a given bandwidth BW (photons/s/sr/BW). Spectral brightness Bs is the photon flux per area per solid angle per bandwidth (which in the synchrotron radiation community is usually expressed not in base SI units but in photons/s/mm2 /mrad2 /0.1% BW). For a Gaussian source, one can write the spectral brightness as Bs =

1 Φ ,   Σ x Σy Σ x Σy ΔE/E

(7.3)

where the sizes Σ and divergences Σ are given in Eq. 7.12 for an undulator with finite electron beam emittance effects included. The coherent flux within a given Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

263

spectral bandwidth is Φc = Bs · λ2 , as was shown in Eq. 4.199. This photometric term is sometimes called “brilliance” in parts of the synchrotron radiation community influenced by the early planning documents of the European Synchrotron Radiation Facility. The highest available x-ray source brightness by year is shown in Fig. 7.2. Note that from Eq. 3.7 one can show that ΔE Δλ Δλ =− = | |, E λ λ

(7.4)

and the spectral bandwidth leads to the coherence length as given in Eq. 4.181.

7.1.2

Laboratory x-ray sources: electron impact Most laboratory x-ray sources are based on the same physics used by R¨ontgen: electrons accelerated to strike a target inside a vacuum chamber. Metals are usually chosen because they have a combination of high melting point, high thermal conductivity, and high electrical conductivity. The electron beam is often produced by heating a filament to a high enough temperature that an increased fraction of electrons have an energy above the work function energy (Eq. 3.19); they can then be extracted by an accelerating voltage, focused to a spot, and made to strike a metal target. Only a small fraction ( 0.1 percent) of the electron beam power is converted to X rays; the rest ends up as heat. The target emits both a broad spectrum of Bremsstrahlung radiation (German for “braking radiation,” caused by electrons swinging around the dense positive charges of nuclei), and characteristic X rays produced by removing core-level electrons from the anode material. The characteristic X rays (x-ray fluorescence; Section 3.1.1) form narrow peaks in the emission spectrum with well-tabulated energies [Bambynek 1972] and spectral widths [Krause 1979b]; for example, aluminum Kα1 line emission is into 0.43 eV full-width at half-maximum (FWHM), while for copper the FWHM linewidth is 2.11 eV. The emitted X rays are unpolarized. There are many details involved in optimizing an electron impact source for a specific application, as discussed in several books [Cosslett 1960, Dyson 1973] and journal articles [Green 1961, Green 1968]: • One can change the target material to choose among various emission lines as shown in Fig. 7.4. One can then use absorption filters to significantly reduce the transmitted fraction of continuum X rays below a fluorescence line. • One can change the energy of the electron beam to excite higher-energy x-ray fluorescence lines, or to enhance the continuum spectrum at lower energies, as shown in Fig. 7.5. • A fundamental limitation is that the target must not be made to melt (otherwise one has an electron beam evaporation system, such as is used to deposit thin metal films on surfaces). The anode can be water-cooled, and furthermore one can direct the electron beam spot onto a region near the outer radius of a rotating disk, so that one region on the disk is heated in “flashes” and has time to be cooled by heat

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

264

X-ray microscope instrumentation

Electron beam

Apparent source width Apparent source width Take-off angle Ƨ

X-ray attenuation length ƫ-1 Metal target

Figure 7.3 Schematic representation of the effect of changing the take-off angle from an electron impact source. The electrons will penetrate some distance into a target, and spread out laterally (for simulation results on a polymer, see Fig. 11.4, though for high-density metal targets the electron scattering lengths will be shorter). At a small take-off angle relative to the surface, the apparent source size will be smaller but also more of the emitted X rays will be reabsorbed over their long path through the target material. At larger take-off angles, fewer X rays will be self-absorbed but the source will appear to be larger. Take-off angles of 25◦ are not uncommon in practice.









conduction into the non-irradiated regions while it spins around. Water-cooled rotating anode x-ray sources can involve kilowatts of electron beam power into the anode. One can embed a series of very small metal targets in diamond film, which offers very high heat conductivity and x-ray transparency. With a series of metal target regions in a plane illuminated by electrons coming from above, an edge-on view of the plane can have X rays from all of these small targets add up for increased flux [Yun 2016]. Sources of this type are commercially available from Sigray. An alternative to cooling the target material is to pre-heat it into a liquid jet, and direct a “hotter” electron beam onto that jet [Hemberg 2003] since there is no need to worry about melting. Sources of this type are commercially available from Excillum. For maximum x-ray flux, one can direct a significant electron current into a largesized spot. If the electron beam is focused not to a round spot but a line, one can take advantage of the higher cooling provided by the non-irradiated material along the sides of the line, yet obtain a roughly circular apparent source size by using a shallow take-off angle (Fig. 7.3; 25◦ is not uncommon, and 6◦ is sometimes used) along the direction of the line. One can instead go to a very small, micrometer-sized or smaller electron beam spot on a thin target in a microfocus source. Lateral beam spreading will be greatly reduced in a thin target (see Fig. 11.4; while that’s for a polymer rather than a metal, you get the idea), leading to a smaller x-ray source. The total x-ray flux will be much lower, but due to the nature of heat conduction into the large nonirradiated area (see Eq. 11.4) the x-ray flux per area can be much higher. This

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

Spectral intensity (photons/s/sr/200 eV BW/watt on target)

7.1 X-ray sources

265

KƠ1,2 Copper

1010

Cu and W with 40 kV electrons

LƠ1,2 Kơ1 Lơ1,2

Tungsten 109

Continuum

LƢ1 Continuum

108 3

10

30

Photon energy (keV) Figure 7.4 Laboratory x-ray source spectra for targets made of W (tungsten) and Cu (copper),

calculated for an electron beam accelerating voltage of 40 kV. This calculation (courtesy of Michael Feser) is for a 25◦ take-off angle from the target, and includes absorption of a 250 μm thick beryllium window. Since the actually emission lines are typically 1–3 eV wide, the ratio of line to continuum or Bremsstrahlung emission will change on this plot as one changes the assumed bandwidth (BW).

makes for higher source brightness, and furthermore the ratio of line emission to continuum radiation is also improved. Microfocus sources have been used for point-projection x-ray microscopy (Section 6.2), and even for propagation-based phase contrast imaging (Section 4.7.2) using scanning electron microscopes to produce the small electron beam source [Mayo 2003].

Electron-impact laboratory x-ray sources are readily available from a number of commercial manufacturers, and with a variety of source characteristics. As one example, one can purchase a gallium liquid metal jet source with a Kα brightness of 6.5 × 1010 photons/s/mm2 /mrad2 total, with 58 percent going into the Kα1 line at 9224.8 eV (with linewidth 2.59 eV FWHM), and 29 percent going into the Kα2 line at 9251.7 eV (with 2.66 eV linewidth). If one could isolate the Kα1 line with 100 percent efficiency, this would yield a spectral brightness of 1.3 × 1011 photons/s/mm2 /mrad2 /0.1% BW. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

266

X-ray microscope instrumentation

Spectral intensity (photons/s/sr/200 eV BW/watt on target)

1010

LƠ1,2

KƠ1

Tungsten

109

Lơ1,2

40 kV

KƠ2

LƢ1

90 kV

108

.ơ1

160 kV

107 3

10

30

100

Photon energy (keV) Figure 7.5 Laboratory x-ray source spectra for a W (tungsten) target calculated for three different electron acceleration voltages (40, 90, and 160 kV). As in Fig. 7.4, this calculation (courtesy of Michael Feser) is for a 25◦ take-off angle, and again includes absorption due to a 250 μm thick beryllium window.

7.1.3

Unconventional laboratory x-ray sources An alternative way to generate X rays is to create a hot, high-density plasma (an extreme limit of this was discussed in Box 2.1). This can be done in the laboratory by focusing an intense laser pulse on a material, or by using pulsed electromagnetic “pinching.” This plasma can produce X rays by two distinct mechanisms. One is simply blackbody radiation based on the temperature of the plasma. The Planck blackbody radiation distribution of photons versus photon energy has an maximum at Epeak = 2.821 kB T,

(7.5)

where the Boltzmann constant is kB = 8.62 × 10−5 eV/K (Eq. 3.20). That is, to produce a blackbody peak at a soft x-ray energy of 300 eV, one requires a temperature of 1.2 × 106 K, while 10 keV requires a temperature of 4.1 × 107 K. Even at somewhat lower temperatures, one can create a plasma with fully or partially ionized atoms, and have sufficient thermal energy to excite electronic transitions to drive x-ray fluorescence. Because most of the atoms are at least partially ionized, the energies of various electron orbitals will be shifted, and the emission energies will be different than those listed in standard tabulations of neutral atoms. Hot plasma sources are pulsed sources (unless you can arrange to have a long-lived Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

267

hot, dense plasma, in which case a magnetic confinement fusion energy researcher would love to chat with you!). Therefore one needs to replace the “target” material after each pulse. In pulsed electromagnetic pinch sources, the medium can be a gas, in which case replenishment is straightforward; such sources have been used for soft x-ray microscopy [Niemann 1990] and tomography [Duke 2014]. If one instead uses a high-intensity laser as a means of generating a hot plasma, one early approach to producing a continuously-regenerated target was to use the key components of a magnetic tape reel-to-reel system [Michette 1988, Michette 1994, Michette 1997] (one could then set an x-ray experiment to a soundtrack of music from the 1960s!). A more recent approach has been to use pulsed laser excitation on a narrow stream of liquid or liquid droplets of ethanol, ammonium hydroxide, or liquid nitrogen for emission at various lines in the 360–500 eV range [Rymell 1993, Rymell 1995, Berglund 1998, Martz 2012]. These latter sources can deliver a time-averaged spectral brightness of ∼1 × 1012 photons/s/mm2 /mrad2 /0.1% BW [Martz 2012] at 500 eV, are being used in a successful series of compact laboratory transmission x-ray microscopes [Berglund 2000, Takman 2007, Fogelqvist 2017]. With these pulsed plasma sources, the peak brightness Bs,peak can be a million times higher (Table 7.1 and Eq. 7.2), but the time-averaged brightness Bs,ave depends on the repetition rate of the laser, which is often limited by cost and laser cooling considerations. Can one create X rays in a laboratory in a more finely controlled manner than with a hot plasma, and with greater source brightness than with an electron impact source? One very active research field involves the excitation of electrons in an atom by successive electric field cycles in high-intensity lasers in a method called high harmonic gain, or HHG [L’Huillier 1993], which has seen considerable development for the production of extended ultravoilet (EUV) and soft x-ray beams [Rundquist 1998, Bartels 2002]. Because all of the electrons in the laser focus are driven synchronously by the laser’s electric field, their emission is coherent, from a spot size of typically 50–200 μm and into an angle consistent with one spatially coherent mode, or Msource = 1. These sources provide considerable light output (∼5 × 1010 photons/s/1% BW) at EUV photon energies of 92 eV, so they have been used in impressive coherent diffraction imaging experiments with sub-wavelength spatial resolution [Gardner 2017]. The photon flux drops off at higher photon energies [Heyl 2017, Fig. 3], though in one recent example significant emission was obtained over a wide enough photon energy range that one could obtain carbon XANES (Section 9.1.3) spectra, and even EXAFS (Section 9.1.7) spectra [Popmintchev 2018] around the Sc L edges near 400 eV and the Fe L edges near 700 eV. At 300 eV or λ = 4.1 nm, the coherent flux with a 1 kHz driving laser is about 109 photons/sec/1 percent BW which with Msource = 1 works out to a time-averaged brightness of about 6 × 1012 photons/s/mm2 /mrad2 /0.1% BW. With a duty cycle dt (Table 7.1) of 2 × 10−11 , the peak brightness is 3 × 1023 in the same units.

7.1.4

Synchrotron light sources Target melting is a limitation of electron impact sources, and target vaporization is inevitable in laser or pinch sources, or in HHG. So is there a way to remove the target

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

268

X-ray microscope instrumentation

and its limitations? It sounds a bit like the Zen koan about one hand clapping. . . In fact this can be realized by subjecting relativistic electron beams to strong magnetic fields. While all the ingredients to understand this were in place from Maxwell’s theory of electromagnetism in the 1860s [Maxwell 1861] and Einstein’s theory of special relativity [Einstein 1905], the discovery of synchrotron radiation was an accident that happened to those with prepared minds [Blewett 1998]. In an era where vacuum chambers were often fabricated by glass-blowers, researchers looking to see if there were electrical sparks from a balky synchrotron accelerator at General Electric noticed the steady emission of light from the electron beam. They soon came to understand its origin [Elder 1947], based on earlier theories outlined by Iwanenko and Pomeranchuk [Iwanenko 1944] and filled in more completely by Schwinger [Schwinger 1949]. There were then early experiments [Hartman 1988, Robinson 2015] first at Cornell in the 60– 200 eV region [Tomboulian 1956], and then at NIST in Gaithersburg in the 25–70 eV region [Madden 1963, Codling 1965]. Even then, at many accelerators synchrotron radiation was more of a curiosity, or even a nuisance if one’s goal was to accelerate charged particles to high energies for collisions with other particle beams or fixed targets. One such particle physics machine was the DESY synchrotron in Hamburg, where the beam was ramped up every 20 msec to 7.5 GeV before being “dumped” onto a fixed target. While Kenneth Holmes had considered synchrotron radiation from this machine for xray generation in the mid-1960s, it was in 1969 that an increase in current to 10 mA made it attractive for then-PhD-student Gerd Rosenbaum to work with Holmes to build a focusing x-ray monochromator at a VUV beamline. This gave them a 150-fold gain in intensity over an x-ray tube, so that with Jean Witz they were able to record x-ray diffraction from a muscle fiber. They then installed a seperate access tunnel to the synchrotron within which to install their monochromator, and an experimental “bunker” (what we would now call a hard x-ray beamline and a hutch) for further studies in small angle diffraction [Rosenbaum 1971, Huxley 1997, Holmes 1998]. Already in their first paper [Rosenbaum 1971] they even predicted that the x-ray beam could, in the future, be focused down to as small as 200 μm! These early efforts at parasitic use of synchrotron radiation from machines built for high-energy physics eventually led to the development of synchrotron storage ring light sources. It is worthwhile therefore to make the terminology clear, even though the word “synchrotron” is often used as synonym of “storage ring”:

• A synchrotron is an accelerator in which dipole magnets are used to establish a closed-loop orbit (usually approximately circular). In today’s examples, a radio frequency (RF) cavity is used to add kinetic energy to charged particles. Synchrotrons can be used to ramp up the charged particle energy by simultaneously increasing the power (and thus voltage) in the RF cavity, and increasing the field in the dipole magnets. At the top of the energy ramp, one can deflect the charged particle beam into a fixed target, or make a collider by causing collisions between a counter-rotating charged particle beam (such as protons on anti-protons at the Large Hadron Collider – the LHC – at CERN near Geneva). Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

269

• A storage ring is a synchrotron that is designed to be operated primarily at one single, fixed kinetic energy for the charged particle beam. • A synchrotron light source is a storage ring optimized for the production of electromagnetic radiation. Thus synchrotron light sources are designed to maintain a constant beam energy and near-constant current over time, so as to be optimized for the production of steady XUV and x-ray radiation beams. The storage ring mode was first used for parasitic x-ray production on the SPEAR ring at Stanford, and this in turn inspired the development of synchrotron light sources such as Aladdin in Wisconsin, NSLS at Brookhaven, BESSY in Berlin, and others. Today, synchrotron light sources with electron beam energies of 1–2 GeV excel at producing soft X rays up to about 1000 eV, machines with an energy of about 3 GeV work well in producing hard X rays up to about 10 keV, and machines in the 6–8 GeV range produce X rays with energies up to about 100 keV. Beam currents tend to be in the 100–500 mA range, and since the beam is organized into discrete electron bunches that “surf the wave” in the RF accelerating cavities, photons arrive in 1–100 picosecond pulses spaced 1–200 nanoseconds apart, as indicated in Table 7.1 (the details vary by facility). Along with the development of undulators (Section 7.1.6), this has enabled the tremendous increase in available x-ray source brightness shown in Fig. 7.2. This gives rise to a generational history of lightsources, with the first generation representing parasitic use, the second generation representing machines designed from the start to produce synchrotron radiation (the 1980s), the third generation being machines with lower emittance and undulator x-ray sources (the 1990s), and the fourth generation being diffraction-limited storage rings (beginning with MAX IV in Sweden around 2017). The GeV-range electron beam kinetic energies E listed above for various storage rings are, of course, highly relativistic, in that the Lorentz factor γ (Eq. 4.286) of γ =1+

E me c 2

is in the thousands since the electron rest mass times the speed of light squared is me c2 = 511 keV (Eq. 3.29). From Eqs. 4.286 and 4.287 we can also see that the velocity v of electrons in these storage rings is very close to the speed of light c, since  1 c−v = 1 − 1 − 1/γ2  2 . (7.6) 1−β= c 2γ This will become important when considering undulator sources. At the highly relativistic energies of electrons in synchrotron light sources, radiation from a moving charge such as a dipole oscillating transverse to its linear motion is compressed into an angle of about 1/(2γ), or less than a milliradian about the viewing direction. When combined with the high x-ray flux that can be achieved, this leads to spectacularly high intensity and brightness when compared to laboratory x-ray sources. Storage rings have a built-in economy factor to their operation: one only has to accelerate the electrons up to relativistic velocities once, upon beam injection. After that, electrons will lose energy due to spontaneous radiation emission as they travel in their Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

270

X-ray microscope instrumentation

orbit, but the RF cavities need only supply enough power to exactly compensate for that loss, which might be a few hundred keV per electron per orbit (compared to several GeV per electron to achieve beam injection). Even so, they are expensive facilities, costing roughly 108 –109 US dollars to construct and about one-tenth of that cost for annual operations. Therefore they are usually developed as regional, national, or even international facilities, as catalogued by the web site www.lightsources.org, with dozens in operation worldwide and most including at least one x-ray microscopy beamline. While these are expensive facilities to construct and operate, they can usually host about 20– 50 beamlines running simultaneously, so the per-simultaneous-experiment cost is not far from that of top-end electron microscopes. While some industrial users pay a fee to use these sources for proprietary work, in nearly all cases one obtains no-fee access via peer-reviewed scientific user proposals. To understand the properties of synchrotron radiation sources in more detail, we must consider the phase space characteristics of the electron beam. The magnetic lattice of the storage ring causes the beam to undergo a series of focusing and defocusing operations, but as discussed in connection with Eq. 4.189 the product of the electron beam size times divergence at various beam foci is a constant. In storage rings, the product is described by the emittance of the electron beam, while β gives the ratio of the size over divergence at one particular local focus (both quantities can have separate values in the horizontal and vertical planes). The relationships between these quantities and the standard deviation source size of the electron beam are and σy = y βy , (7.7) σx = x βx while the divergences are characterized by and σx = x /β x

σy =



y /βy .

(7.8)

(The ratio of vertical to horizontal emittance y / x is known as the emittance coupling of the machine; values of 10 percent are not uncommon in today’s machines). The emission of radiation by a single electron has its own intrinsic emittance r of r =

λ , 2π

(7.9)

where the result of Eq. 7.9 is only approximate, depending on the criterion chosen to characterize size and divergence [Elleaume 2003, Eq. 33]. (This is given by others [Kim 1986] as σr = λ/(4π), and both of these 1σ values differ from the FWHM value of r = λ for Msource = 1 as discussed in Section 4.4.6). When collecting radiation from a infinitessimal emittance source of length L along the electron beam’s trajectory, the intrinsic beta function associated with radiation emisssion [Elleaume 2003, Eq. 33] is βr =

L π

and the intrinsic source size σr and divergences σr are  √ 2λL λ and σr = . σr = 2π 2L

(7.10)

(7.11)

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

271

In the Gaussian approximation, the net source size is given by a convolution of electron beam and intrinsic photon emittances, leading to net source sizes and divergences characterized by  2 Σ x = σ2r + σ2x and Σx = σ2 r + σx   2 Σy = σ2r + σ2y and Σy = σ2 (7.12) r + σy . In circa 2016 storage rings, the horizontal emittance x is often about 100 times larger than the radiation wavelength, while the vertical emittance is a few times the wavelength, but fourth-generation storage rings are beginning to appear which use multibend achromat lattice designs with near-wavelength emittance in both directions [Eriksson 2014].

7.1.5

Bending magnet sources Electrons are kept on a roughly circular orbit in the storage ring by a series of dipole or bending magnets distributed around the machine’s magnetic lattice. Within a dipole magnet with magnetic field B, the electron’s radius of curvature ρe is given by ρe =

βE eB

or

ρe (meters) 

10 βE (GeV) , 2.998 B (T)

(7.13)

where e is the electron charge, E the electron beam kinetic energy, and β is the relativistic velocity (nearly 1) as given in Eq. 7.6. That is, the few-Tesla magnetic field in a bending magnet gives rise to the centripetal acceleration needed to maintain the circular orbit. Because all acceleration of charged particles results in the emission of radiation, bending magnets are good x-ray sources. In the vertical, the emission is concentrated within a 1σ angle of 1/2γ as noted above, while in the horizontal one might collect radiation over an acceptance angle θ x,a an arc of radiation that depends on limiting apertures, with some bending magnet beamlines accepting light over several milliradians in the horizontal. Dipole magnets at synchrotron light sources tend to have large β x,y values (large size σ x,y relative to angular divergence σx,y ) so that the beam size does not change much over the length of the dipole or bending magnet. (This also means that the equivalent intrinsic radiation source size of Eq. 7.11 is unimportant in bending magnet sources). The effective source size σθx in the horizontal is even larger [Williams 2005, Eq. 20] as one “views” the arc of the source over the full-width acceptance angle θ x , giving a 1σ equivalent source size of ρe 2 θ . σθx  (7.14) 16 x For the Advanced Photon Source at Argonne, the source size is characterized by σ x = 83 μm and σy = 35 μm, while if one accepts θ x = 1.5 mrad from the source the effective horizontal source size becomes σθx = 55 μm (using Eq. 7.13 to obtain a bending radius of ρe = 38.9 m). This gives a net result of  (7.15) Σ x,BM = σ2x + σ2θx Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

272

X-ray microscope instrumentation

so in this particular case with σ x = 83 μm, the extra source size factor of σθx = 55 μm (Eq. 7.14) is negligible. The spectrum of bending magnet radiation (Fig. 7.6) is quite strong in the THz, infrared, and visible-light ranges [Williams 2005], and its output increases up to a critical energy Ec , after which it declines steeply, as shown in Fig. 7.6. This critical energy (the dividing line in terms of matching emitted power above and below Ec ) is given by 3eBγ2 2me Ec (keV) = 0.665 (E 2 [GeV]) (B [T]), Ec =

(7.16) (7.17)

where the second expression allows for simple calculations. For the Advanced Light Source in Berkeley with E = 1.9 GeV and B = 1.27 T, standard bending magnets have a critical energy of Ec = 3.0 keV, while for the Advanced Photon Source at Argonne with E = 7.0 GeV and B = 0.6 T the critical energy is Ec = 19.6 keV. Bending magnet radiation is linearly polarized in the horizontal plane, due to the direction of the electron’s bend. Radiation slightly above or below the orbit is elliptically polarized. At lower energies, the output is strong enough that these lightsources are also among the brightest sources for broad-spectrum infrared light, and several synchrotron lightsources host infrared spectromicroscopy programs.

7.1.6

Undulator sources Bending magnets at synchrotron light sources are very bright, and have served as excellent sources for both full-field (TXM) and scanning (SXM) x-ray microscopes. However, it was made clear in Chapter 5 that quasi-monochromatic radiation is required for refractive (Eq. 5.9) and diffractive (Eq. 5.33) x-ray optics, and the highest resolution Kirkpatrick–Baez mirrors used for practical hard X ray nanofocusing applications have multilayer reflective coatings (and thus limited spectral bandpass) so as to give larger grazing incidence reflection angles and higher spatial resolution [da Silva 2017]. Thus it is highly desirable for x-ray microscopy to take the broad spectrum of a bending magnet (Fig. 7.6) and somehow compress most of the output into a narrow spectral line, to decrease the horizontal source size relative to the extended source of a bending magnet, and to decrease the angular divergence. This is what undulators do. Undulators were first discussed [Ginzberg 1947, Motz 1951] as microwave sources, and they were first demonstrated at Stanford [Motz 1953] as a way to generate microwave radiation from a 100 MeV electron linear accelerator (linac). The history of their development at storage rings has been nicely described [Winick 1981], with a landmark for x-ray science being the installation in 1980 of a permanent magnet undulator at the SPEAR ring at Stanford, which was able to produce keV photon energies [Halbach 1981]. The idea quickly spread, so that all synchrotron light sources built since then have included undulators for delivering bright x-ray beams. An undulator is named for the motion of the electron beam in the device’s magnetic field. By creating a sinusoidal magnetic field with Nu periods each of period length λu (usually using permanent magnet blocks to produce a field that oscillates up and down),

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

273

Spectral intensity (1019 photons/s/mrad2/0.1% BW)

1014

1013

Critical energy Ec

1012

1011 0.1

1

10

100

Photon energy (keV) Figure 7.6 Spectral intensity I s for a standard bending magnet source at the Advanced Photon

Source at Argonne National Laboratory. Bending magnet sources have strong output at lower photon energies which increases toward a critical energy Ec (Eq. 7.17). The brightness is found by dividing by Σ x,BM (Eq. 7.15) and σy (Eq. 7.7), giving in this case a numerical factor of 340 mm−2 , or a spectral brightness of about Bs  3 × 1016 photons/s/mm2 /mrad2 /0.1% BW at Ec = 19.6 keV. Calculation provided by Roger Dejus of the Advanced Photon Source.

a relativistic electron in a straight section of the storage ring lattice is made to oscillate side to side in an undulating motion. When viewed head-on, each of the Nu periods looks like a set of Nu in-phase dipole radiation emitters, so that one has a spectral concentration into a bandwidth of ΔE/E  1/Nu , and a coherent superposition of radiation within an angular range of 1/Nu . However, these properties are changed by special relativity in two important ways. The first is that from the electron’s perspective, the magnetic field lattice period λu is contracted to λu /γ, where γ is the Lorentz factor of Eq. 4.286. The second is that when one goes from the frame of the electron’s average motion back into the laboratory frame, a relativistic Doppler shift applies which upshifts the frequency of the radiation by another factor of 1/(2γ). The net effect goes like λu /(2γ2 ), which explains how one can obtain nanometer-scale radiation wavelengths from centimeter-scale magnetic periods using multi-GeV storage rings. A more detailed calculation of the emission wavelength involves considering the time difference between the emission of electric field peaks (at the location of magnetic field peaks, at undulator half-periods) which travel at the speed of light, and transit of the electron to the next field peak which is at a velocity slightly less than that of light (Eq. 7.6, with an added time delay due to the slight transverse sinusoidal motion of the electron, which is in turn affected by the magnetic field strength). The end result Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

274

X-ray microscope instrumentation

5•1018 Undulator harmonics

m=1

m=3

Spectral brightness (ph/s/mm2/mrad2/0.1% BW)

K=2.61

m=5

K=0.5 1•1018

m=7 m=9

1•1017

0

10

20

30

40

50

Photon energy (keV) Figure 7.7 Tuning curve of brightness versus photon energy for the λu = 3.3 cm undulator (“Undulator A”) at the Advanced Photon Source at Argonne National Laboratory. Shown here is the tuning range for each undulator harmonic as given by Eq. 7.18, over a range of undulator K values (Eq. 7.19) of K = 0.5 to 2.61. Calculation provided by Roger Dejus of the Advanced Photon Source.

is that the undulator emits radiation at a series of harmonic wavelengths λm given by [Hofmann 1978, Coisson 1982, Krinsky 1983] λm =

λu K2 + γ2 θ2 ), (1 + 2 2 2γ

(7.18)

with only the odd harmonics (m = 1, 3, 5, . . .) appearing on-axis (θ = 0) if the electron beam emittance is sufficiently small. In this expression, K is a measure of the peak magnetic field strength B0 given by K=

eB0 λu , 2πme c

(7.19)

where e is the charge and me is the rest mass of the electron. The value of B0 and therefore K can be adjusted by changing the mechanical separation between the top and bottom halves of a permanent magnet structure. Thus one can tune the wavelength and therefore the photon energy of the radiation, so as to place undulator spectral peaks at the desired photon energy over quite a wide range (an example of an undulator radiation tuning range for the Advanced Photon Source at Argonne is shown in Fig. 7.7). The parameter K gives the maximum deflection angle θe of the electron from its normal trajectory according to K (7.20) θe = . γ Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

Spectral brightness (1019 photons/s/mm2/mrad2/0.1% BW)

7.1 X-ray sources

Harmonics: n=1

4

275

APS Undulator A, K=1.5

3

n=3 2

1

n=5 n=4

n=2

n=6

n=7

0 0

10

20

30

40

50

Photon energy (keV) Figure 7.8 Spectral brightness of a λu = 3.3 cm undulator (referred to locally as “Undulator A”) at the Advanced Photon Source at Argonne National Laboratory. Undulator sources have a series of harmonic peaks at wavelengths given by Eq. 7.18, which are tunable by adjusting the mechanical gap between top and bottom halves of permanent magnet undulators (thus tuning the on-axis magnetic field strength, and undulator K value of Eq. 7.19); the plot here is for K = 1.5. The even harmonics only show up on-axis due to convolution of the undulator output with the electron beam divergence. Calculation provided by Roger Dejus of the Advanced Photon Source.

Since the radiation wavefield from one magnetic period extends out to an angle of about 1/γ about the electron’s forward motion, this means that large values of K start to produce discontinuities in time of the electric field received on axis, which moves one from a more sinusoidal field pattern over time with no harmonics when K  1, to a more square-wave-like field pattern with increasing strength of high harmonics as K is increased beyond 1. This explains the increase in the brightness of high harmonics at the expense of the first harmonic as shown in Fig. 7.7. (At much higher K values like K = 3 or more, one moves from the properties of an undulator to what is called a “wiggler,” which produces higher-energy X rays but with less brightness [Krinsky 1983]; wigglers are generally of less interest for x-ray microscopists.) Returning to undulators, if one ignores electron beam emittance, the angular width of the m = 1 harmonic is given by

1 1 + K 2 /2 (7.21) Δθ = γ 2Nu and the spectral bandwidth is given by Δλ 1  . λ mNu

(7.22)

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

276

X-ray microscope instrumentation

Thus undulators reach the goal of compressing radiation into the desired spectral bandwidth, and divergence angle. As was noted, the most common undulator magnet structure today uses a set of permanent magnet blocks mounted on upper and lower support girders, with mechanical adjustment of their separation distance used to tune the peak field strength and thus the K value. This arrangement generates linearly polarized beams, with the polarization in the horizontal plane. More complex arrangements of magnets can make the electron beam undulate in the vertical plane (not recommended in today’s storage rings, since vacuum chambers are usually wider than they are tall to accommodate large horizontal beam emittance), or follow a helical path. Elliptically polarizing undulators (EPUs) have four sets of magnets that can be shifted along the beam axis with respect to each other to provide a full variety of electron beam motions, creating any form of linear or circular polarization. One can also use currents in coils to produce the magnetic field, with superconducting undulators gaining in popularity at present. As can be seen from Fig. 7.8, undulators give a time-averaged brightness Bs,ave that is orders of magnitude higher than what is available from bending magnet sources. They are the brightest sources of X rays available at synchrotron light sources, and are in the highest demand. However, their brightness is affected by the parameters of the electron beam in the storage ring. The presence of the θ2 term in Eq. 7.18 means that finite electron beam divergences in the storage ring will effectively put some θ  0 radiation onto the undulator axis, which is why even harmonics m = 2, 4, . . . begin to appear in undulator spectra with third-generation storage rings, as shown in Fig. 7.8. As noted in Section 4.4.6, the phase space area of a coherent photon beam is approximately equal to λ; at the wavelength of peak emission power, undulator radiation from an infinitesimal electron beam can be approximated [Kim 1986, Onuki 2003, Kim 2017] as being from a Gaussian source characterized by a FWHM size (Fig. 4.4) of 2.35σr with √ √ 2λL 2λ Nu λu = , (7.23) σr = 2π 2π and a FWHM angular divergence of 2.35σr with  λ  . σr = 2Nu λu

(7.24)

The electron beam itself has a emittance, or phase space area, which leads to an electron beam size characterized by Eq. 7.7 and a divergence characterized by Eq. 7.8 In the Gaussian approximation, the combined net source size and divergence is as given by Eq. 7.12. In third-generation storage rings, the horizontal emittance is often about 100 times larger than the radiation wavelength (and “natural” undulator emittance) so that Msource  100 in the xˆ direction, while the vertical emittance is a few times the wavelength so that Msource  3–10 in the yˆ direction. Fourth-generation storage rings use a multibend achromat lattice design with near-wavelength emittances (and thus Msource  1, so that they are called diffraction-limited storage rings) in both directions [Eriksson 2014]. This is continuing a remarkable trend in the increase in available x-ray brightness, as shown in Fig. 7.2. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

277

Much information about undulator radiation can be found in recent books about the topic [Onuki 2003, Kim 2017].

7.1.7

Inverse Compton scattering sources The approximately centimeter-scale magnetic period length λu in permanent magnet undulators is dictated by the achievable field strength in the magnetic materials, while electromagnetic undulators again involve an interplay between achievable current density in wire windings of decreasing size. This means that one needs GeV electron beam energies to produce keV photons. What if one could dramatically shrink λu from about a centimeter to about a micrometer? From Eq. 7.18 we see that we could then reduce γ2 by a factor of 104 , and thus go from around 1 GeV beam energies to roughly 10 MeV, which can be obtained with much smaller and less expensive accelerators. It might seem that the way to achieve such a short period is to replace the mechanical structure providing a sinusoidal magnetic field with a way of producing an electromagnetic field with a periodicity of about a micrometer: an intense visible-light laser field! While the magnetic field associated with light is not very large (Appendix B.3 at www.cambridge.org/Jacobsen), the electric field introduces transverse velocity kicks to the electrons in a way that will be described in the next section. But just as one can use a classical description of light as electromagnetic waves on Mondays, Wednesdays, and Fridays, and a quantum description in terms of photon momentum the rest of the week, one can consider the transfer of momentum produced by backscattering a visible light photon from an energetic electron in a process known as inverse Compton scattering (Eq. 3.28). While this approach has not yet found application in sub-micrometer x-ray microscopy which is the focus of this book, it has been developing rapidly [Hajima 2016] and it has been used for larger-scale x-ray imaging [Achterhold 2013] using a commercial source [Eggl 2016] from Lyncean Technologies that was compact enough for one (well-funded) laboratory to purchase and operate.

7.1.8

X-ray free-electron lasers (FELs) As we have seen above, the key to the increased brightness provided by undulators is the correlation of radiation wavefields produced by one electron traversing all the Nu periods in the undulator. This increases the radiation power only linearly with the length Nu λu of the undulator, though it does compress the radiation into narrow spectral peaks (Eq. 7.22) and a narrower angular distribution (Eq. 7.21). What about a large number of electrons together? In a storage ring, the Ne electrons in a bunch are stochastically distributed due to radiation emission over an orbit of the machine, so each and every time one adds up the electric fields from radiation of individual electrons via an incoherent sum, leading to an intensity that scales with Ne (see Eq. 4.9). Most storage ring operating modes have many bunches circulating simultaneously, and again these bunches are uncorrelated on the timescale of λ/c so the x-ray flux scales linearly with the electron beam current in the ring. What if we could somehow produce some correlation of the radiation from the Ne

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

278

X-ray microscope instrumentation

e

10-5

reg

im

Self-amplification buildup

al g

ain

Saturation

ion :e xp on en ti

10-6 Spontaneous emission

ula t Sim

10-7

ѥP

Pulse energy (joules)

10-4



10-8 0

2

4

6 8 10 Undulator length (m)

12

14

Figure 7.9 Illustration of the self-amplified spontaneous emission (SASE) principle in

free-electron lasers (FELs). An electron bunch enters a long undulator with an initially random distribution of electron positions as shown at left, but as the undulator radiation field builds up the electrons begin to be bunched in position, leading to a correlation of their electric fields. This continues through the exponential gain regime until saturation occurs. Shown here are the experimental measurements of per-pulse radiation energy versus undulator length for the first soft x-ray SASE FEL at DESY [Rossbach 2003], where the saturation regime agrees very well with simulation results [Saldin 1999]. Inset figures of the electron bunches provided by Rolf Treusch of DESY.

electrons at a specific wavelength and viewing angle? There are a lot of electrons in a bunch (Ne = 1 × 1011 in each bunch when the Advanced Photon Source at Argonne runs in 24-bunch mode), so the potential gains are huge! This is the name of the game for free-electron lasers. In a typical visible-light laser, an optical cavity with high-reflectivity mirrors at either end is used to build up a strong light field within the gain medium so as to use the mechanism of stimulated emission from atoms. Thinking mainly of visible light (though considering 14.4 keV X rays too), John Madey at Stanford proposed a scheme of using an undulator with an optical cavity to cause bunching of the electrons in the cavity’s standing wave field distribution in a process he termed “stimulated emission of Bremsstrahlung” [Madey 1971], with further theoretical developments [Hopf 1976, Colson 1977] providing insight from semiclassical theories. However, making an optical cavity for X rays using normal incidence mirrors is not practical, as Sections 3.6 and Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.1 X-ray sources

279

4.2.4 made clear. How then to achieve the required bunching without an optical cavity? The answer arrived in the form of self-amplified spontaneous emission, or SASE (pronounced like “sassy”) [Kondratenko 1980, Bonifacio 1984]. As an electron bunch traverses a long undulator, the spontaneous radiation in the upstream end eventually builds up to produce enough of an electric field that individual electrons gain positive or negative velocity kicks depending on whether they are “surfing downhill” or “climbing uphill” on the field; this causes progressive bunching of the electron beam as shown in Fig. 7.9. Because the SASE process is affected by the particular initial configuration of the electrons in the bunch, there is a “startup from noise” characteristic to the SASE radiation pulse such that there are bunch-to-bunch fluctuations in the details of the electron correlations, leading to fluctuations in the exact radiation wavelength, beam direction, and beam energy (one can “seed” the pulses by using harmonics of lower wavelength coherent radiation sources to improve things in seeded FELs). The SASE effect only applies to those electrons that are within a single spatial coherence mode (or Msource  1 as described in Eq. 4.198) of the emitted radiation, which means an electron beam emittance smaller than the photon emittance, or  λ, is required. In spite of early attempts to develop storage ring FELS, the transverse and longitudinal spreading of the bunch that occurs in repeated orbits in a storage ring means that linear accelerators offer overwhelming advantages for FEL operation. Thus while one would like to use storage rings due to their economy of operation as noted in Section 7.1.4, FELs use linear accelerators or “linacs” in which many GeV of energy is invested in each electron in order to produce keV X rays from one pass through the undulator, after which the electron beam is “dumped.” While one can switch successive electron bunches into separate long undulators, it also means that one can at most operate a few experimental stations at once, unlike the 40–60 that operate simultaneously at some storage ring synchrotron light sources. This makes FELs much more expensive to build and operate, thus affecting the availability of beamtime. In addition, in order to maintain the required magnetic field uniformity in 100 meter long undulators, most SASE FEL undulators run at fixed magnetic field strength so that photon energy tuning requires adjustment of the linac accelerating energy, which is less straightforward. In spite of their access limitations, and the “startup from noise” fluctuation limitations of SASE FELs, the achievable gains in instantaneous brightness are huge, approaching a factor of the number of electrons Ne in the bunch as one nears saturation. Therefore x-ray FELS (XFELs) have enabled radical and exciting new destroy-but-diffract approaches to x-ray microscopy as described in Section 10.6. For approaches such as scanning microscopy, spectromicroscopy, or tomography, the specimen must survive multiple illumination pulses so in these cases one must worry about the anti-Goldilocks condition and the “no-fly zone” discussed in Section 11.1.1. The development of FELs, and the first demonstration of the SASE effect with EUV [Rossbach 2003] and hard x-ray radiation [Emma 2010], involves a rich interplay of personalities and big science politics. One can get a glimpse of this from personal perspectives provided by Madey [Madey 2016], and by Claudio Pellegrini, who helped arrive at the SASE picture [Bonifacio 1984] and who played a major role in instigating hard x-ray FEL development [Pellegrini 2012, Pellegrini 2017]. The various stories are Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

280

X-ray microscope instrumentation

so complex that they have even shaped how patent law applies to university research [Cai 2004].

7.2

X-ray beamlines In a visible-light microscope, one would never simply place the specimen on top of the filament of a light bulb or the exit aperture of a laser, mount an objective lens, and be done with it. The same applies to x-ray microscopes: some sort of optical transport system is used to deliver the x-ray beam from the source to the specimen, and to condition the illumination properties as described at the start of Section 7.1. This system is called a “beamline” at synchrotron light sources and XFEL facilities, and in fact this word is also sometimes used to describe x-ray beam transport systems for other source types, too. The beamline should provide some or all of the following: • • • • • •

the required degree of monochromaticity (Section 7.2.1); the proper e´ tendue and coherent phase space (Section 7.2.2); apertures and shutters (Section 7.2.3); radiation shielding (Section 7.2.4); management of thermal loads (Section 7.2.5); vacuum or gas environment (Section 7.2.6).

Sometimes several of these functions are provided by a combined system, such as a spherical grating monochromator, which both images the source onto an exit slit in one direction and monochromatizes it at the same time. Given that there are entire books on beamline design [Peatman 1997], we will only give a short overview of the topic.

7.2.1

Monochromators and bandwidth considerations X-ray microscopes using grazing incidence reflective optics (Section 5.2), like Kirkpatrick– Baez or Wolter mirrors without multilayer coatings, can use a broad spectral bandwidth (broadband radiation). Refractive optics require a higher degree of monochromaticity (Eq. 5.9), as do diffractive optics such as Fresnel zone plates (Eq. 5.33) and reflective optics with multilayer coatings (Section 4.2.4). Simple absorption contrast in contact (Section 6.1) or point projection microscopy (Section 6.2) can also make use of broadband radiation. Electron optical x-ray microscopes (Section 6.5) often require narrowband illumination so as not to “blur out” the energy spectrum of emitted electrons, which would be problematic for the types of electron lenses that are highly chromatic. Propagation-based phase contrast methods (Section 4.7.2) require that E/ΔE be larger than the number of Fresnel fringes used to obtain a phase contrast image, while holography and coherent diffraction imaging have much more demanding requirements on E/ΔE, as discussed in Chapter 10 (such as the requirement of Eq. 10.54 for ptychography). It is also preferable to be able to tune the photon energy so as to maximize contrast, or to be able to excite a desired x-ray fluorescence emission line (Section 9.2) or near-edge absorption resonance (Section 9.1.2).

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.2 X-ray beamlines

281

One can always narrow the spectral bandpass ΔE of a source by using a monochromator as a spectral filter (Section 7.2.1), though this comes at a cost in efficiency within the selected bandpass. X-ray monochromators come in three basic types: crystals, multilayers, and gratings. Crystals tend to be used above about 2 keV, and gratings below, with multilayers spanning both ranges and offering greater spectral bandpass ΔE (or, equivalently, lower spectral resolution). Crystal monochromators work on principles described in Section 4.2.3, though they are usually used in pairs: one crystal might deflect the beam at an upwards angle, and a second crystal then deflects it by the same angle so as to provide a beam that is offset in height but at the same angle as the x-ray beam emerging from the source. (Upward deflection is used at most of today’s synchrotron light sources due to the smaller vertical source size, though at the newest, lowest-emittance light sources horizontal deflection is sometimes chosen based on mechanical stability considerations). In a double-crystal monochromator (DCM) [Schwarzschild 1928, Smith 1934], the first crystal absorbs significant energy from the incident beam, so cooling must often be provided. For small energy tuning ranges, one can cut a rectangular channel in a monolithic silicon crystal and thus obtain a DCM with very high stability, but for larger tuning ranges one needs a mechanism to tilt two separate crystals to the desired Bragg angles while also translating the second crystal to keep it centered in the first crystal’s diffracted beam. While one can use Si 111 crystals down to about 2 keV, and exotic materials such as YB66 down to about 1.2 keV [Wong 1999], at even lower photon energies or longer wavelengths the lattice spacing of available crystals no longer allows for Bragg diffraction. The relatively low absorption of multi-keV or “hard” X rays in silicon leads to diffraction efficiencies of >90 percent in many cases (Fig. 4.12), and the high quality of readily available silicon crystals leads to good coherence preservation from polished crystals. For microscopy, they can provide a restrictively small spectral bandpass (for example, 1.4 eV at 10 keV for Si 111 , as shown in Fig. 4.12), giving a monochromaticity of E/(ΔE)  7000. Since many x-ray focusing optics might need a monochromaticity of only about 1000 or less (see Eqs. 5.9 and 5.33), this overly restrictive spectral bandwidth reduces the flux that could otherwise be used. Because of the narrowness of the Darwin width, it can be very helpful to use a mirror optic to collimate the x-ray beam so that it is more nearly parallel when it reaches the crystal monochromator. To work at longer wavelengths, one must increase the d spacing of the crystal lattice (Fig. 4.9) and the way that can be achieved is to use synthetic multilayers as described in Section 4.2.4. As with DCMs, one usually uses these multilayers in pairs in a double multilayer monochromator (DMM) so that the beam angle is unchanged. Double multilayer monochromators can produce illumination with a spectral bandwidth in the neighborhood of 1 percent (with considerable flexibility around that number according to the design choices made for the multilayer coating), thus providing more flux for those imaging systems that can tolerate the larger bandwidth. When using a DMM to increase the accepted spectral bandwidth and thus the accepted flux from a nonmonochromatic x-ray source, one must keep in mind the dispersive limits to spatial resolution in compound refractive lenses, as considered in Eq. 5.9, or in Fresnel zone plates, as considered in Eq. 5.33. An especially clever arrangement is to combine the Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

282

X-ray microscope instrumentation

function of a single multilayer monochromator and condenser lens into one optic for laboratory transmission x-ray microscopes [Hertz 1999, Berglund 2000]. At photon energies below about 2 keV, grating monochromators become the common choice. These are mirrors operated at grazing incidence because of the critical angle θc given by Eq. 3.115. The mirror surface may be flat in a plane grating monochromator, or curved in a spherical grating monochromator, in which case one can combine dispersive and focusing properties (as described in Section 5.2.1) in one optic. Because the grating is used at grazing incidence angles, one can achieve nanometer effective grating periods as seen from the beam direction by using micrometer-scale structures on the grating surface. This means that methods including laser interferometry and mechanical ruling can be used to produce the grating structure. Because the grating is operated at grazing incidence, it also tends to be somewhat long (several centimeters), so that one might need to slightly adjust the grating period between the upstream and downstream ends; such variable line space gratings are usually produced by mechanical ruling. The grating grooves can be blazed (arranged at a shallow angle relative to the substrate) so that one approaches specular reflectivity in the dispersion direction, thus improving efficiency, which can exceed 20 percent for soft X rays (note that the grating entrance and exit angles as shown in Fig. 4.8 are normally chosen to not be equal to the substrate grazing incidence reflection angles so that one separates the zeroth or undiffracted order from the desired diffraction order). Of particular note is the Rowland circle condition for spherical or toroidal grating monochromators, where one matches the dispersion and focusing properties of the optic as worked out by H. A. Rowland in 1882, and described in more recent publications [Namioka 1959, Samson 1967, Peatman 1997]. Spherical grating monochromators (SGMs) in particular have been used for many scanning x-ray microscopes [Winn 2000, Warwick 2002] where spectral monochromaticities of λ/(Δλ)  3000 or more are obtained so as to match the intrinsic ∼1 eV spectral linewidth of carbon XANES resonances (Section 9.1.3) near 300 eV photon energy. With SGMs, one must pay attention to the presence of second-order diffraction which can cause some 600 eV light to be present when rotating the grating to obtain 300 eV light at the exit slit, so that one uses either order-sorting mirrors (Fig. 3.27) or transmission through a material with an absorption edge just below the energy of the secondorder light, such as a gas filter [Winesett 1998, Warwick 2002]. Of special note is the use of large-diameter Fresnel zone plates as combined linear monochromators and condenser lenses, as discussed in Section 6.3.1. These were central to many of the transmission x-ray microscopes (TXMs) developed by the G¨ottingen group at the BESSY storage ring. However, as noted in Section 6.3.2, more recent TXMs have used grating monochromators and capillary condenser lenses as these provide better power handling, higher monochromaticity, and increased working distance near the specimen [Schneider 2012, Sorrentino 2015].

7.2.2

Coherence and phase space matching Various x-ray microscopes make different demands on x-ray source size and divergence characteristics. As discussed in Section 4.4.6, the size and angular distribution of an

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.2 X-ray beamlines

283

40

xўDQJOH ѥUDG

20

6RXUFHZLGWK wx ѥP GLYHUJHQFH VHPLDQJOH Ƨx ѥUDG

z P

z P

z P

ѥP ZLGHVOLW 0 z P

z P

z P -20 %HDPZLGWK YHUVXVz -40 -80

-60

-40

-20

0

20

40

60

80

xSRVLWLRQ ѥP Figure 7.10 Evolution of the phase space area occupied by a light beam focused on a slit, and at various distances from the slit. Shown here is a beam with a convergence semi-angle θ = 30 μrad focused onto a slit with a width of w = 20 μm. When the beam propagates to positions of z = 1 meter and z = 2 meters downstream of the slit, the phase space area becomes “tilted” in the clockwise direction due to geometric propagation of the beam. If a focusing optic were to be placed at z = 2 meters, it would tilt the phase space area in a counter-clockwise direction before propagation would again tilt it clockwise towards the vertical at the focus. Depending on the focal length of that second optic, it might lead to a phase space distribution with a different width and angular divergence, but the product of the two would remain the same due to Liouville’s theorem (Eq. 4.189).

x-ray source is sometimes referred to as its e´ tendue. The product of size times angle at a source or focus position divided by the wavelength gives the number of source modes Msource . While one can use optics to image light from a small source with large divergence onto a large focal spot with small divergence (and vice versa), the size–angle product is unchanged due to Liouville’s theorem (Eq. 4.189). The exception is that one can reduce the number of source modes through the use of a spatial filter, where an aperture placed at a focus is used to throw away light in exchange for decreasing the number of source modes Msource . Point projection microscopy requires a reasonably small source to minimize penumbral blurring (Section 6.2) or to produce a specified degree of spatial coherence, while a large divergence leads to a large field of view. In propagation-based phase contrast imaging methods (Section 4.7.2), the ratio of the imaging field width to the width of the total number of Fresnel fringes recorded gives an estimate of the number of source modes Msource that can be accepted in each of the x and y image directions. In full-field imaging (Section 6.3), one can accept as many source modes Msource as there are pixels in the detector in each direction, so that bending magnet sources at synchrotrons serve nicely as sources for full-field TXM systems [Feser 2012]. However, exposure times Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

284

X-ray microscope instrumentation

depend on the number of photons in a single mode Msource , since one mode maps onto one pixel; therefore brightness still comes into play, and especially at synchrotron radiation sources one often has fewer source modes Msource than image pixels N so that one has to wobble or scan the condenser optic during exposure (Sections 6.3.1 and 6.3.2). In scanning microscopy (Section 6.4) one obtains the smallest focused beam size only when Msource  1 (see Section 4.4.6 and Eq. 4.198), and the coherent imaging methods of holography, CDI, and ptychography discussed in Chapter 10 also require Msource  1 (see for example Fig. 10.12). The role of beamline optics is to transfer light from the source into the x-ray microscope with the proper number of source modes Msource , and with the appropriate ratio of size versus divergence. One very useful tool for thinking about how to do this with beamline optics is to consider how a light beam can become rearranged in phase space [Hastings 1977, Pianetta 1978, Smilgies 2008, Ferrero 2008, Huang 2010a]. Consider a source with uniform illumination across a specified width and angular divergence as shown in the xˆ direction in Fig. 7.10 (one must track changes in phase space separately in the xˆ and yˆ directions). As the beam propagates downstream by a given distance, the light rays with large divergence expand out to large positions, so the phase space box becomes tilted in the clockwise direction as indicated. If one were to then put a focusing lens in place, large positive angles would become large negative angles so the phase space box would become tilted in the counter-clockwise direction. If the lens were to produce an image of the beam which is smaller in width and therefore (necessarily) with larger divergence, the phase space box would become narrower in width and taller, while an enlarged image of the source would lead to a phase space box that was wider but not as tall. It is also worthwhile considering the role of a slit. If the slit is narrower than the beam at a focus position, the divergence of the beam is unaffected (ignoring diffraction effects) so the phase space box becomes narrower in width but has an unchanged height. This gives rise to a smaller net phase space area and a reduced number of source modes Msource . If the slit were to be at a position different than a focus position (such as at z = 2 m in the example of Fig. 7.10), even a slit that is wider than the original source (such as a 40 μm wide slit as shown in Fig. 7.10) would intercept part of the phase space box in a way that a 1:1 re-focused beam would have an altered angular distribution. Therefore the phase space model becomes a good way to understand the effects of optics and apertures at various points in a beamline, with calculations handled easily using matrix methods in classical optics. If one extends it to incorporate partial coherence and diffraction effects, one essentially has the Wigner distribution approach that has been used with synchrotron radiation sources [Kim 1986]. One can manage both source monochromaticity and e´ tendue at the same time with beamline optics. In a spherical grating monochromator (SGM) operated in the Rowland circle condition, polychromatic light focused on the entrance slit results in monochromatic light being focused on the exit slit of the monochromator. That exit slit can then serve as both a spectral filter (by controlling how large a slice of the spectrally dispersed beam is accepted) and a spatial filter (by adjusting the size of a slit at the focus) in one direction, while in the orthogonal direction a separate optic might be used to image the source onto the exit slit, where again one can do spatial filtering. This arrangement is Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.2 X-ray beamlines

285

(misaligned)

(refrac te transm d itted rays)

Intended beam path

Knife-edge Rectangular slits slits

Cylindrical slits

Figure 7.11 Slit types that can be used for aperturing x-ray beam. The intended transmitted beam path is shown in red. For visible light and electrons, knife-edge slits are often preferred because they are relatively easy to prepare and are tolerant of angular alignment errors. However, rays just off the intended path are only partly absorbed due to the penetration of X rays into thin material (especially for higher-energy X rays) so the slit appears to be soft-edged and wider than intended. In addition, these rays can be refracted by the x-ray beam (see Fig. 3.19). With rectangular slits, one solves the problem of x-ray penetration if the slit face is perfectly aligned, but any misalignment of the edge (or, with a perfect rectangle, angular misalignment of the rectangular block) will again produce a somewhat soft-edged slit transmission function with refraction. For this reason, hard x-ray slits are sometimes made from cylinders so that one has a sharp cutoff of transmission and no requirement for rotational alignment along the axis of the cylinder.

well suited to soft x-ray scanning microscopes [Winn 2000, Warwick 2002]. One can open slits for higher flux with lower spatial and temporal coherence (noting that the focus spot will increase in size, as was shown in Fig. 4.42), or close the slits for higher spatial resolution and better spectroscopy at the price of a decrease in flux.

7.2.3

Slits and shutters Slits are conceptually simple items in a beamline, but their implementation requires care. Consider three different slit types as shown in Fig. 7.11; especially for hard X rays, obtaining a clear slit cutoff for a well-defined x-ray beam width is not so straightforward. For coherent diffraction imaging methods, one must also consider scattering from any roughness on the edges of the slit (sometimes referred to as “hot lips”) as seen by the x-ray beam, which sometimes leads to complex scatter shield arrangements like that shown in Fig. 10.17. Furthermore, slits are usually placed at intermediate focus positions of the x-ray beam and sometimes they are used upstream of monochromators so they must be able to absorb the entire spectral output of the source; this provides challenges with slit cooling so as to maintain a stable slit position and width. Shutters serve to interrupt the beam entirely. Safety shutters in the front end of the

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

286

X-ray microscope instrumentation

beam line serve to protect personnel from the x-ray beam, from the electron beam in case the orbit is lost at synchrotron light sources, and (again at synchrotron light sources) from gamma rays that can be created by collision of storage ring beam electrons with residual gas molecules. These safety shutters are interlocked so that one cannot access the x-ray beam path unless the safety shutter is closed. Safety shutters must be able to absorb the entire beam power, so they are often massive water-cooled blocks with slow opening and closing times. For this reason, smaller and lighter shutters are often added as part of the microscope instrumentation (see for example [Chapman 1999]) to protect the sample from unnecessary exposure, to define the exposure time, or to synchronize the exposure with some other stimuli. The beam incident on the microscope tends to be small in dimension, and low in power, so that small movement of a light object can suffice to intercept it. Fast shutters may be actuated by an electromagnet, or even a piezoelectric actuator, with millisecond or even faster response time.

7.2.4

Radiation shielding From the calculations summarized in Section 4.9.1, it is clear that specimens in x-ray microscopes will be exposed to a significant radiation dose. However, as noted in Section 11.2.3, it’s nice to minimize the radiation dose received by the experimenters! For soft X rays (1 keV), the thickness of vacuum pipes and windows is sufficient to completely absorb any stray x-ray beam, and even air is a pretty good absorber (Fig. 7.12). As one gets up to harder X rays at 10 keV and above, vacuum pipes still provide sufficient shielding but lead (Pb) is used to shield areas where higher-energy X rays might strike and produce x-ray scattering. Since one might use electron acceleration voltages of many tens of keV in laboratory sources, or have an electron beam energy of many GeV in storage rings, radiation shielding must account for the possibility that the electron beam can go in unanticipated directions and create higher-energy X and gamma rays. For that reason, hard x-ray experiments at synchrotron light sources are often carried out inside steel- or lead-shielded rooms from which the experimenter is excluded; these “hutches” (a word someone from farm country in the USA will associate with enclosures for pet rabbits) have a security system involving keys and area search buttons to ensure that nobody is inside and the door is closed before a safety shutter can be opened.

7.2.5

Thermal management The power of the X ray beam emitted from synchrotron radiation sources is typically between hundreds of watts to many kilowatts. Undulators in particular concentrate much of this power into a narrow cone, so the power density can be higher than in an arc welder. These beams can melt through thick stainless steel valves and other equipment not designed to handle them, and unfortunately this fact has been demonstrated more than once. The first surfaces to see the beam are typically water-cooled and oriented at grazing incidence to the beam to spread out the heat load. One can use the lowpass energy “filtering” property of grazing incidence mirrors (Fig. 3.27) to remove all

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

1/e attenuation length (meters)

7.2 X-ray beamlines

105 104 103 102 101 100 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8

0.1

lium

He

287

1m

Air 1 mm

304

0.3

1

s

l

tee

ss s

le tain

ѥP

Pb

3

10

30

Photon energy (keV) Figure 7.12 X-ray absorption length μ−1 (Eq. 3.75) in air and helium, plus 304 stainless steel

(used for ultra high-vacuum chambers and fittings) and lead (Pb). Soft X rays are effectively stopped by a few millimeters of air, though more caution must be used if helium gas is in the beam path. The walls of vacuum chambers are sufficient to stop x-ray beams in most cases, though lead shielding is used when extra protection is required and in particular for locations where there is the potential for gamma ray scattering.

the x-ray power above a certain photon energy, thus reducing power loading on all downstream components. Silicon, diamond, and the copper-based compound glidcop are preferred materials for dealing with high power densities because of their good thermal properties. In beamlines with double crystal monochromators, the first crystal is cooled either by water or by liquid nitrogen. Remarkably, the thermal expansion coefficient of silicon at liquid nitrogen temperature is vanishingly small, making such cryogenic monochromators popular at undulator beamlines in spite of the cost and complexity. Downstream of a monochromator the power in the beam is reduced, typically by three orders of magnitude, so dealing with the heat load is much easier. Still, with precision optical elements, avoiding damage is not enough: thermal distortions of an optic’s figure must be carefully controlled. Beamline optical designs typically involve detailed finite element calculations to predict local thermal response of all optical components, and to design the geometry, the materials used and the cooling arrangement to keep such distortions within allowable limits.

7.2.6

Vacuum issues, and contamination and cleaning of surfaces Since most x-ray sources involve electron beams accelerated in a vacuum environment, ultra high-vacuum (UHV) conditions (typically about 10−9 Torr or lower, or 10−7 Pascal or lower) are highly desirable so as to minimize electron scattering and beam degrada-

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

288

X-ray microscope instrumentation

tion. At most synchrotron light sources, UHV conditions are maintained in the storage ring vacuum chamber, and in x-ray beamlines up until one reaches a window of some sort. With soft x-ray beamlines this might be a 100 nm thick silicon nitride window (Section 7.5.1) separating the UHV beamline from the optics and specimen at atmospheric pressure, while for hard x-ray beamlines a 0.3–0.5 mm thick beryllium window is often used to separate upstream UHV regions from a downstream region which might be at vacuum (allowing for a thinner window) or which might be filled with helium gas. With coherent x-ray beams, these windows must be polished so that thickness variations do not impose phase structure on the beam, and considerable care is taken in their handling (solid beryllium is quite safe to handle, but fine beryllium dust is extremely hazardous and leads to the severely debilitating lung condition called berylliosis1 ). Polyimide or Kapton windows are also sometimes used to separate vacuum from atmospheric pressure environments at hard x-ray beamlines. Another advantage of maintaining a beamline at UHV conditions is that it minimizes the condensation of contaminants on surfaces (see Eq. 11.13). This helps keep grazing incidence optics clean, which is especially important for soft x-ray beamlines. If one instead has residual hydrocarbons in the vacuum chamber from fingerprints, organic solvents, and so on, the x-ray beam can ionize the molecules and the damaged fragments can then stick to the surface where the beam has hit (a process called radiation-induced cracking). This can be bad news at a beamline, because one might then peer in a vacuum window and see that an expensive grazing incidence mirror or grating now has an ugly brown streak on it, producing a severe loss of reflectivity at the carbon edge and difficult incident flux normalization problems for carbon XANES spectromicroscopy. The achievement of UHV conditions is a religion with its own bible [O’Hanlon 2003]. It involves careful choices of materials and pump types, and proper cleaning and handling of components. One important rite in this religion is “baking out” a vacuum system, where it is heated to elevated temperatures (e.g., 100◦ C) to accelerate water and hydrocarbon desorption from surfaces while being pumped. If the “baked” system pressure gets down into the 10−8 Torr range, chances are good that the base pressure will drop another order of magnitude or more after the system is cooled down to room temperature. Resistive electrical heating tapes are sometimes wrapped around the vacuum chamber, and aluminum foil is often used to wrap the system and keep the heat in relative to the room temperature environment. This is why one sometimes sees some beamline components covered with aluminum foil, which some in the USA might otherwise associate with baking a turkey at the Thanksgiving holiday.

7.3

Nanopositioning systems X-ray microscopes require nanopositioning to move the specimen into the focal region, and mechanical scanning of either the optic or (more typically) the specimen in scanning 1

The author’s research lab head during college summers at Los Alamos was Herb Anderson, who had contracted berylliosis during a laboratory fire during the Manhattan Project days, so that by the early 1980s he toted around a portable oxygen system.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.3 Nanopositioning systems

289

microscopy. Once moved, the specimen needs to remain in a stable position relative to the optic with a tolerance of a small fraction of the desired spatial resolution. Like UHV technology, nanopositioning technology represents another religion with its own prophetic writings (see for example [Fleming 2014, Ru 2016]), so I will only convey here some brief points from my own belief system. For coarse positioning over long distances (including centimeters or more), micropositioning stages with ball-bearings slides and gear-reduction electrical motors are usually used. The motors are either stepping motors, where gear-like permanent magnets provide a set of stable positions at fixed rotational intervals that electrically driven coils must move past, or direct current (DC) motors, where the speed of rotation is proportional to the current supplied. With finer gear reduction systems, slower velocities can be traded off for finer motion precision down to about 50–100 nm, depending on the model chosen. Encoders provide a check of the actual motion achieved (sometimes by optical measurements of motor shaft rotation, or actual linear position of the stage) and this feedback is used by a motor controller to reach the desired position. Encoders are required when using DC motors; with stepping motor stages, they can provide reassurance against missed motor steps. Because micropositioning stages often have a screw pushing against a surface with a retention spring holding things in place, most motor controllers will employ a backlash correction strategy in which the last stage motion always involves pushing with that screw (moves in the other direction are accomplished by going past the desired endpoint, and then driving back). These stages can carry significant mass (kilograms or more, depending on the model chosen), and special vertical translation stages are sometimes made by a horizontal motion of driving one wedge into another, with ball-bearing guides between the wedges. The electric motors can generate heat (as can the infrared light sources in encoders) [Nazaretski 2013], so there can be some thermal drift of the stage after a motion has been carried out. Finer motion is achieved by using piezoelectric crystals which expand or contract in response to an applied voltage. These can be used to achieve sub-nanometer displacements, or if a mechanical leveraging arrangement is used one can expand the range to tens of micrometers at a loss in stiffness and compliance. Over these smaller ranges, the motion is usually constrained to be in the desired direction through the use of a flexure system. Because piezos are notorious for hysteresis, some sort of nanometer-precision encoder is used, which is often a capacitance micrometer, though laser interferometers and mechanical strain gauges can also be used. One can think of the pushing strength of these piezos in terms of capacitance: stiffer flexures and longer ranges require larger drive piezos, which then require voltage amplifiers with larger electrical current limits in order to quickly supply the per-volume charge imbalance needed to cause the piezo to expand with a short response time. In order to do long-range coarse positioning with nanoscale fine movement, one often uses a stack of x–y–z motorized micropositioning stages with a x–y piezo stage on top (because the depth of field of most x-ray microscopes is several micrometers, stepper motor stages usually offer fine enough motion for focusing). Unfortunately this can lead to rather large stage assemblies, with long mechanical paths between the optic and the specimen. Let’s consider the example of a U-shaped stage assembly that has two stacks Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

290

X-ray microscope instrumentation

that are each 10 cm high, separated by 10 cm. If the U were to be made out of stainless steel, a 1◦ C change in temperature would lead to an expansion of each of the three arms by a distance of Δx = αx ΔT = (18 × 10−6 /◦ C) · (10−1 m) · (1◦ C) = 1.8 × 10−6 m or 1.8 μm, assuming a representative value for the coefficient of thermal expansion for stainless steel (one can also use materials with significantly lower thermal expansion coefficients, such as invar metal or zerodur ceramic). For the two vertical arms of the U, a 1◦ C change of temperature would make both the optic and the specimen move upwards by 1.8 μm together, while the bottom arm of the U would have the two stacks move apart by 1.8 μm. This might be OK in that the depth of field (Eq. 4.215) of most x-ray microscopes is much larger than 1.8 μm (except for high resolution soft x-ray microscopes), and movements of both the optic and the specimen by 1.8 μm relative to an illumination width of typically 50–200 μm are inconsequential. However, the stage stacks are not usually made of uniform material (there might be a mix of materials, plus various interfaces for bearing slides etc.) so there will likely be some differential in the up–down thermal expansion. If this happens during acquisition of one full-field image, the image will be blurred, while in a scanning microscope the scan field will become distorted. If one takes a sequence of images such as are required for spectromicroscopy or for tomography, one will have to worry about registration of the images to correct for thermal drift. For this reason, many high-quality experimental facilities at synchrotron light sources follow the practice of high-end electron microscopy labs by taking special care to install heating and cooling systems that can maintain temperature stability to within 0.1◦ C. Thermal drifts can also be compensated for using various image registration steps (as has been demonstrated in spectromicroscopy [Jacobsen 2000] and in tomography [G¨ursoy 2017b]), but an even better approach is to use a laser interferometer to measure any remaining changes in relative position between optic and specimen. Laser interferometers have been used as absolute position encoders to correct for nonlinearities in piezo motion in x-ray microscopes [Shu 1988] and to correct for the relative position of specimen and optic in 2D scanning microscopy [Kilcoyne 2003] and in 3D ptychographic tomography [Holler 2012]. They have also been touted as solutions to correct for high-frequency motion such as vibration [Kilcoyne 2003], though it is difficult to correct for vibrations at frequencies above about 100 Hz when one considers the bandwidth of digital servo control systems and the combination of weight, piezo size, and piezo drive current limit in many nanopositioning stages. As a result, for vibration it can be argued that there is no substitute for solid mechanical design, so that one can have a system that is stable in the typical vibration environment of modern synchrotron light source facilities (Fig. 7.13) or well-designed microscopy laboratories. These points are discussed in more detail in connection with a recently developed compact STXM [Takeichi 2016]. When it comes to precision mechanical design, there are yet more holy writs [Jones 1988, Smith 1992], so we restrict ourselves to a few comments: • One can first minimize vibration excitations by design of the laboratory. Modern Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.3 Nanopositioning systems

291

506GLVSODFHPHQW PHWHUVʚ+]

10-6 10-7

f

+]

-2

120 180 +] +]

Vertical

10-8

10-9

10-10

10-11

Horizontal Floor vibrations at the Advanced Photon Source, January 2013 (C. Preissner)

10-12 1

10

100

300

Frequency f +] Figure 7.13 X-ray microscopes require a high degree of stability between the objective lens (often a zone plate optic) and the specimen, in spite of ambient vibrations. This plot shows the vibration on the experimental floor at the Advanced Photon Source at Argonne National Laboratory, in units of RMS displacement in meters per square root Hz. Vertical motion is often about ten times larger than horizontal motion. The spikes at 60, 120, and 180 Hz are likely due to vibration from electrical transformers, pumps, and other equipment operating at the USA alternating current electrical line frequency of 60 Hz, along with harmonics. Data courtesy of Curt Preissner, Argonne Lab.

light source and electron microscope facilities make significant investments into the construction of the floor, often using thick slabs that are mechanically isolated from the “regular” building floor with heating and cooling systems, mechanical vacuum pumps, electrical transformers, and so on. Even the dirt under the slabs must be paid attention to, with various schemes of compaction and deep support pillars being employed. The microscope itself should also be acoustically shielded, especially if there are noisy pumps or air outlets nearby. Commercially available optical tables mounted on air supports offer one way to minimize vibration for microscopes mounted on top, though at synchrotron light sources one must pay attention to the absolute position control of the air supports so that the whole table stays in a stable position relative to the 50–200 μm size that is typical for x-ray beams. One can also place a large granite or concrete block on top of foam mounts to a floor, and then build the microscope on this large, weakly couDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

292

X-ray microscope instrumentation

pled mass, although the foam mounts can slowly compress over time leading to a need to occasionally adjust the microscope to remain centered on the x-ray beam. • One must be cognizant of minimizing mechanical strain. If two plates are bolted together, changes in temperature will demand that they both expand or contract, and the resulting strain can lead to mechanical creep. With optics such as grazing incidence mirrors, one must also pay particular attention to the clamping mechanism so that it does not create strain and thus bumps or dips on the surface of a very expensive, exquisitely figured and polished optic. A good solution is to use kinematic mount design principals [Smith 1992] such as balls in grooves. • A design mantra worth repeating is “small, light, and stiff.” Smaller objects undergo less thermal drift, and objects that are small, light, and stiff have higher mechanical resonance frequencies. This is useful for two reasons: the first is that the typical vibration excitation environment shows a rolloff in amplitude at higher frequencies (Fig. 7.13), and the second is that for “white noise” acoustical excitation (noise that has the same power at all frequencies) the amplitude of motion tends to decline with the frequency to the negative third power (because the third derivative of x = x0 sin(ωt) corresponds to impulses, or d3 x/dt3 , and impulses are reflective of the input noise). • Have realistic and sensible goals for what needs to be highly stable, and what does not. One needs nanometer-scale stability of the optic relative to the specimen (image positions), but only micrometer-scale stability of the optic relative to the xray beam. Table-top atomic force microscopes provide great examples of this: the entire microscope might vibrate a bit as a unit when placed on “noisy” tabletops, but the important thing is that the scanning probe tip and the specimen do not vibrate relative to each other. These points are not important to most scientific users of x-ray microscopes, though they certainly can appreciate the outcomes of good instrument design!

7.4

X-ray detectors While the comic character Superman may have x-ray eyes, most of the rest of us do not—and besides, all the x-ray image analysis methods discussed in earlier chapters require quantitative, digital images rather than observations recorded as sketches in a notebook in the manner of van Leeuwenhoek more than three centuries ago. The topic of x-ray detectors is broad, ranging from x-ray detectors for astronomy on rockets and satellites [Fraser 1989] to the very large market of x-ray detectors for medical imaging [Spahn 2013, Panetta 2016], where in the latter case the energy range of interest is from about 20 keV to 100 keV or more. For x-ray microscopes in the 0.2–15 keV range, detectors have often been treated as a bit of an afterthought, and far less money has gone into their development than into the development of accelerators for storage ring light source facilities, or the development of x-ray nanofocusing optics. Fortunately there is enough of a crossover from x-ray detector requirements for macro-

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

293

molecular crystallography and powder diffraction (both of which have significant commercial markets for both academic and industrial applications) that x-ray microscopy has benefitted tremendously from advanced detectors in recent years, and we have hope that this trend will accelerate with the concentrated investments now being made in detectors for x-ray free-electron laser (XFEL) facilities. X-ray detectors work by absorbing x-ray photons, and generating either electrons and holes in semiconductors, electrons from photocathodes, visible light in scintillators, or heat in superconducting calorimeters. We will discuss these in turn, but first let us outline the general characteristics that are valuable in x-ray microscopes: • Detective quantum efficiency (DQE): DQE will be formally defined in Eq. 7.34, but for a photon-counting detector with no dark noise it is in essence the fraction of incident photons that are recorded. • Dead time: the time tdead after arrival of one photon before the detector is ready to record a subsequent photon. This can be due to a shaping time in a pulse detection circuit, or the time required to restore an equilibrium charge distribution in a detector, or the time required to transfer a signal onwards during which photon arrivals cannot be recorded. • Dark noise: the signal that is (incorrectly) reported even though no x-ray photons are incident. This is essentially zero in most photon-counting detectors, while in charge-integrating detectors it can reflect both thermal excitation of electron–hole separations or readout noise in the analog charge measurement electronics. • Dynamic range: the ratio of the maximum to minimum signal value that can be successfully recorded. At the low end, this is affected by dark noise, while it can be limited at the high end by dead time (limiting the maximum flux rate) or by saturation (limiting the maximum signal that can be measured successfully, such as full well capacity for per-pixel charge collection in CCD cameras). • Solid angle: for the detection of radiation that goes into a large angular distribution, such as x-ray fluorescence, an important parameter is the solid angle of detection Ω which is given in Eq. 9.24 as Ω = 2π(1 − cos θ)  πθ2 , where θ is the semi-angle from the center to the outer radius of a circular detector. • Energy resolution: for x-ray fluorescence microscopy (Section 9.2) using energydispersive detectors, as will be discussed in Section 7.4.12, one requires that the detector be able to record the energy of detected photons with some specified energy resolution ΔE. This is discussed further in connection with Eq. 7.30. • Number of pixels: some detectors count all photons that arrive within their active area (this is the case for many energy-dispersive detectors for x-ray fluorescence microscopy, as well as basic detectors in scanning transmission x-ray microscopes). Other applications require area detectors (Section 7.4.4) with spatial resolution, which is most often in the form of a regular array of pixels N x × Ny . • Pixel size: for pixelated area detectors (Section 7.4.4), the size of a pixel Δdet is important for experimental design (see for example Fig. 10.18). Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

294

X-ray microscope instrumentation

• Spatial resolution: in pixelated area detectors, there can be some distribution of signal from one photon into a set of neighboring pixels. In an ideal case, the spatial resolution is much less than the pixel size so that there is no spreading of the signal into neighboring pixels. • Frame rate: for pixelated area detectors, this is the rate at which one can read out entire images. This can either be a burst rate for a limited number of images to be stored in an internal buffer and later transferred, or a sustained frame rate for continuous image transfer. High frame rate is especially important for imaging methods like ptychography (see Section 10.4.1). These form the essential characteristics of x-ray detectors.

7.4.1

Detector statistics In Section 4.8, we discussed the statistics of imaging in terms of an intensity measured with a feature present I f , the intensity measured with a feature absent or the background intensity Ib , and the mean number of incident photons n¯ in the Gaussian approximation to Poisson statistics (Fig. 4.69). That also led to a discussion of false positives and false negatives, and minimum detection limits (Section 4.8.2). We need one other result from statistics to understand detector performance: the statistics of a chain process, where a primary event a causes a second event b before detection, so that one must account for the variance in both processes a and b. This is known as a Markov chain, with an expected signal S¯ of (7.25) S¯ = S¯ a S¯ b and a variance [Breitenberger 1955, Gillespie 1991] of σ2s = S¯ b2 σ2a + S¯ a σ2b .

(7.26)

We outline here some further statistical considerations that are specific to detectors. The first of these involves energy resolution. If one photon leads to the generation of a mean value q¯ of quanta in a detector, then the full-width half maximum (FWHM) of the distribution of the number of quanta detected (essentially the resolution of the √ detector) is given by FWHM = 2.35σ = 2.35/ q¯ as shown in Fig. 4.4. However, this is based on the assumption that the detected quanta q¯ are produced directly by a photon with energy E in a process that takes an energy W to create detectable quanta, or E . (7.27) W If there are some secondary processes that follow (such as electrons produced by x-ray absorption undergoing a number of inelastic scattering events in the detector material), one might arrive at a distribution that should have been based on the statistics of the initial process, but is instead described by the final processes that lead to the detected quanta q. ¯ In order to account for these effects, Ugo Fano introduced [Fano 1947] the use of a parameter F (now called the Fano factor F) which can in principle be calculated from detailed knowledge of the underlying physics but which also allows one to sweep q¯ =

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

295

all of that under the rug of one empirical correction factor! The Fano factor F then becomes [Knoll 2010] F=

observed variance in q¯ Poisson-predicted variance in q¯

(7.28)

so that variance σ2q in the number of quanta is then given by σ2q = F

E . W

(7.29)

With this factor taken into account, the FWHM energy resolution of an x-ray detector is given by  σq F ΔEFWHM = 2.35 = 2.35 (7.30) E q¯ E/W before accounting for any additional degradation in energy resolution due to electronics noise in measuring the charge per photon. Consider the case of a 10 keV photon in silicon. Jumping ahead to employ a result for silicon at room temperature that will be discussed in Section 7.4.5, x-ray absorption produces one electron–hole pair per WSi = 3.65 eV

(7.31)

FSi = 0.118

(7.32)

deposited, with a Fano factor

using recent results [Lowe 2007, Mazziotta 2008]. Therefore, one 10 keV photon produces on average q¯ = (10 × 103 /3.65) = 2740 electron–hole pairs, giving   F 0.118 3 = (10 × 10 eV) · 2.35 = 154 eV (7.33) ΔEFWHM = E · 2.35 E/W 2740 as the FWHM energy resolution of a silicon detector (with the 1σ value being lower by a factor of 2.35) if there are no other noise sources beyond Fano-modified Poisson statistics. The next consideration involves the detective quantum efficiency (DQE). After several earlier explorations of metrics for detector performance [Rose 1946, Jones 1952, Shaw 1963], detector DQE became accepted as the fundamental approach [Jones 1959, Dainty 1974]. For a number n¯ a of incident quanta on the detector, the number of these events that are actually recorded by the detector is called the noise-equivalent quanta (NEQ), leading to a DQE of DQE =

2 NEQ SNRoutput = n¯ a SNR2input

(7.34)

where the second form written in terms of the signal to noise ratio (SNR) is true only for the case where the noise follows a Poisson distribution (Section 4.8.1). The NEQ is reflective of the initial x-ray photon absorption event without consideration for processes that have intrinsic gain in the number of quanta (examples where there is intrinsic gain Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

296

X-ray microscope instrumentation

in the number of quanta include photomultipliers, where a cascading process is used to generate many electrons from one photoabsorption event). Let us examine first a detector that detects a fraction a of a mean value n¯ a of the quanta incident on it. In this case the mean signal is given by S¯ a = a¯na , and the variance is given by σ2a = a¯na in the Gaussian approximation to Poisson statistics. The signal-tonoise ratio SNR on the input signal is √ n¯ a SNRinput = √ = n¯ a , n¯ a

(7.35)

√ a¯na = a¯na , SNRoutput = √ a¯na

(7.36)

while SNRoutput is given by

so the DQE is given by DQEa =

SNR2output SNR2input

√ ( a¯na )2 a¯na = √ 2 = = a. n¯ a ( n¯ a )

(7.37)

The DQE is just given by the fraction a of photons that are detected, as expected. What about the case where there is a cascading gain? Let n¯ a be the number of incident photons as before, and a be the fraction that are absorbed. If an absorption event then leads to the production of n¯ b secondary quanta, and the probability of detecting these secondary quanta is b, then one has a Markov chain as described in Eqs. 7.25 and 7.26. In this case the mean detected signal S¯ is given by S¯ = S¯ a S¯ b = (a¯na ) (b¯nb ).

(7.38)

If we assume that the individual variances are determined by the Gaussian approximation to Poisson statistics, we have σ2a = a¯na and σ2b = b¯nb and a net variance of σ2S = b2 n¯ 2b a¯na + a¯na b¯nb . The SNR is then SNRout =

S¯ =  σs

√ ab¯na n¯ b = √ . 1 + b¯nb ab2 n¯ a n¯ 2b + ab¯na n¯ b ana b¯nb

(7.39)

(7.40)

The DQE then becomes DQEab

√ √ ( ab¯na n¯ b / 1 + b¯nb )2 ab¯nb 1 . = = =a √ 2 1 + b¯nb 1 + 1/(b¯nb ) ( n¯ a )

(7.41)

That is, the DQE is reduced by a factor of 1/[1 + 1/(b¯nb )] relative to the case of a single-stage process (DQEa ). Consider the example of x-ray photons with an energy of, say, 10 keV incident on a scintillator that produces green light with λ = 500 nm or a photon energy (Eq. 3.7) of 1240/500 = 2.48 eV. If the scintillation process were to be 100 percent efficient, one would obtain a mean number of n¯ b = 4032 visible photons. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

297

Even if the scintillation process were 10 percent efficient, and only 10 percent of the visible photons were detected by a visible light detector, one would have 1 1 = = 0.976, 1 + 1/(b¯nb ) 1 + 1/(0.1 · 0.1 · 4032) so in this example the extra reduction in DQE caused by the secondary statistics is very small. Now let us consider the case [Feser 2002] of a one-stage detector with DQEa = a as above, but with “dark noise,” or a signal that is recorded even with no incident x-ray photons. We assume that the mean dark noise N¯ d is subtracted from the signal via “background subtraction,” but that there are fluctuations in the dark noise that are uncorrelated with the signal and that can be characterized by a variance σ2d . The SNRd of this detector with dark noise is √ ana a¯na SNRd =  √ =  , (7.42) √ ( a¯na )2 + σ2d 1 + σ2d / a¯na which along with Eq. 7.35 means that the DQE becomes DQEd =

(SNRd )2 a . √ 2 = √ 2 ( n¯ a ) 1 + σd / a¯na

(7.43)

Therefore the DQE is affected primarily at low detected count values when a¯na < σ2d . A more specific example of a type of dark noise is the equivalent noise charge (ENC) or qENC of charge integrating detectors (which can in fact vary with incident flux rate). The effect of qENC on DQE is given in Eq. 7.56.

7.4.2

Detector statistics: dead time Many detectors have a “dead time” tdead after the arrival of one photon, during which they cannot successfully record the arrival of another photon. For example, in a silicon detector an x-ray absorption event separates a certain number of electrons and “holes” (see Section 7.4.5 below) from each other. Inelastic scattering of these charges in the detector might lead to different travel times of individual charges to the detection circuit, so a pulse-shaping circuit is used to integrate over the maximum straggling time and give the same peak height for each x-ray photon of the same energy. However, if another x-ray photon arrives during this time, the pulse-shaping circuit will interpret the combination as being as a single photon (within the same pulse-shaping time) but with higher energy (more charge received during that time). As a result, the second x-ray photon will be “missed” while the first photon also has its energy over-estimated (which is not necessarily a problem if a single pulse energy threshold is used). This is the case of a “nonparalyzable” detector [Knoll 2010]. To calculate the effects of this loss, let us consider photons arriving at an actual average rate f0 while the detector records a lower, incorrect rate of f  . For each single detected photon there is a dead time of tdead , so the fraction of time that the detector is in a “dead” state is given by f  tdead so that the

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

298

X-ray microscope instrumentation

5 ƫV

Measured count rate (MHz)





4

t dead ƫV  

e

ns

o sp

3

t dead

e rr

a

ne

Li

ƫV

t dead



2 tdead ƫV 1

le

Nonparalyzab

tdead ƫV

Paralyzab

le

0 0

1

2

3

4

5

Actual count rate (MHz) Figure 7.14 Measured detector count rate f  as a function of actual (incident) count rate f0 for various values of detector dead time tdead . This is shown both for a nonparalyzable model (Eq. 7.46) as solid lines, and for a paralyzable model (Eq. 7.48) as dashed lines.

number of photons “missed” is given by f0 f  tdead . In other words, we have [Knoll 2010] f0 − f  = f0 f  tdead .

(7.44)

One can use this to find either the actual count rate f0 from the measured count rate f  as f , (7.45) f0 = 1 − f  tdead or one can calculate the expected measurement rate f  from the actual count rate f0 as f =

f0 , 1 + f0 tdead

(7.46)

which asymptotically approaches a limit of lim f  =

f0 →∞

1 tdead

(7.47)

at very high count rates. In the case where the detector becomes paralyzed for a time that varies according to the signal level, one instead obtains a result [Knoll 2010] of f  = f0 exp[− f0 tdead ].

(7.48)

Examples of measured f  versus actual f0 count rates from Eq. 7.46, as well as from Eq. 7.48, are shown for several dead times tdead in Fig. 7.14. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

299

In the preceding paragraph, an implicit assumption was that the actual photon rate was continuous over time, whereas in Fig. 7.1 and Table 7.1 we saw that synchrotron light sources (among others) in fact have rather small duty cycles dt . While the general case is well studied [Westcott 1948, Cormack 1962], an approximate treatment [Knoll 2010] is to say that the repetition time tr gives an approximate dead time of tdead,source  tr /2. At least for synchrotron sources and most pulse-counting x-ray detectors, we have tr  tdead , so the results of Eqs. 7.44–7.47 are very good approximations of the correct answer though more complete studies are available [Sobott 2013]. We can also think of detector dead time tdead as providing a reduction in the DQE [Feser 2002]. As before, the fraction of time that the detector is in a “dead” state is given by f  tdead , so if we consider a counting interval tdwell (such as a pixel dwell time in a scanning microscope, giving f  = n¯  /tdwell ), then the mean number of counted photons n¯  relative to the mean number of incident photons n¯ 0 during that interval is given by n¯  tdead ), (7.49) n¯  = n¯ 0 (1 − f  tdead ) = n¯ 0 (1 − tdwell from which we can find n¯  =

n¯ 0 . 1 + n¯ 0 tdead /tdwell

(7.50)

If, as before, we also account for the detector only recording a fraction a of the photons incident upon it during the non-dead time intervals, the output signal is a¯n . We can then solve for the SNR as √ a¯n = a¯n , (7.51) SNRout = √  a¯n leading to a DQE of DQEdeadtime =

(SNRout )2 a¯n a = = . n¯ 0 1 + n¯ 0 tdead /tdwell (SNRin )2

(7.52)

That is, the DQE begins to be reduced if n¯ 0 tdead approaches the pixel dwell time tdwell .

7.4.3

Detector statistics: charge integration Another approach to signal collection in x-ray detectors is to use a charge-integration circuit rather than a pulse-counting circuit. As can be seen from the above considerations of detector dead time tdead , this is especially advantageous if one has incident x-ray flux rates f0 that approach a reasonable fraction of 1/tdead . It is also important if one begins to have an appreciable chance (in a Poisson statistics sense) of collecting more than one photon per “on” time t0 in a pulsed source (Fig. 7.1), which is likely to be the case with scanning microscopes on diffraction-limited storage rings [Denes 2014]. The challenge is that detectors and their charge-integration circuits have their own sort of dark noise which is referred to as the equivalent noise charge (ENC) expressed in number of electrons as qn . We start with the initial photon absorption signal S a = a¯na with variance σ2a = a¯na , and this time include [Chen 2002] a subsequent step of conversion into

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

300

X-ray microscope instrumentation

100 Photon counting tdead=200 ns 80

DQE (%)

60

a=0.95 in all cases

Charge integrating tdwell=1.00 ms, qENC=50 e-, E=300 eV

40 Charge integrating tdwell=0.01 ms, qENC=50 e-, E=300 eV

20

0 100

101

102

103

104

105

106

107

108

109

Incident flux (photons/sec) Figure 7.15 Detective quantum efficiency for photon-counting and charge-integrating detectors

as a function of incident flux. For a pixel array detector, the flux represents the flux incident upon one pixel. The photon counting detector is assumed to be a nonparalyzable detector with tdead = 200 ns, with a DQE calculated using Eq. 7.52. The charge-integrating detector is assumed to be a silicon detector with qENC = 50 electrons, with a DQE calculated using Eq. 7.56. The values of tdead = 200 ns and qENC = 50 electrons are roughly consistent with the Eiger and Jungfrau detectors, respectively, developed at the Paul Scherrer Institut in Switzerland. Note that with a pixel dwell time tdwell = 1.00 ms (representative of some scanning microscopes today) the detector receives 10 photons per pixel at a flux of 10 kHz, while with a pixel dwell time of tdwell = 0.01 ms the detector receives 10 photons per pixel at a photon flux of 100 kHz. This explains the flux values below which the DQE of charge-integrating detectors becomes quite low so that qENC dominates. This plot is for a photon energy of E = 300 eV, for which one photon generates q¯ = E/Wsi = 82 electron–hole pairs (Eqs. 7.27 and 7.31), which is only a little bit larger than qENC ; 10 keV photons generate a mean charge of q¯ = 2740 so the charge-integrating DQE for low photon flux would improve considerably. For photon counting, the detector dead time tdead dictates the flux above which the DQE starts to drop, as was shown in Fig. 7.14. Figure inspired by one shown by Michael Feser [Feser 2002].

detected quanta with q¯ = E/W (Eq. 7.27) and a variance (Eq. 7.29) of σ2q = FE/W. We add to the net variance in the Markov chain process (Eq. 7.26) an extra term of σ2ENC = q2ENC ,

(7.53)

which we assume is uncorrelated with the photon events or their subsequent conversion (ENC is, after all, a fluctuation in a dark signal). The net variance is then σ2S =

E2 E a¯na + a¯na F + q2ENC W W2

(7.54)

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

301

so the SNR is SNRoutput

√ a¯na S a¯na E/W = =  = σS E2 E W W 2 q2ENC a¯na + a¯na F + q2ENC + 2 1 + F 2 W W E E a¯na

(7.55)

Since the input SNR is just n¯ a , the DQE is then DQEintegrating =

(SNRoutput )2 = (SNRinput )2

a W W 2 q2 1 + F + 2 ENC E E a¯na

,

(7.56)

which differs from a previous result [Feser 2002] by application of Markov chain statistics. In the DQE expression of Eq. 7.56, the term a in the numerator is the same noise equivalent quanta (NEQ) term as appeared in the basic detector DQE expression of Eq. 7.37. For detection of n¯ a = 50 photons at 500 eV with a detector with a = 0.9, and using values for W and F from Eqs. 7.31 and 7.32, a detector readout circuit with qENC = 100 electrons leads to a term in the denominator of Eq. 7.56 of 1+F

3.65 3.652 1002 W W 2 q2ENC + 2 + = 1 + (0.118) E 500 E a¯na 5002 0.9 · 50 = 1 + 0.000 86 + 0.011 84 = 1.012 70

whereas if the photon energy is increased to 5 keV the denominator becomes 1 + 0.000 09 + 0.000 20. This means that the DQE is almost not reduced at all compared to an ideal detector, where DQE = a as was found in Eq. 7.37, provided qENC is made small enough. Another potential challenge with charge integration involves dynamic range. Counting photons demands that digital counters (with especially simple circuitry) have more bits to count more photons per frame time. In charge integration, one might face limits such as the full well capacity of CCD detectors as noted in Section 7.4.5, or the equivalent in charge-integrating pixel array detectors. If, however, charge is collected on a capacitor C to yield a voltage V = q/C as it is integrated, one can trigger a reset circuit when Vreset = qreset /C is reached, at which point one quickly drains the capacitor, and then lets it continue to integrate further charge (possibly with further reset events, which can be digitally counted as Nreset ) until the end of the measurement. If the voltage Vanalog = qanalog /C is then measured, one can can calculate the total charge accumulated as qtotal = CVanalog + Nreset · CVreset = qanalog + Nreset · qreset

(7.57)

and if photons at a fixed energy are detected one can then calculate their number. In this way one has the potential to combine single photon sensitivity with high dynamic range. This is the approach used in a series of detectors developed at Cornell University [Ercan 2006, Weiss 2017], while a group at the Paul Scherrer Institut has developed an approach where successive capacitors are “switched in” to increase dynamic range [Bergamaschi 2011]. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

302

X-ray microscope instrumentation

We have presented above the essential ingredients for understanding the DQE of a variety of x-ray detector systems. For detectors exposed to monochromatic illumination, the main choice one faces is whether to operate in pulse-counting mode or in chargeintegrating mode. The answer depends on the flux one must detect, and the per-pixel dwell time in the case of a scanning microscope, as shown in Fig. 7.15.

7.4.4

Pixelated area detectors Full-field microscopy (Section 6.3) and ptychography (Section 10.4) are imaging methods that require pixelated area detectors. Area detectors are characterized by their number of pixels N x and Ny , and the detector pixel size Δdet . In full-field microscopes, image magnifications of about 1000 are common so it is preferable to have pixel sizes of Δdet  5–30 μm in order to obtain nanometer pixel sizes in the image. Therefore either scintillator detectors (Section 7.4.7) or CCD detectors (Section 7.4.5) tend to be preferred for full-field microscopy. It is also advantageous to have pixel counts N x × Ny of at least 1024×1024, so as to obtain a large field of view, with 2048×2048 being especially popular (N = 2n is well matched to fast Fourier transforms (FFTs) as discussed in Section 4.3.3). In methods such as ptychography (Section 10.4), the detector pixel size Δdet is less crucial because the detector is placed in the Fraunhofer or far-field regime (Section 4.3.2), so the main disadvantage of larger pixel sizes is the need for a longer flight tube leading from the specimen to the detector (this is either evacuated or filled with helium gas). This makes it easier to use pixel array detectors (Section 7.4.5) with their larger pixel size of Δdet  50–200 μm. At the other extreme of small pixel size, photocathodes with electron optics and electronic readout have been used to make area detectors with sub-micrometer Δdet [Polack 1980, Polack 1983, Kinoshita 1992, Shinohara 1996]. Fewer researchers have taken up this approach in recent years, perhaps due to concerns about DQE, linearity of response, and field distortions. In scanning x-ray microscopes (see Section 6.4), spatial resolution is provided by the focused x-ray beam, and picture elements or “pixels” are provided by the set of beam positions on the specimen. Therefore the detector need not have any spatial resolution, allowing the use of detectors such as gas flow proportional counters with very low noise for soft x-ray detection at low flux levels [Feser 1998], or avalanche photodiodes for very fast response time for pump–probe experiments [Stoll 2004, Puzic 2010]. At the same time, controlling the sensitive area of the detector can be important, as discussed in Section 4.5.1, for the detector area plays the same role as the condenser aperture does in full-field imaging. One also needs to control the detector area for dark field microscopy (Section 4.6, and in particular Fig. 4.59). Some degree of detector segmentation or even full pixelation is needed for techniques such as phase contrast, as discussed in Section 4.7, while ptychography (Section 10.4) requires an array detector. Pixelation can also be valuable in fluorescence microscopy, both for increasing the overall count rate of an energy-dispersive detector as well as for advanced data interpretation approaches [Ryan 2010]. One important negative characteristic of a pixelated area detector involves charge sharing. If a photon arrives near the boundary between two pixels, the charge can spread

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

303

into adjacent pixels (see Fig. 11.4 for an example in plastic; the charge spreading is smaller in the higher density of silicon). In a photon-counting detector, this could lead to none of the pixels having enough charge deposited to reach the pulse threshold needed to record the photon event. In a charge-integrating detector, one might mistakenly record this as the arrival of more than one photon (each with lower energy than that of the incident photon) in multiple adjacent pixels. With monochromatic illumination and single photons arriving per integration time, the charge sum adds up to that produced by one photon, and the center of the charge distribution shows the arrival position at sub-pixel resolution (“droplet analysis” provides this information [Livet 2000]). If the pixels are large so that few events are affected by charge sharing, one can use charge integration to measure both the position (pixel location) and energy (Eq. 7.30) of each photon. This requires low readout noise and a photon flux per pixel well below the frame rate so as to avoid pile-up, so in the past it has been applicable mainly to x-ray astronomy [Janesick 2001]. However, as frame rates have increased, one can start to use chargeintegrating detectors in synchrotron experiments either for sub-pixel spatial resolution or for per-pixel energy resolution [Soman 2013, Cartier 2016, Ihle 2017]. One way to reduce charge-sharing in a multi-element or pixelated area detector is to overlay on the detector a collimator grid to block sensitivity near pixel boundaries. This comes at the cost of area-averaged DQE. Frame rates for area detectors are another important consideration for applications such as tomography or spectromicroscopy in full-field imaging, or for ptychography. Frame rates for CCD detectors are usually in the 20–400 Hz range as limited by the rate of clocking charge through the CCD pixels to a small number of analog-to-digital converter circuits. Pixel array detectors and CMOS detectors have pulse-counting or charge-integrating electronics at each pixel, so that frame rates of 500–5000 Hz are now commercially available with faster frame rates on the horizon. Visible-light CMOS detectors (which can be coupled with scintillators) can reach sustained frame rates of 30 kHz or more [Mokso 2017], especially on smaller regions of the detector array. These cameras are enabling high-speed tomographic imaging of dynamic processes, with capabilities summarized in Fig. 8.10. The area detectors above all offer electronic readout, which is strongly preferred in today’s digitized world. Early x-ray microscopes used photographic film or nuclear emulsions with a pixel size Δdet of about a micrometer or less [Niemann 1980], and contact microscopy (Section 6.1) and inline holography (Section 10.2) have used photoresists with an effective spatial resolution of about 50 nm. For film, nuclear emulsions, and photoresists, there is no particular limit on “pixel” count except for the field of view of subsequent image enlargement and/or digitization devices.

7.4.5

Semiconductor detectors Semiconductor-based detectors are the most commonly employed detectors in x-ray microscopes today. This is particularly true for silicon detectors, which have the virtue of benefitting from the worldwide digital and analog circuit industries. This makes it rela-

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

304

X-ray microscope instrumentation

tively inexpensive and easy to obtain ultrapure, single-crystal silicon wafers for sensors, and low-noise, high-speed electronics for signal processing and readout. The literature on semiconductor detectors is voluminous, with many excellent books [Spieler 2005, Knoll 2010, Lowe 2014] outlining the key concepts and a plethora of research papers describing the most recent developments. As a result, we only make a few comments of particular relevance to the developers and users of x-ray microscopes: • For transistors and diodes, dopants are added at low concentration to silicon to provide donor energy levels just below the conduction band in negatively doped or n-type semiconductors (where phosphorous is frequently chosen as the dopant), or acceptor energy levels just above the valence band in positively doped or p-type semiconductors (where boron is a typical dopant). The energy gap Eg between these levels is only about Eg = 1.12 eV in silicon and Eg = 0.67 eV in germanium at room temperature, so with silicon one can promote an electron from the valence band to a donor level in n-type silicon using a wavelength shorter than about λ = hc/Eg (Eq. 3.7) or λ  1100 nm. This means that most charge integrating and electrical-current-mode silicon detectors must be shielded against seeing visible light, and furthermore with germanium detectors one can have unacceptably high dark current if the sensor is not kept at low temperature and shielded from infrared light. A film of about 100 nm of aluminum can do the trick for visible-light shielding, with perhaps 10 nm of chromium laying underneath to “wet” the surface and thereby help prevent pinhole defects. • When one moves into the regime of ionizing photons ( 50 eV) with appreciable radiation momentum, conservation of momentum in the Si lattice changes the process to one involving the creation of electron–hole pairs, where a “hole” is a positively charged pseudo-particle representing a “missing” electron (the “hole” can travel through the silicon lattice as one electron hops into a hole to one side, leaving a hole on its other side). The transport of both electrons and holes involves coupling to Raman modes in the semiconductor lattice, which are the highest energy collective vibrational modes of nuclear displacements from equilibrium positions (phonon modes). This leads to a simple estimate [Shockley 1961] of the threshold energy for creating an electron–hole pair of W  2.2Eg + rEr ,

(7.58)

where Eg is the energy gap noted above, r is the mean number of inelastic collisions of an electron with Raman modes (Shockley estimated r  17.5 for Si), and Er is the energy of a Raman mode (Er = 0.063 eV for Si). The factor of 2.2 in Eq. 7.58 includes the fact that one must excite both an electron and a hole (thus, in a way, explaining the factor of 2), and the parabolic shape of the energy surfaces in the Brillouin zone. Shockley employed some fortuitous guesswork to arrive at Eq. 7.58, which gives a fairly good estimate of WSi (Eq. 7.31) of about 3.5 eV. More exact predictions involve Monte Carlo modeling of electron, hole, and phonon interactions [Fraser 1994], and these calculations are in strong agreeDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors









305

ment with the observed value of WSi = 3.65 eV (Eq. 7.31), as well as the Fano factor of FSi = 0.118 (Eq. 7.32). Electrons have about three times the mobility that holes have (1350 V/cm/s2 versus 450 V/cm/s2 in Si). Therefore it takes an average electron a time of about 20 ns to traverse a 300 μm thick silicon sensor layer with a bias voltage of 30 V [Spieler 2005]. In order to collect all electrons in one pulse, charge amplifiers make use of a pulse-shaping time of tens of nanoseconds. This leads to the detector dead time tdead and its consequences, as discussed in Eqs. 7.44–7.52. The persistence of electron–hole separation is temperature dependent, as is the buildup of a dark signal. Therefore it is advantageous to cool silicon detectors to temperatures of about 240 K if they are used for integration times of tens of seconds or longer. At colder temperatures, one begins to interfere with the Fermi–Dirac distribution function (Eq. 3.19), which is the mechanism providing the necessary quiescent current in semiconductor junctions. One can arrange for signal gain via an “avalanche” effect based on a bias voltage near but just below a breakdown voltage in avalanche photodiodes (APDs) which produce a linear signal amplification effect, or a bias just above the bias voltage in singe-photon avalanche diodes (SPADs) [Cova 2004]. These detectors can deliver very fast response for pump–probe experiments of magnetic spins [Stoll 2004, Puzic 2010]. Silicon sensors that are 200–500 μm thick (limited by electron and hole transport distances) work well for x-ray energies up to about 15 keV. At higher energies, sensors made of higher-density materials such as Ge, GaAs, or CdTe sensors become advantageous even though these materials are less readily available in high-purity single-crystal forms.

Finally, as hard materials with controllable electrical conductivity, silicon sensors are relatively robust against radiation damage (Section 11.4) and one can sometimes “free” radiation-induced trapped charge distributions by periodically heating up or annealing a silicon detector chip [Doering 2011]. Of course this should only be attempted after consultation with the manufacturer of the detector chip! Integrated circuits in silicon are not always so robust against radiation damage, in particular because they employ thin oxide layers as electrical insulators and these layers are known to be more susceptible. Among industry-standard processes for making complementary metal-oxide semiconductor circuitry, some are known to be more radiation resistant than others and not all of these processes are compatible with highperformance silicon sensor fabrication, as will be noted below. Therefore the process choice for fabricating application-specific integrated circuits (ASICs) for detector signal processing should take radiation damage into account, at least if the circuitry layer is subject to x-ray dose (this is less of a concern in pixel array detectors with separate sensors and ASICS, for reasons discussed below). Pixelation and readout in direct exposure semiconductor-based area detectors is accomplished in several ways: • In charge-coupled devices (CCDs), bias voltages supplied to an array of electrodes Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

306

X-ray microscope instrumentation

create something like “charge buckets” which contain photon-generated charge within each pixel during data collection [Janesick 2001]. These bias voltages are then manipulated to “clock out” the charge from pixel to pixel (like water being tipped from one bucket to the next), finally reaching a charge-sensitive preamplifier (fabricated as part of the CCD). There can be one pre-amplifier per CCD, in which case all pixels have to be clocked through that one amplifier, or multiple amplifiers [Denes 2009] with the ultimate limit being the location of an amplifier on each side of each row. The sequence of per-pixel charges can then be transferred to an off-CCD-chip digitizing circuit. This means that photons that arrive during the charge transfer process can be incorrectly interpreted as if arriving at a different position, so CCDs are often used with a shutter that can be closed during readout. (An alternative is to make a frame store CCD where one uses a 2N x × Ny pixel array, shields the outer N x /2 pixels on either side from seeing light, and first quickly clocks charge out from each of the center N x /2 regions of the array into their respective outer N x /2 regions; one can then allow the center regions to again see X rays while the outer regions are clocked out to the pre-amplifiers at a more leisurely rate [Denes 2009]). The fact that there are no per-pixel electronics means that the CCD pixel size Δdet can be reasonably small, such as 5–30 μm; however, larger pixel sizes offer greater “full well capacity” (height of the bucket walls) before one reaches saturation and charge starts to leak over the voltage barrier into adjoining pixels. Full well capacities approach one million or so electrons per pixel, which for 10 keV X rays creating (10, 000/3.65) = 2740 electron–hole pairs per photon means that only about 365 photons can be recorded per pixel per integration time. The CCD chip can be thinned to expose the backside (or noncircuit-containing side) to radiation provided the exposed layer is “passivated” to remove dangling silicon bonds (typically done using ion implantation and annealing), and these backside-thinned CCDs give high DQE as well as increased radiation damage resistance, since their oxide layers are restricted to the unexposed, circuit side of the chip. (A variant involves p–n junctions [Meidinger 2006].) The comments on charge sharing, droplet analysis, and energy resolution given for charge-integrating detectors in Section 7.4.4 all apply to direct x-ray detection with CCDs. Visible-light CCDs can also be used with scintillators [Gruner 2002] for x-ray area detectors, using either fiber-optic bundles or lens imaging to transfer the signal to the CCD. Finally, when only single photons are recorded per pixel per integration time, one can use the droplet analysis and energy resolution capabilities noted for charge-integrating area detectors in Section 7.4.4. The status of x-ray CCD capabilities is presented in a recent book chapter [Str¨uder 2016]. • In pixel array detectors (PADs) one separates the sensor and the readout electronics into two separate chips: a sensor chip (Section 7.4.6), and an ASIC chip. This allows one to use a separately optimized process for fabricating each chip (such as high resistivity for the sensor), and in particular the readout electronics chip (an ASIC) can be fabricated using standard integrated circuit fabrication processes and facilities. The two chips are usually connected by a “bumpbond” process in which matching metal pads are fabricated on the bottom of the Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

307

sensor and the top of the ASIC, small blobs of a ductile metal such as indium are placed on the metal pads on one chip, and then the two chips are mechanically pressed together to electrically connect each sensor pad to its respective ASIC pad (this process is also used for some flip-chips in commercial electronics). This bump bonding process usually limits the pixel size to the Δdet = 50– 200 μm range (and leaves each sensor pixel with relatively high capacitance), though 25 μm has been demonstrated in one x-ray PAD [Ramilli 2017]. Because most of the X rays are stopped in the sensor chip, the ASIC receives less x-ray dose, which allows a broader array of ASIC processes to be considered. Of particular importance has been a series of x-ray pixel array detectors developed at the Paul Scherrer Institut (PSI) in Switzerland, starting with the Pilatus detector with per-pixel photon counting [Broennimann 2006], followed by the Jungfrau detector with charge-integrating electronics [Jungmann-Smith 2016], and more recently the AGIPD detector with analog storage to allow up to 352 frames to be recorded at a burst frame rate of 5 MHz before chip readout [Henrich 2011]. Work at PSI also led to the establishment of the company Dectris in Switzerland, which supplies commercial versions of several of these detectors. As noted in the discussion of high dynamic range and charge integration (Eq. 7.57), Cornell University has developed a series of charge-integrating PADS, including one developed with the former USA company ADSC [Angello 2004] and one that later on became the basis for the CSPAD detector at the free-electron laser at SLAC [Philipp 2011, Herrmann 2013]. The Medipix series of ASICs developed at CERN [Campbell 1998, Ballabriga 2007, Llopart 2002] have also served as the basis for several research and commercialized x-ray PADs. With per-pixel electronics, one can use either pulse-counting or charge-integrating readout schemes (or even a hybrid of the two [Weiss 2017]) and have a detector optimized for the expected flux rate (see Fig. 7.15). In addition, with charge-integrating PADs, the droplet analysis and energy resolution capabilities described in Section 7.4.4 become an option. Recent book chapters provide greater detail on pulse-counting [Br¨onnimann 2016] and charge-integrating [Graafsma 2016] PADs. • In CMOS detectors the sensor and CMOS per-pixel readout electronics are fabricated together in one monolithic device. Because no bump-bonding is required, the pixel size Δdet can be smaller than with PADs. However, most standard CMOS processes are not compatible with high-quality x-ray sensors (see Section 7.4.6) and also show sensitivity to radiation damage in the oxide layers, as noted in Section 11.4. If one uses front-side illumination (illumination on the same side as the circuitry), one can lose some of the signal due to absorption in the circuitry, leading to a limited fill factor (along with increasing risk of radiation damage to the circuitry). The approach of the PERCIVAL detector developed by a collaboration led by DESY in Germany [Wunderer 2015] is to use backside thinning as has been done for CCDs, with Δdet = 27 μm pixel size and frame rates up to 120 Hz. Another approach to overcome these limitations is to use silicon-oninsulator (SOI) technology, so that one can separate the high-resistivity sensor layer from the electronics readout layer. This approach has been used for both Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

308

X-ray microscope instrumentation

charge-integrating detectors with Δdet = 17 μm pixel size [Nishimura 2016] and photon-counting detectors with Δdet = 60 μm pixel size [Arai 2011], with use in application experiments planned for the future. Another approach is to have an epitaxially grown layer surrounded by two layers of p-doped silicon; in this case one can have a response that depends on which layer an x-ray photon was absorbed in, in which case one can use droplet analysis for charge sharing amongst pixels, as well as per-photon energy resolution analysis, for single-photon-perpixel exposures as discussed in Section 7.4.4. This has led to the development of charge-integrating detectors with Δdet = 12 and 25 μm pixel size [Doering 2016], which again are planned to be tested in application experiments in the future. Apart from CMOS sensors for direct x-ray detection, what is far more common is the use of visible-light CMOS detectors together with scintillators ,as will be discussed in Section 7.4.7. These three detector types have been compared against each other for x-ray crystallography, where PADs were found to be superior [All´e 2016]. There are additional semiconductor x-ray detector types beyond CCDs, PADs, and CMOS detectors. One example involves the use of thin-film transistors and amorphous selenium films [Parsafar 2015], but these are detectors with performance optimized for medical x-ray imaging at 50–150 keV photon energies, and are less well suited for x-ray microscopy.

7.4.6

Sensor chips for direct x-ray conversion As noted above, in PADs one can separate the function of x-ray absorption and charge generation into a sensor chip, which is then coupled with a separate ASIC chip for signal readout using methods such as bump-bonding. The sensor chip can then have its fabrication properties such as resistivity optimized for its role, without concern for the material properties and processing steps needed for the ASIC readout chip. If the sensor is thick enough, it can also greatly reduce the x-ray dose received by the ASIC chip. Most sensors for PADs are made out of high-resistivity silicon, in thicknesses ranging up to about 1 mm. For higher-energy X rays, the low density of silicon can mean that not all photons are absorbed in the sensor chip, reducing DQE and increasing the dose on the ASIC chip. For energies well above 10 keV, one can instead use sensors made of higher-density materials such as germanium which has been coupled with the Medipix3 ASIC in the LAMBDA detector [Pennicard 2014], or CdTe which has been used with Medipix2 ASICs [Aamir 2011] and which is available as a sensor material on several commercial detectors.

7.4.7

Scintillator detectors: visible-light conversion We use the term “scintillator detector” to refer to the coupling of a luminescent material (which converts x-ray absorption into visible-light emission) with a visible-light detector.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

309

Luminescent materials absorb ionizing radiation and release the energy as visible light. The generic term “luminescence” refers to the case in which the absorption event does not lead to a blackbody emission spectrum (Eq. 7.5) [Garlick 1958]. Luminescence therefore includes both fluorescence (which is a rapid, direct electron transition process permitted by a spin-allowed transition with Δ j = 0; see Eqs. 3.16 and 3.17), and also the spin-disallowed process of phosphorescence [Blasse 1994, Appendix 3], which occurs at times longer than 1 ns after the absorption of the ionizing radiation trigger. Luminescent materials in powder form tend to be called phosphors, while single crystals tend to be called scintillators [Blasse 1994, Chapter 8], though there is not a high degree of uniformity of terminology in the literature (for example, structured scintillators [Olsen 2009] do not use single crystal materials). Because most (but not all!) x-ray microscopy work employs single-crystal luminescent materials, x-ray microscopists tend to simply speak of scintillator detectors for all cases.2 After ionizing radiation has been absorbed, there can be a competition between nonradiative return to the material’s ground state by phonon modes (heat), versus the transfer of at least some energy to one or more luminescent centers (also called activators). As an example, the gemstone ruby is made of crystalline aluminum oxide with chromium atoms replacing some (typically about 1 percent) of the aluminum atoms (written as Al2 O3 :Cr). These dopant atoms form Cr3+ ions in the aluminum oxide lattice with orbital distortions due to the lattice, so that excitations of the aluminum oxide lattice can stimulate transitions at wavelengths that are different from transitions in Cr alone. Therefore while pure aluminum oxide is colorless, the Cr dopant in ruby gives rise to emission at the ruby red wavelength of λ = 694 nm. One can use different activators to obtain emission at different wavelengths so as to match the peak efficiency of a visiblelight photodetector or camera, and other materials to reduce effects such as afterglow, which is caused by electrons or holes becoming trapped at defects or contaminants in the material, with slow, thermally excited release long after the exciting ionization event. X-ray scintillators used in x-ray microscopy include CsI:Tl, and Lu3 Al5 O12 :Eu (lutetium aluminum garnet, or LuAg or LAG, doped with europium as one example activator) crystals grown by liquid phase epitaxy, with other materials used in certain cases as well [Martin 2006]. One tabulation of the properties of various scintillators, including luminosity (photons per MeV of energy deposited), decay time, and visible-light emission peak is available from an internet search of scintillator.lbl.gov. While phosphors and scintillators are sometimes used as visible-light converters for single area detectors in scanning transmission x-ray microscopes [Kilcoyne 2003], luminescence allows one to make pixelated area detectors. One approach for soft x-ray detection is to simply coat the surface of a visible-light CCD with a thin phosphor [McNulty 1992] (with P41 having especially favorable properties amongst soft x-ray phosphors [Yang 1987b]). For hard x-ray applications, thicker luminescent materials are required in order to stop an appreciable fraction of the x-ray beam, but one must then consider transfer of the visible-light intensity pattern through the thickness of the material and onto the visible light camera. One of the first approaches was to create 2

After all, being sloppy about the terminology gives one a chance to see a luminescence specialist cringe!

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

310

X-ray microscope instrumentation

a columnar structure that would confine visible light laterally, and fill it into an extended depth with a luminescent material. This was first done with ∼10 μm column pitch [Duchenois 1985, Bigler 1985, Thacker 2009] and then at just below 1 μm pitch [Deckman 1989, Olsen 2009]. At ∼10 μm resolution, another approach is to coat a luminescent material onto a tapered fiber-optic bundle to transfer light to the camera [Gruner 2002]. For x-ray microscopists, the most common approach for realizing micrometer-scale spatial resolution pixelated area detectors is to use a luminescent screen followed by visible-light microscope objectives to project a magnified image onto a CCD or CMOS detector. (A 45◦ mirror is often used in the optical path so that the visible-light camera does not get damaged by exposure to the direct x-ray beam.) This was done first with structured scintillators [Flannery 1987, Deckman 1989], but today most researchers prefer the uniformity of response of high-quality single-crystal scintillators. Because there is a large commercial market for sensitive and high-speed pixelated area detectors for visible light, one can obtain burst frame rates of hundreds of kHz using on-camera frame storage [Fezzaa 2008], or sustained frame rates of tens of kHz using specialized camera interfaces [Mokso 2017]. Finally, because many 2–3 eV visible-light photons are generated from each X ray, the DQE of the two-stage process is dominated by the efficiency of absorption in the luminescent material (Eq. 7.41) even though the efficiency for collecting the visible-light photons might be low. When using single-crystal scintillators with microscope objective optics, thin scintillators are preferred so that all of the visible light is generated within the depth of field (DOF; Eq. 4.215) of the microscope objective, while thick scintillators are preferred to stop a large fraction of the X rays. This trade-off is shown in Fig. 7.16, where the absorption of LuAG is shown along with spatial resolution based on a simplifying formula [Koch 1998] of

 2 % & 0.70 μm + (0.28 μm) · t · N.A. 2 (7.59) Δdet  N.A. where t is the thickness of the scintillator. The first term is the diffraction limit to resolution, the second term is due to DOF, and the numerical coefficients are for capturing 90 percent of the signal within the line spread function of a microscope objective with a given N.A.. As this figure shows, one has to decide whether to emphasize high DQE or high spatial resolution at harder x-ray energies. Scintillator-based detectors tend to have relatively high dark noise from stray light scattering, and low dynamic range. However, they offer a spatial resolution in the submicrometer range, which is one to two orders of magnitude smaller than what can be achieved with CMOS or PAD area detectors.

7.4.8

Gas-based detectors Gas-based detectors have a long history in x-ray science, dating from R¨ontgen’s experiments that showed that X rays cause ionization in gases. In x-ray microscopy they were the first type of detector used for scanning transmission x-ray microscopes (Fig. 2.4). They are less in favor in today’s semiconductor era, but they still offer some very nice

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

311

Photon energy (keV) 2

3

3

10

20

30

14

so Ab

12

100

5HVROXWLRQ ѥP

8

6

80

60

t ѥP 40

Absorption (%)

n

tio

rp

10

4

t ѥP

Re 2 0 0.0

so

t ѥP

lu

tio

n

20 t ѥP

t ѥP 0.5

1.0

0 1.5

Numerical aperture (N.A.) Figure 7.16 Spatial resolution and x-ray absorption in Lu3 Al5 O12 (LAG or LuAG) scintillators

of 2, 5, 10, 20, and 50 μm thickness. The spatial resolution versus relay optic N.A. (left and bottom axes, narrow plot lines; see Eq. 7.59) shows that thin scintillators are preferred so that higher resolution and more light collection can be obtained with high N.A. optics. Absorption versus x-ray energy (right and top axes, broader plot lines) shows that thick scintillators are preferred so as to stop a larger fraction of the x-ray beam at multi-keV x-ray energies, thus improving DQE. Therefore one has to make trade-offs between resolution and DQE for a given experiment.

properties: a dark noise of essentially zero due to a large activation energy W (Eq. 7.27) for creating ion–electron pairs (W = 33.9 eV for air, 24.8 eV for nitrogen, 25.5 eV for argon, and 26.8 eV for methane [Weiss 1955, Wolff 1974]), and near 100 percent DQE for detection of photons that are absorbed in the gas. With a low electric field applied across a gas, one can use the measured ionization current created in a volume sealed using Kapton or polyimide chamber windows as a monitor of the x-ray flux transmitted through the ionization chamber. If one instead uses a wire at a bias voltage relative to the chamber wall to produce a large electric field near the wire’s small radius, one can begin to get a gas multiplication effect. The initial ion is accelerated enough to cause ionization in another gas molecule, so that at voltages of about 1000–2000 V one xray absorption event generates an ionization pulse involving W · M ion–electron pairs. The gas multiplier M ranges from about 102 to 104 depending on the gas and the voltage [Hendricks 1972, Wolff 1974]. (One often uses a gas such as Ar so that electrons and ions do not get trapped by the closed-shell noble gas prior to an ionization event, with a small quantity of an organic molecule such as methane to serve as a quench gas to terminate a pulse discharge [Collinson 1963, Agrawal 1988]; 90 percent argon and Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

312

X-ray microscope instrumentation

10 percent methane, or P-10, is one common gas mixture for proportional counters.) It is therefore possible to measure the energy of the absorbed photon in this “proportional counter” regime, though the energy resolution is not very high (Eq. 7.30) due to the large value of W (the Fano factor for several gases is around F  0.17 [Alkhazov 1967]). As the voltage is raised further, lightning can strike: one obtains a complete breakdown of the gas upon an initiating ionization event, so that one x-ray pulse creates a very large charge pulse independent of the x-ray photon’s energy. This is the Geiger–M¨uller regime of gas detectors. In both the proportional and Geiger–M¨uller regimes, there is a recovery time for ions to recombine with electrons so that the counter is ready for another x-ray event, and this is affected by space-charge limitations near the bias wire. This can limit the maximum count rate of gas proportional counters to less than 1 MHz, though higher counting rates can be achieved by operating at below-atmosphere pressures and with multiple bias voltage wires arranged along an extended x-ray beam path [Feser 1998]. It is also advisable to have a low but steady gas flow to compensate for the outgassing of contaminants and to remove molecules that have been not just ionized but fragmented after x-ray absorption.

7.4.9

Superconducting detectors The energy resolution of silicon detectors is limited by the WSi = 3.65 eV needed to create a single electron–hole pair, as indicated by Eq. 7.33. One approach to reduce W dramatically, and thus dramatically improve the energy resolution of energy-dispersive detectors (Section 7.4.12), is to use the energy associated with the sharp onset of electrical resistance in a superconducting film operated near its critical temperature. Consider the case of aluminum, which has a critical temperature of 1.175 K corresponding to a thermal energy of kB T = 0.0010 eV. In principle this should lead to the creation of 3.65/0.0010 = 3650 more quasiparticle events than in a silicon detector, with a decrease √ in the energy resolution by a factor of 1/ 3650 = 0.016, so that the 154 eV energy resolution of silicon (Eq. 7.33) would be reduced to about 2.5 eV. Thus the strategy is to have an x-ray photon stopped in an absorbing material which will heat up by a minuscule amount, and detect that heat by the change in resistance in a superconducting film [Moseley 1984]. This is best accomplished by using the electrothermal feedback of a transition edge sensor [Irwin 1995], leading to an estimate of the fundamental limit to energy resolution of  (7.60) ΔEFWHM = 2.36 4kB T 2C(1/α s ) n/2, where C is the heat capacity of the superconductor, n  4–6 is the thermal impedance between a superconducting film and its substrate, and T dR (7.61) R dT is a unitless measure of the sharpness of the change in electrical resistance R of the superconductor at its critical temperature (values of α in the hundreds are representative for many superconducting alloys). The first x-ray detectors to use this principle αs =

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

313

achieved an energy resolution of 2.6 eV with 1 keV photons [Irwin 1996]. A major limitation has been the time required for a detector element to “cool down” once again after it has absorbed an x-ray photon, thus limiting count rate, but this can be addressed by providing many detector elements in a pixelated array with multiplexed signal readout [Chervenak 1999, Irwin 2004]. X-ray detectors that can combine high energy resolution with high count rate are under active development, and could offer a path to imaging differing chemical states of elements in scanning x-ray fluorescence microscopy by exploiting XANES-like shifts in fluorescence emission energies.

7.4.10

Energy-resolving detectors While the method of x-ray emission spectroscopy [Meisel 1989] is ripe for combination with microscopy with a few early examples, the detection of element-specific x-ray fluorescence (Section 9.2) is the main method in x-ray microscopy that requires detectors that can report the energy of detected photons. There are two main types of energy-resolving x-ray detector systems now in use in x-ray microscopes: wavelengthdispersive detectors, which use Bragg diffraction in crystals or from gratings, and energydispersive detectors which measure the number of quanta created in a detector after an x-ray absorption event. We outline only a few key ideas of each.

7.4.11

Wavelength-dispersive detectors Wavelength-dispersive detectors are essentially diffraction spectrometers, with a detector that has pixels at least along the dispersion direction (that is, a linear pixel array detector). No energy resolution from the detector is required, as different photon energies are translated into different arrival positions (spatial resolution) on the detector. Because in scanning x-ray microscopy they are used to collect the signal from a small excitation spot, their design is somewhat simplified relative to x-ray spectrometers used to collect signals from larger areas. Because x-ray fluorescence yields are larger at multi-keV x-ray energies and above (Fig. 3.7), wavelength-dispersive detectors used for x-ray microscopy tend to use Bragg diffraction from crystals (though there are very successful grating-based spectrometers for soft x-ray emission [Nordgren 1989, Chuang 2017]). One of the challenges of volume gratings like crystals is that Bragg’s law requires one to match both the input and the output angle with the crystal’s d spacing, as shown in Fig. 4.9. Because the Darwin width or angular spread within which one has strong diffraction is so narrow (Fig. 4.12), one faces special challenges not present with simple plane gratings (Fig. 4.8). One way to overcome this limit is to scan the grating through a rotation range to collect a spectrum, but this is not practical in scanning x-ray microscopes where short pixel dwell times are desired in order to image large fields of view at high resolution. Instead, it is more common to bend the crystal [DuMond 1930] both to provide a degree of imaging from the source to a focus like in the Rowland circle condition for spherical gratings, and to keep a larger width of a crystal near the Bragg condition. This can be done in reflection mode [Johann 1931], or in transmission mode [Cauchois 1932]. These older

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

314

X-ray microscope instrumentation

designs have been summarized [Sandstr¨om 1957], while newer designs with multiple crystals have appeared [Alonso-Mori 2012, Honkanen 2014]. As noted in Section 9.2, an especially interesting wavelength-dispersive spectrometry system at the European Synchrotron Radiation Facility in France uses a polycapillary array to increase the solid angle of collection to Ω = 1.48 (though with only 14 percent efficiency), while also providing the crystal with a parallel rather than a diverging beam. This system delivers 4–40 eV energy resolution [Szlachetko 2010], and because different x-ray energies arrive at different pixels on a linear array detector one can accommodate higher overall flux rates than in single-element energy-dispersive detectors.

7.4.12

Energy-dispersive detectors Energy-dispersive detectors (sometimes called energy-dispersive spectroscopy detectors or EDS detectors) work by using Eq. 7.27 of E = qW to measure the photon energy E for each photon (q = 1) as it arrives. The resulting energy resolution for silicon EDS detectors of ∼150 eV FWHM is usually sufficient to separate fluorescence lines, though there can be overlaps as discussed in Section 9.2.1. As can be seen from Eq. 7.30, one will obtain improved energy resolution using materials with lower values of W (recall from Eq. 7.31 that WSi = 3.65 eV, while WGe = 2.96) though silicon remains popular because of the quality of material that can be obtained and its more highly developed processing technologies. Considerable detail on the various technologies for EDS detectors such as lithiumdrifted silicon or Si(Li), or silicon drift diodes (SDDs), is provided elsewhere [Spieler 2005, Lowe 2014], so we will only make a few general comments here: • Because measuring the energy of a photon requires collecting all of the electrons or holes it liberates, one must allow for enough time for all charges to reach an amplifier (not leaving enough time leaves one with a “ballistic deficit” in charge collection [Loo 1988]). This would make one prefer a smaller active detector area, while increasing the solid angle of the detector (Eq. 9.23) would make one prefer a larger active detective area. The required compromise affects the detector dead time tdead , thus limiting the maximum count rate to the MHz range (Eq. 7.44) per detector element even if dead time corrections are made. One way to go beyond this limit is to essentially have a pixelated energy-resolving detector so that the aggregate rate can be much higher; this is the approach taken by the 384 “pixels” of the MAIA detector system [Ryan 2010]. • If some of the electron–hole separations created by the absorbed photon fail to reach the amplifier, one obtains an incorrect, lower value for the photon energy. This can happen when some charges are trapped by defects or impurities in the detector material. Incomplete charge collection manifests itself as a “step” and a “tail” in the spectral response from a single photon energy, as illustrated in Fig. 7.17. • In order to minimize the contribution of thermal excitations from adding “extra” charge to a photon’s measurement, the detector is often cooled below room temperature so as to “sharpen” the Fermi–Dirac distribution function of Eq. 3.19.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.4 X-ray detectors

315

100

Normalized intensity

10-2 Sum

10-4

S (step; incomplete charge collection)

10

-6

10-8 10-10 T (tail; incomplete charge collection)

10-12

G (Gaussian broadening)

10-14 0

2

4

6

8

10

Photon energy (keV) Figure 7.17 Factors in the response of an energy-dispersive spectroscopy (EDS) detector. Shown

here is the representative response to 8.00 keV photons, with a peak broadening given by Eq. 7.30, as well as both a “step” and a “tail” on the low-energy side of the spectrum, which are both due to incomplete collection of the photon-absorption-produced electron–hole charge separation. Figure adapted from [Sun 2015], following the notation of [Van Grieken 2002].

• Incoming photons can trigger the process of x-ray fluorescence in the materials that make up the detector. If the fluorescence occurs deep within the detector material, the fluorescent photon will be re-absorbed (Section 9.2.4) in the detector so the correct total charge will still be reported. If, however, that fluorescence event takes place near the surface of the detector material, the energy of that fluorescent photon can be lost. This gives rise to “escape peaks” in the fluorescence spectrum; for example, with 10.00 keV photons into Si where the SiKα fluorescence line is at 1.74 keV, there will be a small escape peak at 10.00 − 1.74 = 8.36 keV in the x-ray fluorescence spectrum. • As was noted above Eq. 7.44, if two photons arrive within the pulse-shaping time they can be interpreted as a single photon with twice the energy. This is called pile-up, and it means that strong fluorescence at 4 keV can produce a pile-up peak at 8 keV. • In order to avoid the buildup of contaminants on the active detector surface as well as to shield from visible light, most detectors use a thin beryllium or aluminized silicon nitride entrance window. This affects how close the detector can be placed to the specimen, thus limiting collection solid angle; it also limits the ability to detect x-ray fluorescence lines below 1–2 keV, depending on the window material Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

316

X-ray microscope instrumentation

and thickness. X-ray fluorescence self-absorption is strong in these cases (Section 9.2.4), and fluorescence yields are low (Fig. 3.7), though there are successful demonstrations of the use of sub-1 keV x-ray fluorescence in x-ray microscopy [Kaulich 2009, Kaulich 2011, Hitchcock 2012, Bufon 2017]. Together these effects lead to the characteristics of actual energy-dispersive spectra, such as that shown in Fig. 9.12.

7.5

Sample environments X-ray microscopy is all about the specimen under study, and one of the advantages of using X rays for microscopy is the high penetrating power relative to electron beams. While environmental electron microscopy offers one the chance to image specimens in partial gas pressures and in liquid environments that are up to a few hundred nanometers thick [Ross 2016], inelastic and plural elastic scattering sets fundamental limits (Fig. 4.81) while x-ray microscopes are able to study specimens that are micrometers or even millimeters thick. Therefore one can consider sample environments that are much closer to what is “natural” for the study at hand, provided one is aware of the limitations set by radiation damage (Chapter 11). For soft and biological materials, radiation damage can be minimized by maintaining the specimen at cryogenic temperatures using the methods discussed in Section 11.3.1. For many other materials, one can observe them in native conditions (in situ) or even as they normally operate (operando3 ), though as noted in Section 11.4 at doses of about 109 Gy one does begin to see changes in some lithium battery materials [Nelson 2013] and in silicon-on-insulator materials [Polvino 2008]. Even with these limits, there is considerable headroom for x-ray microscopy to accommodate a wide range of sample environments. Two of these environmental conditions involve gases and fluids. As was shown in Fig. 7.12, X rays are able to penetrate significant distances in air and in helium gas, depending on the x-ray energy. Therefore while it is helpful to provide a helium gas environment where possible so as to minimize x-ray beam absorption and scattering, and while the presence of oxygen can increase radiation damage effects in soft materials studies [Coffey 2002, Braun 2009] as noted in Section 11.2.1, there is no absorption problem in providing an air or other gas environment for a specimen if required. Fluids are also easily accommodated: Fig. 2.5 showed that the 290–540 eV “water window” spectral range [Wolter 1952] provides great absorption contrast for the study of hydrated organic materials, and there is appreciable phase contrast for such specimens at multikeV energies, as was shown in Fig. 4.77. One can also incorporate microfluidics systems to provide for fluid exchange and reactions within an x-ray experiment [Ghazal 2016]. However, for studies of fluids and gases one usually needs a way to obtain an isolated environment, which leads to the need for x-ray windows. There are several materials that one can use for x-ray windows with different absorp3

This is often written as in operando as one would expect from in vivo and in vitro, but apparently the correct Latin usage is operando. Consult your local Jesuit to be sure.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.5 Sample environments

317

1/e attenuation length (meters)

10-3

10-4

Borosilicate glass

10-5 Kapton

10-6

1000 nm

Silicon nitride

10-7

100 nm

Silicon

10-8 100

300

500

1000

3000

5000

10,000

Photon energy (eV) Figure 7.18 Attenuation lengths μ−1 (Eq. 3.75) in several materials used for thin x-ray windows.

Since the transmission through a thickness t goes as exp[−μt] (Eq. 3.76), and x-ray window thicknesses tend to be between 100 and 1000 nm, those thicknesses are indicated at right on the plot. Shown here are the curves for silicon nitride (Si3 N4 ; ρ = 3.17 g/cm3 ), borosilicate glass (81 percent SiO2 , 13 percent B2 O3 , 3.5 percent Na2 O, 2 percent Al2 O3 , and 0.5 percent K2 O; ρ = 2.23), silicon (Si; ρ = 2.329), and Kapton (stoichiometry H10 C22 N2 O5 ; ρ = 1.42).

tion properties, as shown in Fig. 7.18. Polyimide (of which Kapton is one variant) was used for x-ray windows in early zone plate x-ray microscopes [Schmahl 1980], but its relatively high absorption in the soft x-ray range and its lack of stiffness can be problematic. Silicon windows offer great properties and have been used with much success in soft x-ray microscopy, but the methods needed to produce them [Medenwaldt 1995] are somewhat costly in terms of equipment required and manual monitoring. Thinwalled borosilicate glass capillaries (pulled to a wall thickness of about 100 nm) have proven to be very successful [Weiß 2000] for soft x-ray tomography of samples that can be loaded inside, such as cells or particles in liquid suspension. However, they are not easily manufactured for ∼100 nm thin flat substrates. For that requirement, silicon nitride has nearly perfect properties of low absorption, high stiffness, great radiation damage resistance, and reasonably low-cost production methods (as will be described in Section 7.5.1). However, other alternatives are worth considering, such as silicon carbide membranes, which are said to be tougher and with better compatibility for cell culturing [Altissimo 2018]. Graphene is another interesting material, since monolayer films can support high pressure differentials and block the transport of waDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

318

X-ray microscope instrumentation

UV light Photomask

Photoresist Si Silicon nitride 1: UV exposure

2: Develop photoresist

3: Silicon nitride RIE (reactive ion etch) Wafer orientation

face

4: Strip photoresist

54.7°

5: KOH etch Si

Figure 7.19 A process for producing thin silicon nitride windows [Pawlak 1987]. It begins with a double-side-polished 100 silicon wafer upon which has been grown an Si3 N4 layer using low-pressure chemical vapor deposition (LP-CVD). A positive photoresist is then spun on each side, and one side is exposed through a contact mask so that liquid development exposes regions of the Si3 N4 surface. A reactive ion etch (RIE) with a fluorinated gas (for example, 8.5 percent O2 with CF4 ) is then used to selectively etch the nitride layer and expose the underlying silicon only at the desired locations. Next, an anisotropic wet etch along the 111 planes is made to remove silicon in the exposed areas, creating free-standing windows.

ter [Bunch 2008, Liu 2015a]. This has allowed graphene to be used as a very thin membrane for encapsulating protein crystals [Wierman 2013], for liquid cells in electron microscopy [Park 2016, Textor 2018], and indeed for multiple imaging modalities [Matruglio 2018]. With the right window material, one can leverage the capabilities of lithographic fabrication methods to make sample environments with the right combination of fluids or gasses and temperature. This has been important for x-ray microscopy studies of battery and catalysis materials, as will be discussed in Section 12.4 as well as in recent reviews [Weker 2016, Lin 2017]. The details of what makes for an ideal sample environment are very application-specific, so we emphasize here the basic considerations of window materials.

7.5.1

Silicon nitride windows Silicon nitride windows were first made in connection with early plans to use x-ray shadow printing of absorption masks for proximity x-ray lithography (an approach that caused much excitement for possible future integrated circuit production [Spears 1972, Spiller 1993] at a time when optical lithography was thought to be limited in spatial resolution to many micrometers). The original process for silicon nitride window fabrication [Bassous 1976] was developed at the IBM Research Center in New York in the 1970s, and was later simplified [Pawlak 1987] to the one shown in Fig. 7.19. The process steps are as follows: • One starts with a double-side polished silicon wafer with the surface oriented along

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

7.5 Sample environments

319

the 100 direction. This wafer is then placed in a low pressure chemical vapor deposition (LP-CVD) system in which a silicon nitride layer of the desired thickness is grown. It is important to use a gas mixture and temperature combination to produce films with low internal stress [Sekimoto 1982, Temple-Boyer 1998, Toivola 2003]. The resulting film might have a stoichiometry that differs slightly from Si3 N4 . • A positive photoresist is then spun on both sides, after which UV exposure through a contact mask is used to define the top surface silicon etch areas. The photoresist is then developed to expose these areas on the silicon nitride film. • A reactive ion etch (RIE) is carried out using a fluorinated gas such as 8.5 percent O2 with CF4 . This removes the silicon nitride in the exposed areas to expose the underlying silicon wafer, with the differential etch rate of photoresist in the RIE gas protecting the other areas of silicon nitride. • The photoresist is then stripped using O2 RIE, which does not harm the silicon nitride. • The wafer is then wet-etched in a solution of ethylene diamine, pyrocatechol, and water, which produces a highly anisotropic etch along the 111 planes of silicon [Finne 1967]. This wet-etch step can take several hours, and it does not affect the silicon nitride layer. The result of this process is to produce silicon nitride windows on silicon wafer frames. These have been used as low-electron-backscatter, x-ray transparent windows for the fabrication of Fresnel zone plates, and as the windows in specimen environmental chambers for in situ and operando studies [Yang 1987a, Neuh¨ausler 2000, de Groot 2010]. Mini-chambers with silicon nitride windows and flow tubes for periodic flushing with culture medium have been used for studies of initially living cells [Pine 1992], including the study shown in Fig. 11.8, while silicon nitride windows offer one option in microfluidic chambers used in x-ray studies [Ghazal 2016]. Silicon nitride windows are now commercially available from several vendors, including with electrodes for specimen heating and for electrochemistry. They are also used for electron microscopy [Ring 2011], and in microelectromechanical systems (MEMS) devices where their mechanical properties have been studied [Zwickl 2008] and where circular-area windows have been fabricated [Serra 2016]. Silicon nitride windows can be used to separate the UHV environment of an x-ray beamline from the atmospheric pressure environment of an x-ray microscope: 100 nm thick silicon nitride windows can withstand a pressure difference of 1 atmosphere over an area of 1 mm2 with no radiation-induced weakening over years of operation (scaling to larger areas has been studied in detail [Sekimoto 1982]). This took some courage to try at first, including convincing the people in charge of the vacuum system at synchrotron light sources that the windows would not break and vent the accelerator to air! With precautions such as differential pumping arrangements and electropneumatically driven gate valves interlocked to pressure gauges, this was eventually permitted [Rarback 1984] and this approach is now widely used in x-ray microscopes. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

320

X-ray microscope instrumentation

7.6

Concluding limerick Instrumentation may seem prosaic, but it’s the poetry through which we express our goals of learning more about the world at the nanoscale. If we want to do great x-ray science on our tools we must place our reliance Our detector and source we choose wisely; of course! Our instruments are in compliance.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008

8

X-ray tomography

Do˘ga G¨ursoy contributed to this chapter. Up until now we have concentrated on two-dimensional (2D) imaging of thin specimens. However, one of the advantages microscopy with X rays offers is great penetrating power. This means that X rays can image much thicker specimens than is possible in, for example, electron microscopy (as discussed in Section 4.10). For this reason, tomography (where one obtains 3D views of 3D objects) plays an important role in x-ray microscopy. There are entire books written on how tomography works [Herman 1980, Kak 1988], and on its application to x-ray microscopy [Stock 2008], so our treatment here will be limited to the essentials. Examples of transmission tomography images are shown in Figs. 12.1, 12.6, and 12.9, while fluorescence tomography is shown in Fig. 12.3. Our discussion of x-ray tomography will be carried out using several simplifying assumptions: • We will assume parallel illumination, even though there are reconstruction algorithms [Tuy 1983, Feldkamp 1984] for cone beam tomography where the beam diverges from a point source. • We will assume that we start with images that provide a linear response to the projected object thickness t(x, y) along each viewing direction. In the case of absorption contrast transmission imaging, this can be done by calculating the optical density D(x, y) = − ln[I(x, y)/I0 ] = μt(x, y) as given by Eq. 3.83, with μ being the material’s linear absorption coefficient (LAC) of Eq. 3.75. In phase contrast imaging, one may have to use phase unwrapping [Goldstein 1988, Volkov 2003] methods to first obtain a projection image which is linear with the projected object thickness since ϕ = kδt (see Fig. 3.17). • We will assume that there is no spatial-frequency-dependent reduction in the contrast of image features as seen in a projection image. That is, we will assume that the modulation transfer function (MTF) is 1 at all frequencies u (see Section 4.4.7). One can always approach this condition by doing deconvolution (Section 4.4.8) on individual projection images before tomographic reconstruction, or building in an actual MTF estimate into optimization approaches (Section 8.2.1). • We will assume that the first Born approximation applies (Section 3.3.4): we can approximate the wavefield that reaches a downstream plane in a 3D object as being essentially the same as the wavefield reaching an upstream plane. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

322

X-ray tomography

• We will assume that the 3D object volume lies entirely within an x-ray microscope’s depth of field limit of DOF  5.4δ2r /λ as given in Eq. 4.215. This is known as the pure projection approximation. Cases where this does not apply are discussed in greater detail in Section 10.5. Together, these assumptions allow us to assume that we obtain pure projections through the object, with no information encoded regarding differences along the illumination direction. We will furthermore limit our discussion here to standard tomography with a single axis of rotation (at least until we consider double-tilt tomography and laminography in Section 8.5.2). Single-axis tomography is also referred to as computed axial tomography, which in medical imaging is referred to as a CAT scan.1 In tomography based on absorption contrast, one wants to find a balance between having enough absorption to produce contrast in the projection images, but not so much that one has trouble getting the beam to emerge from the thick specimen or that one absorbes too much radiation dose in the specimen. The optimum linear absorption coefficient (LAC) μ for absorption contrast tomography is μopt = 2/D,

(8.1)

where D is the diameter of the specimen [Grodzins 1983b]. One possibility is to tune the photon energy to satisfy Eq. 8.1 as indicated by Eq. 3.75. As is often the case in scientific discoveries, several people contributed to the origins of what we know today as tomography. Important mathematics advances relevant to tomography were made by Johann Radon in 1917 [Radon 1917, Radon 2007]. Some ideas of determining object slices from line projections were developed by Allan Cormack [Cormack 1963, Cormack 1964], though with only limited experimental demonstration. In January 1968, David De Rosier and Aaron Klug submitted the first [De Rosier 1968] of several [Crowther 1970a, Crowther 1970b, Klug 1972] papers on tomography in electron microscopy, while in August 1968 Godfrey Hounsfield of the company EMI in the UK submitted a patent filing for medical CAT scan methods (his first publication demonstrating its operation came several years later [Hounsfield 1973, Ambrose 1973]). Cormack and Hounsfield shared the 1979 Nobel Prize in Medicine, and Klug won the 1982 Nobel Prize in Chemistry.

8.1

Tomography basics In Fig. 8.1 we show the collection of a particular row from a 2D projection image p(x , z)θ as an object slice f (x, y)z is rotated. This row is called a slice projection p(x )z,θ , and it records the transmission image through a particular object slice f (x, y)z chosen along the z direction (the key tomographic variables are listed in Table 8.1). Mathematically, a slice projection is given by the famous Radon transform [Radon 1917, Radon 1

Veterinarians might request cat CAT scans; those who like sporting dogs might want to see the lab results.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.1 Tomography basics

Illuminating beam

Ƌz

Slice projection

Object slice

323

Object slice

3D object

2D projection

Figure 8.1 Schematic representation of the collection of 2D projection images from a 3D object

as it is rotated. A parallel beam is assumed to illuminate the object, with transmission images p(x , z)θ being recorded on an area detector at each angle θ. Because the object is rotated in θ about the z axis, each object slice f (x, y)z is imaged onto a separate line projection on one row z on the detector (with height Δz = Δdet ). Since z on the detector maps exactly onto z on the object, we can write the line projection at the particular detector row z as p(x )z,θ . The set of line projections (or one-dimensional images in the x direction, as recorded by the detector) as the object is rotated by θ gives rise to a sinogram for one slice position z, as will be shown in Fig. 8.2. Table 8.1 Notation used for tomographic quantities, both for the Fourier representation

discussed in Section 8.1, and for the matrix representation discussed in Section 8.2, where it is explained in Eqs. 8.5 and 8.6 how multidimensional images are represented by 1D arrays.

Name Object Object slice at z Object slice pixel indices Object slice in Fourier space Projection image Slice projection at z Slice projection pixel indices Slice projection in Fourier space Projection matrix Rotation angles

2007] of p(x )z,θ =



∞ −∞



∞ −∞

Fourier formula f (x, y, z) f (x, y)z [x, y] F { f (x, y)z } p(x , z)θ p(x )z,θ x F {p(x )z,θ } θ

Matrix formula f fz x

Matrix dimensions

pθ pz,θ x

1 1 (from x ) 1

Wθ θ

2 (from fz ,pz,θ )

1 (from x) 1

f (x, y, z) δ(x cos θ − y sin θ − x ) dx dy,

(8.2)

where δ() is the Dirac delta function of Eq. 4.84. Our goal is to use this set of slice projections to reconstruct the 2D object slice f (x, y)z at the object row z, and then to combine all the reconstructed object slices to produce a 3D view of the object. Because we assume that there is no blurring of the projection image from row to row z , we can Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

324

X-ray tomography

Oblique view of object slice Ƨ Ƌz

Feature

Ƌuy Redundant angles

ƋƧ

Ƨ Feature Feature

Top-down view of slice

Sinogram

Fourier space

Figure 8.2 Two ways to represent slice projections p(x )z,θ in tomography. At left is shown one

object slice f (x, y)z at a particular z location in the 3D object (Fig. 8.1), both in an oblique view of object slice and in a top-down view of slice. Different rotation angles θ allow one to record different slice projections p(x )z,θ as 1D images; the axes of these slice projections for a few specific rotation angles θ are shown in separate colors. Features in the object appear at right-of-center slice projection positions x for half of the projection angles, and at left-of-center positions x for the other half of the projection angles. One can assemble the set of all slice projections for all angles θ to form a sinogram, which is a 2D image with axes x for positions along each slice projection p(x )z,θ , and θ for each rotation angle. (The name sinogram is drawn from the fact that one feature will appear to trace out a sinusoidal curve along the vertical axis, as shown.) Because the slice projection taken at θ = 350◦ should just be the same as the slice projection taken at θ = 170◦ except that the x direction is reversed, rotation of the object over 180◦ is sufficient to construct the entire sinogram (this is why half of the sinogram is shown as having “redundant angles”). Each slice projection with N x pixels is assumed to be a true projection with no depth information about the object. This means that its Fourier transform yields information over N x pixels in the u x direction in Fourier space, and over only one pixel (the zero-spatial-frequency pixel) in the orthogonal direction. This leads to a filling of information in the Fourier space representation of the object slice, or F { f (x, y)z }. This happens both for the selected, colored angles shown at left in top-down view of slice, and for any addition angles (shown in gray) over which slice projections were acquired, with an angular spacing Δθ. The Fourier space coordinates [u x , uy ] (with a spacing Δuy in the uy direction indicated) correspond to the object slice real space coordinates [x, y].

map the detector row z directly onto the object row z, or z = z. In addition, because we assume these rows are imaged separately, we can process each of the Nz object slices separately; this makes standard tomography data processing a trivially parallelizable problem in computing. If you were to view several photos of a simple 3D object taken from different views, you might try to construct a 3D model by simply extending (or backprojecting) the object along your viewing direction in each view. For an object slice f (x, y)z at position z, you would take the 1D slice projections and backproject them within the f (x, y)z image array. This approach will be shown in the top row of Fig. 8.4. As more and more projections are obtained (that is, as Nθ is increased), you would expect the reconstructed Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.1 Tomography basics

325

object slice image f (x, y)z to improve, yet you would not have to wait for all projection angles to be acquired before starting to see an image appear. However, this simple realspace backprojection operation produces an inaccurate representation of the object slice for reasons that will be discussed below. As we rotate an object slice through θ, the set of slice projections p(x )z,θ we obtain can be put together in a 2D “image” in which the horizontal axis is the distance x along each slice projection, and the vertical axis is the rotation angle θ. This image is called a “sinogram” because a feature at one position in the object slice traces out a sine curve along the vertical axis, as shown in Fig. 8.2. Because the Radon transform of Eq. 8.2 is invertible, once we have assembled a sinogram from a complete set of slice projections p(x )z,θ we can, in principle, recover the object slice f (x, y)z .

8.1.1

The Crowther criterion: how many projections? How fine a rotation step Δθ should we use in tomography? It is easiest to think about this question by considering tomographic data in transverse–axial Fourier space. Let’s consider one slice projection p(x )z,θ in the Fourier plane. As shown in Fig. 8.2, we re-map the information from N x pixels in the slice projection to N x discrete spatial frequencies (see Section 4.3.3). Because the slice projection has no depth information (due to our assumption of obtaining pure projections), its extent in the orthogonal direction in Fourier space is precisely one pixel (the center, zero-spatial-frequency pixel). If we then assemble the Fourier transforms of the slice projections F {p(x )z,θ } over all rotation angles θ, we wind up with a representation of the object slice in the Fourier plane F { f (x, y)z }, as shown at right in Fig. 8.2. If we can re-map each Fourier transform of each slice projection F {p(x )z,θ } onto a regular grid in the Fourier plane (this Fourier plane interpolation operation is conceptually straightforward, but tricky in detail as we will see below), we can simply carry out an inverse Fourier transform F −1 {} of that remapped data in the coordinates [u x , uy ] to obtain the object slice image f (x, y)z . That is, we obtain a full 2D view of the slice from the set of 1D line projections, or a 3D view of the object from a set of 2D projection images. That is the beauty of tomography! For those who are familiar with crystallography, or with techniques for reconstructing images from diffraction plane intensities as will be discussed in Chapter 10, it is good to remember that our Fourier space information is obtained computationally from images. When one takes the Fourier transform of a 2D projection image, or a 1D line projection, one obtains complex information in the Fourier plane. The Fourier plane magnitudes (which when squared give diffraction intensities) and phases can be inverse Fourier transformed with no loss of information. In other words, there is no phase problem in tomography. If we acquire projections at too few rotation angles Nθ , we will not be able to fill information into all pixels in the Fourier plane. These “unfilled pixels” will by default have an information content of zero, and this will lead to artifacts in the reconstructed image, as will be shown in Fig. 8.4. To avoid this problem and provide information at all points in Fourier space, one should make the angular step Δθ be about equal to the distance of one pixel in Fourier space or Δθ  Δuy , so that there are no “unfilled pixel”

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

326

X-ray tomography

(a) ion ject Pro

Ƌt

ge ima

(b)

y

(c)

ƋƧ

Ƨ

Nt

uy

uy

Rotation axis

x Top view of slice z ina

tion

Illum

ux

ux

Fourier space

Fourier space

Figure 8.3 Schematic representation of the Crowther criterion in conventional tomography. Each

slice of the object is mapped onto one row of a detector, as shown in Fig. 8.1. One then obtains 1D pure projection images of the object with N x transverse pixels of width Δt in the transverse direction (a), and a depth of precisely one pixel at zero spatial frequency in the axial direction (because there is no way to distinguish between different axial positions in a pure projection). For an angle θ = 0◦ , the Fourier transform of this image yields an array with N x = N x pixels in the transverse or u x direction and Nz = 1 pixels in the axial or uz direction in transverse–axial Fourier space (b). As the object is rotated, so is the information obtained in Fourier space, so (u x , uz ) Fourier space is filled in as shown in (c). The Crowther criterion of Eq. 8.3 is effectively a statement that one must provide complete, gap-free coverage of all pixels around the periphery in transverse–axial Fourier space. Figure from [Jacobsen 2018].

gaps between polar angle projection slice lines at the periphery (see Fig. 8.3). Now if we have square object slices with N x pixels on a side, we can write r = N x /2 and note that the distance around the periphery of the array edge is 8r. But in fact the true periphery of our sampling is a circle, not a square, so the correct number of angular sampling points required is not 8r but 2πr. That is, if the object slices have N x × N x pixels, we need to acquire projections over not 4N x but 4(2π/8)N x angles, giving a requirement for Nθ = πN x if rotating the specimen over 360◦ . Because projections taken 180◦ apart yield the same information in the pure projection approximation, we in fact need only Nθ =

π N x 2

(8.3)

rotation angles. The condition of Eq. 8.3 is known as the Crowther criterion [Crowther 1970b]. One can be a bit more exact and make these angles exactly match rectangular voxel positions at the outer surface of a 3D data cube (thus changing the spacing to be even not in θ but in 1/ cos θ), and for data sampled in this way one can obtain a further improvement in the tomographic reconstruction by using polar Fourier transforms in an approach called equal slope tomography [Miao 2005]. One can also relax the Crowther criterion by a factor of 1/NA when using multislice methods (Section 10.5) to reconstruct NA axial or depth planes from one viewing direction [Jacobsen 2018]. Finally, as we will see when we compare filtered backprojection with algebraic reconstruction tomography methods in Section 8.2, one can in fact frequently get away with far fewer Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.1 Tomography basics

327

than the Crowther number of Nθ projections when using algebraic approaches with their additional object constraints, though it is important to emphasize that these constraints only minimize artifacts that would otherwise appear due to the missing information without actually substituting for that missing information.

8.1.2

Backprojection, filtered backprojection, and gridrec Examination of the Fourier space representation of an object slice F { f (x, y)z } shown at right in Fig. 8.2 makes clear another detail in Fourier-based tomographic reconstruction methods. The overlap of information from each projection is very high at low spatial frequencies [u x , uy ] (near the center of the Fourier space representation), and it becomes low or even undersampled at high spatial frequencies (near the periphery of the Fourier plane representation, with undersampling taking place when the Crowther condition of Eq. 8.3 is not met). Therefore it is essential to correct for these variations in data weighting by applying a radially dependent filter g(ur ) (sometimes called a “ramp filter”) g(ur ) = 1/|ur |.

(8.4)

to the data in the Fourier plane before carrying out the inverse Fourier transform operation to recover the object slice f (x, y)z . This filter normalizes out the weighting of all spatial frequencies, leading to the filtered backprojection method. However, we must exercise care, because unthinking application of the filter of Eq. 8.4 will lead us to magnify the presence of noise in the reconstructed image. Recall examples like that of Figs. 4.19 and 4.49, where we found that image signals tend to decline with spatial frequency as u−a r where a  3–4, while noise due to Poisson statistics has a “flat” floor independent of spatial frequency u. Therefore, just as in image deconvolution (Section 4.4.8), we need to multiply the backprojection filter function g(ur ) with either a Wiener filter W(u) like in Eq. 4.207, or with some other function such as a Hamming filter [Harris 1978]. Another thing to note in the Fourier space representation of an object slice shown at right in Fig. 8.2 is that the Fourier-transformed slice projections F {p(x )z,θ } provide data in a 1D array along a polar angle, while the Fourier transform of the object slice F { f (x, y)z } is on a Cartesian grid. The pixels in one representation do not sit exactly on top of the pixels in the other, so some scheme of re-mapping the data must be found. If one were to do this in real space, a simple bilinear or cubic interpolation might suffice with only local-to-one-pixel errors present; however, one pixel in Fourier space contributes to all pixels in real space, so imperfect interpolation can affect the entire object slice image f (x, y)z . The approach used to handle this Fourier space grid mapping is one inspired by work in synthetic aperture radio astronomy [Brouw 1975], which was then brought over to x-ray tomography [O’Sullivan 1985]. The idea is this: one wishes to convolve the Fourier space polar line projection data F {p(x )z,θ } with a smooth, but limited-in-extent convolution kernel W(u x , uy ) that will provide a data sampling on the Cartesian grid. Now the discrete Fourier transform has the property that the real space data are assumed to be cyclically periodic (Eq. 4.95), when in fact we normally assume that the object fits fully within the slice projection p(x )z,θ ; in other words, we assume Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

328

X-ray tomography

Backprojection

Filtered backprojection

MLEM

Figure 8.4 Comparison of simple backprojection, filtered backprojection using gridrec, and the

maximum likelihood expectation maximum (MLEM) algorithm [Richardson 1972, Lucy 1974] (Section 8.2.2), which is one of a class of algebraic, iterative reconstruction algorithms. In all cases, projections were generated from a Shepp–Logan phantom [Shepp 1974], with the number of rotation angles θ varying from Nθ = 4 at left to Nθ = 256 at right. Since this phantom was digitized over N x = Ny = 256 pixels, the reconstructions for Nθ = 256 at right approximately satisfy the Crowther criterion of Eq. 8.3, so good reconstructions are obtained in both filtered backprojection and MLEM cases. When fewer angles are recorded, backprojection and filtered backprojection show severe artifacts which could be erroneously interpreted as image features; in such cases, algebraic iterative reconstruction methods such as MLEM are strongly preferred.

that the object can be contained within a compact support. For this reason, one also desires that the real space representation of the convolution kernel W(x ) be able to greatly suppress the line projection at its edges (since a narrow function in Fourier space produces a broad function in real space, this condition is reasonably easy to satisfy). So as to have constant results at all polar angles θ where one calculates the 1D Fourier transform of the slice projection F {p(x )z,θ } and maps it into the 2D Fourier grid of F { f (x, y)z }, the kernel should be nearly symmetric in 2D, and separable so that W(x, y) = W(x)W(y). A good choice is to use a prolate spheroid function,2 or a polynomial approximation to such a function for computational speed [Nuttall 1981, Xiao 2001]. After regridding, the effects of this convolution kernel can be removed by dividing the object slice f (x, y)z by the kernel W(x s , y s ) [O’Sullivan 1985]. This fast computational approach for Fourier space re-gridding and filtered backprojection reconstruction has been 2

At the Advanced Photon Source in the USA, this prolate spheroid might carry a Chicago Bears football logo and have rather pointy ends, while at the Australian Synchrotron it might carry St. Kilda Saints markings and have more rounded ends.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.2 Algebraic (matrix-based) reconstruction methods

329

named the gridrec algorithm [Dowd 1999], with additional papers describing its performance [Marone 2012]. The combination of the polar weighting filter function g(ur ) of Eq. 8.4, and the gridrec re-gridding operation, make filtered backprojection reconstruction approaches an excellent choice for many uses. The computation has no iterative sequences, and furthermore it exploits the fast Fourier transform (FFT) algorithm, which involves ∼2N 2 log(N) computational steps to process an N 2 array (rather than the N 4 steps that the brute-force discrete Fourier transform would require; see Eq. 4.89). Gridrec delivers fast, high-quality reconstructions, but only if one has met the Crowther criterion (Eq. 8.3) for angular sampling. If instead one has a smaller number Nθ of projection directions, both simple backprojection and filtered backprojection deliver low-quality tomographic reconstructions. The good (large Nθ ) and the bad (small Nθ ) results for filtered backprojection via gridrec are shown together in Fig. 8.4.

8.2

Algebraic (matrix-based) reconstruction methods The reconstruction methods described above are based on a Fourier optics understanding of the tomography reconstruction problem. An entirely different approach was formulated by Gordon and Herman in 1970 (their first paper in fact appeared in print the following year [Gordon 1971]), leading to the development of algebraic reconstruction tomography (ART) [Gordon 1970] as the first of a class of iterative matrix-based methods. Normally we would rewrite the object slice f (x, y)z as a two-dimensional array in the coordinates [x, y]. However, we can also choose to index the 2D pixels in this array by a single index x, which follows a sequence (i x , iy ) of x = {(0, 0), (1, 0), . . . , (N x − 1, 0), (0, 1), (1, 1), . . . , (N x − 1, Ny − 1)} = {0, 1, . . . , (N 2 − 1)},

(8.5)

where in the second form of x we have assumed that N x = Ny = N. This allows us to write the object slice as a 1D array fz with N 2 elements. Let us also write the slice projection p(x )z,θ as a 1D matrix pz,θ , where the indices i x go as x = {0, 1, . . . , (N x − 1)}.

(8.6)

How are these two matrices related to each other? Well, one pixel in the slice projection involves a weighted sum of a subset of pixels from the object slice as shown in Fig. 8.5, which naturally involves the dimensionality of both the object slice x and the slice projection x . That is, there is a 2D projection matrix Wθ with N x × N 2 matrix elements that allows us to relate the object slice fz to the slice projection pz,θ as pz,θ = Wθ fz

(8.7)

for a given angle θ. The projection matrix Wθ is illustrated in Fig. 8.5. Let’s consider the deceptively simple expression of Eq. 8.7 in greater detail. Because the 2D projection matrix Wθ is something that can be calculated exactly, we can obtain the N x values of the slice projection pz,θ if we known the N 2 values of the object f. Now Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

330

X-ray tomography

Object slice

X-ray beam direction

ce Sli

for one slice projection pixel

pro n tio

jec Figure 8.5 Algebraic reconstruction tomography (ART) methods are matrix-based methods for

recovering the object slice fz via solution of the equation pz,θ = Wθ fz for all values of θ (Eq. 8.7). Shown here is the weighting matrix Wθ that tells which of the x pixels in the 2D object slice fz contribute to one of the x pixels in the 1D image of the slice projection pz,θ . The weighting matrix Wθ will be sparse, because for any one pixel in fz there will only be a few object pixels that appear along the projection direction (the red column through the object slice in the figure).

there is no point to having N x > 2N due to the Nyquist sampling theorem of Eq. 4.88; as a result, we have an overdetermined problem in that the calculation of pz,θ = Wθ fz of Eq. 8.7 involves obtaining N x values in fz from N 2 observations in pz,θ . However, the inverse is certainly not true: even if we could obtain the pseudoinverse W+θ of the matrix Wθ and thus write fz = W+θ pz,θ ,

(8.8)

we would not arrive at an unambiguous answer, since we would be trying to determine N 2 values from N x measurements. That is, we would have far more unknowns than knowns. Obviously as slice projections from more and more projection angles Nθ were added to our data, we could get closer and closer to a deterministic measurement. However, while it is easy to manipulate two equations to solve for two unknowns, it far more difficult to algebraically manipulate N x Nθ knowns to solve for N 2 unknowns. The better solution is to turn to numerical optimization approaches, which leads us to a little detour.

8.2.1

Numerical optimization Matrix equations of the form pθ = Wθ f of Eq. 8.7 appear in a wide range of problems (we shall see another example in Section 9.3). As a result, an entire branch of applied mathematics has arisen around their solution, calling them numerical optimization prob-

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.2 Algebraic (matrix-based) reconstruction methods

norm

331

norm

Figure 8.6 Common “norms” used in optimization problems. The one-norm or 1 (Eq. 8.12) has

a sharp minimum, while the two-norm or 2 (Eq. 8.13)) has a softer minimum. When combining a main cost function C0 with several regularizers λi Ci such as in Eq. 8.15, the two-norm provides a better balance between various costs towards finding an overall minimum.

lems. A significant subset of this literature deals with the generic matrix equation y = Ax

(8.9)

where y is usually some measurement, A is a matrix modeling the measurement process (with a pseudoinverse A+ ), and x is a model of the object. It is worth noting that the discrete Fourier transforms can be written as a matrix operation (Eq. 4.96). In the algebraic reconstruction tomography example seen above, we were not able to solve the equivalent (Eq. 8.8) of x = A+ y exactly. This is characteristic of situations where numerical optimization methods are used. We might reasonably seek to find a minimum residual to the difference between the two sides of the equation, or a minimum value of min y − Ax p

(8.10)

generally, or in the case of algebraic tomography methods based on Eq. 8.7, a minimum of min pz,θ − Wθ fz  p ,

(8.11)

where the subscript p indicates the specific measure of the minimum, as will be illustrated below. In other words, we can define a data-matching cost function C0 or objective function that should be minimized. The function C0 is often formed from one of several -norms [Tarantola 2005], including  |yi − Axi | (8.12) 1 : C0 = y − Ax1 = i

⎞1/2 ⎛ ⎟ ⎜⎜⎜ 2⎟ ⎜ 2 : C0 = y − Ax2 = ⎜⎝ |yi − Axi | ⎟⎟⎟⎠

(8.13)

i

where 1 is called the one-norm and 2 is called the two-norm. These different norms drive different behaviors in the convergence of the solution, as illustrated in Fig. 8.6, so that the two-norm 2 is usually preferred. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

332

X-ray tomography

While we may not know exactly what the object x is, we may have some knowledge of its characteristics. In coherent diffraction imaging (Chapter 10) as solved using an optimization approach, this might include knowledge that the object fits entirely within a subregion of a sampled real-space array from which we have diffraction intensities; this is known as a finite support constraint. In spectromicroscopy (Chapter 9), we might know that materials with known spectra comprise part of the specimen, so these spectra should play a role in the solution [Mak 2014]. Yet another constraint can be non-negativity [Lee 1999], such as in the case of processes A that are based on x-ray absorption where negative absorption (that is, addition of energy to the transmitted beam) would violate basic physics. For those items of prior knowledge that can be quantified in terms of additional error terms Ci , we can incorporate them into an overall cost function by adding them as regularizers. Example regularizers include the following: • Sparsity: one may have reason to favor models of the specimen x which are “sparse,” meaning that many entries are zero. In the case of spectromicroscopy, one may have regions of a specimen that are phase-segregated so that certain pixels in the image contain only one material with one spectroscopic signature. Ideally one would then like to minimize the zero-norm or 0 , which measures how many entries in x are non-zero. Since this turns out to be an “NP-hard”3 regularizer to minimize in optimization [Natarajan 1995], it has been shown that the one-norm 1 serves as a good proxy for sparsity [Tibshirani 1996], leading to a regularizer of Csparsity = ||x||1 . • Total variation (TV): the most “safe” or “conservative” version of a reconstructed object x is the one that has the least amount of structural variation, consistent with what is demanded by the recorded data. Therefore a common regularizer is to minimize the total variation V of the object, which is defined as V=

i=N−2 

|xi+1 − xi |.

(8.14)

i=0

The concept of TV minimization4 first arose in Fourier analysis [Jordan 1881], and there are a variety of approaches to its definition in multiple variables [Clarkson 1933]. The application of TV regularization in an optimization approach often allows one to obtain reconstructed images even in cases where one might seem to have incomplete data, such as in compressive sensing [Cand`es 2006, Donoho 2006], where one first transforms the data onto a basis set where its information is nicely sparsified and separable (principal component analysis provides one such transform, as will be discussed in Section 9.3.1). The basic error minimization cost of C0 = y − Ax2 (Eq. 8.13) might not be on the same numerical scale as any of the separate regularizer costs. As a result, the net cost is written using a set of regularization terms λi for each of the regularizers, leading to a 3 4

NP-hard means non-deterministic polynomial-time hard; think “slow to compute.” TV minimization is almost always a good approach, unless there’s something really good on television tonight!

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.2 Algebraic (matrix-based) reconstruction methods

333

Cost function Ctotal

Local minima

Global minimum

Variable being adjusted Figure 8.7 Local versus global minima in a cost function Ctotal . One of the challenges of

optimization strategies involves avoiding traps in local minima. Numerous strategies can be applied to avoid being trapped in a local minimum, including multiscale optimization and simulated annealing.

total cost of Ctotal = C0 + λ1C1 + λ2C2 + . . .

(8.15)

Think of making a multi-item purchase where each individual item i is bought using a separate currency (dollars, euros, yen, pesos, and so on); obviously the total cost must include factors λi that account for currency exchange rates in order to know the true cost in a single currency. As one learns in calculus, the minimum of a simple function f (x) can be found by setting its first derivative to zero, and ensuring that its second derivative is negative. However, when it comes to finding a minimization of Eq. 8.15, we have a plethora of variables—each of the pixel values in an object slice fz in the example of tomography— even before we consider any regularizers λiCi . Therefore, taking a derivative along a single variable is not guaranteed to lead us in the right direction, and we must also take care not to get trapped in any local minima as we try to find our way to a global minimum (Fig. 8.7). In fact, because the problem is usually underdetermined, local minima might be abundant and furthermore there may be only small differences between many of the local minima and the global minimum. Furthermore, the solution might be so unknown at the outset that one starts with solution parameters chosen at random (in fact, using incomplete knowledge rather than random starts can often lead one to getting trapped in local minima; random starts are often better). So how do we tweak a myriad of variables to find a global minimum to Ctotal when we start with random numbers? Some of us might throw up our hands in exasperation, but applied mathematicians will rub their hands in glee: this is a rich playground! Of course there are many books written on the topic (see for example [Nocedal 2006]), so we will make only a few remarks here. • The calculations become iterative. From the random start, one tries to tweak the solution towards smaller individual cost terms C1 , C2 , and so on in the total cost Ctotal of Eq. 8.15. But since tweaks affecting one cost term will usually affect another, one must iterate through this process and see how the cost function becomes minimized. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

334

X-ray tomography

• The simplest approach is to go in the direction of reducing one cost term C1 , and then in the direction of reducing another cost term C2 , and so on, in a successive optimization approach. However, one can obtain better convergence at the cost of more calculations by using a simultaneous optimization approach where the vector sum of all individual minimization steps is used. The successive and simultaneous optimization schemes are compared against each other in Fig. 8.8. While simultaneous optimization involves more calculations, it also involves a more direct path to the global minimum, without zig-zagging in a way that could lead one into local minima. It turns out that the original ART algorithm [Gordon 1970] involves a successive optimization approach (which can be traced back to work by the Polish mathematician Stefan Kaczmarz [Kaczmarz 1937]), while the simultaneous iterative reconstruction tomography (SIRT) algorithm [Gilbert 1972] involves a simultaneous optimization approach. These two fundamental differences in approach arise again in coherent diffraction imaging (Chapter 10), where the classical error reduction algorithm [Fienup 1978] inspired by Gerchberg and Saxton [Gerchberg 1972b] and its variants such as the hybrid input–output algorithm [Fienup 1982a] are successive approaches, while the difference map algorithm [Elser 2003] represents a simultaneous approach. • One can first downsample the data to lower resolution, and carry out a more rapidly computed optimization where many of the local minima might be “blurred out,” after which optimization can be carried out on the full dataset to refine the solution. This is now known as multiscale optimization, though multiresolution image processing has an older history [Rosenfeld 1984]. If one thinks of this in Fourier space, one starts by reconstructing the image only at low spatial frequencies, and then one gradually “marches” out to higher spatial frequencies in iterative reconstructions in an approach called “frequency marching” [Chen 1999]. • If you took a calculus class, you probably learned how to use the chain rule to differentiate a more complex function in terms of a series of elementary differentiation operations. Well, computers have taken calculus classes too! Automatic differentiation [Griewank 2008] refers to an approach in which a computer applies the chain rule to the mathematical operations of a cost function as written in computer code. This gives the computer an efficient way to estimate a steepest descent path, and it is available in various toolkits such as TensorFlow by Google. In the case of iterative phase retrieval, as will be discussed in Chapter 10, this approach was first suggested by Jurling and Fienup [Jurling 2014] and later applied to x-ray ptychography data reconstruction [Nashed 2017, Kandel 2019]. Computer optimization problems are strongly related to machine learning and pattern recognition. Consider the example of artificial neural networks, where a set of “stimuli” x produce various weighted responses A in various “neural” pathways to produce differing outcomes y. If one has a set of known correct outcomes y from a set of known inputs x, one can use optimization to find the response matrix A in y = Ax and approaches of this sort have been applied to tomography reconstruction problems [Pelt 2013, Parkinson 2017]. As an example, if one has a “training set” of images x of Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.2 Algebraic (matrix-based) reconstruction methods

x2

Simultaneous optimization

C1

C1

Successive optimization

335

x1 Initial x2

guess

x1 Initial

guess

x3 x4

C2 x5

x4

C2

x3

Figure 8.8 When trying to solve for the global minimum between multiple terms in a total cost

function Ctotal , one approach is to take a step towards the minimum Demonstration of the iteration process for ART and SIRT on a two dimensional solution space. ART converges rapidly. SIRT has somewhat slower convergence but is more robust to measurement noise.

people which one knows can be sorted into three categories y (images of Jane, or of Diane, or of Robert), one can “train” the neural network by finding the matrix A using optimization. With this “trained neural network,” one can then supply a new dataset x of images and have high confidence that y = Ax will recognize the new images as being of Jane, Diane, or Robert—though sometimes one can spoof [Sharif 2016] machine learning! However, this is a gross oversimplification of how computational neural networks (CNNs) work. For example, CNNs might mimic the brain by having multiple computational “layers” with decision points made after certain layers, in which case the operations of a CNN clearly are not represented by a single matrix A nor with linear operators. Problems where one would like to solve y = Ax by minimizing y − Ax will appear again in Section 9.3.2, and in Section 10.3.6.

8.2.2

Maximum likelihood and estimation maximum Another approach to solve the matrix equation of pz,θ = Wθ fz in tomography (Eq. 8.7) is to seek to maximize the likelihood of the solution for the object slice fz , rather than minimizing a cost function Ctotal . If the object is assumed to be f (x) and the ideal observed image is p(x ˜  ) in the one-dimensional case, one should be able to write the observed image [Lucy 1974, Lucy 1994] as   f (x) g(x |x), dx, (8.16) p(x ˜ )= where g(x |x) is the probability of getting a signal at the observed image pixel p(x ˜ )  based on the object signal at pixel x. That is, g(x |x) is a blurring function in the measurement process, which could be a Gaussian blur given by Eq. 4.14 as P(x , x), or it could be some more complicated blurring function such as an intensity point spread

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

336

X-ray tomography

function of an imaging optic (Section 4.4.3). However, we should also assume that the actual observed image p(x ) has noise n(x ) so that p(x ) = p(x ˜  ) + n(x ),

(8.17)

where n(x ) might often be Poisson noise due to limited photon statistics (Section 4.8.1). It would therefore be more likely that sharp fluctuations in the actual observed image p(x ) are due to photon statistical fluctuations n(x ) rather than sharp structure in the object f (x). In other words, the blurred image with the maximum likelihood (in a Bayesian statistics sense [Richardson 1972]) is p(x ˜  ). Using a statistical method called expectation maximization (EM) [Dempster 1977] to seek the solution with maximum likelihood, one can write the next iterate j + 1 of the guess of the object f (x) j using a convolution notation [Fish 1995] as +  , p(x) f j+1 (x) = f j (x) ∗ p(−x) . (8.18) f j (x) ∗ g(x) This update rule is at the heart of the maximum-likelihood expectation maximization (MLEM) method as applied to tomography [Shepp 1982], though in fact there is a great diversity of notation used to write Eq. 8.18, as shown in Appendix C online at www.cambridge.org/Jacobsen. The update rule of Eq. 8.18 was first proposed by Richardson [Richardson 1972] and by Lucy [Lucy 1974] based on a Bayesian statistics approach with no assumption on the characteristics of the noise n(x ). However, the EM approach stipulates Poisson noise, yet arrives at the same update rule of Eq. 8.18; this has led to the statement [Carasso 1999] that “the equivalence of these two methods is curious in view of their different underlying hypothesis.” In addition, the act of dividing the observed image p(x) by the convolution of the present guess of the object fi (x) and the probe or blurring function g(x) in Eq. 8.18 can lead to noise amplification in the reconstruction process if too many iterations are carried out. In real space there may be true “zeroes” in the object, and moreover the Weiner filter approach used to avoid divide-by-zero errors in Fourier deconvolution (Section 4.4.8) is not applicable in real space. For this reason, MLEM is often combined with regularization schemes such as minimizing total variation (Eq. 8.14).

8.3

Analysis of reconstructed volumes The goal of tomography is to arrive at a 3D representation of the specimen, often with each voxel quantified in terms of its linear absorption coefficient (Section 3.3.3) or its electron density (Eq. 10.72). What does one do with these data? The simplest method is to view object slices f (x, y)z in succession in a movie, where each slice is a grayscale image in the movie sequence. However, often this is only an intermediate step in the analysis of a specimen. If the specimen is composed of materials with distinctly different densities, one can form isodensity surfaces (surfaces that contain a set of voxels with approximately the same numerical values for optical density) and then “spin”

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.3 Analysis of reconstructed volumes

Reconstructed optical density

337

Overlay of segmentation

Segmentation alone Figure 8.9 Segmentation of tomographic reconstructions into identifiable features allows one to

carry out important quantitative analyses of their volume, distribution, and connectivity. However, renderings of segmented volumes are usually represented by clear boundaries between features, which can give an impression of higher spatial resolution and feature separation than might actually exist in the reconstructed optical or electron density. This is illustrated here in a subregion of an x-ray tomography dataset of a charcoal sample [Vescovi 2018], where the raw reconstructed optical density is shown at top left. Otsu thresholding [Otsu 2007] was first applied to the data to find find three somewhat distinct values of optical density in the reconstruction, after which a 3D connected-component analysis was used to remove few-voxel blocks. The resulting segmentation masks were then smoothed by convolution with a Gaussian kernel, after which isodensity surfaces (surfaces at threshold values of each of the three classes) were generated and rendered in separate colors for each class of optical density using Vaa3D [Peng 2010]. These isosurfaces are shown overlaid on the optical density data at top right, and by themselves at bottom. The point of this illustration is that one can gain a false impression of the sharpness and feature separation in a tomographic dataset if only isosurface renderings are shown; representative gray-scale images of the actual reconstructed optical density should also be shown to provide full disclosure of the characteristics of the tomographic reconstruction. Figure made using images provided by Ming Du, Northwestern University.

the resulting 3D rendering on a computer display to obtain an amazingly realistic view of the object. There is a wide range of more sophisticated approaches to segmentation (such as watershed methods, and adaptive thresholding) which have been used for some Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

338

X-ray tomography

time in electron tomography [Sandberg 2007] and in medical imaging [Taha 2015], and the throughput of these approaches has been improved through the use of graphical processing units (GPUs) [Smistad 2015]. More recently, deep learning or multilayered convolutional neural network approaches [LeCun 2015] have emerged as offering improved performance following “training” by manual annotation of features in a small number of images; progress in this area is summarized in a recent review [Rehman 2018]. Application of these various methods has enabled quantitative studies of the pore structures which determine reactant access in catalysts [da Silva 2014], the changes in the lacunar system of bone with different osteoporosis drug treatments [Mader 2013], and the volume distribution of subcellular organelles [Parkinson 2008]. A tutorial on segmentation in soft x-ray tomography of cells is available in the Journal of Visualized Experiments [Darrow 2017]. While segmentation gives one an all-important ability to quantify volumes, connectivity, and other characteristics of distinct features within a specimen, one should be aware that renderings of the surface boundaries of segmented regions also give one the appearance of sharper boundaries within the specimen than is justified based on the examination of the actual reconstructed density. In other words, the segmented volume can give the appearance of higher spatial resolution and feature separation than is present in the actual reconstructed volume (Fig. 8.9).

8.4

Tomography in x-ray microscopes As was noted at the beginning of this chapter, the high penetration power of X rays makes tomography an obvious method to implement in x-ray microscopes, because without it the overlap of structures in depth can make 2D images difficult to interpret. Examples of the use of tomography in x-ray microscopy are shown in Figs. 12.1, 12.6, and 12.9 for transmission imaging, while fluorescence tomography is shown in Fig. 12.3. One very simple yet powerful way to carry out high-resolution x-ray tomography is to use a scintillator screen and a high numerical aperture (N.A.) microscope objective to image the x-ray intensity distribution downstream of an object onto a visible-light camera [Flannery 1987]. This allows one to exploit the rapid development of highframe-rate visible-light cameras, so that projection images can be obtained at sustained frame rates rates of 30 kHz [Mokso 2017] and above, thus enabling x-ray studies of dynamic processes in materials (see Fig. 8.10; this involves complications in data processing for time-evolving samples [Ruhlandt 2017], while opening new opportunities in materials science [Villanova 2017]). The use of scintillators limits the spatial resolution to something just below 1 μm where several complicating factors arise as discussed in Section 7.4.7. One can improve the spatial resolution further by using a scintillator/microscope objective/visible-light camera system in the point projection geometry discussed in Section 6.2, an approach that is employed in several commercially available x-ray microtomography systems.

Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009

8.4 Tomography in x-ray microscopes

339

7RPRJUDPWLPH VHFRQGV

106

104

102

100

10-2 0.01

0.1

1

10

9R[HOVL]H ѥP Figure 8.10 A sampling of experimental results reported for x-ray tomography at various voxel sizes (approximately equal to the 3D spatial resolution in most cases), and the time required to acquire one tomogram. Because of the increased photon flux required for high-resolution imaging (Section 4.9.1), there is usually a trade-off between resolution and tomography acquisition time. Data compilation organized by Francesco De Carlo of Argonne Lab, with similar plots presented and discussed elsewhere [Maire 2014, Villanova 2017].

It took a longer time for nanoscale x-ray tomography to be realized, and different developments took place with scanning and full-field microscopes: • The first demonstration of x-ray nanotomography used a scanning transmission x-ray microscope (STXM; Section 6.4) to acquire Nθ = 9 projections over a 105◦ tilt range at 345 eV, delivering images of a two-plane test structure at 100 nm xy and 600 nm z resolution [Haddad 1994]. Three dimensional imaging was combined with spectromicroscopy (Chapter 9) for the first time in STXMs, first by imaging a set of serial sections of a 3D object [Hitchcock 2003] (which, of course, is not the same as tomography) and then by rotating an object over Nθ = 61 projections over 180◦ as it was imaged at two XANES resonance energies (530.0 and 532.2 eV) near the oxygen K edge [Johansson 2007]. • Transmission x-ray microscopes (TXMs; Section 6.3) generally offer much faster imaging times, therefore making it easy to collect more projections. As a result, the first TXM tomography demonstration [Lehr 1997] involved an increased number of projections (Nθ = 33) over a 160◦ tilt range at 517 eV, delivering 50 nm xy resolution. Most transmission tomography in x-ray microscopes today is done using TXMs. • Soft x-ray microscopes operating at E1 we can make the approximation  sE μm,2 μm,1 · (E1 /E2 ) sE E1  = , (9.10) μm,1 μm,1 E2 in which case Eq. 9.9 becomes

ρx =



I1 ln I0,1



 s  sE I2 E1 E E2 − ln E2 I0,2 E1 . μx,2 − μx,1

(9.11)

Finally, if (μx,2 − μx,1 ) is accurately known and all non-specimen-dependent background signals are properly subtracted, the error in the measurement will be dominated by photon statistics. Let us assume that (ρx μx )  (ρm μm ) so that most of the absorption is due to the matrix. In this case we can write I1  exp[−ρm μm,1 ]I0,1 .

(9.12)

If we assume that the illumination is provided by a mean exposure of n¯ photons, the ratio I1 /I0,1 becomes √ exp[−ρm μm,1 ](¯n ± n¯ ) I1 , (9.13) = √ I0,1 n¯ ± n¯ where we have used the Gaussian √ approximation for Poisson statistics discussed in Section 4.8.1. Assuming the errors n¯ to be uncorrelated so that they add in a root-sum-ofsquares fashion, Eq. 9.13 becomes I1  exp[−ρm μm ](1 ± 2/¯n) (9.14) I0,1 and (within the simplifying approximations we have made) Eq. 9.11 becomes



ln exp[−ρm μm,1 ](E1 /E2 ) sE (1 ± 2/¯n) − ln exp[−ρm μm,2 ](E2 /E1 ) sE (1 ± 2/¯n)  ρx  μx,2 − μx,1 sE ln(E1 /E2 ) + ln(1 ± 2/¯n) + sE ln(E1 /E2 ) − ln(1 ± 2/¯n)  μx,2 − μx,1 √ 2sE ln(E1 /E2 ) ± 2/ n¯  , (9.15) μx,2 − μx,1 where in the last step we have made use of the Taylor series approximation ln(1 + x)  x for x  1, and added uncorrelated errors in a root-sum-of-squares fashion. Separating this into the measurement and its error, we have √ 2sE ln(E1 /E2 ) 2/ n¯   ±  , (9.16) ρ x ± Δρ x  μx,2 − μx,1 μ x,2 − μx,1 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.1 Absorption spectromicroscopy

2.0

Tyrosine 1.5

Optical density

Absorption cross section Ʊ

n=1 Electronic Continuum (core level) state (fully ionized)

ƋE

K edge

EK

Excitations from core level Continuum resonances

1.0

0.5

Photon energy

355

0.0 280

(Ionization potential)

Idealized carbon absorption edge

290

300

310

Photon energy (eV)

Figure 9.2 X-ray absorption near-edge structure (XANES) or near-edge x-ray absorption fine

structure (NEXAFS) in x-ray absorption spectroscopy. A schematic representation is shown on the left, and a spectrum from the amino acid tyrosine is shown at right (data from [Kaznacheyev 2002]). An x-ray absorption edge occurs at a photon energy sufficient to completely remove a core-level electron (Fig. 3.3). Because chemical binding happens at energies within a few eV of the Fermi energy (Box 3.2), there can be unoccupied or partially occupied electronic states just a few eV below an absorption edge. In XANES, a photon with an energy below the ionization potential can promote an inner-shell electron into this electronic state, leading to an absorption resonance at an energy a few eV below the absorption edge. In fact, one can have many resonances of this type, for example as shown in Fig. 9.5. There can also be continuum resonances above the absorption edge; electrons that get promoted into these states don’t stay there very long, leading to broad resonances due the Heisenberg uncertainty principle (ΔE) · (Δt) ≥  (Eq. 3.24).

so the fractional error in the measurement is given by Δρx 1 1 .  √  ρx n¯ sE ln(E1 /E2 )

(9.17)

With sE = 3 and E1  E2 , the fractional error reaches 1 percent when one uses n¯ = 1100 photons per pixel in the illumination. One can increase the illumination n¯ to achieve higher sensitivity provided all background signals are properly controlled, but this illustrates why it is hard to measure concentrations below about 1 percent using differential absorption.

9.1.2

Living near the edge: XANES/NEXAFS The schematic representation of an x-ray absorption edge shown in Fig. 3.3 represents one basic transition for an isolated atom. However, atoms are neither that simple, nor are they always so lonely, leading to two effects in x-ray absorption spectroscopy. The first is XANES (x-ray absorption near-edge structure, a term that comes from the EXAFS community), which is also called NEXAFS (near-edge x-ray absorption fine structure, a term favored by chemists who consider the discrete electronic states involved). The second is EXAFS (extended x-ray absorption fine structure; Section 9.1.7). While XANES

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

356

X-ray spectromicroscopy

Orbital configuration

551.0 532.0

(O1s-1› )

(C1s-1› )

305.0 287.4

C 1s

(C1s-1Ʊ ) Experiment

(O1s-1Ʊ )

x10

288

292

Theory

296

300

304

308

Energy (eV)

O 1s

Energy (eV) 0.0

Experiment

Theory

532 Ground state

536

540

544

548

552

Energy (eV)

Figure 9.3 Carbon and oxygen edge inner-shell spectra of carbon monoxide (CO) in the gas

phase, measured under electron scattering conditions equivalent to x-ray absorption [Hitchcock 1980] (higher resolution x-ray absorption spectra have also been published [Domke 1990]). Spectra and upper-level orbital plots from ab initio quantum calculations performed using the program GSCF3 [Kosugi 1980, Kosugi 1987] are also shown. The state energy diagram in the center shows the four main transitions. The hatched line indicates the core-level ionization potential.

happens within a few eV of the ionization potential, EXAFS can extend to energies several hundred eV higher. Fortunately for our existence, atoms can form chemical bonds, and those bonds have energies of about 1.5–11 eV (see Box 3.2). This produces available electronic states that are a few eV below the last occupied atomic orbital (as set by the Fermi energy, discussed in Section 3.1.3), yet these states may not be fully occupied within an ensemble of molecules. As a result, an x-ray photon with an energy of a few eV below the ionization potential has the chance to excite an inner-shell electron into occupying that electronic state. This leads to an absorption resonance a few eV below the absorption edge as shown in Fig. 9.2. The typical features of a XANES spectrum can be placed into the following categories, with most of these features shown in the carbon and oxygen near-edge spectra of carbon monoxide (CO) shown in Fig. 9.3 along with illustrations of σ and π orbitals. 1. Valence excitations involve core shell or ground state electrons being excited to valence states, such as the C 1s → π∗ transition in CO at 287.4 eV. In this context, valence refers to compact orbitals constructed from atomic levels of the outermost occupied principal quantum number (n = 2 for CO). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.1 Absorption spectromicroscopy

357

2. Rydberg transitions involve promotion of a core shell electron to levels constructed from atomic levels with n greater than the valence shell (n > 2 for CO). Examples are the 3s and 3p Rydberg features at 292.4 and 293.8 eV in C 1s excited CO. 3. Ionization continuum corresponds to direct ionization of the core electron. The threshold or ionization potential for the C 1s and O 1s edges of CO are indicated in Fig. 9.3 by a hatched line. For atoms and molecules with weak XANES features, this corresponds to the position of the absorption edge (Fig. 3.3). 4. Multi-electron excitations have valence (occupied) to valence (unoccupied) excitations occurring simultaneously with core to valence excitations. The relatively sharp feature at 300 eV in CO is a two-electron excitation: (C1s2 . . . π2 π∗0 ) → (C1s1 . . . π1 π∗2 ). 5. Continuum resonances involve valence electronic states above the core level ionization potential, so that promotion of a core electron to this state rapidly decays into direct ionization. From the Heisenberg uncertainty principle (Eq. 3.24) of ΔE ≥ /(Δt), these short-lived states produce broad spectral features above the ionization potential. The C 1s → σ∗ transition at 305 eV and the O 1s → σ∗ transition at 551 eV are both examples of continuum resonances. While this categorization is couched in the orbital language of atoms and molecules, corresponding features exist in the XANES spectral of solids. In this case there are valence excitations (corresponding to core→conduction band excitation), excitons (core excited electrons temporarily trapped by the core hole potential), direct ionization transitions, multi-electron features (such as shake-up and shake-off features [Mukoyama 1994]), and multiple-scattering resonances in the ionization continuum. The energy locations of XANES resonances can shift with the atom’s oxidation state, sometimes by several eV. Consider the case of an atom becoming more oxidized, such as iron going from Fe2+ to Fe3+ or, equivalently, from Fe(II) to Fe(III). The higher oxidation state number means that the iron atom has “given away” more of its electrons to another atom, so that fewer remain in its own inner-shell orbitals. As a result, there are fewer electrons to help in partial screening of the nuclear charge, and those that remain are more tightly bound. Those electrons then face a longer (in energy) jump out to a partially occupied molecular or valence orbital, so that the XANES peak shifts to higher energy at a higher oxidation state. The shift scales with the square of the oxidation state [Johnson 1936], as might be expected from the Zscreen term in the modified Bohr model of Eq. 3.12.

9.1.3

Carbon XANES Given the variety of ways that carbon atoms can form molecular bonds, it is not surprising that the carbon absorption edge is particularly rich in XANES resonances. This can lead to a powerful way to image chemical speciation in organic materials, as was first demonstrated for imaging polymer blends [Ade 1992]. As an example, consider the immiscible polymers polycarbonate (PC) and poly(ethylene terephthalate) (PET), for which carbon XANES spectra are shown in Fig. 9.4. At about 285 eV (which is

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

358

X-ray spectromicroscopy

2.0

Optical density (OD)

285.36

285.69

PET

PC

1.5 1.0

PET

PC

PET

PC PC:

C

O

C

O C O

C

0.5

O

PET:

C2H4

O C

n O

ѥP

C O n

0.0

285

290

295

285.36 eV

285.69 eV

Photon energy (eV)

Figure 9.4 Carbon near-edge or XANES spectra and images of a blend of two immiscible

polymers: polycarbonate (PC), and poly(ethylene terephthalate) (PET). The carbon XANES spectra at left show that PET has higher absorption at 286.36 eV, while PC has higher absorption at a photon energy that’s only 0.33 eV higher. This leads to a complete reversal of contrast between images taken at those two photon energies. Microtome-sectioned 100 nm thick films were prepared by G. Mitchell and R. Cieslinksi, Dow Chemical, and images and spectra were acquired [Ade 1994] with H. Ade, North Carolina State University, using a soft x-ray scanning transmission x-ray microscope (STXM).

about 5 eV below the carbon ionization potential at about 290 eV), these two materials have aromatic bond resonances that appear about 0.3 eV apart from each other, so that images taken at the two resonance energies show a complete contrast reversal, as shown in Fig. 9.4. Because of this, carbon near-edge spectromicroscopy has become quite popular, with a number of synchrotron light source microscopes optimized just for this. While one could probably write an entire book just on this topic, we limit ourselves to some key comments here: • Detailed understanding of the spectrum of a specific molecule requires highly accurate quantum mechanical calculations of the electronic states of the molecule and their occupancy. This is often done by methods such as density functional theory (DFT) [Jones 2015], though one can also gain insights by studying the progression of spectral peaks between molecule types with added chemical binding states [St¨ohr 1992]. • One can excite core-level electron transitions to near-edge states using either x-ray absorption as discussed here, or using electron energy loss spectroscopy (EELS) at the near-edge (energy loss near-edge structure or ELNES) as discussed in Section 4.10.1. Compared to EELS, X ray XANES provides a lower-dose way to use core-level electrons for spectromicroscopy [Isaacson 1978, Rightor 1997] because one excites only the desired transition. At the same time, the similarities between ELNES and XANES means that one can gain valuable insights into XANES spectra from consulting EELS data [Hitchcock 1994], as shown in Fig. 9.3 • Because the core-level electron state undergoes little modification due to chemical binding effects, XANES core-level electron transitions into valence orbitals are generally easy to understand and interpret. Transitions from a low-energy valence state to a higher-energy valence state that are driven by UV or visible light involve Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

8

6

Alanine: aliphatic

5

8

O

HS

O 6

Cysteine: side chain -SH

OH NH3+

4

OH O

OH NH3+

6

6

O

NH3+ NH2

3

Arginine: C=N /*

Glutamine: -NH2

4

4

+H2N

10

O

Mass Absorption Coefficient (104 cm2/g)

9.1 Absorption spectromicroscopy

OH NH3+

Tyrosine: aromatic 8

O

OH 3

NH3+

6

6 OH

NH NH2

4

359

4

2 2

2

2

2

1

0

0

287 288 289 290 291 292

0

287 288 289 290 291 292

0

287 288 289 290 291 292

0

287 288 289 290 291 292

284

286

288

290

292

Photon Energy (eV)

Figure 9.5 Carbon XANES spectra of several amino acids, along with calculations of the

specific electronic transitions and organic functional groups that produce the observed resonances [Kaznacheyev 2002]. These resonances allow one to identify major biochemical organizational themes in biological specimens, but the ∼1 percent local mass concentration limit of soft x-ray spectromicroscopy and the ratio of natural peak widths to the overall carbon XANES spectral range means that one is far from being able to detect individual proteins within the complex environment of a cell.

a more complex initial state. The simplicity of x-ray XANES comes at a cost of ionizing radiation damage. • Carbon XANES spectromicroscopy has proven to be of tremendous value in studies of polymers [Ade 1992]. Because polymers such as Kevlar exhibit strong molecular alignment, one can use the difference in x-ray absorption between different beam polarization directions for linear absorption dichroism studies (Fig. 12.5) to observe the polymer orientation in the plane of the thin section [Ade 1993]. Several reviews describe the details of carbon XANES spectroscopy of a large number of polymer types [Urquhart 1999, Dhez 2003, Watts 2011b]. Associations of functional groups with specific energies in the carbon near-edge spectral region are shown in Fig. 9.6. • Carbon XANES has the potential for imaging major biochemical organization motifs in biological specimens. The 20 amino acids that serve as the functional units for proteins have had their carbon XANES spectra carefully measured, and compared with theoretical calculations [Kaznacheyev 2002] so that one can associate specific resonances with specific functional groups, as shown in Fig. 9.5. As an example, tyrosine has a particularly distinctive spectrum with especially strong pre-edge resonances, as shown in Fig. 9.2. However, given that the intrinsic energy width of most carbon XANES resonances is about 0.2 eV and the range over which they appear is only about 5 eV, there are a limited number of “energy channels” available without significant spectroscopic overlap. In addition, the ∼1 percent sensitivity limit of differential absorption mapping (Section 9.1.1) means that one is far from detecting individual proteins in the complex environment of a cell. Finally, radiation damage to carbon XANES spectra (Section 11.2.1) and the lack of availability of many soft x-ray microscopes with cryogenic specimen transfer and imaging conditions (Section 11.3) limits what can be done. The best examples of the use of carbon XANES for biological studies published thus far involve Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

360

X-ray spectromicroscopy

Carbonyl Carboxyl

Schäfer et al. Aliphatic Aromatic

Phenol

C-O O-alkyl COOH carboxyl

Schumacher, Scheinost et al. Aromatic

Phenol

C-H aliphatic Rydberg Carbonate Carbamate

Urquhart and Ade

Urea Acetic Aromatic

284

285

286

Acetate Ketone Amide Aldehyde 287

288

289

290

291

Photon energy (eV) Figure 9.6 Bonding motifs in carbon near-edge (XANES) spectra, as reported by several

authors. These include carbonyl core transitions in polymers (Urquhart and Ade; [Urquhart 2002]), and organic components of soil and environmental specimens by one set of authors (Schumacher/Scheinost et al., [Scheinost 2001, Schumacher 2005]) and by another (Sch¨afer et al., [Sch¨afer 2003]). Many additional tables of peak assignments exist (see for example [Solomon 2012]), but definitive assignments usually require careful spectroscopic studies of thin films of the isolated components, along with density functional theory calculations, for reliable interpretation.

studies of protein-mediated packing of DNA in sperm [Zhang 1996], and overall sperm biochemical organization [Mak 2014]. • Because of strong absorption at the carbon K edge, most carbon XANES studies are carried out on specimens that are 100–200 nm thick (approximately equalling the absorption length μ−1 of Eq. 3.75, for reasons described in Section 9.1). However, 3D images employing carbon XANES contrast have been obtained by using serial sectioning of materials [Hitchcock 2003], and polymers have been studied in 3D using tomography combined with oxygen-edge XANES spectroscopy [Johansson 2007] and also with tomographic spectromicroscopy at the fluorine K edge [Wu 2018]. • In carbon XANES spectroscopy, care must be taken to understand the radiation dose at which spectroscopic resonances begin to be modified. This is addressed in more detail in Section 11.2.1, and low-dose data acquisition strategies are discussed in Section 11.2.7. • One can combine the insights gained by radiation-dose-limited x-ray spectromicroscopy with those obtained using resonant soft x-ray scattering, which can Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.1 Absorption spectromicroscopy

361

probe finer length scales but only by averaging over larger illuminated areas [Ade 2008]. These comments only touch the surface of this very important application of x-ray microscopy, with recent applications revisited in Chapter 12.

9.1.4

XANES in magnetic materials Electrons have spin, so for atoms with partially filled orbitals (Table 3.1) one can have unpaired electrons which can couple with their partners in nearby atoms. This can take the form of parallel coupling, leading to ferromagnetism, or antiparallel coupling, leading either to antiferromagnetism if the opposite spins are equal or ferrimagnetism if not. Paramagnetism and diamagnetism arise when spins are coupled only when induced by an external magnetic field, while spintronic materials have long-range correlations that can be associated with structuring of the material [St¨ohr 2006]. These include skyrmions, where the coupling is between orthogonal spin orientations. When circularly polarized light is incident with its helicity aligned with or against the direction of electron spins, one can have a change in the x-ray absorption cross section, with the degree of enhancement depending on the density of states for the specific spin direction. This difference in absorption at XANES resonances for magnetic moments parallel or antiparallel to the x-ray beam direction is termed x-ray magnetic circular dichroism (XMCD), as shown in Fig. 9.7. While there were earlier observations of x-ray magnetic linear dichroism (XMLD) [van der Laan 1986], the first observations of XMCD were for the Tb M edge [van der Laan 1986] and the Fe K edge [Sch¨utz 1987]. The first observation of strong L edge XMCD using Fe reflectivity [Kao 1990] and especially Ni absorption [Chen 1990] inspired the development of magneto-optical sum rules [Thole 1992] so that one can obtain a quantitative determination of element-specific spin and orbital magnetic moments. While there are other ways to remove non-magnetic image contrast, by taking images with opposite circular polarization and subtracting them one can obtain XMCD images that show the degree of magnetization along the x-ray beam direction. This was first done [St¨ohr 1993] with a photoelectron emission microscopes (PEEM; see Section 6.5). However, because the mean free path for inelastic scattering of low-energy electrons is typically under 5 nm (Fig. 6.9), one obtains a clean PEEM signal from only the outer few nanometers of the material and surface properties such as topography can add complexity to image interpretation. Transmission imaging removes those limitations: the 100–150 nm penetration of 700 eV soft X rays is well matched to the thickness of the magnetic thin films used in hard disk drives, while the sensitivity extends down to 1 nm thick layers in Co [Maci`a 2012]. Transmission XMCD imaging was first demonstrated [Fischer 1996] using a TXM (Section 6.3) on a bending magnet beamline where radiation was collected above and below the plane of the orbit so as to obtain a high degree of partial circular polarization (an example image is shown in Fig. 12.10). Another way to achieve this is to use electromagnetically driven elliptically polarizing undulators (EPUs; see Section 7.1.6) to dynamically adjust the direction of circular polarization.

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

362

X-ray spectromicroscopy

1.5

DOS

L3

EF

(2p3/2)

H field

Absorption

1.0

L2

0.5

(2p1/2)

0.0

2p3/2 2p1/2

Dichroism effect

-0.5

700

710

720

730

Photon energy (eV) Figure 9.7 X-ray magnetic circular dichroism (XMCD) in iron with an applied external

 leading to an auxiliary field H  in the material (see Eq. B.12 in online magnetic field B Appendix B at www.cambridge.org/Jacobsen). This effect can be used for high-contrast x-ray transmission imaging of magnetic domains in thin films (as will be shown in Fig. 12.10). When circularly polarized light is incident upon magnetic domains aligned parallel or antiparallel to the x-ray beam direction, a significant difference is observed in the L2 and L3 XANES resonances depending on whether the helicity of the beam points in the same or the opposite direction as the electron spin. The difference between these two is the dichroism effect. The inset diagram shows how the density of states (DOS) for the two spin directions differ near the Fermi edge [Mathon 2001, St¨ohr 2006], thus giving rise to dichroism. Data courtesy of Elke Arenholz, then of Lawrence Berkeley National Laboratory.

With a spatially coherent beam and a pixelated area detector, one can use ptychography from an EPU beamline for imaging magnetic materials [Shi 2016]. Fourier transform x-ray holography offers yet another approach to XMCD imaging (Fig. 10.9), with the advantage that one does not need to place an optic near the specimen so that there are fewer space constraints for pole pieces for providing external magnetic fields. The dynamics of magnetic spins can be studied using the pulse structure of the electron beam in synchrotron light sources (Table 7.1) along with fast-gating detectors. In scanning microscopes (Section 6.4), this can be done using fast-response avalanche photodiodes [Stoll 2004], while time-gated CCD detectors have been used in TXM systems [Wessels 2014]. The application of x-ray microscopy to the study of magnetic materials will be discussed further in Section 12.4, with an example image shown in Fig. 12.10. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.1 Absorption spectromicroscopy

100 284.59 eV

80

363

284.87 eV 285.09 eV

f1, f2

60 40

f1

20

f2 0 287.97 eV

-20

288.59 eV 288.31 eV

280 281 282 283 284 285 286 287 288 289 290

Energy (eV) Figure 9.8 Carbon near-edge spectrum of the complex oscillator strength ( f1 + i f2 ) for the amino acid tyrosine. The experimentally measured x-ray absorption spectrum shown in Fig. 9.2 was used to determine f2 (E) with a strong aromatic absorption resonance at 285.09 eV, and a strong COOH π∗ transition at 288.59 eV; the f2 (E) data were then spliced into tabulated data over a larger energy range after which f1 (E) was calculated using a numerical implementation [Jacobsen 2004] of the Kramers–Kronig transform of Eq. 3.111. As the photon energy increases towards the aromatic absorption resonance, the f1 (E) value reaches zero at 284.59 eV, which is where f2 (E) begins its sharp rise, and it reaches its most negative value at 284.87 eV. Unfortunately the sharp negative resonance in f1 (E) comes at the mid-point in the rise of absorption f2 (E) so that one can reduce but not avoid radiation damage by using phase contrast near-edge spectroscopy to detect selected molecule types.

9.1.5

XANES in phase contrast When discussing the x-ray refractive index n = 1 − αλ2 ( f1 + i f2 ) of Eq. 3.65, we noted that the Kramers–Kronig relationship (Section 3.4.1) of Eq. 3.111 provides a way to calculate the phase-shifting part of the complex oscillator strength f1 (E) from knowledge of the absorptive part f2 (E) over a wide range of photon energies E. One can “splice in” a near-edge absorption spectrum into tabulated data for f2 (E) over a larger energy range [Palmer 1998, Jacobsen 2004, Yan 2013, Watts 2014], and thus obtain a near-edge spectrum for the real or phase-shifting part of the complex oscillator strength [ f1 (E) + i f2 (E)]. We showed in Fig. 3.21 a comparison of calculated versus interferometrically measured near-edge f1 (E) values at the carbon K edge, and how this departs from tabulated values of f1 (E). One challenge in using XANES spectromicroscopy at high spatial resolution is the possibility of radiation-induced modifications to the underlying material, as will be discussed in Section 11.2.1. Could one carry out phase contrast XANES spectromi-

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

364

X-ray spectromicroscopy

croscopy with a lower radiation dose than one has with absorption contrast XANES? There are some possibilities, but several complicating factors arise [Jacobsen 2004]. To illustrate the possibilities, we show in Fig. 9.8 the f2 (E) near-edge spectrum of the amino acid tyrosine (obtained from the absorption data shown in Fig. 9.2), and the f1 (E) phase-shifting spectrum obtained by the “splicing-in” method. Unfortunately the sharp dips in the f1 spectrum lie half-way up the strong absorption resonances, so one can reduce but not eliminate the extra absorption due to the XANES resonance. One could instead exploit the zero-crossing value of f1 (E) at an energy below the absorption resonance, but that does not undergo a sharp change with energy. Finally, absorption spectra arise via Fermi’s Golden Rule of Eq. 3.18 directly from the overlap of quantum mechanical states leading to simple physical interpretation, while phase spectra will have the added complication of the Kramers–Kronig integral to deal with for their interpretation. Still, near-edge phase contrast effects have been explored experimentally in differential phase contrast [Hornberger 2007a, Figs. 3.8 and 3.9] and in ptychography [Maiden 2013, Shapiro 2014, Farmand 2017, Hirose 2017], so it is possible they could end up being practical and important. In the meantime, other approaches that exploit XANES phase-shift resonances include resonant soft x-ray reflectivity (RSoXR) [Wang 2005, Wang 2007], since it involves both δ = αλ2 f1 and β = αλ2 f2 , as shown by Eq. 3.120.

9.1.6

Errors in XANES measurements When carrying out absorption spectromicroscopy in a scanning transmission x-ray microscope or STXM using Fresnel zone plate optics, one must worry about blocking the “zero order” or undiffracted light that might be transmitted through the central stop, as shown in Fig. 5.17. As a result, the central stop should be made quite thick, as indicated by Eq. 5.45. In addition, if a grating monochromator used to select the photon energy delivers some second-order light, and if the zone plate has a mark:space ratio other than 1:1, second-order light from the monochromator (at twice the desired photon energy) can be delivered by second-order diffraction by the zone plate (at half the first-order focal length) to exactly the same focal position as the desired photon energy. This can reduce the apparent strength of XANES peaks, as illustrated in Fig. 9.9. So as to minimize the presence of higher-energy photons, some synchrotron beamlines use double-bounce mirrors to suppress second-order light as shown in Fig. 3.27. An alternative approach is to use a gas cell or a thin filter made of a material with an absorption edge above the desired photon energy E, but below the second-order photon energy 2E, to selectively absorb second-order light. Finally, one can also attempt to correct for the presence of higher monochromator orders in the analysis of a spectrum by measurement over an extended range and doing least-squares fitting [Yang 1986].

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.1 Absorption spectromicroscopy

365

2.5

Optical density

2.0

1.5

2× energy fraction: 0% 1% 3% 10%

1.0 30% 0.5

0.0 282

284

286

288

290

292

Energy (eV) Figure 9.9 Effect of the presence of second-order monochromator light on the measurement of a XANES spectrum. Shown here is a subset of the carbon near-edge absorption spectrum of the amino acid tyrosine (a larger spectral range was shown in Fig. 9.2) with an assumed thickness of 230 nm, representing an optical density OD = μt (Eq. 3.83) of about 1 at a photon energy of 290 eV. This spectrum was assumed to contain a specified fraction of light at twice the photon energy, such as from the second diffraction order of a grating monochromator. As can be seen, the presence of even a small amount of second-order light can reduce the apparent height of XANES resonances, and limit the maximum optical density that can be observed using a specific x-ray beamline.

9.1.7

Wiggles in spectra: EXAFS As the incident photon energy E is tuned above an x-ray absorption edge so that a corelevel electron is removed from an atom, the excess energy above the ionization potential E I goes into kinetic energy of the electron, or Ek = E − Ei .

(9.18)

This liberated electron has a non-relativistic momentum p given by Ek = p2 /(2me ), where me is the electron’s mass, so that the momentum becomes (9.19) p = 2me Ek , giving a de Broglie wavelength (Eq. 3.5) λe of λe =

h h = √ . p 2me Ek

(9.20)

Now let’s assume that there is an occupied electron orbital in a neighboring atom that is a distance r away. The ejected electron’s wavefunction can be reflected by this charge Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

366

X-ray spectromicroscopy

1.0

Optical density

0.8

0.6

0.4

0.2

0.0 9,600

9,700

9,800

9,900

10,000

10,100

10,200

Photon energy (eV) Figure 9.10 Extended x-ray absorption fine structure (EXAFS) spectrum of a zinc foil. The above-edge “wiggles” in EXAFS are due to constructive and destructive self-interference of the de Broglie wave nature of ejected electrons reflecting off of electron shells in a neighboring atom. Data from Matt Newville of the University of Chicago, via the XAFS spectra library found at cars.uchicago.edu/xaslib.

“surface” back onto itself, so that it forms a standing wave when n x λe = r.

(9.21)

Therefore one expects there to be slight constructive and destructive modifications to the x-ray absorption cross section at electron kinetic energies Ek that go like Ek =

n2x h2 n2x (1240 eV · nm)2 n2 (hc)2 = = , 2me r2 2 me c2 r2 2(511 × 103 eV)r2

(9.22)

where in the latter version we have made use of hc from Eq. 3.7, and the relativistic energy corresponding to the mass of the electron of me c2 = 511 keV. If we assume a representative inter-atom spacing of r = 0.25 nm, we find that the first spectral “wiggle” comes at a photon energy E − Ei = Ek of 24 eV above an absorption edge, with subsequent wiggles at energies spaced farther apart according to the n2x dependence in Eq. 9.22. Because these modulations occur over an extended energy range above the absorption edge (as shown in Fig. 9.10), these modulations are referred to as extended x-ray absorption fine structure or EXAFS. Of course there is a deeper story to EXAFS analysis, and its historical origins go back to the early days of x-ray physics [Stumm von Bordwehr 1989]. A key advance was to √ realize that by considering the absorption spectrum on a scale of Ek , the relationship of Eq. 9.22 becomes linear in n x , the number of de Broglie waves between an atom and nearby electron shells. One can then apply a Fourier transform to the spectrum so Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.2 X-ray fluorescence microscopy

367

as to measure the distance r to a nearby occupied electron shell with a high degree of accuracy [Sayers 1971]. Analysis of EXAFS spectra is practically an industry unto itself, and it is covered in detail in monographs [Teo 1986]. In a crystal, the arrangement of neighboring atoms is replicated precisely over many unit cells, so the EXAFS modulations are strong and one can measure interatomic distances to very high precision. The orientation of neighboring atoms is not precisely repeated in liquids, yet there is still a nearest-neighbor distance that can be measured from weaker EXAFS modulations [Eisenberger 1975]. However, in gases, the distances to neighboring atoms are random, and no EXAFS wiggles are observed. Because EXAFS analysis usually requires a careful measurement of the spectrum over a range of several hundred eV above an absorption edge, we are not aware of it being combined with high-resolution imaging for applications in EXAFS spectromicroscopy – but it would not be surprising if this were done in the future.

9.2

X-ray fluorescence microscopy Our discussion of XANES/NEXAFS and EXAFS has involved measurements of changes in x-ray transmission as the incident photon energy is tuned (absorption spectroscopy). Another very powerful approach in x-ray microscopy is to use a focused x-ray beam to excite the emission of x-ray fluorescence photons as discussed in Section 3.1.1, so that one can detect the presence of specific elements with atomic number Z by observing the emission of fluorescence X rays at energies approximated by Eq. 3.14 and its equivalents for other electronic transitions. In practice, one uses the experimentally determined energies of x-ray fluorescent emission lines from tabulations [Bambynek 1972, Krause 1979a, Elam 2002], including in computer-readable formats as described in Appendix A. Because one is looking for distinctive signals (x-ray fluorescence lines) against a dark background (low scattering, as will be described below), x-ray induced x-ray fluorescence provides one of the best combinations of sensitivity for trace element analysis and minimum damage, as was shown in Fig. 4.79. If one had an achromatic imaging optic and an energy-resolving area detector, one could perform full-field x-ray microscopy using x-ray fluorescence lines. However, both optics and detectors present challenges for realizing this approach: • Achromatic optics for nanoscale full-field x-ray imaging are not yet readily available. Fresnel zone plates (Section 5.3.1) and compound refractive lenses (Section 5.1.1) are strongly chromatic, so that a single energy-resolving area detector would be at the in-focus image position for only one x-ray fluorescence energy. Grazing incidence reflective x-ray optics are achromatic if they do not have multilayer coatings, but as discussed in Section 5.2 they tend to have very small fields of view due to off-axis aberrations. In theory, the Wolter geometry shown in Fig. 5.10 could overcome this limitation but it is challenging to realize with grazing incidence optics. Earlier attempts at fabricating two grazing incidence Wolter optics for

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

368

X-ray spectromicroscopy

2D focusing proved challenging [Onuki 1992, Hasegawa 1994]. More recently, Wolter-type optics have been developed using a Kirkpatrick–Baez-type geometry so that a separate optic pair is used to focus in each direction (thus requiring four optics for 2D focusing) [Matsuyama 2014, Yamada 2017]. These optics have demonstrated 50 nm resolution with unspecified efficiency [Matsuyama 2017], so this limitation might be overcome. • It is difficult to obtain detectors that provide both spatial and spectral information, at least for high signal rates. As will be discussed in Section 7.4, most x-ray detectors work by converting an absorbed x-ray into a number of quanta (such as electrons and holes in semiconductor detectors). If readout of a detector pixel’s quanta occurs over a time T (whether due to a choice in data collection time, or fundamentals such as capacitance in a detector element), one has a choice of interpreting the number of quanta in terms of the energy of a single x-ray photon, or of the number of photons at an already-known x-ray energy, but not both. How then to measure the energy of single photons while also allowing for an appreciable overall signal rate? This requires a large number of pixels with charge integration. The MAIA x-ray fluorescence detector [Ryan 2010, Ryan 2014] offers 384 pixels, though this is not sufficient for most imaging applications. At very low signal rates where one can expect no more than one photon per pixel per acquired image, standard CMOS cameras have been used for energy-resolved image detection [Scharf 2011, Ordavo 2011, Zhao 2017]. However, there is not yet a good solution available that combines many pixels, high spatial resolution, and high spectral resolution while also accommodating high signal rates. Difficult does not mean impossible, and full-field x-ray fluorescence microscopy at several-micrometer resolution has already been demonstrated [Takeuchi 2009, Garrevoet 2014] with a “light sheet” illumination approach to 3D imaging. More recently, 0.5– 1.0 μm spatial resolution was obtained using Kirkpatrick–Baez optics as noted above, and a charge-integrating CCD camera [Matsuyama 2019]. However, the above challenges remain for extension to nanoscale imaging with high throughput. Therefore the main approach to nanoscale imaging using x-ray fluorescence is to use a scanning microscope to provide spatial resolution, and a non-spatially resolved energydispersive detector to measure the energy of each emitted fluorescent photon. (While the MAIA detector mentioned above has energy-resolving pixels, its many detector elements are used primarily to increase overall count rate, and per-pixel collimation is used to preferentially detect fluorescent photons emitted from the focused x-ray beam spot position while reducing the contribution of scattered photons from other locations.) This scanning approach is known by several names, including scanning x-ray fluorescence (SXF), x-ray fluorescence microscopy (XFM), scanning x-ray fluorescence microscopy (SXFM), or scanning fluorescence x-ray microscopy (SFXM). In keeping with the nomenclature of scanning transmission electron microscopes (STEMs) and STXMs, we will use SFXM here.2 A schematic of a typical layout for SFXM at a synchrotron light source is shown in 2

If you’re sufficiently sophisticated, you can pronounce SFXM as “sficks-em.”

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.2 X-ray fluorescence microscopy

Zone plate objective

369

Sample, raster scanned

Undulator Monochromator

Order sorting aperture (OSA)

Transmission detector XRF detector

Figure 9.11 Schematic representation of a scanning fluorescence x-ray microscopy (SFXM) experiment with an undulator source at a light source facility. The undulator delivers an x-ray beam polarized in the horizontal direction, so there is a minimum of elastic scattering for a detector at 90◦ horizontal to the incident beam. An energy-dispersive detector (Section 7.4.12) is used to record the x-ray energy of each fluorescent photon at each specimen scan position, and a transmission detector can also be used which might be a simple silicon photodiode, or a segmented transmission detector for differential phase contrast [Hornberger 2008], or a pixelated detector for a variety of contrast modes [Thibault 2009b] including ptychography (Section 10.4). Figure modified from [Deng 2015b].

Fig. 9.11. As described in Section 7.1.4, these facilities use dipole magnets to steer the circulating electron beam in the horizontal direction and these “bending magnets” are one type of x-ray source for experimental end stations; another type is an undulator, and most undulators also deflect the electron beam back and forth in the horizontal direction. As a result, the x-ray radiation in the plane of both bending magnet and undulator sources is very strongly linearly polarized in the horizontal direction, and Eq. 3.34 then tells us that there is zero elastic scattering at 90◦ horizontally from the incident beam direction. This means that an x-ray detector placed at that angle relative to the specimen will receive a minimum of elastic scattering [Dzubay 1974], thus reducing the overall flux on the fluorescence detector and minimizing a strong elastic peak in the detected spectrum (see Fig. 9.12), although there are other fluorescence detector placement options, as will be discussed in Section 9.2.2. For a detector with a circular sensitive area of radius r located a distance z from the specimen such that it extends over a semi-angle of θ = tan−1 (r/z), the detected solid angle Ω will be given by  2π  θ sin θ dθ dϕ = 2π(1 − cos θ) (9.23) Ω= ϕ=0

θ =0

 πθ2 for θ  π/2.

(9.24)

Energy-dispersive detectors (Section 7.4.12) typically offer detection solid angle coverage of about Ω = 0.1–0.7 steradians (full coverage over 180◦ involves a solid angle of 2π sr out of 4π possible), and one can use two fluorescence detectors on either side of the illuminating beam to double the collected solid angle [De Samber 2012]. The 384 element MAIA detector [Ryan 2010] offers a solid angle of collection of up to 1.2 sr in the forward or backward direction, as shown in Fig. 9.14. One impressive alternative Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

370

X-ray spectromicroscopy

approach [Szlachetko 2010] for transverse detection has been to use a polycapillary array or “Kumakhov lens” (Section 5.2.5) to collect a solid angle of 1.48 sr and deliver a highly parallel beam onto a crystal in a wavelength-dispersive detector (Section 7.4.11) with 4–40 eV energy resolution. However, in this case the polycapillary optic had an efficiency of only 14 percent, leading to an efficiency·solid angle product of 0.21 sr. In order to allow for clearance of the detector front face relative to the specimen, as well as to minimize fluorescence self-absorption (Section 9.2.4), the specimen scan axis might be inclined by some angle θscan as shown in Fig. 9.14 relative to the usual perpendicular-to-incident-beam direction. This must be accounted for in setting scan parameters and displaying the image. Because one single detector system records the fluorescence signals from all detected elements during one pixel’s exposure time, the “maps” or images of elemental composition that one obtains are intrinsically registered with each other (that is, one does not need to do anything to align the different images to a common position). For reasonably thin specimens, it is useful to record the transmission signal simultaneously with the fluorescence signal “for free.” The detector might be a simple silicon photodiode for absorption contrast (which is often quite weak at the multi-keV x-ray energies used for SFXM; see Figs. 3.16 and 4.61). One can instead use a segmented transmission detector for differential phase contrast [Hornberger 2008]. If a pixelated area detector is used, one can use it to display a variety of contrast modes [Thibault 2009b], including ptychography [Schropp 2010a, Deng 2015b] as will be discussed in Section 10.4. Again, the transmission images are intrinsically registered to the fluorescence images. Scanning fluorescence x-ray microscopy (SFXM) provides one of the most sensitive methods for imaging the distribution of elements present at low concentrations within inorganic and biological materials. Its comparison to other methods is discussed in Section 4.10.1.

9.2.1

Details of x-ray fluorescence spectra When using an incident x-ray energy E in a SFXM system, one has the potential to excite x-ray fluorescence emission from all lower-energy fluorescence lines with an intensity per line given by Eq. 3.10. The lines with energies closest to the incident energy will be excited with the greatest efficiency due to the jump ratio of Eq. 3.8 just above an absorption edge, but the lower energy fluorescence lines will still be excited. As a result, in a SFXM experiment one acquires a spectrum like the example shown in Fig. 9.12. This spectrum contains a number of features that are worth considering in detail: • In most cases an energy-dispersive detector is used which might have an intrinsic energy resolution of about 150 eV, depending on the x-ray energy (Eq. 7.33). This means that spectral peaks appear to be much wider in the measured spectrum than one would have expected from their intrinsic width. • At the incident x-ray energy E, one can observe a Rayleigh or elastic peak which is

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

Mg Na AlSi P S Cl Ar K Ca

Cr Mn Fe Co Ni Cu Zn

Ti

Ge As

371

Br

105 Si

Data Ti

Cl

104

K Ar

Al

10

P S

3

Ti

Cu

Fe

Ca

Elastic

Zn Ni

Cr

Ca

102

Overall fit

Compton

Mn

Co





Cu Zn Background

102 Au

Tail

L

Image-integrated photons/50 msec/19.3 eV bin

9.2 X-ray fluorescence microscopy

101 1

2

3

4

5

6

7

8

9

10

11

12

X-ray emission energy (keV) Figure 9.12 Example spectrum from a scanning fluorescence x-ray microscope (SFXM). This is from a 10 μm thick cryo section of bovine articular cartilage that was freeze-dried prior to x-ray fluorescence analysis. This spectrum was obtained by summation of all pixel spectra in an image. It contains contributions mainly from Kα and Kβ lines, plus the elastic scattering peak at the incident photon energy of 10.2 keV and a Compton scattering peak at a slightly lower energy. The spectrum fitting includes the effects of detector energy response as shown in Fig. 7.17. The weak fluorescence lines in the 11–12 keV region are excited by a small presence of 20.4 keV photons due to second-order diffraction from the beamline monochromator. From a study by Markus Wimmer, Rush University, with Olga Antipova, Argonne Lab. The spectrum fitting was carried out by Stefan Vogt, Argonne Lab, using his program MAPS [Vogt 2003a].

due to elastic scattering of a fraction of the incident photons into the fluorescent detector. As noted above, a small detector located at 90◦ to the incident beam in the horizontal plane will record a very small elastic peak due to the minimization of elastic scattering at that angle; as the solid angle of the detector is increased, more and more of the elastic peak will appear in the detected spectrum. • At an energy of E − ΔECompton as given by Eq. 3.28, there will be a Compton peak due to inelastic x-ray scattering. The strength of this peak is proportional to the electron density in the specimen, so that one can use the Compton peak as a proxy for the projected or areal mass in the specimen along the incident beam direction. • One will then see the various fluorescence peaks from chemical elements present in the specimen. The natural linewidth of fluorescence lines from the lighter elements is ∼1 eV [Krause 1979b], but they are broadened considerably by the resolution of the energy-dispersive detector (Eq. 7.33). One must therefore pay careful attention to nearby fluorescence peaks such as the L1 , L2 , and L3 peaks (see for example Fig. 3.8) when analyzing fluorescence spectra. One can also have nearDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

372

X-ray spectromicroscopy

overlaps of L lines from heavier elements with K lines from lighter elements, for example. • When using energy-dispersive detectors, several complicating factors can arise in the observed spectra, including dead-time correction, pile-up, escape peaks, and incomplete charge detection. The origins of these effects are described in more detail in Section 7.4.12, but their effects on observed spectra are as follows: – Dead time: because the detector must collect the signal from electron–hole separation in the detector material (typically silicon, though sometimes germanium is used), there is a “dead time” tdead when the detector is insensitive to the arrival of another photon. At high count rates, this means one must apply a dead time correction to the observed intensity as given by Eq. 7.45 and illustrated in Fig. 7.14. – Pile-up: when two fluorescent photons reach the detector within its pulse integration time (that is, within the dead time), the detector electronics might report it as a single photon with an energy given by the sum of the two. – Incomplete charge detection: defects in the detector’s crystalline lattice can trap some of the signal from one photon, leading to a broad signal “floor” underneath the fluorescence peaks. This is shown in Fig. 7.17. – Escape peak: with silicon-based energy-dispersive detectors, the electron generated by absorption of an incident photon of energy E  can occasionally excite emission of an Si Kα fluorescent photon with an energy of E(Si, Kα ) = 1.74 keV. If that fluorescent photon escapes, the remaining electron–hole charge separation will correspond to an energy of E  − E(Si, Kα ). If there is an especially strong Rayleigh or Compton peak in the spectrum, or even an overwhelmingly strong fluorescence line from one major constituent in the specimen, this means one can see an “echo” of this line at an energy 1.74 keV lower. • At photon energies below about 2 keV, the ability to detect x-ray fluorescence lines begins to diminish. Energy-dispersive detectors are often equipped with thin, visible-light-opaque windows to separate the cold vacuum environment of the detector material from the specimen region, and these windows can preferentially absorb lower-energy fluorescence lines. The energy resolution of siliconbased detectors can also make it difficult to separate these low-energy lines. Finally, low fluorescence yield (Figs. 3.5 and 3.7) and self-absorption of low-energy fluorescence lines (Section 9.2.4) can also affect their detectability. These are merely challenges rather than roadblocks; several scanning x-ray microscopes have been used for successful fluorescence imaging at energies down to 280 eV [Kaulich 2009, Kaulich 2011, Hitchcock 2012]. Again, these effects can be seen in the experimental example of Fig. 9.12. In order to see these complicating factors in a more simplified example, Fig. 9.13 shows a simulation result for the fluorescence spectrum one might expect from Zn in a biological specimen. Simulations of this sort can be carried out using Monte Carlo methods [Schoonjans 2012, Golosio 2014], or a semi-analytical approach [Sun 2015]. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.2 X-ray fluorescence microscopy

373

Compton Rayleigh

S KƠ S Kơ 10-10

Zn KƠ

10-11

Zn Kơ

I(E)/I0(E0)

10-12 10-13

Zn L 10

Sum

-14

10-15 10-16

10-18

Detector resolution

O K

10-17

0

2

4

6

8

10

E (keV) Figure 9.13 Simulated SFXM spectrum from a biological specimen consisting of a 20 nm protein layer with 0.01 percent Zn added, contained within an overall specimen thickness of 100 nm of amorphous ice. Shown here relative to an incident beam intensity of 1 are the fluorescence contributions of S and O within the model protein (Box 4.8), and various fluorescence lines from Zn as well as the Rayleigh or elastic scattering background and the Compton background at 9.80 keV (Eq. 3.28). Signals at these x-ray energies are all affected by the energy-dispersive detector energy resolution and response as shown in Fig. 7.17, and a detector solid angle of 0.024 sr was assumed. Modified from [Sun 2015].

9.2.2

Fluorescence detector geometries As was noted in Fig. 9.11, most SFXM systems at synchrotron light sources use a detector placed at 90◦ to the incident beam in the horizontal plane so as to minimize elastic or Rayleigh scattering. However, as the detector solid angle (Eq. 9.23) is increased, a detector at this position will nevertheless begin to collect elastically scattered photons. It is of course advantageous to increase the solid angle that the fluorescence detector subtends, but in this case one may begin to consider other detector geometries such as those shown in Fig. 9.14. One notable example of this is the MAIA detector [Ryan 2010, Kirkham 2010, Siddons 2014] developed between CSIRO in Australia and Brookhaven National Laboratory in the USA. In present versions, this detector is composed of 384 “pixels” of energy-dispersive detector elements, each with their own pulse-processing circuitry, so that the detector can handle a much higher aggregate fluorescence count rate. These pixels are also equipped with collimators pointing towards the expected specimen/beam focus position. Because of the large planar extent of this detector, it is not convenient

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

374

X-ray spectromicroscopy

Specimen Incident beam

Fluorescence detector

Fast scan

Ƨscan Specimen Fast scan

Incident beam

Incident beam

180°

0° Forward detector geometry

Specimen Fluorescence detector Fast scan

Fluorescence detector

Backward detector geometry

90° Transverse detector geometry

Figure 9.14 Geometries for locating energy-resolving detectors for SFXM. The most common geometry is the transverse geometry, where the detector is oriented at 90◦ to the x-ray beam in the horizontal plane. At this angle, one will detect a minimum of elastically scattered photons due to the horizontal polarization of x-ray beams from bending magnet and most undulators at synchrotron light sources; one must also incline the scan direction by some angle θscan relative to the usual so as to allow for clearance between the specimen and the detector and to minimize self-absorption of the fluorescence signal within the specimen. As one increases the solid angle collected by the detector, the advantages of this geometry are muted so that one may consider other geometries such as the forward (0◦ ) or backward (180◦ ) positions. In the case of the 180◦ position, the detector must have a hole through which the incident beam can pass; this is the case of the MAIA detector [Ryan 2010, Kirkham 2010, Siddons 2014]. Figure adapted from [Sun 2015].

to mount in the 90◦ geometry; instead, it is mounted in the backwards-emission or 180◦ geometry as shown in Fig. 9.14, and the detector has a hole in its center to allow for the passage of the incident x-ray beam. This detector also has another unique characteristic: it delivers data in the form of a list of individual photon events (tagging the time of the event, and the energy of the photon) rather than integrating all photon events over a pre-selected integration time. The relative merits of various detector positions, specimen tilt angles, and solid angles can be considered using either Monte Carlo [Hodoroaba 2011] or semi-analytical [Sun 2015] approaches. As one example, one might worry that for very low element concentrations there could be a competition between different physical effects: • As the detection solid angle is increased, one will collect more “signal” photons which are emitted isotropically. • As the detection solid angle is increased, more “background” photons are collected as one moves off of the polarization-produced zero of elastic scattering. It turns out that signal collection dominates, so that large solid angle is always preferred (see [Sun 2015, Fig. 16]). In addition, the relative geometric ease that a planar detector at 180◦ has for collecting large solid angles outweighs the elastic scattering minimum for the 90◦ geometry for nearly all specimens. An exception is for low-mass specimens that are only a few micrometers thick, such as for detecting low concentration elements in biological specimens; in this case, a large solid angle detector in the transverse detector geometry offers advantages (see [Sun 2015, Fig. 14]). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.2 X-ray fluorescence microscopy

375

Table 9.1 Parameters for estimating the main K shell fluorescent flux produced by zinc.

K absorption edge Absorptive part of the oscillator strength Jump ratio (Eq. 3.8) Fluorescence energies Net K fluorescence yield Fractional yields F Electron–hole transfer factor T

9.2.3

E K = 9.659 keV f2 = 3.694 at 10 keV rK = 7.543 E Kα1 = 8.637 keV, E Kα2 = 8.614 keV ωK = 0.46937 F Kα1 = 0.57606, F Kα2 = 0.29435 T =1

Elemental detection limits using x-ray fluorescence Exact quantitation of trace element quantities using SFXM is a topic deep enough to fill entire books [M¨uller 1972, Russ 1984, Janssens 2000d], and complicated enough to warrant Monte Carlo methods for ab initio analysis [Vincze 1995b, Vincze 1995a, Vincze 1999, Schoonjans 2012, Schoonjans 2013, Golosio 2014]. In practice, most researchers carry out quantitative analysis by first recording the fluorescence emission from a “standard” sample which (ideally) has similar absorption and scattering characteristics to the main mass of the specimen under study, and fluorescing elements added at a known concentration that approximates what might be expected from the specimen under study. For example, one might start with a high-purity glass to which known quantities of trace elements are added, followed by thorough mixing when molten; the cooled sample should then contain a known concentration of trace elements with uniform distribution. By measuring the standard in the same apparatus as used for the unknown specimen, one can measure the concentration of a trace element by the ratio of emitted fluorescence signals. Originally the emitted fluorescence signal was measured by integrating the signal over an “energy window” set to incorporate most of the fluorescence line, but today the preferred practice is to measure the entire fluorescence spectrum delivered by the detector (an approach that was first used in electron [LeFurgey 1992] and proton [Ryan 1993] microprobes, and then introduced to SFXM [Vogt 2003b, Twining 2003]). Several computer codes are available to carry this out [Ryan 2000, Vogt 2003a, Sol´e 2007, Schoonjans 2012, Schoonjans 2013, Crawford 2019], and some of the general principles involved in full spectrum analysis are discussed in Section 9.3. Exact calculations of elemental detection limits requires detailed knowledge of effects including incomplete charge collection in the fluorescence detector (see Fig. 7.17). However, if these background limits are sufficiently low then the sensitivity is dominated simply by the number of fluorescent photons detected from a trace element, as discussed in Section 4.8.2. In Section 3.1.1 we gave the following expression (Eq. 3.10) for the flux into one x-ray fluorescence line: IKα1 (E) = ωK F Kα1 T Kα1 (1 −

1 )(1 − e−μZ (E)tZ ) I0 (E). rK

For elemental analysis it is more convenient to consider the areal mass density ρZ and mass absorption coefficient μZ of Eq. 9.3 for element Z, leading to an alternative form Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

376

X-ray spectromicroscopy

for the fluorescence flux of IKα1 (E) = ωK F Kα1 T Kα1 (1 −

1   )(1 − e−ρZ μZ (E) ) I0 (E). rK

(9.25)

The product ρZ μZ (E) can be found from Eqs. 9.2 and 9.3 as mZ NZ AZ = 2 2 Δr Δr N A N A μZ = 2re λ f2 A NZ   ρZ · μZ = 2 2 re λ f2 . Δr ρZ =

thus

(9.26)

If we had NZ = 104 zinc atoms in an area of (Δr = 50 nm)2 , the numerical value of ρZ ·μZ using 10 keV incident X rays is 1.03 × 10−5 , given the parameters listed in Table 9.1. Thus we are well justified in making the approximation 



(1 − e−ρZ μZ (E) )  ρZ μZ (E),

(9.27)

which along with Eq. 9.25 and Eq. 3.7 lets us write the fraction of Kα1 fluorescent photons per incident flux as hc 1 NZ IKα1 (E)  2ωK F Kα1 T Kα1 (1 − )re f2 (E) 2 , I0 (E) rK E Δr

(9.28)

with an obvious equivalent for the Kα2 fluorescence line. Since the Kα1 and Kα2 fluorescence lines are only 23 eV apart, and most SFXM systems use silicon-based energydispersive detectors with an energy resolution of about 150 eV (Eq. 7.33), the detected signal will be the sum of the Kα1 and Kα2 lines. Thus in practice we are interested in the numerical result of NZ IKα1 (E) + IKα2 (E) = (9.15 × 10−25 meters2 ) 2 , I0 (E) Δr which for NZ = 104 atoms in an area of (Δr = 50 nm)2 gives IKα1 (E) + IKα2 (E) = 3.66 × 10−6 . I0 (E) If we have an incident flux of I0 (E) = 109 photons/s, an x-ray fluorescence experiment with a per-pixel dwell time of 0.1 s and a detector solid angle coverage of 0.2 sr will detect a total of 0.2 = 58 photons 109 · 0.1 · 3.66 × 10−6 · 4π in a SFXM experiment. We found in Section 4.8.2 that one can have very low false positive and negative error rates for detecting an element even if only 10 photons are detected (assuming sufficiently small background normalized intensity Ib ), so this small number of zinc atoms in a (50 nm)2 area should be detectable. If the specimen is made of carbon with ρ = 2.26 g/cm3 and is t = 5 μm thick (an exceedingly simple model for a biological cell!), the sampled pixel has ρtΔ2r NA /A = 1.42 × 109 carbon atoms so the detection of 104 Zn atoms represents a concentration of 7.06 parts per million Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.2 X-ray fluorescence microscopy

377

(ppm) and a detected mass of NZ A/NA = 1.11 × 10−18 grams or 1.11 attograms. These numbers are representative of what is achieved at several synchrotron light source facilities [De Samber 2016]; for example, the absolute mass detection limit for ESRF beamline ID22NI with 0.3 ms pixel dwell time was reported to be about 2 attograms [Adams 2011]. With low photon counts in fluorescent signals, the spatial resolution is not necessarily given by the full probe resolution. A good approach to evaluate the achieved spatial resolution is to examine the power spectrum of the image, as illustrated in Fig. 4.19. Since one can use techniques like ptychography (Section 10.4) to measure the actual shape of the focused x-ray beam, it is natural to try to use deconvolution to obtain a sharper view of a x-ray fluorescence image [Vine 2012, Deng 2017b]. If deconvolution is combined with a Wiener filter as described in Section 4.4.8, one may well have different spatial resolution values for different fluorescence maps according to their signal strength, as shown in Fig. 4.49. Since many incident photons are required to detect elements present at low concentration, it is useful to estimate the radiation dose imparted to the specimen in SFXM. Consider the above flux of 109 photons/s and a pixel dwell or transit time of 0.1 s, so that one has 108 photons incident per (Δr = 50 nm)2 pixel. One can estimate the radiation dose DC imparted to the carbon mass (with μ−1 = 2.14 mm at 10 keV according to tabulations) using Eq. 4.281 as DC = n¯

Eμ ρΔ2r

= (108 photons) ·

(104 eV/photon) · (1.602 × 10−19 J/eV) · (2.14 × 10−3 m)−1 (2.26 g/cm3 ) · (10−3 kg/g) · (102 cm/m)3 · (50 × 10−9 m)2

= 1.3 × 107 Gy or 13 MGy. This dose can cause significant changes in the chemical bonding state of organics (as will be discussed in Chapter 11). However, especially if cryogenic conditions are used so that molecular fragments do not “fly away” in solution or in a vacuum, the element in question might remain in place along with the overall mass of the specimen region. That overall mass provides contrast in transmission imaging (especially phase contrast at higher x-ray energies) at length scales much larger than atomic, so x-ray microscopes may not show very much change in image contrast, as demonstrated in Fig. 11.11. As was noted in Section 4.10.1, SFXM represents one of the most sensitive methods for non-destructive detection of elements present at low concentrations; it has a sensitivity roughly 1000 times better than electron microprobes due to very small background signals relative to the desired fluorescence signal [Janssens 2000a]. However, for biological studies one must exercise care in specimen preparation since it is known that chemical fixation can be associated with the loss of certain diffusible elements; rapid freezing followed either by freeze-drying or, better yet, imaging in the frozen hydrated state offers better chemical fidelity [Matsuyama 2010, Perrin 2015, Jin 2017, Jones 2017, De Samber 2018]. A recent paper summarizes the factors one must consider in Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

378

X-ray spectromicroscopy

Fluorescence self-absorption in 25% protein, 75% water 1.0

Zn (8.6 keV) Cu (8.0 keV) Fe (6.4 ke V) Mn (5.9 keV)

Transmisson

0.8

0.6 Ca (3.7 keV)

K (3.3 keV)

0.4

Cl (2.6 keV)

0.2

P (2.0 0.0 0

20

keV)

40 7KLFNQHVV ѥP

S (2.3

60

keV) 80

Figure 9.15 Self-absorption can give rise to errors and artifacts in fluorescence tomography (as well as incorrect ratios of elements) if not corrected for. Shown here is the self-absorption calculated for fluorescence from a number of biologically interesting elements due to varying thicknesses of a mix of 25 percent generic protein (Box 4.8) and 75 percent water to represent the cytosol of a typical cell [Fulton 1982, Luby-Phelps 2000].

sample preparation, experimental data collection, and analysis for the accurate detection of elements present at low concentration [Lemelle 2017].

9.2.4

Fluorescence self-absorption One of the advantages of x-ray microscopy is its ability to work with quite thick specimens. However, in fluorescence microscopy there arises a potential complicating factor: after an incident photon penetrates the specimen and is absorbed by an atom, the lower-energy x-ray fluorescence photon that might result has a chance of being absorbed within the specimen before it can escape and be detected. This is known as self-absorption, and of course it affects the detection of lighter elements more than of heavier elements (due to the Z 2 dependence of x-ray fluorescence emission energies; see Eq. 3.12). An example in the case of biological imaging is shown in Fig. 9.15. One can correct for this effect with assumptions about the nature of the specimen [Janssens 2000b], or by measuring the absorption in the transmitted beam, as discussed in the next section (Section 9.2.5).

9.2.5

Fluorescence tomography Fluorescence tomography provides a way to view the 3D distribution of elements in a specimen, as shown in Fig. 9.16. As the beam in a SFXM is scanned across one object slice (see Fig. 8.1), the transmission detector records information on the net absorption along the beam direction, and the fluorescence detector records information on

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.2 X-ray fluorescence microscopy

379

Transmitted x-ray beam detector Tomographic rotation X-Y scan Incident x-ray beam

Specimen

X-ray fluorescence detector (energy resolving)

Figure 9.16 Schematic of fluorescence tomography data collection in SFXM. At each scan position, the transmission detector records the integrated absorption along the beam column as required by the pure projection assumption in standard tomography. The fluorescence detector records the integrated signal from each fluorescence emission line in much the same way, also satisfying the projection assumption. Figure adapted from [de Jonge 2010b].

the net fluorescence signal along that same beam direction. One can therefore use an energy-dispersive detector to record what is equivalent to a pure projection image for each fluorescence line, and carry out a standard tomographic reconstruction of the object [Boisseau 1986, Boisseau 1987, Cesareo 1989]. This approach has been extended to sub-micrometer-resolution 3D imaging of low-concentration elemental distributions using SFXM [de Jonge 2010a], as will be shown in Fig. 12.3. An alternative approach is to use a detector that is sensitive to only one plane transverse to the x-ray beam direction, as shown in Fig. 9.17. This is called confocal x-ray fluorescence, even though the optical arrangement is much different than in visible-light confocal microscopes. The confocal optic should be non-dispersive, so single capillaries (Section 5.2.3) or polycapillary optics (Section 5.2.5) are usually used, typically with a depth resolution of several micrometers. If a wavelength dispersive confocal optic is used, it is possible to deliver a collimated beam to a wavelength-dispersive detector such as a flat crystal spectrometer. The confocal approach was proposed [Gibson 1992] and then demonstrated [Ding 2000, Kanngießer 2003] for imaging a selected depth within a thicker specimen, after which scanning the specimen along the x-ray beam direction led to 3D imaging [Vincze 2004]. The self-absorption problem in x-ray fluorescence described in the previous section (Section 9.2.4) becomes even more pressing as one goes to the thickness of 3D specimens. This can lead to severe artifacts in x-ray fluorescence tomography reconstructions, such that the fluorescence elements in the interior of the object simply do not appear in the reconstruction; this is illustrated in Fig. 9.18. However, the 3D information of tomography can also be put to advantage. If one could tune the x-ray beam energy to match each of the fluorescence line emission energies, a transmission tomogram could be taken at each energy and used to calculate the exact self-absorption characDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

380

X-ray spectromicroscopy

Incident x-ray beam

Capillary optic Specimen

Detection depth plane

Confocal overlap To fluorescence detector

Figure 9.17 Confocal method for x-ray fluorescence imaging of a selected depth plane in a 3D specimen. In this example, the incident x-ray beam comes from above to stimulate x-ray fluorescence emission along its path through the specimen, while a non-wavelength-dispersive optic (such as a capillary, or a polycapillary lens as shown here) orthogonal to the beam limits fluorescence signal detection to one depth plane in the specimen. If one is using a wavelength-dispersive detector (Section 7.4.11), it is advantageous to deliver a mostly collimated fluorescence signal beam to the detector (in fact, rays can diverge from the exit of the capillary optic by as much as ±θc , the grazing incidence critical angle, as given by Eq. 3.115 and discussed in Section 5.2.5). Note that the confocal arrangement is very different than what is used in visible-light confocal microscopes, where the illuminating and detecting lenses lie on the same optical axis (and in epi-illumination schemes one lens serves both functions).

teristics of the specimen and thus correct for it (as long as self-absorption merely reduced but did not eliminate detection of fluorescence from that element) [Hogan 1991]. However, if one is trying to detect multiple elements in the specimen, the number of datasets required can become overwhelming and difficult to acquire (for example, some of the x-ray fluorescence energies might be below the lowest incident photon energy available on a particular light source beamline). Other approaches are usually used, sometimes following upon developments made for radionuclide emission tomography [Chang 1978, Nuyts 1999, Zaidi 2003]. In the case of objects with uniform absorption, and illumination at a single x-ray energy, analytical approaches have been developed [La Rivi`ere 2004, Miqueles 2010] and these have been shown [Miqueles 2011] to provide a good starting point for iterative methods. One iterative approach has been to use transmission tomography data at a single x-ray energy to estimate the absorption at all x-ray fluorescence energies (using the fact that, in the absence of x-ray absorption edges, x-ray absorption scales with x-ray energy as shown in Fig. 3.20), and thereby correct for self-absorption [Schroer 2001, La Rivi`ere 2006, Yang 2014]. One can also add the Compton scattered signal as another measurement of overall specimen electron density, and use the tabulated absorption coefficients μeE of all elements e at each fluorescence energy E [Golosio 2003, De Samber 2016]. Other approaches classify the specimen as being composed of a finite number of material phases for the calculation of Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.3 Matrix mathematics and multivariate statistical methods

Glass

W

Au ѥP

8 6 4 2 0

Rotation Ƨ

Solid cylinder without self-absorption

Position Ʋ Solid cylinder with self-absorption

Hollow cylinder with self-absorption

Transmission x10-3 sinogram

Rotation Ƨ

Si fluorescence sinogram

Sample

381

x10-3

Position Ʋ

4 3 2 1 0 ×10-3 4 3 2 1 0

×1010 6 5 4 3 2 1 ×1010 6 5 4 3 2 1 ×1010 6 5 4 3 2 1

Figure 9.18 Illustration of self-absorption in x-ray tomography, in a simulation. At left is shown an object slice of a 200 μm diameter simulated glass rod surrounded by two 10 μm diameter wires, one of tungsten (W) and one of gold (Au); in the bottom row, the glass rod is assumed to be hollow with a wall thickness of 30 μm. Sinograms (Fig. 8.2) from the simulated 12.1 keV transmission images (right row) clearly show the difference between the solid and hollow glass rod, since the absorption length μ−1 of 12.1 keV x-rays in the glass is 150 μm. However, even the hollow glass rod is large compared to the 1.66 μm absorption length μ−1 of Si Kα1 X rays in the glass, so Si fluorescence is only detected from the side of the glass cylinder facing the fluorescence detector. If there were to be no self-absorption of the Si fluorescence signal, one would obtain an Si XRF sinogram as shown in the top row, where the incident x-ray beam is partially absorbed in the small W and Au wires as they rotate into positions to intercept the incident beam before it reaches the glass cylinder. However, when self-absorption is included, the sinograms for the solid and hollow glass rods are nearly indistinguishable as shown in the middle and bottom images, respectively. By combining information from the fluorescence (XRF) and transmission (XRT) sinograms, one can in principle obtain a better reconstructed image of the specimen in the case of strong fluorescence self-absorption [Schroer 2001, Golosio 2003, Di 2017]. This simulation figure is adapted from [Di 2017], while [De Samber 2012, Fig. 3] provides a nice experimental demonstration.

self-absorption [Vekemans 2004]. The above methods have usually used specific energy windows for fluorescence analysis, but more recently full-spectrum fluorescence analysis has been combined with transmission tomography to correct for self-absorption of x-ray fluorescence in an optimization approach [Di 2017].

9.3

Matrix mathematics and multivariate statistical methods The tradition among experts in x-ray absorption or fluorescence spectroscopy has been to carefully acquire one high-statistics spectrum from a specimen, and then carry out

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

382

X-ray spectromicroscopy

exhaustive analysis that might include density functional theory calculations so as to understand the physical origins of all features in the spectrum. This detailed level of understanding is of course to be celebrated, but it becomes impractical in spectromicroscopy as one has not one or a few, but thousands or even millions of spectra in one dataset. We therefore turn to the consideration of matrix mathematics and multivariate statistical analysis methods for x-ray spectromicroscopy. The notation in what follows is centered on x-ray absorption spectroscopy, but of course the mathematics is the same if one considers x-ray fluorescence. In x-ray absorption spectroscopy, one measures the specimen transmission I(E) as a function of energy which of course decreases exponentially with increases in specimen thickness t as given by the Lambert–Beer law (Eq. 3.76). To obtain a representation that is linear with the projected thickness, this is normalized relative to the incident spectrum I0 (E), leading to an optical density of D(E) = − ln[I(E)/I0 (E)] = μ(E) t(E),

(9.29)

as given by Eq. 3.83. In 2D spectromicroscopy, the optical density is spatially resolved, so in fact one has a 3D dataset D(x, y, E) (and with tomography one would have D(x, y, z, E)). However, when it comes to analyzing the set of spectra obtained, we do not care about their arrangement in an image, so we instead consider this dataset to be a 2D array DN×P where we use n = 1, . . . , N to index the set of photon energies of the spectra, and p = 1, . . . , P to index the pixels according to p = icol + (irow − 1) · nrows ,

(9.30)

where icol and irows are both indexed from a starting value of 1. Now since Eq. 9.29 gives D = μt, the optical density matrix DN×P should be the product of a set S of absorption spectra μN×S and a matched set S of thickness maps T S ×P leading to a matrix equation for the optical density of DN×P = μN×S tS ×P .

(9.31)

In principle there could be a unique spectrum at each pixel, in which case S = P, but we will hope that there will be some common spectroscopic signatures S among at least some of the pixels P so that S < P. Therefore we will refer to S as the set of chemical components to the specimen, and in fact it is this set that we are trying to deduce from the data. With that understanding, we can write out Eq. 9.31 as ⎤ ⎡ ⎤ ⎡ pixels D1P ⎥⎥⎥ ⎢⎢⎢ μ11 components μ1S ⎥⎥⎥ ⎢⎢⎢ D11 ⎥ ⎢ ⎢⎢⎢ .. ⎥⎥⎥ ⎢⎢⎢ .. ⎥⎥⎥⎥ ⎢⎢⎢ spectra = ⎥ ⎢ ⎥ ⎢ . spectra . ⎥⎥⎥⎥ ⎥⎥⎦ ⎢⎢⎣ ⎢⎢⎣ ⎦ ... DNP ... μNS DN1 μN1 ⎡ ⎤ pixels t1P ⎥⎥⎥ t11 ⎢⎢⎢ ⎢⎢⎢ .. ⎥⎥⎥⎥ · ⎢⎢⎢⎢ components (9.32) . ⎥⎥⎥⎥ ⎢⎣ ⎦ ... tS P tS 1 to make more explicit the meaning of the rows and columns in all three matrices. In some cases we might in fact know the exact set of absorption spectra μN×S for Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.3 Matrix mathematics and multivariate statistical methods

383

all the chemical components S that, combined together, make up our specimen. With x-ray fluorescence, we may know the set of elements present in the specimen, and the spectral response of the detector, so that we can again represent any measured spectrum as a linear combination of the spectra produced by all fluorescence lines plus Rayleigh, Compton, and incomplete charge collection backgrounds [Ryan 1993]. In these cases, we can simply use the rules of linear algebra to calculate directly the “thickness maps,” or images of the thickness of each chemical component at each pixel, as tS ×P = (μN×S )+ DN×P = μ+S ×N DN×P ,

(9.33)

where μ+S ×N is a pseudoinverse of μN×S . Numerical matrix pseudoinversion can be done by a number of methods, but the most common approach involves singular value decomposition (SVD). Based on the Eckart–Young theorem of linear algebra, it states that an array AN×S with N ≥ S can be decomposed into AN×S = U N×S · WS ×S · VST×S ,

(9.34)

where the matrix U N×S has orthogonal columns, the matrix WS ×S is zero everywhere except for its diagonal elements, which are all zero or positive (these diagonal elements are called the singular values), and the matrix VS ×S has orthonormal rows. That is, these matrices have the properties that UST×N · U N×S = 1S ×S , and VST×S · VS ×S = 1S ×S . The singular value decomposition algorithm [Press 2007] or SVD routine that is present in many linear algebra subroutine libraries can be used to numerically construct these arrays. With them, one can find the pseudoinverse of AN×S as A+S ×N = VS ×S · WS−1×S · UST×N ,

(9.35)

where the inverted matrix WS−1×S is again a diagonal matrix with elements Wi,i−1 that are the inverse of the singular values Wi,i , or zero when Wi,i = 0. Use of SVD for x-ray spectromicroscopy analysis with a set of known spectra has proven to be useful in soft x-ray XANES [Zhang 1996, Koprinarov 2002]; it is particularly useful in the analysis of immiscible polymers (Fig. 9.4) where one indeed has only a few known spectra that can be determined in advance by spectroscopy on pure thin films.

9.3.1

Principal component analysis In many cases (particularly in biology or environmental science), the specimen cannot be assumed to be made up of a simple combination of a limited number of components for which reference spectra are known a priori. One approach to handle these cases involves the use of principal component analysis (PCA) to characterize the dataset in terms of its most significant variations without prior knowledge of their characteristics. There is a long tradition of using PCA in the social sciences and in chemistry [Malinowski 1991], and its first use in x-ray microscopy was in connection with photoelectron-based imaging [King 1989], with scanning transmission x-ray microscopy applications coming later [Osanna 2000]. The goal in PCA is to describe the specimen by a set of s = 1, . . . , S abstract abstract

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

384

X-ray spectromicroscopy

components (where S abstract ≤ N). These abstract components describe the main spectroscopic signatures in the data; these signatures may in fact arise from a linear combination of several different chemical species, so that there is not a simple, direct relationship between one particular abstract component and one particular chemical component of the specimen. While one can use SVD to carry out a PCA calculation, in many cases the numerical implementation involves creating square arrays of dimension N × N or P × P, whichever is larger, which is of course inefficient if N  P (in soft x-ray XANES spectromicroscopy, one might have N = 100 energy points, and P = 5122 pixels). A more efficient approach is to first calculate the spectral covariance from the optical density data times its transpose, or ZN×N = DN×P · DTP×N

(9.36)

which measures the correlation between images at various energies. Because the correlation of the image at energy n1 with the image at energy n2 is the same as the correlation of n2 with n1 , the covariance matrix ZN×N is symmetric. One can then use an eigenvalue routine from a linear algebra subroutine library to find a matrix of eigenvectors (which we will henceforth call eigenspectra C N×S abstract ) and eigenvalues λ(s) that fully span the covariance matrix: ZN×N · C N×S abstract = C N×S abstract · ΛS abstract ×S abstract

(9.37)

where S abstract = N, and ΛS abstract ×S abstract is a diagonal matrix whose diagonal elements are given by the eigenvalues λ(S abstract ) for S abstract = 1, . . . , N. We can also find a corresponding matrix (which we will henceforth call the eigenimage matrix RS abstract ×P ) from T RS abstract ×P = C N×S · DN×P , abstract

(9.38)

where we have used the fact that C is orthogonal (being composed of eigenvectors) so that its inverse is its transpose, or C −1 = C T . Finally, it is obvious that we can also re-write Eq. 9.38 as DN×P = C N×S abstract RS abstract ×P ,

(9.39)

so that we can represent the full dataset with the matrix product of the eigenspectra times the eigenimages, and the most significant information as DN×P = C N×S¯ abstract RS¯ abstract ×P ,

(9.40)

where the expression of Eq. 9.40 takes advantage of data size reduction and noise suppression as we will now discuss. To understand the power of PCA analysis, it is helpful to look at the example shown in Fig. 9.19. The dataset shown [Lerotic 2004] is one with lutetium serving as a stand-in for americium in interactions with ferrihydrite and humic acids, the latter of which are organics in soil that can affect the transport of radionuclides [Dardenne 2002]. An oxygen K edge XANES spectromicroscopy dataset was acquired, and a set of eigenvalues, eigenspectra, and eigenimages were calculated using Eqs. 9.36–9.38. As one can see in Fig. 9.19, these quantities have the following characteristics: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.3 Matrix mathematics and multivariate statistical methods

(b) Eigenspectra

(a) Eigenvalues 106 Eigenvalueh(Sabstract )

1

6

105

5

104 103

4 2 3 4

102

3 2 (zero)

5 6

101 10

385

1

(zero)

0

0

10 20 30 40 Component index Sabstract

50

525

530

535 540 545 Photon energy (eV)

550

(c) Eigenimages

1

2

3

4

5

6

Figure 9.19 Example use of principal component analysis (PCA) in x-ray spectromicroscopy. Shown here is an oxygen K edge XANES dataset [Lerotic 2004] of a specimen with lutetium used as a stand-in for americium in a study of radionuclide transport in groundwater [Dardenne 2002]. The first four eigenvalues λ(S abstract ) contain most of the significant variation in the eigenspectrum/eigenimage representation of the data, as indicated by the fact that they have much stronger values than all other eigenvalues even when displayed on a logarithmic scale. The importance of the first four eigenvalues is also seen in the fact that the first four (or the reduced set S¯ abstract = 4) eigenspectra show XANES-spectrum-like features, while the spectra for S abstract = 5 and above show mostly uncorrelated noise with low eigenvalue weightings. The eigenimages tell a similar story; the first four (or S¯ abstract = 4) show recognizable image features, while eigenimages S abstract = 5 and beyond show only the “salt and pepper” appearance of random noise (weakly visible here because images 1–6 are shown on the same intensity scale). However, the eigenspectra and eigenimages beyond the first one have both positive and negative values, with the negative values in the eigenimages shown here on a red instead of grey color scale. Because both the eigenspectra and eigenimages show negative values that resemble successive orthogonal differences from the first eigenspectrum and eigenimage, it is difficult to interpret these higher eigenspectra and eigenimages on their own; it is only in linear combination that they reproduce measured x-ray absorption spectra at various image pixels.

• They are obtained from the data “as is,” with no a priori biases on the nature of the data. • The eigenvalues λ(S abstract ) decrease rapidly on a logarithmic scale. Because the eigenvalues represent overall weightings of eigenspectra and eigenimages in the dataset, they indicate that most of the significant variations in the data can be represented by using only the first four components. This represents a tremendous degree of data compression: in essence one can represent most of the significant variations in the data with a reduced set of S¯ abstract = 4 abstract components, rather than (in this case) N = 120 photon energies. The separation between the S¯ abstract = 4 components from the full set of S abstract = N components is not always Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

386

X-ray spectromicroscopy

so clear as is shown in this example, so one should also examine the eigenspectra and eigenimages as described next. • Among the set of eigenspectra C N×S abstract , the first eigenspectrum shows what is effectively an average of all optical density spectra present in the dataset, with all positive values as one would expect for an optical density (negative values would represent negative x-ray absorption μt, or the addition rather than removal of energy from the x-ray beam—clearly unphysical!). However, the subsequent eigenspectra have a mean value of zero, with excursions to both positive and negative values. In effect, the subsequent eigenspectra represent successive differences from the first eigenspectra as required to represent all of the observed variations in the dataset. This means that it is very difficult to interpret the spectroscopic signatures of individual eigenspectra; instead, one only matches observed spectra when forming an appropriate linear combination of the reduced number S¯ abstract of eigenspectra. The eigenspectra beyond S¯ abstract = 4 in this example show mostly stochastic variations that are characteristic of noise (plus, in this case, a slight non-linearity of response in the strongly absorbing energy range of 540–543 eV, which is likely due to incomplete suppression of higher monochromator orders as discussed near Eq. 5.45 and illustrated in Fig. 9.9). • Among the set of eigenimages RS abstract ×P , the average over all individual optical density images is shown in the S abstract = 1 eigenimage, and then successive images show different positive and negative value variations. As with the eigenspectra C N×S abstract , it is only through a linear combination of the S¯ abstract = 4 most significant eigenimages that one can represent the individual optical density images acquired at the various photon energies N. The eigenimages for S abstract = 5 and beyond show the “salt and pepper” appearance of random noise, except for a slight shadow in the most strongly absorbing region near the center due to nonlinearities (the same non-linearity as shown in the 540–543 eV energy range in eigenspectrum S abstract = 5). On the positive side, PCA gives us a way to significantly compress the dataset for subsequent analysis (by reducing the dimensionality from N × P to S¯ abstract × P, where in this case S¯ abstract = 4 is much smaller than N = 120), and the compressed data are delivered as a set of orthogonal, successive-difference eigenspectra and eigenimages. On the negative side, because the eigenspectra and eigenimages beyond S abstract = 1 show successive differences with positive and negative values, they are difficult to interpret on their own. In addition, one can spoof image classifiers that use PCA as a data pretreatment step by deliberately adding in weak image features designed to “tickle” weak eigenimages (see [Sharif 2016] for one amusing example).

9.3.2

Cluster analysis and optimization methods The downsides of PCA listed above mean that it is usually used not as a final step in analysis, but as a pre-treatment step for a variety of follow-on methods. The first approach demonstrated in x-ray spectromicroscopy was cluster analysis [Lerotic 2004],

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.3 Matrix mathematics and multivariate statistical methods

1.0

1

Pixel weighting in Sabstract=2

Pixel weighting in Sabstract=3

2

4 5 3

2 0

0.5

5

0.0

34 1

-0.5

-1

1

2

-1.0

-2

-3 -3

387

-2

-1

0

1

2

-1.5 -3

Pixel weighting in Sabstract=4

-2

-1

0

1

2

Pixel weighting in Sabstract=4

Figure 9.20 Scatterplots of the weightings RS abstract ×P (Eq. 9.38) of pixels P for two components S abstract compared against each other: S abstract = 3 versus 4 at left, and 2 versus 4 at right. These weightings show the relative strength of the associated eigenspectra present in a pixel, and one can only show two-wise comparisons in an individual 2D plot. By using a k-means clustering algorithm on the Euclidean distance between pixels over the full set of S abstract dimensions, one can classify pixels as belonging predominantly to one eigenspectrum or another. Figure adapted from [Lerotic 2004].

where one seeks dense groupings of pixels in an S abstract -dimensional search space. Because each pixel P has a weighting RS abstract ×P for the set S abstract of components as given by Eq. 9.38, one can use a k-means clustering algorithm to classify pixels based on their spectral similarity, as shown in Fig. 9.20. With a “hard” clustering method which classifies each pixel to one and only one group, one then obtains a classification map as shown in Fig. 9.21(a). One can then average the spectra of the pixels in each cluster together to obtain a set of spectra μS ×N,cluster from which a pseudoinverse can be obtained and used in Eq. 9.33 to obtain thickness or weighting maps corresponding to these spectra. This provides a good first start to analysis, even though one can still arrive at non-physical negative values for the reconstructed optical density for reasons shown in [Mak 2014, Fig. 2]. One can improve upon the basic cluster analysis approach described above in a number of ways, including by using “soft” clustering methods in which each pixel is given a weighting of how strongly it belongs in one cluster or another [Ward 2013], or by using an angle rather than Euclidean distance measure [Lerotic 2005]. Additional classification approaches have been developed for spectrum imaging in transmission electron microscopy [Bonnet 1999] and x-ray fluorescence analysis in electron microprobes [Kotula 2003]. In fact, one can do much more. What we have done in Eq. 9.31 as well as in Eq. 9.40 is to write our data analysis problem as one of solving a simple matrix equation of the form y = Ax Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

388

X-ray spectromicroscopy

(a) Categorized map

(b) Spectra of categorized map (c) Similarity measure of spectra

Optical density

2.0

5

1.5

4 3

1.0

2 0.5 0.0 525

1 Similarity 530

535

540

545

550

Photon energy (eV) ѥP (d) Weighting images of categorized spectra

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Figure 9.21 Cluster analysis of the data of Fig. 9.19. By assigning pixels exclusively to one cluster or another as shown in Fig. 9.20, one obtains a simple map (a) of regions with similar spectroscopic response, and (b) the corresponding spectra. A “genetic relationship” plot of the similarity of the cluster spectra and their merge points is shown in (c). From the set of spectra μN×S shown in (b), one can obtain the pseudoinverse and use Eq. 9.33 to obtain thickness or weighting maps as are shown in (d). The negative (unphysical) values that can appear in these thickness maps point out a shortcoming in this approach, but it can still provide a good starting solution which can then be refined using optimization methods. Figure adapted from [Lerotic 2004].

which, to the alert reader, should ring a bell: it is Eq. 8.9, the basic equation for the class of numerical optimization problems discussed in Section 8.2.1 in the context of iterative tomography reconstruction algorithms. This means that all of the approaches discussed in terms of image reconstruction, including compressive sensing, are in principle applicable to spectromicroscopy data analysis. Inspired by the method of non-negative matrix analysis [Lee 1999], which corresponds to the fact that both the absorption spectra μN×S and thickness maps tS ×P should have only positive values, one can use optimization approaches where the basic cost function is given by matching the spectra and thicknesses to the data, or C0 = DN×P − μN×S tS ×P 

(9.41)

so that one wishes to minimize the two-norm (Eq. 8.13) of this cost. That is, one wishes to find min C0 2 . One can add regularizers C such as the one-norm (Eq. 8.12) of ||tS ×P ||1 to seek a “sparse” solution (that is, to seek a set of spectra that make the thickness maps as different as possible from each other, so that a pixel is more likely to be dominated by one of the set of spectra μN×S rather than a more even weighting of all the spectra). One can also add a positivity constraint on μN×S . This approach has been used with Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

9.4 Concluding limerick

389

success in XANES spectromicroscopy [Mak 2014, Mak 2016], and it is implemented in a software package available for download [Lerotic 2014]. There’s one important difference between tomography reconstructions based on pθ = Wθ f (Eq. 8.7) as well as x-ray fluorescence analysis, versus x-ray absorption spectroscopy based on DN×P = μN×S tS ×P (Eq. 9.31): • In tomography one usually knows the set of angles θ from which projection images were obtained, and in x-ray fluorescence analysis one knows the set of x-ray fluorescence lines from all 92 stable elements. • In XANES spectromicroscopy, however, one may not know ahead of time the set S of chemical components with distinguishably different absorption spectra. Principal component analysis can be used to obtain the number S¯ abstract of distinguishable spectra (Eq. 9.40) with no prior knowledge, after which one can carry out cluster analysis followed by non-negative and sparse refinement [Mak 2014, Mak 2016] to “discover” what these S¯ abstract spectra look like. That is, in XANES spectromicroscopy one can recover the “hidden” organizing factors S¯ abstract present in the data. This approach is also used in single-particle imaging methods in electron microscopy (subject of the 2017 Nobel Prize in Chemistry), where one seeks to recover the set θ of viewing directions from which projection images of identical macromolecules were obtained [Frank 1975a, Frank 1988]. Electron microscopists can recover additional “hidden” organizing factors from a dataset, such as molecular conformation states [Scheres 2007, Spahn 2009]. We will see these single-particle methods again in Section 10.6 when we discuss coherent diffraction imaging with free-electron lasers.

9.4

Concluding limerick Adding spectroscopy to x-ray microscopy to make spectromicroscopy lets us understand what we are seeing, ranging from trace element distributions to chemical binding states in complex materials. It’s all cool enough to inspire a limerick! X-rays have spectra with wiggles from electrons; they shakes and they jiggles I could be confused but instead am amused Such images give me the giggles!

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010

10 Coherent imaging

Janos Kirz contributed to this chapter. As discussed in Chapter 5, there have been considerable advances in the development of x-ray lenses using diffraction, reflection, and refraction. However, all of these x-ray lenses have their limitations: they typically have numerical apertures (N.A.) of 0.005 or less, and efficiencies ranging from 60 percent to as low as single digit percentages. Compare that with what is available in objective lenses for visible-light microscopes, where oil immersion objectives go up to N.A. = 1.4 with efficiencies near 100 percent and excellent fields of view and achromatic properties. In electron microscopy, highend microscopes have spherical and even chromatic aberration correctors, an efficiency limited only by scatter-limiting apertures and detectors, and a spatial resolution as low as 50 picometers [Erni 2009]. Among these techniques, x-ray microscopy suffers the most from lens-imposed limitations so there is an especially strong motivation to consider lensless x-ray imaging methods such as holography, coherent diffraction imaging (CDI; sometimes called diffraction microscopy), and ptychography. Besides the images shown in this chapter, x-ray ptychography images are shown later on in 2D (Fig. 12.2) and 3D (Fig. 12.6), while an example of a Bragg CDI image is shown in Fig. 12.8.

10.1

Diffraction: crystals, and otherwise X rays have been used for atomic resolution imaging, without lenses, for more than a century via the method of x-ray crystallography (from the Greek word κρύσταλλος or krystallos for “ice”). As discussed in Section 2.1, von Laue had realized in 1912 that the regular spacing of atoms in crystals could provide evidence of the wavelength of X rays. Later that year, Lawrence Bragg and his father William worked out their eponymous law describing crystalline diffraction (Eq. 4.33), and used it to determine the structure of several simple cubic lattice salts, including NaCl and KCl [Bragg 1913a, Bragg 1913b]. Another key advance was provided in 1934 by Arthur Lindo Patterson, who realized that the autocorrelation map A of a diffraction pattern gives direct information on interatomic distances in crystals, in an approach that is now called the Patterson map [Patterson 1934b, Patterson 1934a]. Recall from Eq. 4.83 that we can describe convolution as a shift–multiply–add operation, as illustrated in Fig. 4.18. By carrying out a shift–multiply–add sequence on a crystal’s diffraction pattern, one gets a reinforcement of the double-slit-like grating patterns arising from any two point sources in the

Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011

10.1 Diffraction: crystals, and otherwise

Object

391

Diffraction pattern (+inset)

Patterson map A (autocorrelation) Patterson map vectors

Figure 10.1 Patterson map of a three-atom simulated crystal. At top left is shown the object, consisting of three spheres, all with the same electron density; the line segments separating these spheres are shown in color. The diffraction pattern at top right has rings similar to an Airy pattern, as shown in Fig. 4.22 (except that the projected thickness of a sphere is different than a uniform circular disk); the pattern is then modulated by interference grating patterns from the interatomic spacings. This can be seen in the inset in the diffraction pattern, which shows a region near the optical axis zoomed up. By squaring the diffraction magnitudes and taking the Fourier transform, one obtains the Patterson or autocorrelation map A(x , y , z ) (Eq. 10.1) at bottom left, which shows all of the interatomic distance vectors contained in the unit cell at bottom right. A Patterson map of a non-crystalline object is shown in Fig. 10.16.

object. Since the electron density of an atom acts like a point source scatterer, the selfconvolution or autocorrelation of the diffraction pattern then produces strong signals at positions corresponding to the interatomic distances as shown in Fig. 10.1. Crystallographers label the diffraction spots by Miller indices hkl , as was shown in Fig. 4.11, so in their notation the diffraction amplitudes are written as Fhkl . Thus for a crystallographer the autocorrelation of the diffraction pattern magnitudes—the Patterson map A(x , y , z ) where {x , y , z } represent positions within the unit cell of dimension {a, b, c}—is written as [Patterson 1934b, Eq. 6]  y z x (10.1) |Fh,k,l |2 exp[i 2π(h + k + l )], A(x , y , z ) = a b c hkl which is the inverse Fourier transform of diffraction intensities rather than of the properly phased diffraction amplitudes. As shown in Fig. 10.1, the Patterson map A(x , y , z ) provides direct information on the vector distance between atoms in the unit cell. Aided by this and other advances, by the 1930s crystallography was being applied to understand biological structures, and today the method has advanced so far that many labs obtain crystallographic structures by using airborne shipment of newly made crystals Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011

392

Coherent imaging

Structural heterogeneity

Eukaryotic cell

Atom Molecule

Bacterium Mitochondrion

Ribosome

Size

Figure 10.2 Structural heterogeneity versus size in biology. Crystallography relies on a very low degree of heterogeneity among the many molecules in their unit cells. As one gets to larger structures, one can find a greater range of conformational variations in macromolecules; one can sort for a number of these variations in single particle electron microscopy [Leschziner 2007, Scheres 2010, Spahn 2009, Cossio 2013]. Individual variations are seen in the interiors of larger viruses (see for example [Xiao 2005, Xiao 2009, Okamoto 2017]), and by the time one reaches the length scale of bacteria one has a large degree of structural heterogeneity, even before reaching the variability of eukaryotic cells. Figure adapted from [Jacobsen 2016a].

to synchrotron light sources for near-automatic, remote control structure determination. These developments in x-ray crystallography are so significant in science that they deserve a full tellling of their history and conceptual development, which others already provide [Finch 2008, Authier 2013, Jaskolski 2014]. As powerful as it is, crystallography requires crystals, where many identical structures have been persuaded to line up in an orderly lattice. Simple molecules have mostly identical structures, though subunits can twist and swing during chemical reactions (a classic example on a larger, biologically important molecule is the cis–trans conformational change of rhodopsin [Palczewski 2006]). Even so, large macromolecules— such as subunits of the ribosome [Yonath 1980]—can be persuaded to form crystals. However, the interaction with nearest neighbors can slightly affect the positioning of the outermost atoms in a unit cell, and large molecules can have multiple conformational states (which can be sorted out in single particle electron microscopy [Leschziner 2007, Scheres 2010, Spahn 2009, Cossio 2013]). Virus capsids can also be crystallized, though their interiors can show considerable individual variation (see for example [Xiao 2005, Xiao 2009, Okamoto 2017]). By the time one reaches the size of bacteria, all bets are off with crystallization, and eukaryotic cells show even more variation. Indeed, in biology there is a trend of increasing structural heterogeneity as one goes to larger objects (Fig. 10.2). The consequence is that one loses the ability to form crystalline arrays, as well as the required degree of having identical copies of the same structure. Therefore even though X rays have excellent properties for imaging micrometersized and larger specimens (Fig. 4.82), crystallography is no longer relevant for these larger individual objects. We are then left with a dilemma: there can be situations where we want to see things beyond the limit of what x-ray lenses can deliver, yet where we do not have enough Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011

10.1 Diffraction: crystals, and otherwise

393

Figure 10.3 X-ray diffraction pattern from a single object, versus various periodic arrays of that object. The “object” here is a photograph of David Sayre (1924–2012), a cherished colleague of ours who provided key insights into the inversion of diffraction patterns [Sayre 1952a]. At left is shown the real space image, while the corresponding diffraction intensity is shown at right (with an inset showing a zoomed-in view of a subregion). The diffraction pattern from a single object is quite continuous, with a characteristic size of “speckles” corresponding to the size of the object as given by Eq. 10.5. As more copies of the object are placed in a regular, periodic array, the diffraction intensity begins to be concentrated into Bragg spots with the continuous diffraction pattern (what crystallographers call “diffuse scattering”) becoming more obscured. Note that the 1 × 2 case shown at upper right produces something like a double-slit interference modulation of the single object diffraction pattern.

structural regularity to use the methods of crystallography to obtain an “image” of a unit cell using Bragg diffraction spots. What about the diffraction pattern from a non-crystalline object? An example of such a far-field diffraction pattern is shown in Fig. 10.3, in a simulation where one goes from one isolated object, to an increasing number of copies of the same object in a regular array. As this figure shows, the single object diffraction pattern is much different than the set of Bragg spots that result from a large crystal: it is a much more continuous function, with information accessible across an entire far-field detector array rather than just at certain pixels that coincide with Bragg spots. Since the far-field diffraction patterns of Fig. 10.3 can be calculated by taking a Fourier transform complex transmittance (see Box 10.2) of # (Eq. 4.76) of the object’s $ 1 g(x, y) = z exp k iδ(x, y, z) − β(x, y, z) dz and then squaring the result to obtain the inDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011

394

Coherent imaging

Box 10.1 The phase problem in crystallography One early statement of the phase problem in crystallography comes from William Duane at Harvard in 1925. Drawing upon Duane’s earlier work on a momentumbased picture of quanta (particles, or photons) interacting with crystals [Duane 1923], Epstein and Ehrenfest had shown that one can calculate the intensities present in Bragg peaks from the density distribution of the diffracting material [Epstein 1924]. Duane then stated [Duane 1925]: “If we reverse the line of thought, and attempt to deduce the density, ρ(x, y, z), of the diffracting power (or the density of the electron distribution) in a crystal from the measured intensities of the various reflected beams by adding together the corresponding terms in the Fourier series, we find that these intensities do not determine the phase angles, δ. In other words, an indefinitely large number of distributions of diffracting power will produce beams of rays of precisely the same intensities in the same directions.” A simple example of this is shown from the shift theorem of Fourier transforms given by Eq. 4.86; all sidew