Written by a pioneer in the field, this text provides a complete introduction to Xray microscopy, providing all of the
743 86 30MB
English Pages 587 [595] Year 2019
Table of contents :
Cover
Front Matter
Advances in Microscopy and Microanalysis
Xray Microscopy
Copyright
Contents
Contributors
Foreword
1 Xray microscopes: a short
introduction
2 A bit of history
3 Xray physics
4 Imaging physics
5 Xray focusing optics
6 Xray microscope systems
7 Xray microscope instrumentation
8 Xray tomography
9 Xray spectromicroscopy
10 Coherent imaging
11 Radiation damage and cryo
microscopy
12 Applications, and future prospects
Appendix A.
Xray data tabulations
References
Index
Xray Microscopy Written by a pioneer in the ﬁeld, this text provides a complete introduction to xray microscopy, providing all of the technical background required to use, understand, and even develop xray microscopes. Starting from the basics of xray physics and focusing optics, it goes on to cover imaging theory, tomography, chemical and elemental analysis, lensless imaging, computational methods, instrumentation, radiation damage, and cryomicroscopy, and includes a survey of recent scientiﬁc applications. Designed as a “onestop” text, it provides a uniﬁed notation, and shows how computational methods in different areas are linked with one another. Including numerous derivations, and illustrated with dozens of examples throughout, this is an essential text for academics and practitioners across engineering, the physical sciences, and the life sciences who use xray microscopy to analyze their specimens, as well as those taking courses in xray microscopy. Chris Jacobsen is Argonne Distinguished Fellow at Argonne National Laboratory, and Professor of Physics and Astronomy at Northwestern University. He is also a Fellow of the American Association for the Advancement of Science, the American Physical Society, and the Optical Society of America.
Advances in Microscopy and Microanalysis Microscopic visualization techniques range from atomic imaging to visualization of living cells at near nanometer spatial resolution, and advances in the ﬁeld are fueled by developments in computation, image detection devices, labeling, and sample preparation strategies. Microscopy has proven to be one of the most attractive and progressive research tools available to the scientiﬁc community, and remains at the forefront of research in many disciplines, from nanotechnology to live cell molecular imaging. This series reﬂects the diverse role of microscopy, deﬁning it as any method of imaging objects of micrometer scale or less, and includes both introductory texts and highly technical and focused monographs for researchers and practitioners in materials and the life sciences Series Editors Patricia Calarco, University of California, San Francisco Michael Isaacson, University of California, Santa Cruz Series Advisors Bridget Carragher, The Scripps Research Institute Wah Chiu, Baylor College of Medicine Christian Colliex, Université Paris Sud Ulrich Dahmen, Lawrence Berkeley National Laboratory Mark Ellisman, University of California, San Diego Peter Ingram, Duke University Medical Center J. Richard McIntosh, University of Colorado Giulio Pozzi, University of Bologna John C. H. Spence, Arizona State University Elmar Zeitler, FritzHaber Institute
Books in Series Published Heide Schatten, Scanning Electron Microscopy for the Life Sciences Frances Ross, Liquid Cell Electron Microscopy Joel Kubby, Sylvain Gigan, and Meng Cui, Wavefront Shaping for Biomedical Imaging Chris Jacobsen, XRay Microscopy
Xray Microscopy CHRIS J AC OB SEN Argonne National Laboratory, Illinois Northwestern University, Illinois
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107076570 DOI: 10.1017/9781139924542 c Chris Jacobsen 2020 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2020 Printed in Singapore by Markono Print Media Pte Ltd A catalogue record for this publication is available from the British Library. ISBN 9781107076570 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or thirdparty internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
Contributors Foreword 1
Xray microscopes: a short introduction
1.1 1.2 1.3 2
A bit of history
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3
How to read this book Online appendices Key mathematical symbols and formulae
R¨ontgen and the discovery of X rays Einstein and mirrors Cold War microscopes Zone plates Synchrotrons and lasers Lensless microscopes The dustbin of history Concluding limerick
Xray physics 3.1 The Bohr model, energy levels, and xray shells 3.1.1 Xray ﬂuorescence and Auger emission 3.1.2 Xray transitions: ﬂuorescence nomenclature 3.1.3 Beyond the core: the Fermi energy, valence electrons, and plasmon modes 3.2 Atomic interactions, scattering, and absorption 3.2.1 Scattering by a single electron 3.2.2 Scattering by an atom 3.3 The xray refractive index 3.3.1 Electromagnetic waves in media 3.3.2 The great frequency divide and the refractive index 3.3.3 Xray linear absorption coeﬃcient 3.3.4 The Born and Rytov approximations 3.3.5 Oscillator density in molecules, compounds, and mixtures 3.4 Anomalous dispersion: life on the edge
page xii xiii 1 2 3 3 5 5 9 12 13 14 19 21 22 23 23 26 30 35 38 41 42 45 47 49 55 57 59 62
vi
Contents
3.5 3.6 3.7 4
3.4.1 The Kramers–Kronig relations Xray refraction Xray reﬂectivity Concluding limerick
Imaging physics
4.1
4.2
4.3
4.4
4.5 4.6 4.7
Waves and rays 4.1.1 Adding up waves 4.1.2 Rayleigh quarter wave criterion 4.1.3 Connecting waves and rays Gratings and diﬀraction 4.2.1 Slits and plane gratings 4.2.2 Volume gratings and Bragg’s law 4.2.3 Bragg’s law and crystals 4.2.4 Synthetic multilayer mirrors 4.2.5 Momentum transfer and the Ewald sphere Waveﬁeld propagation 4.3.1 The Huygens construction 4.3.2 Fraunhofer approximation 4.3.3 Fourier transforms: analytical and discrete 4.3.4 Power spectra of images 4.3.5 Fraunhofer diﬀraction 4.3.6 Fresnel propagation by integration, and by convolution 4.3.7 Fresnel propagation, distances, and sampling 4.3.8 Propagation and diﬀraction in circular coordinates 4.3.9 Multislice propagation Imaging systems 4.4.1 Field of view 4.4.2 Optical system via propagators 4.4.3 Diﬀraction and lens resolution 4.4.4 Beating the diﬀraction limit in light microscopy 4.4.5 Cylindrical (1D by 1D) optics 4.4.6 Coherence, phase space, and focal spots 4.4.7 Transfer functions 4.4.8 Deconvolution: correcting for the transfer function 4.4.9 Depth resolution and depth of ﬁeld Fullﬁeld imaging 4.5.1 TXM condensers, STXM detectors, and reciprocity Darkﬁeld imaging Phase contrast 4.7.1 Phase contrast in coherent imaging methods 4.7.2 Propagationbased phase contrast 4.7.3 Zernike phase contrast imaging 4.7.4 Diﬀerential phase contrast
63 64 66 70 71 71 71 76 76 78 78 80 82 83 86 92 93 95 96 100 101 102 104 106 110 112 114 116 119 122 126 129 138 141 144 149 152 154 156 159 160 162 165
Contents
4.8
4.9 4.10
4.11 4.12
4.7.5 Grazing incidence imaging Image statistics, exposure, and dose 4.8.1 Photon statistics and the contrast parameter Θ 4.8.2 Minimum detection limits 4.8.3 Signal to noise and resolution from experimental images 4.8.4 Estimating the required photon exposure 4.8.5 Imaging modes and diﬀraction From exposure to radiation dose 4.9.1 Dose versus resolution Comparison with electron microscopy and microanalysis 4.10.1 Elemental mapping 4.10.2 Transmission electron microscopy 4.10.3 A comparison of transmission imaging with electrons and with X rays See the whole picture Concluding limerick
vii
167 169 169 172 174 177 181 183 186 187 188 191 194 197 198
5
Xray focusing optics 5.1 Refractive optics 5.1.1 Compound refractive lenses 5.2 Reﬂective optics 5.2.1 Grazing incidence spheres and toroids 5.2.2 Kirkpatrick–Baez and Montel mirrors 5.2.3 Ellipsoidal and Wolter mirrors, and single capillaries 5.2.4 Multilayer mirrors 5.2.5 Nonimaging grazing incidence optics 5.3 Diﬀractive optics 5.3.1 Fresnel zone plates 5.3.2 Focusing eﬃciency 5.3.3 Order sorting 5.3.4 Fabrication 5.3.5 Making zone plates thicker 5.3.6 Thick zone plates and multilayer Laue lenses 5.3.7 Multilayer Laue lenses: practical considerations 5.4 Combined optics 5.5 Resolution over the years 5.6 Concluding limerick
199 199 200 204 207 209 211 213 215 216 216 221 224 225 230 232 235 236 238 240
6
Xray microscope systems 6.1 Contact microscopy 6.2 Point projection xray microscopes 6.3 Fullﬁeld microscopes, or transmission xray microscopes 6.3.1 Zone plate condensers 6.3.2 Capillary condensers
241 242 243 246 248 249
viii
Contents
6.4 6.5 6.6
Scanning xray microscopes Electron optical xray microscopes (PEEM and others) Concluding limerick
252 256 258
7
Xray microscope instrumentation 7.1 Xray sources 7.1.1 Photometric measures 7.1.2 Laboratory xray sources: electron impact 7.1.3 Unconventional laboratory xray sources 7.1.4 Synchrotron light sources 7.1.5 Bending magnet sources 7.1.6 Undulator sources 7.1.7 Inverse Compton scattering sources 7.1.8 Xray freeelectron lasers (FELs) 7.2 Xray beamlines 7.2.1 Monochromators and bandwidth considerations 7.2.2 Coherence and phase space matching 7.2.3 Slits and shutters 7.2.4 Radiation shielding 7.2.5 Thermal management 7.2.6 Vacuum issues, and contamination and cleaning of surfaces 7.3 Nanopositioning systems 7.4 Xray detectors 7.4.1 Detector statistics 7.4.2 Detector statistics: dead time 7.4.3 Detector statistics: charge integration 7.4.4 Pixelated area detectors 7.4.5 Semiconductor detectors 7.4.6 Sensor chips for direct xray conversion 7.4.7 Scintillator detectors: visiblelight conversion 7.4.8 Gasbased detectors 7.4.9 Superconducting detectors 7.4.10 Energyresolving detectors 7.4.11 Wavelengthdispersive detectors 7.4.12 Energydispersive detectors 7.5 Sample environments 7.5.1 Silicon nitride windows 7.6 Concluding limerick
259 260 261 263 266 267 271 272 277 277 280 280 282 285 286 286 287 288 292 294 297 299 302 303 308 308 310 312 313 313 314 316 318 320
8
Xray tomography
321 322 325 327 329
8.1
8.2
Tomography basics 8.1.1 The Crowther criterion: how many projections? 8.1.2 Backprojection, ﬁltered backprojection, and gridrec Algebraic (matrixbased) reconstruction methods
Contents
8.3 8.4
8.5
8.6 8.7
8.2.1 Numerical optimization 8.2.2 Maximum likelihood and estimation maximum Analysis of reconstructed volumes Tomography in xray microscopes 8.4.1 Tomographic mapping of crystalline grains 8.4.2 Tensor tomography Complications in tomography 8.5.1 Projection alignment 8.5.2 Limited tilt angles and laminography 8.5.3 Pixel intensity errors and ring artifacts 8.5.4 Beam hardening 8.5.5 Selfabsorption in ﬂuorescence tomography Limiting radiation exposure via dose fractionation Concluding limerick
ix
330 335 336 338 341 341 342 342 344 346 347 347 347 349
9
Xray spectromicroscopy 9.1 Absorption spectromicroscopy 9.1.1 Elemental mapping using diﬀerential absorption 9.1.2 Living near the edge: XANES/NEXAFS 9.1.3 Carbon XANES 9.1.4 XANES in magnetic materials 9.1.5 XANES in phase contrast 9.1.6 Errors in XANES measurements 9.1.7 Wiggles in spectra: EXAFS 9.2 Xray ﬂuorescence microscopy 9.2.1 Details of xray ﬂuorescence spectra 9.2.2 Fluorescence detector geometries 9.2.3 Elemental detection limits using xray ﬂuorescence 9.2.4 Fluorescence selfabsorption 9.2.5 Fluorescence tomography 9.3 Matrix mathematics and multivariate statistical methods 9.3.1 Principal component analysis 9.3.2 Cluster analysis and optimization methods 9.4 Concluding limerick
350 350 352 355 357 361 363 364 365 367 370 373 375 378 378 381 383 386 389
10
Coherent imaging 10.1 Diﬀraction: crystals, and otherwise 10.2 Holography 10.2.1 Inline or Gabor holography 10.2.2 Oﬀaxis or Fourier transform holography 10.2.3 Holography, ankylography, and 3D imaging 10.3 Coherent diﬀraction imaging and phase retrieval 10.3.1 Xray speckle and object size 10.3.2 Coherent versus incoherent diﬀraction
390 390 394 396 399 403 404 406 407
x
Contents
10.4
10.5 10.6 10.7
10.3.3 Iterative phase retrieval algorithms 10.3.4 Coherent diﬀraction imaging with X rays 10.3.5 CDI geometry and notation 10.3.6 Iterative phase retrieval algorithm details 10.3.7 Focus and resolution in CDI 10.3.8 Bragg CDI Ptychography 10.4.1 Ptychography geometry and resolution gain G p 10.4.2 Ptychography reconstruction algorithms 10.4.3 Focus and resolution in ptychography 10.4.4 Ptychography experiments 10.4.5 Bragg ptychography 10.4.6 Beyond strict Nyquist sampling Coherent imaging beyond the pure projection approximation CDI at XFELS: diﬀract before destruction Concluding limerick
410 415 418 420 423 425 433 435 440 443 444 446 447 447 451 456
11
Radiation damage and cryo microscopy 11.1 Specimen heating 11.1.1 AntiGoldilocks and the “noﬂy zone” 11.1.2 Heating and ionization with short, intense pulses 11.2 Radiation damage 11.2.1 Radiation damage in soft materials 11.2.2 Radiation damage in water and in hydrated organic materials 11.2.3 Radiation risk in humans 11.2.4 Radiation damage in initially living specimens 11.2.5 Dose rate eﬀects 11.2.6 Specimen size eﬀects 11.2.7 Lowdose strategies 11.3 Cryo microscopy 11.3.1 Vitriﬁcation and amorphous ice 11.3.2 Radiation damage to organics at cryogenic temperatures 11.3.3 Bubbling in frozen hydrated specimens 11.3.4 Radiation damage limits to resolution in cryo xray microscopy 11.4 Radiation damage in hard materials 11.5 Concluding limerick
457 457 460 460 462 462 468 469 471 474 476 476 477 484 489 492 493 494 495
12
Applications, and future prospects 12.1 Life science 12.2 Geoscience and environmental science 12.3 Astrobiology 12.4 Materials science 12.5 Cultural heritage 12.6 Future prospects
496 496 500 502 503 510 511
Contents
12.7 Appendix A
xi
Concluding limerick
513
Xray data tabulations
515
References Index
519 573
Contributors
Janos Kirz
Lawrence Berkeley National Laboratory (retired) Malcolm Howells
Lawrence Berkeley National Laboratory, USA (retired) Michael Feser
Lyncean Technologies, USA ˘ Gursoy ¨ Doga
Advanced Photon Source, Argonne National Laboratory, USA Adam Hitchcock
McMaster University, Canada
Foreword
Xray microscopy is an interdisciplinary topic, both in terms of its technical details and in terms of the scientiﬁc and engineering problems it is applied to. While there are a number of books that provide excellent coverage of certain aspects of xray physics, optics, and microscopy, it is my opinion that there has not been a single book that one can hand to someone new in the ﬁeld of xray microscopy to give them an introduction to most of the key aspects they should know about. This book is an attempt to ﬁll that need. Are you a new PhD student entering a research group who will use xray microscopy for part of your research? If so, you have probably had at least a year or so of university physics during your studies. You are whom I have written the book for! At times I may push you a bit further in mathematics or physics than what you have learned thus far, but if you are in a PhD program you are a serious enough student so this should be OK. Besides, you can always skim over some of the more detailed points. Are you an established researcher or engineer who is new to xray microscopy? This book is also for you! Your expertise might be with microscopes using other radiation, or on materials you hope to understand better using xray microscopy. What I hope to do in this book is to give you a feel for the fundamental ideas that come into play in a variety of xray microscopy approaches and applications, and to do so with enough detail to allow you to go oﬀ and invent new approaches of your own. I look forward to seeing your contributions to xray microscopy! What do I mean by xray microscopy? I have decided to focus on imaging at a spatial resolution of a few micrometers down to nanometers. This is not a book on medical radiology at 0.1 mm resolution as limited by acceptable radiation exposure, and it is not a book on crystallography. I consider X rays to be photons with an energy well above the plasmon resonance (20–50 eV for most solids) and in particular above about 100 eV, and I tend to concentrate on energies below 20 keV since at higher energies the ﬁne structure that one hopes to see in a microscope has reduced contrast. While much useful research is done in an approach where X rays illuminate an area and magnetic or electrostatic lenses image the electrons that come oﬀ of the surface, these photoelectron emission microscopes (PEEM and its variations) are based on electron, not xray, optics so they are given only brief treatment in Section 6.5. However, I do discuss xray microscopy approaches where one uses the properties of xray scattering to recover images without the use of lenses in Chapter 10, and I also include the combination of xray microscopy with absorption and ﬂuorescencebased spectroscopy in Chapter
xiv
Foreword
9. I discuss threedimensional imaging or tomography as a natural extension of twodimensional microscopy in Chapter 8. Chapter 7 covers what I consider to be essential points on xray microscope instrumentation. X rays are ionizing radiation, so Chapter 11 is devoted to radiation damage as well as cryo microscopy methods that can help in minimizing damage. While Chapter 12 discusses applications of xray microscopy, these applications ultimately involve detailed knowledge in their respective scientiﬁc specialties, which may be undergoing rapid development. Therefore the coverage here is rather brief, while pointing out recent review papers when possible. I expect that I have made many sins of commission, and of omission. Cambridge University Press has a web page www.cambridge.org/Jacobsen associated with this book (one can also reach this web page with www.cambridge.org/9781107076570). This web page will host errata, as well as online Appendices B and C. This book was originally undertaken as a team eﬀort with one of my favorite people in the world: Janos Kirz, who is one of the real pioneers in xray microscopy. However, the book has taken longer to complete than we had hoped, and Janos has rightfully been enjoying his retirement more completely as of late. His ﬁngerprints are all over the earliest chapters, and he has provided valuable feedback on the entire tome. However, as the book has grown and developments in later chapters have motivated rewrites of earlier ones, all of the warts and blemishes in what remains have become my fault alone. Therefore at Janos’ request he is no longer listed as a coauthor – which means, I guess, that you can’t blame him for anything that’s wrong or incomplete! A number of other people have provided wonderful input. Some are listed as contributors to speciﬁc chapters, in which case I will not thank them again here. But people like Marc Allain, Elke Arenholz, Lahsen Assouﬁd, Anton Barty, Anna Bergamaschi, Sylvan Bohic, Anibal Boscoboinik, Virginie Chamard, Henry Chapman, Si Chen, Yong Chu, Marine Cotte, Bj¨orn De Samber, Peter Fischer, Manuel GuizarSicairos, Mirko Holler, Young Pyo Hong, Xiaojing Huang, Sarah K¨oster, Florian Meirer, Nino Miceli, G¨unter Schmahl (1936–2018), Xianbo Shi, Pierre Thibault, Stephen Urquhart, Ivan Vartanyants, Pablo VillanuevaPerez, Stefan Vogt, Michael Wojcik, Russell Woods, and Hanfei Yan have taken the time to read various sections of the book and give important critical comments and suggestions, or contributed ﬁgures. Several of my Northwestern University PhD students (Sajid Ali, Ming Du, and Saugat Kandel in particular) have given me great feedback on speciﬁc sections. Joshua Zachariah made early versions of several ﬁgures. Again, you can’t blame any of the above for my mistakes, but you can thank them for reducing their number. One can only undertake the project of writing a book like this with lots of support. The Advanced Photon Source at Argonne National Laboratory (a U.S. Department of Energy Oﬃce of Science user facility) has generously supported me in devoting considerable time to this eﬀort, since xray microscopy is one of its widely used methods. My wife, Holly, has been patient with me in so many ways, and has helped keep me in balance as the project progressed by joining me on many activities, adventures, and travels that have kept me refreshed and enthusiastic! Some income from this book is being directed to a student prize at the international conference series on xray microscopy.
1
Xray microscopes: a short introduction
Xray microscopes are systems in which an xray beam is used to illuminate a specimen, and some sort of xray image is obtained with a spatial resolution δr of micrometers to nanometers.1 Some microscopes use an xray lens such as a Fresnel zone plate to produce a magniﬁed image (Fig. 1.1), while others use an xray lens to produce a small beam spot through which the sample is rasterscanned while image data are collected (Fig. 1.2). When using microscopes, one of the ﬁrst questions asked is this: “What is the magniﬁcation?” This is an eminently sensible question to ask when one is looking at an image through the eyepieces of a visible light microscope, and indeed the objective lenses and eyepiece lenses in visible light microscopes are usually labeled in terms of their respective magniﬁcation, such as 40× and 10× to yield a net magniﬁcation of 400×. However, we do not recommend that you somehow contrive to make an xray eyepiece! Your eye does not directly register xray images, and in any case you do not wish to expose any part of yourself to such high doses of X rays (this will be discussed in Chapter 11). Because we are likely to view the same image at vastly diﬀerent magniﬁcations (ranging from printed images in journal papers, to images on computer screens, to very large images projected in conference rooms), it is far more convenient to instead talk about the spatial resolution δr of images (see Section 4.4.3), and a ﬁeld of view which is the viewable width and height at the specimen’s location. (A common practice that we strongly recommend is to place a scale bar on the image, which shows how large some deﬁned distance would appear; see for example Fig. 4.60.) That is, an image might have a resolution of δr = 20 nanometers (or nm), and a ﬁeld of view of 10 micrometers (or μm) on a side. The continuous intensity variations I(x, y) in Cartesian coordinates are almost always sampled onto a regular discrete array I[i x , iy ] with spacing Δ x and array indices i x = 0, 1, . . . , (N x − 1), and corresponding values in y. Thus one might encounter an image with a picture element or pixel size of Δ x,y = 10 nm, and N x = 1024 and Ny = 768 pixels, giving a ﬁeld of view of 10.24 × 7.68 μm. The extension to 3D imaging involves volume elements or voxels, and the z direction. Of course the image is just a particular representation of the object under study; for incoherent brightﬁeld imaging, the image intensity I(x, y) represents the magnitude squared of the waveﬁeld at a particular plane, which hopefully is the downstream side of the specimen so that one obtains a pixelbypixel mapping of xray absorption in 1
We follow the convention of the American Institute of Physics style guide, so that the noun is “X ray” and the adjective is “xray.”
2
Xray microscopes: a short introduction
Object N.A.
Large source
Objective Image zone detector plate
Condenser zone plate
Figure 1.1 Schematic of a transmission xray microscope, delivering a fullﬁeld image. An xray source illuminates a specimen (often by using a condenser lens to image the source onto the specimen), and the transmitted waveﬁeld is imaged by an objective lens onto a pixelated detector. The numerical aperture (N.A.) of the objective lens is indicated (lens radius divided by focal length; see Eq. 4.172).
Zone plate objective
Sample, raster scanned
N.A.
Undulator Monochromator
Order Sorting Aperture
Fluoresc. detector
Transmitted beam detector
Figure 1.2 Schematic of a scanning xray microscope, delivering a scanned image. An xray
source such as an undulator at a synchrotron light source is (optionally) monochromatized, and an objective lens with numerical aperture N.A. images this source (or an illuminated aperture) to produce a small focal spot through which a specimen is scanned. One detector might record the transmitted signal (either measuring the total signal, or measuring its redistribution such as with a pixelated detector), and other detectors such as an energyresolving detector for xray ﬂuorescence signals can be used. Figure modiﬁed from [de Jonge 2010a].
the specimen. In fact the image usually consists of this signal S due to the presence of contrast in the specimen, and noise N due to stray light, statistical ﬂuctuations, or other causes; the signaltonoise ratio (SNR) weighs their relative contributions (this is discussed in Section 4.8). As we shall see, some xray microscopes deliver multiple image signals, such as simultaneous absorption and phase contrast, or energydependent signals as will be discussed in Chapter 9.
1.1
How to read this book Of course you should really read every word of this book, or even memorize it! But not all readers will need to be concerned with every detail, so here are a few suggestions: • If you just want to get a feel for what xray microscopes can do, look at Chapter 12,
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:06:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.002
1.2 Online appendices
3
which summarizes a number of recent scientiﬁc applications of xray microscopy while providing representative images. You can then put this book on your shelf and pull it down when you need to read up on certain details. • If you are wondering what type of xray microscope to use for a speciﬁc application, Chapter 6 provides an overview. • If your main interest is in using xray absorption spectroscopy combined with imaging, see Section 9.1. If you are mostly interested in ﬂuorescence imaging of the distribution of chemical elements in a specimen, see Section 9.2. • Be aware of limitations due to ionizing radiation damage, as discussed in Chapter 11. Many of the later chapters refer back to discussions of the fundamentals of xray physics in Chapter 3 and imaging physics in Chapter 4. In most cases there will be a crossreference to the exact section or equation number.
1.2
Online appendices Two short appendices available online at www.cambridge.org/Jacobsen2 supplement this book: • Online Appendix B contains further detail on how to calculate the visible and xray refractive index, and properties derived from it. • Online Appendix C provides examples of the many diﬀerent ways that the key formulae for maximum likelihood and estimation maximum approaches are written in the literature. These are short enough to print out, if so desired.
1.3
Key mathematical symbols and formulae One of the beauties of physics is that there is widespread agreement on the basic notation: we all know what F = ma means, for example. This is mostly but not completely true in xray optics and microscopy. Therefore we list our notation for some of the most important mathematical terms in xray microscopy in Table 1.1. There are certain instances where key terms and formulae are written in diﬀerent ways within the community. Some write the xray refractive index as n = 1 − δ + iβ whereas we use n = 1 − δ − iβ, as discussed in Box 3.4. Our usage of the terms “magnitude” and “amplitude” is discussed in Box 4.1. The deﬁnition of momentum transfer q varies in the literature, as discussed in Box 4.2. We discuss depth resolution δz and depth of ﬁeld (DOF) in Box 4.7.
2
See also www.cambridge.org/9781107076570
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:06:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.002
4
Xray microscopes: a short introduction
Table 1.1 Key mathematical symbols and their meaning in this book, along with the section
where they ﬁrst are described. See also Box 4.1 for our usage of the words “magnitude” and “amplitude.” Additional photometric quantities are shown in Section 7.1.1. Symbol Meaning Location Xray wavelength λ, photon energy E, and λ = hc/E Planck’s constant h (Eq. 3.2) multiplied by the Eq. 3.7 speed of light c (Eq. 3.55). na and ne Atom (na ) and electron (ne ) number density Eqs. 3.21 and 3.22 Λ and σ Mean free path (Λ) and cross section (σ) Eq. 3.25 Complex number of oscillator modes per atom, Eq. 3.42 ( f1 + i f2 ) which varies with xray energy E. 2 α As used in n = 1 − αλ ( f1 + i f2 ) Eq. 3.66 Linear absorption coeﬃcient (μ) and inverse Eqs. 3.45 and 3.75 μ attenuation length (μ−1 ) μ Mass absorption coeﬃcient Eqs. 3.78 and 9.3 Xray refractive index n with its phaseshifting part Eq. 3.67 n = 1 − δ − iβ δ and absorptive part β θc Critical angle for grazing incidence reﬂectivity Eq. 3.115 δr Spatial resolution Eq. 4.173 Figs. 1.1 and 1.2, N.A. Numerical aperture of an optic and Eq. 4.172 drN Outermost zone width in a Fresnel zone plate Eq. 5.27 Depth resolution (depth of ﬁeld is DOF = 2δz ; see Eq. 4.213 δz Eqs. 4.214 and 4.215) Pixel size (picture element size) at the object. The subscript r can be thought of as referring to real Eq. 4.87 Δr space rather than Fourier space, or a vector coordinate r with components in xˆ and yˆ . Δu Pixel size in Fourier space Eq. 4.92 Δdet Size of a pixel on an area detector. Eq. 4.93 N Number of image pixels (as in N x and Ny ) Eq. 4.87 Spatial frequencies, or wavelengthnormalized Eqs. 4.32 and 4.88 u x , uy diﬀraction angles u x = θ x /λ, uy = θy /λ F Fourier transform, as in G(u x ) = F {g(x)} Eq. 4.80 SNR Signaltonoise ratio, or S /N Section 4.8.1 DQE Detective quantum eﬃciency Eq. 7.34 Θ Contrast parameter Eq. 4.238 Φ Flux, in photons/second Section 7.1.1 F Fluence, in photons/area or photons/m2 Section 7.1.1 I Intensity Eq. 4.3
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:06:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.002
2
A bit of history
Those who cannot remember the past are condemned to repeat it – George Santayana, Reason in Common Sense (Vol. 1 in The Life of Reason), 1905. Janos Kirz contributed to this chapter.
2.1
¨ Rontgen and the discovery of X rays The words of discovery are rarely those of Archimedes’ legendary shout of “Eureka!” or “I have found it!” as he supposedly leaped naked from his bathtub (good thing there weren’t webcams in those days!). Instead, the words of discovery are more likely to be along the lines of “hmm . . . that’s odd.” Such is the case of the discovery of X rays [Glasser 1933, Mould 1993]. At the time of their discovery, many investigators were carrying out experiments with various types of cathode ray tubes, but it was only Wilhelm Conrad R¨ontgen, Professor and Director of the Physical Institute at the University of W¨urzburg, who noticed some curious phenomena and decided to investigate further. R¨ontgen was 50 years old at the time, with a reputation for care in experiments even though his research in the physics of gases and ﬂuids was not particularly cuttingedge. Cathode rays (which we would now call electron beams) were all the rage at the time, so R¨ontgen decided to investigate whether they would exit thinwalled Hittorf–Crookes tubes. To make it easier to use a phosphor to try to observe this, he surrounded a tube with black paper and worked in a darkened room. While setting up the experiment late on a Friday afternoon (November 8, 1895), he noticed that the phosphor was ﬂickering in synchrony with the ﬂuctuations of the glowing ﬁlament in the tube – even though the phosphor was some distance away, and with black paper in between! The odd phenomena immediately captured R¨ontgen’s attention to the point that he did not notice an assistant entering the room later on to retrieve some equipment. When R¨ontgen’s wife Bertha ﬁnally succeeded in getting a servant to coax him upstairs to their apartment on the top ﬂoor of the Institute, R¨ontgen ate little of his supper and spoke even less before returning that evening to the puzzle in the lab.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
6
A bit of history
Over the course of the following weeks, R¨ontgen did precisely what a careful investigator would do: while suspecting that he was seeing a new type of radiation, he tried to rule out alternative explanations for his curious observation, and he explored with great care the characteristics of the phenomenon. He noticed that the “rays” seemed to travel in straight lines with no Fresnel fringes, no mirror reﬂection, and no refraction, and that they were absorbed preferentially by materials with higher atomic numbers. At one point during his experiments while he was picking up a lead brick, he noticed something startling: he could see the bones of his ﬁngers on his phosphor screen! Along with his discovery that these “rays” could darken photographic plates, he soon brought Bertha into the lab and recorded a famous radiograph of her hand with her wedding ring, though she is said to have remarked unhappily “Now I have seen my death!” [Mould 1993]. With quite some trepidation, R¨ontgen produced a handwritten manuscript summarizing his observations in 17 numbered sections which included his observation that “if one holds the hand between the discharge apparatus and the screen, one sees the darker shadows of the bones within the much fainter picture of the hand itself” (transla¨ tion by Glasser [Glasser 1933]). The title of his manuscript was “Uber eine neue Art von Strahlen (Vorl¨auﬁge Mitteilung)” or “On a new kind of rays (preliminary message),” and within the manuscript he stated “Der K¨urze halber m¨ochte ich den Ausdruck ‘Strahlen’ und zwar zur Unterscheidung von anderen den Namen ‘XStrahlen’ gebrauchen” or “for the sake of brevity I would like to use the term ‘rays,’ and to distinguish them from others I will use the name ‘Xrays’.” This manuscript [R¨ontgen 1895], which contained no images, was submitted to the Sitzungsberichte der Physikalischmedicinischen Gesellshaft zu W¨urzburg or the Transactions of the W¨urzburg Physical Medical Society on Saturday, December 28, 1895, or seven feverish weeks after that fateful afternoon. We must now pause to consider the notion that our internetconnected world is frenetic compared to a stately past. Can you imagine what might happen if you were to submit a paper in the Americas or in Europe on the Saturday between Christmas and New Year’s day, and how long it would be before it appeared in print even if it didn’t have to go through peer review? If this is within your experience, you may ﬁnd the following sequence hard to believe. Starting from R¨ontgen’s handwritten manuscript delivered on Saturday, December 28, the journal issue became available on New Year’s Day (Wednesday, January 1, 1896) at which time R¨ontgen was able to pick up printed copies of his article. He then mailed copies to several respected physicists for their comment, along with photographs (not part of the publication) including the famous one of his wife’s hand. Since R¨ontgen was still not conﬁdent that he had not overlooked a more mundane explanation for his observations, he is said to have stated “Now the devil will be to pay.” Perhaps instead the angels danced a jig! One of the recipients of R¨ontgen’s preprint was Franz Exner in Vienna, who showed it to some friends at a party, including Ernst Lecher who was the son of the editor of Die Presse in Vienna. Thus it was that the sensational news came to appear on the front page of the Sunday edition of the newspaper on January 5, 1896, and in the grand tradition of media echo chambers the Daily Chronicle in London reported this on the following day: Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
¨ 2.1 Rontgen and the discovery of X rays
7
The noise of the war’s alarm should not distract attention from the marvelous triumph of science which is reported from Vienna. It is announced that Prof. Routgen [sic] of the Wurzburg University has discovered a light which for the purpose of photography will penetrate wood, ﬂesh, cloth, and most other organic substances. The professor has succeeded in photographing metal weights which were in a closed metal case, also a man’s hand which showed only the bones, the ﬂesh being invisible.
This was followed by brief commentaries in the New York Electrical Engineer on January 8, in the New York Medical Record on January 11, in Nature on January 16 (p. 253), and in Science on January 24 (p. 131). An English translation of the paper was printed in Nature on January 23 [R¨ontgen 1896a], and in Science on February 24 [R¨ontgen 1896b] – both of which included a reprint of the photo of Bertha’s hand. The New York Times started out with skepticism, ﬁrst commenting on January 19 in a page 1 report about R¨ontgen’s “alleged discovery of how to photograph the invisible” but then on January 26 the report on page 1 stated “R¨ontgen’s photographic discovery increasingly monopolizes scientiﬁc attention. Already numerous successful applications of it to surgical diﬃculties are reported from various countries. . .” [Bakalar 2009]. In the meantime, R¨ontgen had been summoned to appear in Berlin before Emperor Wilhelm II on Monday, January 13 (he remarked, “I hope I shall have ‘Kaiser luck’ with this tube, for these tubes are very sensitive and are often destroyed in the very ﬁrst experiment, and it takes about four days to evacuate a new one”). The tube worked, and the Kaiser awarded R¨ontgen with the Prussian Order of the Crown, Second Class. In an age of multiple conferences to present new results to, it is remarkable to think of how many public lectures R¨ontgen gave on his results: one! This was at his institute on Thursday, January 23; the talk was introduced by the anatomist Albert von K¨olliker, whose hand R¨ontgen recorded a radiograph of during the lecture. The response was rousing, of course, and von K¨ollicker responded at the end of the talk by leading three cheers and saying that the rays should not be called “X rays” as R¨ontgen had written in his paper, but “R¨ontgenstrahlung” (this term remains in use in Germany today). R¨ontgen refused additional opportunities to present his results in public, including a request to make a presentation to the Reichstag and even at his ceremony for receiving the ﬁrst Nobel Prize in Physics in 1901! (Alfred Nobel passed away on December 10, 1896, but it took until 1900 for the Nobel Foundation to be formally established.) Since R¨ontgen had not observed signiﬁcant refraction or reﬂection of his new rays, there seemed to be no optics to deliver magniﬁed images (we now know about compound refractive lenses; see Section 5.1.1). The lack of optics would seem to thwart the development of xray microscopes. How to see ﬁner detail? One early approach was to use a visible light microscope to magnify the xray radiograph recorded on micrometergrainsize photographic emulsions [Ranwez 1896]. X rays found application before their nature was understood. They even found early misapplication: an example is xray irradiation of the brain as a treatment for epilepsy in experiments by Mihran Kassabian in Philadelphia in 1903–1904. Two patients died, though symptoms of epilepsy were reduced. . .. Kassabian lost several ﬁngers due to radiation burns from handling tubes in operation and is said to have used an assumed Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
8
A bit of history
name when checking into a hospital for treatment so as not to cast a dark light on X rays [Brown 1995]. He died in 1910. R¨ontgen published only two more incremental observations on the nature of his new rays on March 9, 1896 (including the observation that they ionize air and other gases) and on March 10, 1897. However, later on he had an indirect role in solving the puzzle of their nature. R¨ontgen was lured away in 1900 to be the Chair of Experimental Physics at the Ludwig Maximilians University of Munich, where in 1906 he helped recruit Arnold Sommerfeld to be the Chair for Theoretical Physics. Sommerfeld became convinced that X rays were pulsed electromagnetic waves and estimated their wavelength to be of order 1 Å based on diﬀraction by a slit, but the evidence was considered to be inconclusive so that even Sommerfeld himself wrote in 1905 “it is a shame that, ten years after R¨ontgen’s discovery, one still doesn’t know what R¨ontgen rays really are” [Authier 2013]. Meanwhile, Sommerfeld had recruited Max von Laue. Convinced that X rays must be a shortwavelength version of visible light, von Laue drew upon the presence in Munich of minerologist Paul von Groth (who was a disciple of Auguste Bravais’ notion that crystals are formed from atoms organized in an lattice of unit cells) to propose that the regular spacings of atoms in a crystal might act like the regular bars of a diﬀraction grating for light, and set Paul Knipping and Walter Friedrich to carry out the experiment. On April 23, 1912, they found success, with a diﬀraction pattern that by modern standards shows barely discernible diﬀraction blobs rather than spots,1 though the experimental results were immensly improved by the time publications appeared in the literature [Friedrich 1912, Friedrich 1913, Laue 1912, Laue 1913]. This provided ﬁrm evidence of the wavelength of X rays, and led to von Laue receiving the 1914 Nobel Prize in Physics. By November 1912, the father–son team of William Henry and William Lawrence Bragg in Leeds had worked out [Bragg 1913a] the simple relationship for diﬀraction from atomic planes that we now refer to as Bragg’s law (Eq. 4.33); they were jointly awarded the Nobel Prize in Physics in 1915. Earlier studies by Charles Barkla had shown that X rays are polarized (1904 in Liverpool) and that X rays emitted from diﬀerent targets include characteristic radiation (1906), a component of radiation with a penetration dependent on the atomic number of the material. Barkla went on to ﬁnd two series of such radiation, which he ﬁrst labeled A and B [Barkla 1909] but then labeled K and L [Barkla 1911] in case there might be a series before A (Barkla received the 1917 Nobel Prize in Physics). The work of Barkla, as well as that of the Braggs, inspired Henry Moseley of Oxford to build in 1913 a crystal spectrometer and ﬁnd that the energy of Barkla’s characteristic X rays scaled as atomic number squared [Moseley 1913], as we shall see in Eqs. 3.11–3.13. If Mosely had not been killed by a sniper in 1915 while serving with the British Army in Galipoli, one can imagine that he might have shared the Nobel Prize honors with Barkla.
1
See for example http://www.iucr.org/publ/50yearsofxraydiffraction/fulltext/lauesdiscovery
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
2.2 Einstein and mirrors
2.2
9
Einstein and mirrors The next important conceptual insight into the nature of X rays came from none other than Albert Einstein [Einstein 1918]. He speculated that bright regions at the edges of radiographs of human limbs made by A. K¨ohler of Wiesbaden might be due to grazing incidence xray reﬂection. Einstein then stated:2 According to the classical theory of dispersion we have to expect that the index of refraction n for xrays is close to 1 but in general diﬀerent from 1. Whether n will be greater or less than 1 will depend on whether the electrons dominating the dispersion have eigenfrequencies smaller or larger than the frequencies of the xrays. The diﬃculty in determining n lies in the fact that (n − 1) is very small (about 10−6 ). But it is obvious that for almost grazing incidence there must be detectable total reﬂection for xrays in the case n < 1.
The radiograph was not shown. It might have actually displayed Fresnel fringes from a noncontact radiograph, as Einstein himself suggested in a note added in proof. Still, as we will see in Section 3.3.2, the refractive index n for X rays is indeed slightly less than 1. To our knowledge this is the ﬁrst time the xray refractive index was expressed as n = 1 − with 1, though of course Einstein was surely aware of the Drude model [Drude 1902] for the visible refractive index, and Charles Galton Darwin (grandson of the naturalist Charles Robert Darwin) had predicted some years earlier [Darwin 1914] that the xray refractive index would be n = 1 + with 10−6 . It also seems that Einstein’s brief comment was missed by Compton, who wrote a more widely read paper [Compton 1923] concluding that the xray refractive index is less than 1 based on two experimental results: 1. The 1919 PhD dissertation [Stenstr¨om 1919] of Wilhelm Steinstr¨om in Lund, Sweden who was the ﬁrst to demonstrate experimentally a refractive correction to Bragg’s law and write it in terms of n = 1 − δ. 2. Additional measurements on a deviation from Bragg’s law, from Duane and Patterson at Harvard [Duane 1920], who were in turn aware of Stenstr¨om’s results. This led Compton to modify Bragg’s equation to the form shown in Eq. 4.34. A refractive index of n < 1 implies a wave velocity faster than the speed of light in vacuum c, and it is curious that Einstein, author of the theory of special relativity, did not comment on this fact. Did he feel it was obvious that while the phase velocity of X rays in media is faster than c (Eq. 3.56), the group velocity would turn out (Eq. 3.73) to be less than c? In any case, as we will see in Section 5.2, an index of refraction of the form n = √ 1 − δ implies a critical angle for grazing angle reﬂectivity of 2δ, a result that was ﬁrst calculated and demonstrated by Compton [Compton 1923] and is in line with Einstein’s 2
In the original German: “Nach der klassichen Dispersiontheorie m¨ussen wir erwarten, daß der Brechungsexponent n f¨ur R¨ontgenstrahlen nahe an 1 liegt, aber im allgemeinen doch von 1 verschieden ist. n wird kleiner bzw. gr¨oßer als 1 sein, je nachdem der Einﬂuß derjenigen Elektronen auf die Dispersion u¨ berwiegt, deren Eigenfrequenz kleiner oder gr¨oßer ist als die Frequenz der R¨ontgenstrahlen. Die Schwierigkeit einer Bestimmung von n liegt darin, daß (n − 1) sehr klein ist (etwa 10−6 ). Es ist aber leicht einzusehen, daß bei nahezu streifender Inzidenz der R¨ontgenstrahlen im Falle n < 1 eine nachweisbare Totalreﬂexion auftreten muß.” The translation shown is due to Dr. Angelika Osanna.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
10
A bit of history
suggestion. This makes it possible to make mirror optics, including focusing optics, with curved surfaces. The problem is that it is very hard to make grazing incidence optics with suﬃciently smoothly polished surfaces and exactly the desired ﬁgure. The ﬁrst to begin to succeed and generate excitement was Paul Kirkpatrick, a professor at Stanford University. Together with his MexicanAmerican student Albert Baez (father of Joan, who became a famous folksinger in the USA), they developed a scheme of using two orthogonal cylindrical mirrors (now known as Kirkpatrick–Baez or simply KB mirrors [Kirkpatrick 1948a, Kirkpatrick 1948b, Baez 1950]; see Fig. 2.1 and Section 5.2.2) to achieve 2D focusing. Kirkpatrick even published an article in Scientiﬁc American in 1949 entitled “The Xray microscope” [Kirkpatrick 1949b], where the synopsis in the table of contents proudly announced It would be a big improvement on microscopes using light or electrons, for Xrays combine short wavelengths, giving ﬁne resolution, and penetration. The main problems standing in the way have now been solved.
This work opened the ﬂoodgates. A variety of groups, especially in the USA, England, Sweden, and Germany, began developing xray microscopes. As it turned out, there were indeed still a few more problems to be solved. The Kirkpatrick–Baez design has serious oﬀaxis aberrations, so that it is usually not used as a lens for fullﬁeld imaging; however, it does work well for imaging a collimated beam to a small spot in scanning microscopes, as will be described in Section 5.2. In addition, the surfaces of grazing incidence optics must be made very smooth, as will be shown in Eq. 3.124. One particularly notable set of advances dating back to the late 1980s [Mori 1987, Higashi 1989] has been made by a group led by Kazuto Yamauchi at the University of Osaka, leading to several landmarks [Yamamura 2003, Mimura 2010] in xray nanofocusing with grazingincidence reﬂective optics. Their work is summarized in a recent book chapter [Yamauchi 2016], and they have helped found the Japanese company JTEC, which sells mirrors that have been used by others for their own spectacular results [da Silva 2017]. A conceptually simple alternative to using two optics is to use ellipsoids of revolution to produce a small focal spot from one optic. Given the need for grazing incidence, the shape of the ellipsoid is like a very skinny cigar. The group of Kunz built the ﬁrst microscope using this scheme [Voss 1992a], but the mirror, painstakingly ground and polished, was nowhere close to delivering the speciﬁcations [Kunz 1995]. Many years later, Bilderback came up with the idea of shaping glass capillaries in an oven to form the right optics, and his team was able to demonstrate a submicron focus [Bilderback 1994b, Bilderback 1995]. The idea was also developed by others, including Xradia/Carl Zeiss Xray Microscopy, who use it as a condenser system in many of their microscopes [Zeng 2008]. In 1952, Hans Wolter in Kiel, Germany developed aberrationfree designs for fullﬁeld imaging [Wolter 1952], as shown in Fig. 5.10. One of these schemes involves a confocal pairing of a paraboloid and a hyperboloid. However, these designs were much too diﬃcult to implement at the time. (Incidentally, Wolter was the ﬁrst to articulate the potential of the “water window” spectral range between the carbon and oxygen K Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
2.2 Einstein and mirrors
Kirkpatrick and Baez, 1948
11
Pattee, 1953
(Kirkpatrick 1949)
Kirkpatrick and Pattee, 1953 Figure 2.1 The crossed1Dlenses focusing scheme developed by Kirkpatrick and Baez using
elliptical proﬁle mirrors [Kirkpatrick 1948a]. At top left is an illustration from Kirkpatrick’s 1949 contribution to Scientiﬁc American [Kirkpatrick 1949b]. At bottom is a threemirror system constructed by Kirkpatrick and Pattee [Kirkpatrick 1953], and at top right is an improved, concentriccylinder mounting scheme developed in 1953 by Pattee (picture from a 1983 letter by Pattee to M. Howells). More recent focusing schemes involving compound refracting lenses (Section 5.1.1) and multilayer Laue lenses (Section 5.3.6) also sometimes use the crossed1Dlens focusing geometry. Adapted from a photograph sent in 1983 from Howard Pattee to Malcolm Howells; used with permission from Pattee, and also shown in [Kirz 2009].
edges at 290 eV and 540 eV respectively; see Fig. 2.5.) Nested Wolter mirrors are in use today in stateoftheart xray telescopes orbiting the Earth, but Wolter mirrors have not had much impact in microscopy in spite of considerable eﬀort [Onuki 1992, Aoki 1992, Aoki 1998]. To overcome the challenges of grazing incidence optics, Eberhard Spiller of IBM Research decided in 1972 to coherently combine the individually weak reﬂectivity of many refractive interfaces via multilayer mirrors [Spiller 1972, Spiller 1974], as will be discussed in Section 4.2.4. Spiller’s approach was to use electron beam evaporation to deposit the layers. Very successful multilayer mirrors were produced using sputtering soon after at Stanford and CalTech by Underwood and Barbee for xray astronomy applications [Underwood 1979, Underwood 1981a, Underwood 1981b]. For normal incidence reﬂectivity, the layers must be about half a wavelength apart so clearly this can only be achieved for very long wavelength X rays. Microscope objectives based on two concentric spherical mirrors operating at nearnormal incidence had been developed by Schwarzschild for visible light [Schwarzschild 1905]. Spiller attempted Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
12
A bit of history
to use this geometry with multilayer coatings to build a microscope for soft X rays [Spiller 1980, Spiller 1984], but it did not perform very well. This did not slow down the USA space agency NASA, which in 1992 gave their NASA Scientist of the Year Award [Hoover 1992, NASA 1993] to Richard Hoover for inventing, according to NASA, a “revolutionary new microscope [that] should enable researchers to see in great detail high contrast xray images of proteins, chromosomes and other tiny carbon structures inside living cells. Resolution of the microscope could be so high that it may produce detailed images of the building blocks of life—tiny DNA molecules.” Several of us got together and sent a letter to the head of NASA, pointing out that radiation damage would make it impossible to do what was being claimed, as will be discussed in Chapter 11. We received an answer from the top lawyer at the agency, who assured us that no laws were violated (apparently the laws of nature were not of concern). Subsequent publications didn’t include any xray micrographs [Hoover 1993, Hoover 1994]. In any case, a successful multilayer mirror Schwarschild XUV microscope was developed by Cerrina et al. [Ng 1990, Capasso 1991] for scanned imaging using photoelectrons as the imaging signal.
2.3
Cold War microscopes Following an earlier demonstration using a pinhole [Sievert 1936], the development of microfocus xray sources during the 1930s and 1940s opened the way to the development of the “projection microscope” or “shadow microscope” (we refer to the modern version as a point projection xray microscope, as will be described in Section 6.2). By placing the specimen close to an xray point source, and an xray ﬁlm at some distance, one obtains geometrical magniﬁcation with a resolution as ﬁne as the size of the source. The real successes with this approach began to come about in work at the University of Cambridge in the 1950s [Cosslett 1951, Cosslett 1952], where researchers used an electron beam focused onto an approximately 1 μm thick tungsten ﬁlm to obtain images of silver grids with a resolution of about 500 nm, along with few micrometer resolution images of the head of the fruit ﬂy Drosophila melanogaster, among numerous examples. Soon commercial instruments were produced and sold for a time by General Electric in the USA [Newberry 1956], Phillips in Holland, and Microray Laboratories in England (see [Newberry 1987] for this and other aspects of the early history of xray microscopy). In more modern point projection instruments [Mayo 2003] the size of the point source is reduced by using the focus of a scanning electron microscope on a roughly 100 nm thick metal target (thus reducing the source size increase that would otherwise be caused by electron scattering) to generate the xray illumination. The Cambridge group was led by Vernon Ellis Cosslett, FRS, a leading early electron microscopist, and William C. Nixon, who together wrote a book summarizing the ﬁeld at the time [Cosslett 1960]. Later on, the group included Theodore Hall, who wrote a book on medical applications of xray microscopy [Hall 1972]. It became known much later that back in the 1930s Cosslett “joined the undercover University Communist Group, which included the German refugee Klaus Fuchs who was later impris
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
2.4 Zone plates
13
oned for passing sensitive nuclear physics from AERE Harwell to the Soviet Union” [Mulvey 1994]. Hall, as it turned out, was the second Soviet spy working on the Manhattan Project at Los Alamos [Albright 1997]. You should draw a lesson from this: be suspicious (very, very suspicious!) of xray microscopists.
2.4
Zone plates The development of the idea of holography by Gabor [Gabor 1948] inspired several people to think of approaches to xray microscopy that would be free of the resolution limits of Kirkpatrick–Baez mirrors available at the time. Inspired by Gordon Roger’s analogy between inline holograms and Fresnel zone plates [Rogers 1950], Baez considered the recording medium resolution limits of inline holography [Baez 1952a], and this led him to demonstrate the use of Fresnel zone plates as xray lenses [Baez 1960]. The basic properties of zone plates had been established more than 70 years earlier, and it was clear that they would work for any kind of wave. However, using them for X rays had particular challenges due to the short wavelength and high penetrating power. The basic zone plate is a circular diﬀraction grating, consisting of concentric rings with radially increasing ring density. Alternate rings should be opaque (or should reverse the phase) and transparent. The width of the ﬁnest (outermost) ring determines the numerical aperture, and the resolution (see Section 5.3.1). But how to make ﬁne rings that are suﬃciently thick to absorb, or reverse the phase, yet narrow enough to provide submicron resolution? Baez’s ﬁrst freestanding metal zone plate [Baez 1960], made by optical lithography, had just 19 zones with a ﬁnest zone width of 20 μm: certainly not a highresolution xray optic by today’s standards! It was not until 1969 that Gunter Schmahl (1936–2018) and Dietbert Rudolph of the University of G¨ottingen came up with holographic approaches to zone plate fabrication [Schmahl 1969]; their original motivation was for xray astronomy but they immediately realized the potential of the approach for xray microscopy. By 1974, the G¨ottingen group (including Bastian Niemann) had demonstrated 2 μm resolution zone plate imaging using a laboratory source [Niemann 1974], and in 1976 they produced the ﬁrst zone plate images using synchrotron radiation [Niemann 1976] (Fig. 2.2). For some time their eﬀorts faced an uphill battle; no less a ﬁgure than Ernst Ruska, who pioneered the development of the transmission electron microscope in the 1930s, wrote [Ruska 1979] in 1979 R¨ontgen had however already shown experimentally in his ﬁrst communication that a large number of solid and liquid materials did not appreciably refract the new rays, so that lenses made from these materials would not appreciably aﬀect the trajectories of these beams. In the meantime it had been recognised that no such materials are possible. Furthermore, one could hardly hope that the very weak interaction of xrays with the atoms of the material irradiated would be adequate to render visible, with suﬃcient contrast, particles of sub lightmicroscope dimensions.
(The quote is from a translation by Mulvey [Ruska 1980].) Ruska’s towering stature for his pioneering developments would soon be recognized by his receiving the 1986 Nobel Prize in Physics (along with Binning and Rohrer, who developed the scanning Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
14
A bit of history
tunneling microscope in 1981), so his lack of enthusiasm for xray microscopy carried some inﬂuence. It is somewhat ironic that phase contrast is the dominant contrast mechanism in electron microscopy (Section 4.10.2), yet Ruska probably did not consider the possibilities of phase contrast in xray microscopy (nor did anyone else until Schmahl and Rudolph made its potential clear in 1987, as will be discussed in Section 4.7). An alternative approach for zone plate fabrication was suggested by David Sayre in 1972 [Sayre 1972]: to use the newly developed electronbeam fabrication technology, available at IBM where he was employed at the time, to produce the required ﬁne linewidth for high resolution. It was his seminar on zone plates that got Janos Kirz interested in xray microscopy. The seminar took place in Oxford in 1972 when both were on sabbatical stays, and was the origin of a decadeslong collaboration. In fact, Sayre lived in St. James on Long Island in New York, just a few miles from Stony Brook University. Stony Brook was the longtime academic home of Kirz (and, later on, the author) – but it was in Oxford where Sayre and Kirz met for the ﬁrst time. (Sayre’s background included a foundation role in direct methods in xray crystallography, participation in the IBM team that developed the FORTRAN as the ﬁrst highlevel language for mathematical computer calculations, and leadership of the IBM team that developed the ﬁrst virtual memory operating system [Glusker 2012, Kirz 2012].) It took nearly a decade for Sayre’s suggestion to be put into practice [Ceglio 1980, Shaver 1980, Ceglio 1983], but since that time it has been reﬁned and implemented in several laboratories around the world. Today zone plates are the dominant focusing elements in xray microscopes, though reﬂective and refractive optics are playing increasingly important roles too (Chapter 5, with a historical trend of spatial resolution shown in Section 5.5).
2.5
Synchrotrons and lasers The ﬁrst microscope to use synchrotron radiation (Section 7.1.4) as the source was built by Horowitz and Howell at the Cambridge Electron Accelerator (CEA) in 1971 [Horowitz 1972, Horowitz 1978, Horowitz 2015] (Fig. 2.3). It used a micronsize pinhole to deﬁne the probe size and hence the resolution. The sample was mechanically scanned in a raster fashion to acquire the image. Unfortunately, this light source closed down shortly thereafter as the particle physics program at the CEA came to a close. Janos Kirz’s group made a few early xray tests of a similar sort using soft X rays at the SPEAR ring at Stanford as the 1970s ended [Rarback 1980, Kirz 1980c]. By that time, an early dedicated synchrotron light source (the National Synchrotron Light Source or NSLS) was under construction at Brookhaven National Laboratory, which is close to Stony Brook. Soon Kirz, Sayre, and Malcolm Howells of Brookhaven were planning for xray microscopy experiments at NSLS. In the meantime, researchers at IBM began a shortlived adventure in “contact microscopy” (Section 6.1). By using an xray sensitive polymer (such as Poly(methyl methacrylate) or PMMA) rather than photographic ﬁlm as the detector, the resolution could be improved signiﬁcantly. The scheme involved placing the object to be imaged on the polymer, exposing to X rays, then “developing” the polymer and examining it
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
2.5 Synchrotrons and lasers
DESY, 1976
15
ACO, 1983
ACO, 1983 Figure 2.2 The ﬁrst zone plate TXMs (transmission xray microscopes) developed by Schmahl,
Rudolph, and Niemann. At top left is shown an instrument operated at DESY in Hamburg in 1976 [Niemann 1976], while an instrument operated at ACO in Orsay is shown at top right. The bottom image is an xray micrograph of a diatom obtained at Orsay [Schmahl 1982]. Images courtesy of the late G. Schmahl; also shown in [Kirz 2009].
using a scanning electron microscope [Feder 1976, Feder 1977]. The ﬁne detail was encoded as a surfacerelief pattern on the polymer. The method attracted a number of practitioners, some of whom became a bit carried away with enthusiasm. The contact micrograph of a “live” blood platelet [Feder 1985] was featured on the front page of the Science section of the New York Times (“New tool captures cells alive,” January 15, 1985). Unfortunately it turned out that the platelet remained stuck on the polymer, so the image was not what it was advertised to be. It was also around this time that the ﬁrst visiblelaserpumped xray lasers were demonstrated [Matthews 1985, Suckewer 1985]. The euphoria over this led to another cover article (April 2, 1985) in the Science section of the New York Times that gushed “But aside from its weapons applications, the Xray laser has excited biologists, chemists and physicists because of its possible use in a super microscope, an instrument that will perhaps be capable of taking holographic threedimensional movies of the genetic code of a living cell.” The excitement of the New York Times reporter was not based just on visiblelaserpumped xray lasers, but on xray lasers using the more intense pumping Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
16
A bit of history
Box 2.1 Grazing incidence mirrors and thermonuclear weapons Nuclear weapons (“atom bombs”) are based on achieving a rapid ﬁssion chain reaction in speciﬁc isotopes of uranium and plutonium. Thermonuclear weapons (“Hydrogen bombs” or “H bombs”) use the energy from a ﬁssion bomb to compress light materials to the temperature and pressure required to achieve nuclear fusion, boosting the energy released by a weapon considerably. As the ﬁssion bomb trigger is detonated, it becomes a very hot object with a blackbody radiation temperature (Eq. 7.5) suﬃcient to serve as an intense source of soft X rays. Being massless, the X rays are able to reach the fusion components (which often have another ﬁssion element at their core) more rapidly than neutrons can, so they can preheat and compress those components for eﬃcient fusion once the neutrons arrive (the penetration power of X rays also helps to even out the heating and compression, a fact that was exploited in laserdriven inertial conﬁnement fusion experiments dating back to the 1970s [Lindl 1995]). This xray approach was proposed in early 1951 by Edward Teller and Stanislaw Ulam (see [Ford 2015] for a discussion of the provenance of this idea), and it has led to H bombs with yields up to tens of megatons of dynamite equivalent from ﬁssion bomb triggers with yields of tens of kilotons. It may be advantageous to focus the X rays from the ﬁssion bomb trigger onto the fusion components; this can be accomplished using grazing incidence xray optics (Section 5.2), which could be in turn fabricated out of ﬁguring and polishing the inside surfaces of an exterior casing. This casing can be made of materials such as the heavier isotope of uranium, 238 U, which is referred to as “depleted uranium” because it is what remains after separation of the slowneutronﬁssionable 235 U isotope from natural uranium. This 238 U “depleted uranium” can also undergo ﬁssion if ﬂooded with the fast neutrons produced by the ﬁssion bomb trigger, though it cannot by itself sustain a chain reaction. (Depleted uranium is also very robust at high temperatures, making it well suited for use as a casing material in ballistic missile warheads, which suﬀer considerable heating upon atmospheric reentry). The important role that soft X rays play in thermonuclear weapons design and testing might help explain the high level of expertise in grazing incidence optics manufacture that was developed at Lawrence Livermore and Los Alamos National Laboratories in the USA, and equivalent laboratories in other countries that have developed highyield H bombs.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
2.5 Synchrotrons and lasers
S
17
Si
Figure 2.3 The ﬁrst synchrotronbased xray microprobe was developed by Horowitz and
Howell at the Cambridge Electron Accelerator in 1972 [Horowitz 1972, Horowitz 1978, Horowitz 2015]. The microprobe used a deep pinhole fabricated by plating around a silicon whisker, and was able to take images of sulfur dust (bottom left) and silicon whiskers (bottom right). Unfortunately this promising instrument was shortlived, as the accelerator was shut down in 1973 due to the highenergy physics community moving on to bigger machines. Images from [Horowitz 1978].
source that an exploding nuclear weapon might provide (see e.g. [Broad 1986]; a possible connection between xray optics expertise and thermonuclear weapons design is noted in Box 2.1). Now it must be pointed out that several members of the weapons labs had given quite serious thought to the issues of radiation and hydrodynamic damage [Solem 1982b, Solem 1986], as will be discussed in Section 11.1.2, but reporters are not always bound by subtleties. Around this time, Stony Brook University hosted a workshop which included researchers from Livermore and Los Alamos laboratories. One memorable talk began with a statement that went something like “We are planning on carrying out xray holographic microscopy experiments with an intense but lowshotrate xray laser source that I cannot describe” while the speaker put a viewgraph of the nuclear test craters at the Nevada Test Site (now the Nevada National Security Site) on the overhead projector! During the 1980s several synchrotron light sources were built and commissioned. The brightness and tunability of these sources provided the opportunity to develop xDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
18
A bit of history
ray microscopes that went beyond the simple transmission image, based on diﬀerential absorption (Section 9.1.1) from diﬀerent regions of the object [Engstr¨om 1946b, Engstr¨om 1946a, Engstr¨om 1947]. Modern practitioners of scanning xray microscopy may be amused to know that in 1983 it took nearly an hour to record a modest pixel count image (Fig. 2.4; [Rarback 1984]) at 300 nm resolution, and with an energy resolution no better than 1 eV! The G¨ottingen group did extensive work at the original BESSY synchrotron light source in Berlin on developing fullﬁeld imaging with zone plates within Wolter’s soft xray “water window” (Fig. 2.5). These developments included pioneering the xray version of Zernike phase contrast microscopy [Schmahl 1988, Schmahl 1994] and magnetic circular dichroism imaging [Fischer 1996] using the BESSY source in Berlin. The Stony Brook group used the NSLS at Brookhaven to develop spectromicroscopy [Ade 1990, Ade 1992, Zhang 1994] (see also Box 9.1, and Section 9.1). The G¨ottingen and Stony Brook groups and their collaborators both pioneered nanotomography [Haddad 1994, Lehr 1997] (Section 8.4) and cryo xray microscopy [Schneider 1998a, Schneider 1998b, Maser 1998, Maser 2000] (Section 11.3) as well as their combination [Weiß 2000, Wang 2000]. Other pioneering eﬀorts were carried out by the King’s College group at Daresbury, by Tsukuba University at the Photon Factory in Japan, at the Center for Xray Optics at Lawrence Berkeley National Laboratory, and elsewhere. In the mid1990s, three largescale synchrotron light sources began operation with higher electron beam energies: ﬁrst 6 GeV at the European Synchrotron Radiation Facility (ESRF) in Grenoble, France; then 7 GeV at the Advanced Photon Source (APS) at Argonne Lab near Chicago, USA; and ﬁnally 8 GeV at SPring8 near Himeji, Japan (there were technical reasons why each facility chose a particular electron beam energy, but it does look an awful lot like oneupmanship, doesn’t it?). These facilities have led the way in advancing xray microscopy at multikeV energies, including with ﬂuorescence (as will be described in Section 9.2), and Bragg coherent diﬀraction imaging (as will be discussed in Section 10.3.8). Experiments at higher energies have also stimulated the development of compound refractive lenses (as will be discussed in Section 5.1.1). This happened ﬁrst in experiments at the ESRF, and was soon joined by a strong eﬀort at PETRA III at DESY in Hamburg, which is a 6 GeV light source that began operation in 2009. Many of these light source facilities now host several xray microscopy beamlines: for example, the Advanced Light Source (ALS) in Berkeley and APS in Argonne host about half a dozen each. The numbers are similarly increasing at many other sources worldwide, yet demand for using the instruments has outstripped the availability of microscope time. While there were several earlier demonstrations of laboratorysource xray microscopes (Section 6.3), around the turn of the century Wenbing Yun optimized a conventional microfocus xray source along with improved optics and detectors to deliver commercially available tabletop xray microscopes with tomographic capabilities [Wang 2002]. He founded the company Xradia (now Carl Zeiss Xray Microscopy), which has sold a large number of laboratory xray microscopes and synchrotron microscopes on four continents. Bruker soon joined the list of vendors of commercial Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
2.6 Lensless microscopes
19
Figure 2.4 The ﬁrst STXM (scanning transmission xray microscope) using zone plate optics
was constructed by Rarback, Kirz, and Kenney at bending magnet beamline U15 at the NSLS at Brookhaven [Rarback 1984]. It used drN = 300 nm zone plates and operated with E/ΔE 300; the images shown took nearly an hour each to acquire. The scanning stage used leave spring ﬂexures to guide linear orthogonal motions, with linear variable diﬀerential transformers (LVDT) for position readout. Before tests with Fresnel zone plates, Kirz’s group at Stony Brook undertook scanning microscope tests at the Stanford synchrotron using pinhole optics [Kirz 1980c, Rarback 1980].
synchrotronbased microscopes, with Axilon as a more recent spinoﬀ (and Yun has now founded another company, Sigray).
2.6
Lensless microscopes In parallel with the evolution of microscopes based on zone plate lenses, alternative lensless schemes have also been developed. The ﬁrst techniques to be demonstrated were Gabor or inline holography [Aoki 1974, Howells 1987, Lindaas 1996], and Fourier transform holography [Kikuta 1972, Reuter 1976, McNulty 1992, Eisebitt 2004], as will be discussed in Sec 10.2. Inline holography depends on a plane reference wave to interfere with the wave diﬀracted by the object. The resolution in this scheme is limited by the detector. In Fourier transform holography a spherical wave diverging from a “point source” is employed. The resolution is limited by how small the source is (or alternatively the resolution at which the source properties are known). This involves a tradeoﬀ in that the intensity of the reference source tends to diminish with decreasing size unless it is produced by a lens, in which case one has to judge whether lensbased imaging is more appropriate. Normally the advantage of the holographic technique is
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
20
A bit of history
;UD\HQHUJ\H9 0
3HQHWUDWLRQGLVWDQFHѥP
10
500 Carbon edge
1000 Oxygen edge
1500
X rays
1/ƫSURWHLQ 1/ƫZDWHU 1
Electrons
ZDWHU R elastic URWHLQ S
DWHU
Z
WHLQ R inelasticSUR
0.1 0
100
200
300
400
(OHFWURQHQHUJ\NH9 Figure 2.5 The “water window,” which is an xray energy region between the K shell absorption
edge energies of carbon at 290 eV and oxygen in water at 540 eV, has played a huge role in xray microscopy [Wolter 1952, Sayre 1977a]. In that energy range, water is relatively transparent while organic materials are not; for example, biological cells show especially favorable absorption contrast. The only other region in the electromagnetic spectrum with such favorable intrinsic contrast for highresolution imaging of hydrated biological specimens is the visible light region to which our eyes are adapted (see Fig. 9.1). Also shown here are the mean free paths Λ for elastic and inelastic scattering of electrons, showing that intrinsic contrast is lower and that electron microscopy is better suited for studying specimens with a thickness of less than about 1 μm, as will be discussed in Section 4.10.
that it does not require optics, and that the reconstruction of the image is rather straightforward, since the reference wave encodes the phase of the diﬀraction pattern. During the 1980s, David Sayre drew from his earlier thoughts on data sampling in crystallography [Sayre 1952a] to advocate a form of lensless imaging that relies on the detection of the intensity of the diﬀracted wave without any reference wave [Sayre 1980]. He pointed out that if the diﬀraction pattern is sampled ﬁnely enough, there should be enough information available to reconstruct the object. The algorithm to perform the reconstruction in the case of an isolated object was developed independently by Fienup [Fienup 1978], and the ﬁrst experimental demonstration of what became known as coherent diﬀraction imaging (CDI; see Section 10.3) was performed [Miao 1999]. In the past decade a powerful new variant of the CDI idea inspired by the electron microscopist Walter Hoppe [Hoppe 1969a] has been developed theoretically by Rodenburg and Faulkner [Rodenburg 2004, Rodenburg 2008], and implemented with X rays ﬁrst at the Swiss Light Source [Rodenburg 2007a, Thibault 2008] and subsequently at several other synchrotron light sources. (A similar experiment with a slightly diﬀerent reconstruction algorithm – also put forward by Rodenburg [Rodenburg 1992] – was Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
2.7 The dustbin of history
21
carried out earlier by Chapman [Chapman 1996b]). The technique, referred to as ptychography (Section 10.4), does not require that the object be isolated. It depends on the recording of many diﬀraction patterns from overlapping areas of the object. In ptychography the resolution is limited only by the wavelength of the X rays and the angular range over which the diﬀraction patterns can be recorded. Resolution as ﬁne as 3 nm has already been demonstrated, and further improvements are expected. We are entering an era where the resolution of xray focusing optics is no longer a limitation! Holography, CDI, and even scanning microscopes require coherent illumination, as will be discussed in Sections 4.4.6 and 10.3.2. Synchrotron radiation from real electron beam dimensions is generally not intrinsically fully coherent, and selecting out the coherent portion of the beam involves monochromators and spatial ﬁlters [Kondratenko 1977], resulting in severe loss in intensity. The development of Xray freeelectron lasers (FELs) (Section 7.1.8) introduces a new dimension to Xray microscopy with inherently coherent or nearly coherent beams. FELs operate with highly intense ultrashort pulses (generally on the order of 50 fs long). It was pointed out by Neutze et al. [Neutze 2000] that although a single pulse will vaporize the object, it may be possible to record the atomic resolution diﬀraction pattern before the parts of the object have a chance to move far enough to blur or distort the pattern. This “diﬀract before destruction” scheme (Section 10.6) was beautifully demonstrated by Chapman et al. [Chapman 2006a], and has led to considerable excitement. Radiation damage to the object is always a concern in highresolution microscopy with ionizing radiation (see Chapter 11). Morphology and elemental content can often be preserved by keeping the sample near liquid nitrogen temperature. It is ironic that “diﬀract before destruction” provides a diﬀerent way out of the radiation damage problem, as long as a single exposure is suﬃcient to collect all necessary information from each of a large number of identical objects.
2.7
The dustbin of history Several other perspectives on the history of xray microscopy are available [Engstr¨om 1980, Baez 1989, Baez 1997, Kirz 2009]. A more complete view of the history of xray microscopy can be found by consulting the original literature. Conference proceedings provide snapshots of the state of the ﬁeld at particular moments. In the Cold War era, there were three conferences: one hosted by Cosslett in Cambridge in 1956 [Cosslett 1957], one by Engstr¨om in Stockholm in 1959 [Engstr¨om 1960], and one by Kirkpatrick in Stanford in 1962 [Pattee 1963]. This third conference led into the series International Congress on Xray Optics and Microanalysis (ICXOM), which for some years concentrated only on electron probe stimulated xray ﬂuorescence. The topic of xray imaging systems emerged again in two New York Academy of Sciences conferences run by Donald Parsons in 1978 [Parsons 1978] and 1980 [Parsons 1980], and in a Rank Prize Funds meeting on scanning microscopy organized in 1980 by Eric Ash [Ash 1980].
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
22
A bit of history
Table 2.1 Xray microscopy conferences in the synchrotron radiation phase of history. This
listing is only of the dedicated international xray microscopy conferences; there have been numerous xray microscopy sessions at other larger conferences, and smaller workshops. Year 1983 1986 1990 1993 1996 1999 2002 2005 2008 2010 2012 2014 2016 2018 2020 2022
Location G¨ottingen, Germany Brookhaven, New York, USA London, England Chernogolovka, Russia W¨urzburg, Germany Berkeley, California, USA Grenoble, France Himeji, Japan Z¨urich, Switzerland Chicago, Illinois, USA Shanghai, China Melbourne, Australia Oxford, England Saskatoon, Canada Hsinchu, Taiwan Lund, Sweden
Proceedings [Schmahl 1984] [Sayre 1988] [Michette 1992] [Aristov 1994] [Thieme 1998b] [MeyerIlse 2000] [Susini 2003] [Aoki 2006] [Pfeiﬀer 2009] [McNulty 2011] [Xu 2013] [de Jonge 2016] [Rau 2017] [Urquhart 2018]
It is also interesting to see how the spatial resolution of xray optics has improved over the years; this will be discussed in Section 5.5. With the rise of zone plate microscopy groups in G¨ottingen, Stony Brook, King’s College London, and elsewhere, a conference series was begun that continues through the present (Table 2.1). Most of these conference proceedings are relatively easy to come by. The proceedings of the September 20–24, 1993 conference held in Chernogolovka, Russia are a bit harder to ﬁnd [Aristov 1994], but the conference was memorable: the Congress of People’s Deputies was dissolved by President Boris Yeltsin on September 21. Rumors were rampant, the ruble decreased in value by 30 percent during the conference week, and the authorities kept changing their minds on whether Lenin’s Tomb was open for viewing; the author was unable to examine the quality of the sample’s aldehyde ﬁxation, but others at the conference were. Most foreign participants had returned home before street riots and battles took place over September 28 to October 5.
2.8
Concluding limerick Like any scientiﬁc specialty, xray microscopy has an interesting history which we summarize in a limerick: R¨ontgen discovered some rays Which left him somewhat in a daze It was not hocuspocus When they came into focus Leading now to our microscope days
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:10:01, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.003
3
Xray physics
Janos Kirz and Malcolm Howells contributed to this chapter. Xray microscopes rely on the characteristics of X rays and how they interact with materials. There are a number of books that go into much greater detail on certain aspects of xray physics [Compton 1935, Dyson 1973, James 1982, Michette 1993, AlsNielsen 2011, Attwood 2017]; we describe here what we consider to be particularly important for xray microscopy.
3.1
The Bohr model, energy levels, and xray shells The early years of the twentieth century saw a revolution in our understanding of the physics behind the atom. J. J. Thomson identiﬁed the electron as a lightweight particle with a speciﬁc chargetomass ratio, and Max Planck postulated quantized energies for electromagnetic radiation proportional to its frequency. In 1905, Albert Einstein had his anno mirabilis, which included his papers on Brownian motion and special relativity, and his paper on the photoelectric eﬀect (for which he received his Nobel Prize in Physics in 1921), which put forward his description of light in terms of photons with speciﬁc energy and momentum. Soon after, Ernest Rutherford showed via alpha particle scattering that the atom has a dense, positively charged nucleus, leading to a solar system analogy with electrons being the planets that orbit a dense nucleus (the Sun). However, it was obvious that this model was incomplete, since classical electrodynamics would suggest that the electrons in orbit would radiate away energy at a rate such that they would crash into the nucleus in around a nanosecond – not a situation that favors a universe with stable atoms! It was Neils Bohr (a young Dane who had made an extended visit to Rutherford’s lab) who supplied in 1913 the ﬁrst glimmers of an answer [Bohr 1913] by using a combination of scaling laws already noticed by others, Planck’s quantization revolution, and intuition. Bohr arrived at a model of the atom in which electrons have discrete values of angular momentum n where n = 1, 2, 3, . . . is an integer (the principal quantum number) and ≡
h 2π
(3.1)
is based on Planck’s constant of h = 6.626 × 10−34 joule · seconds = 4.136 × 10−15 eV · seconds.
(3.2)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
24
Xray physics
(a)
(b) Photon in
e
e
Photoelectron out
n=3
n=3
n=2
n=2
n=1
n=1
+Ze
+Ze K
K L
L M
(c)
M
Fluorescent photon
e
(d)
Electron
Auger electron
e Electron
n=3
n=3
n=2
n=2
Vacancy n=1
Vacancy n=1
+Ze
+Ze K
K L
L M
M
Figure 3.1 Bohr model of electron states indexed by n, and processes of transitions between
them. For the very lightest atoms such as hydrogen, these processes take place at longer wavelengths; we show here the xray version appropriate for most atoms (a). Incident xray photons lead to photoelectric absorption (b; in this case, removing an n = 1 electron). Once an atom has a vacancy in an inner shell, an outer electron drops into that vacancy and the excess energy is emitted either via an xray ﬂuorescence photon (c) or emission of an Auger (or Coster–Kronig) electron (d). The fraction of ﬂuorescent photon events (as compared to electron emission events) is known as the ﬂuorescence yield ω, such as ωK for the case shown here. The notation of xray shells K, L, and M is discussed in Section 3.1.2.
This immediately led Bohr to a calculation of discrete binding energies of these electrons of Z2 (3.3) En = −E0 2 n with E0 as the Bohr energy of E0 =
me e4 = 13.6 eV. 8h2 02
(3.4)
Here Z is the atomic number of the atom (given by the number of protons in the nucleus), me is the mass of an electron (Eq. 3.29) and e is its charge, and 0 is the electric permittivity of free space. The Bohr energies are negative energies relative to the energy of a zero velocity, unbound electron (i.e., the continuum state), so that electrons can be thought of as falling from a ﬂat surface into an energy well when they become bound to an atom. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.1 The Bohr model, energy levels, and xray shells
25
Box 3.1 The Bohr model and de Broglie waves Albert Einstein’s special theory of relativity and explanation of the photoelectric eﬀect led to a picture in which photons have a momentum p = E/c based on their energy E and the speed of light c (Eq. 3.55). This leads to λ = hc/E = h/p. In his 1924 PhD dissertation, Louis de Broglie proposed that this expression of λ=
h p
(3.5)
might also apply to objects with a mass, so that matter might exhibit wavelike properties. This created a stir, for he showed that if you set the circumference of an electron’s orbit in the Bohr model equal to an integer number of his postulated waves or 2πr = nλ = nh/p, in the nonrelativistic limit of p = mv you arrive at mvr = n. This is simply Bohr’s postulate of quantization of the nonrelativistic angular momentum mvr for a mass m traveling at a velocity v in a circular orbit of radius r. Is it more surprising that it took more than a decade to arrive at such a simple physical picture for Bohr’s model, or that Bohr was courageous enough to put forward his radical notion of angular momentum quantization without such a clear physical model? In any case it must be noted that de Broglie’s proposal of wavelike properties for matter demanded that a wave equation be found, which Erwin Schr¨odinger delivered in 1926 [Schr¨odinger 1926a], where it serves as a cornerstone of quantum mechanics.
In an era where Bohr’s model of discrete electron energy states in atoms is presented as standard information to secondary school students, it is perhaps hard to appreciate how revolutionary it was. The simple physical model based on electron waves arrived more than a decade later (see Box 3.1), so Bohr’s postulate originally had the ﬂavor of an improvisation. But it was a successful improvisation! From the Bohr model, we ﬁnd that it takes a discrete amount of energy ⎛ ⎞ ⎛ ⎞ ⎜⎜ 1 ⎜⎜ 1 1 ⎟⎟ 1 ⎟⎟ (3.6) Ei→ f = E f − Ei = −E0 Z 2 ⎜⎜⎜⎝ 2 − 2 ⎟⎟⎟⎠ = E0 Z 2 ⎜⎜⎜⎝ 2 − 2 ⎟⎟⎟⎠ n f ni ni nf to raise an electron from a more tightly bound initial state ni to a less tightly bound ﬁnal state n f ; it follows that the same amount of energy is released when an electron drops from the upper state to the lower one. Using the relationship between the energy E and wavelength λ of a photon of hc 1239.84 eV · nm = , (3.7) E E it turns out that Eq. 3.6 perfectly explained the spectral lines that had already been observed in visible and ultraviolet spectroscopy experiments on hydrogen gas discharge tubes, and it helped explain how atoms could be excited by, and emit, electrons at speciﬁc energies (see Fig. 3.1). The Bohr energy of Eq. 3.3 provides a good guide to estimating the binding energies of singleelectron atoms, though charge screening eﬀects modify it, as discussed in Section 3.1.2 and Eq. 3.12; the best way to ﬁnd out actual tranλ=
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
26
Xray physics
Electron binding energy (keV)
100
L1
L2 L3
K
10
M1 M M2 M M3 M 5 M4,
L
N1 N N2 N N3 N N4,5
M 1 N
N6,7 N O1 O2 O3
O
0.1 0
20
40
60
80
100
Atomic number Z Figure 3.2 Electron binding energies for elements Z = 1–92, labeled by atomic shell using the notation of Table 3.1. The actual values for multielectron atoms diﬀer signiﬁcantly from the Bohr energies of Eq. 3.3 due to screening (Eq. 3.12). The data shown here are from a recent tabulation [Elam 2002] that draws in part upon an older compilation [Bearden 1967]; however, more accurate tabulations now exist [Deslattes 2003], as discussed in Appendix A.
sition energies is to consult tabulations of electron binding energies (see Appendix A and Fig. 3.2).
3.1.1
Xray ﬂuorescence and Auger emission Let’s begin with an atom in its ground state, or lowest energy conﬁguration, and consider what can happen when an xray photon is incident upon it. In the Bohr model, the atom starts out in a state with electrons ﬁlling its orbitals, as shown in Fig. 3.1A. Consider the case in which the incoming photon has suﬃcient energy so that it can remove an n = 1 electron from the atom to the continuum, as shown in Fig. 3.1B; the threshold energy for this is referred to as the K shell ionization potential E K , which is the binding energy of the electron (Eq. 3.3, Fig. 3.2), and the ejected electron is referred to as a photoelectron. The absorption coeﬃcient for the atom increases dramatically at this energy, leading to an xray absorption edge as shown schematically in Fig. 3.3. Because atoms like to have all of their orbitals ﬁlled up in energy order (see Section 3.1.3), an atom with a core electron removed is thermodynamically unstable and an electron from a less strongly bound orbital will want to drop down and ﬁll the hole. However, the atom then has an excess of energy that it will want to release. There are three competing processes for releasing this excess energy:
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.1 The Bohr model, energy levels, and xray shells
n=1 n=2 (core level)
27
n=3 Continuum (fully ionized)
K edge
Absorption cross section Ʊ EK
(Ionization potential)
Photon energy
Figure 3.3 Schematic of a K shell xray absorption edge. When the incident photon energy
reaches the ionization potential for a particular atom (equal to the binding energy of a particular electron orbital; labeled here as E K ), absorption increases dramatically. The ratio of absorption σK at the energy just above the ionization potential relative to σK just below is known as the jump ratio rK = σK /σK (Eq. 3.8), with values as shown in Fig. 3.6.
1. The atom can emit a photon with an energy equal to the diﬀerence between the “ﬁlling” electron’s initial state and the energy of the state from which the photoelectron was ejected (approximated by Eq. 3.6). This is the process of ﬂuorescence, and if the photon’s energy is above about 100 eV (as it is when corelevel electron photoabsorption takes place with all but the few lightest elements) then we refer to the process as xray ﬂuorescence. This process is illustrated in Fig. 3.1C, and its fractional probability or yield is designated with ωK in the case of a vacancy in the K shell. 2. The excess energy between the “ﬁlling” electron’s initial state and the energy of the state from which the photoelectron was ejected can alternatively be released through the ejection of another electron, which is called an Auger1 electron [Auger 1925] (see Fig. 3.1D). Its fractional probability or yield [Zschornack 2007, Eq. 2.78] is designated by a. 3. There can also be transitions between states with the same value of n, leading to transitions to higher subshells within n before the vacancy is ﬁlled by another transition and an electron is ejected (see [Zschornack 2007, Fig. 2.47]). These are known as Coster–Kronig transitions [Coster 1935, Bambynek 1972] with yield [Zschornack 2007, Eq. 2.79] f , and they become signiﬁcant for transitions from L1 , L2 , and L3 shells for elements with Z 70 [Zschornack 2007]. Because these heavier elements are less frequently studied using xray microscopy, and because many tabulations take into account their subsequent eﬀect on ﬂuorescence yields [Bambynek 1972, Hubbell 1994], we will not consider Coster–Kronig transitions further. Which process is more likely to occur? Lumping Coster–Kronig transitions together 1
Note that Auger is pronounced ohZhay, though someone familiar with the holeboring tools required for the ﬁne Minnesota activity of ice ﬁshing (CJ) might be tempted to say Awgerr.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
28
Xray physics
100
KL3 K L2 K
Fluorescence energy (keV)
KM3 M
L1M3
L1M2 M
K
10
L3M5 M
L M4N6
M 1
0.1 0
20
40
60
80
100
Atomic number Z Figure 3.4 Fluorescence emission energies as a function of the atomic number Z. We show here values from the stronger emission lines from one tabulation [Elam 2002]. The notation for the total ﬂuorescence at each ﬁnal atomic state is the IUPAC notation shown in Table 3.2.
with Auger electrons, this is characterized by the ﬂuorescence yield ω, which is the fraction of time that ﬂuorescence occurs. As Figs. 3.5 and 3.7 show, the ﬂuorescence yield ω is quite low for lighter atoms and low xray ﬂuorescence energies. When calculating expected ﬂuorescence signals following xray absorption, one must account for not only the ﬂuorescence yield ω but also the absorption jump ratio [Martin 1927, Compton 1935] r of [Zschornack 2007, Eq. 2.146] σ (3.8) r = , σ as shown in Fig. 3.3. Consider the example of silicon at the K edge, where the ionization potential for the K edge is E K = 1839 eV. As one crosses this absorption edge, the xray absorption cross section σabs increases and this is in turn proportional to the linear absorption coeﬃcient (LAC) of μ, and by extension the imaginary part of the complex number of oscillation modes per atom f2 , as will be shown in Eq. 3.45. For silicon, f2 jumps from 0.367 to 4.16 as one crosses the K edge, giving a jump ratio of 4.16 = 11.3 (3.9) 0.367 according to the tabulation of Henke et al. [Henke 1993] (other tabulations [Elam 2002] give a jump ratio of rK,Si = 10.37). That is, the silicon atom is still absorptive at energies below E K as one excites electrons in the L and M shells, but as one crosses E K the fractional of absorption events that go towards creating K shell vacancies is given by (rK − 1)/rK or (11.3 − 1)/11.3 = 91.2 percent. As a result, the net intensity of K line rK,Si =
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.1 The Bohr model, energy levels, and xray shells
29
Fluorescence yield Ʒ
1
K 0.1 M4 M M5
L L1 L2
M
0.01 L3
0.001 0
20
40
60
80
100
Atomic number Z Figure 3.5 Xray ﬂuorescence yields ω as a function of the atomic number Z. The ﬂuorescence
yield measures the fraction of the time a vacancy in the speciﬁed shell leads to xray ﬂuorescence emission, rather than to the emission of Auger or Coster–Kronig electrons. We show here values for the stronger emission lines from one tabulation [Elam 2002], though other tabulations are available [Bambynek 1972, Krause 1979a, Hubbell 1994, Zschornack 2007]. The notation for the total ﬂuorescence at each ﬁnal atomic state is as shown in Table 3.2.
xray ﬂuorescence emission IK,F will be given from the intensity of xray absorption Iabs by both the ﬂuorescence yield ωK and the edge jump ratio rK [Martin 1927, Compton 1935, Sherman 1955]. One can also distinguish emission into speciﬁc ﬂuorescence lines within a given shell (the K shell in this case) with a factor F, and account for the creation of orbital vacancies or holes by the additional processes of Auger, Coster–Kronig, and radiative transitions with an electron–hole transfer factor T which is unity for the case of K shell ﬂuorescence [Sparks 1980, Bambynek 1972]. With all of these factors, the net ﬂuorescence rate into the Kα1 line (as one example) is given by 1
1 − e−μZ (E) tZ I0 (E) IKα1 (E) = ωK F Kα1 T Kα1 1 − rK
(3.10)
where T Kα1 = 1 in this case, and μZ (E) represents the linear absorption coeﬃcient and tZ the thickness of the ﬂuorescing element Z. In order to use the jump ratio r of Eq. 3.8 in this way, one should ignore absorption cross section changes due to nearedge eﬀects (such as XANES peaks and EXAFS wiggles) since they involve transitions to lowerenergy electronic states (XANES; Section 9.1.2) or ejected photoelectron selfinterference (EXAFS; Section 9.1.7). Jump ratios r calculated for various absorption edges are included in some tabulations of xray optical constants [Elam 2002, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
30
Xray physics
25 20 15
Jump ratio r
10 8 6 K
4
L3 2 L2 L1
1 0
20
40
60
80
100
Atomic number Z Figure 3.6 Xray absorption edge jump ratio r for various xray shells. The jump ratio r (Eq. 3.8)
is the ratio of absorption above and below an xray absorption edge, as illustrated in Fig. 3.3. The data shown here come from one recent tabulation [Elam 2002]. Note the very high value of rL3,K .
Schoonjans 2011a], and are shown in Fig. 3.6. An example of calculating an xray ﬂuorescence rate is given in Section 9.2.3. Innershell electrons can be removed by X rays with suﬃcient photon energy as described above, or by electrons, protons, or other energetic particles. When charged particles are used, one must worry about continuum radiation or Bremsstrahlung backgrounds, as will be discussed in Section 4.10.1. Further details on electronimpact xray sources are provided in Section 7.1.2.
3.1.2
Xray transitions: ﬂuorescence nomenclature The fact that diﬀerent chemical elements produce xrays with diﬀerent absorption properties was ﬁrst characterized by Charles Barkla, as noted in Section 2.1; in the absence of a good physical model, he labeled the two series ﬁrst A and B [Barkla 1909] and later K and L [Barkla 1911]. Within a few months of the publication of Bohr’s model, Henry Moseley realized that it provided the way to explain Barkla’s results, so he undertook a study to measure the xray ﬂuorescence energies from a wide range of elements [Moseley 1913]. If electrons accelerated through a voltage have bombarded atoms in a target and removed some n = 1 electrons through inelastic collisions, then Eq. 3.6 suggests that some n = 2 electrons can subsequently drop down and release an energy
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.1 The Bohr model, energy levels, and xray shells
31
Fluorescence yield Ʒ
1.000
0.100
K shell Al Ne O
Ni
S
In Pd Tc
Zr Sr Kr Se Ge Zn
Ta Yb Ho Gd Pm Ce Cs Te
Ti Ca
L shell
Fe
N Cr
C h=10 B nm
P
Mg Na
F 0.010
Si
K Ar Cl
Cr
Zn Ni Fe
h=3 nm
h=1 nm
Ti
0.001 100
1,000
h=0.3 nm 10,000
Photon energy (eV) Figure 3.7 Xray ﬂuorescence yield ω as a function of xray emission energy, labeled with the
names of some of the elements. This is an alternative way of viewing the same data as is shown in Fig. 3.5.
of
E K−L E0 Z
2
1 1 3 − = E0 Z 2 , 4 12 22
(3.11)
while electrons dropping down from n = 3 to n = 2 can release (5/36)E0 Z 2 and so on. 2 If one writes Eq. 3.11 in terms of a photon’s frequency f , one has h f = (3/4)E 0 Z or √ Z = 2 h/(3E0 ) f , which leads to a linear trend on a plot of Z versus 1/ f . What Moseley found is that he had to use Z → (Z − Zscreen )) with Zscreen 1 to get a good approximate ﬁt for Barkla’s ﬁrst series of xray ﬂuorescence lines, which he interpreted as a reﬂection of partial screening of the nuclear charge by another electron in the n = 1 shell; Moseley suggested an even larger (and, in hindsight, overly large) screening value of Zscreen 7.4 for Lα or L − M transitions. This would lead to a modiﬁcation of the Bohr energy of Eq. 3.3 to E = E0
(Z − Zscreen )2 . n2
(3.12)
In fact, in Rydberg’s paper [Rydberg 1890] on the pattern of spectral lines that was written more than 20 years before Bohr’s theory was developed, another empirical correction term to Eq. 3.3 was introduced which is now called a quantum defect term δ , yielding E = E0
(Z − Zscreen )2 (n − δ )2
(3.13)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
32
Xray physics
where in most cases the quantum defect δ is quite small, but it can reach values of 0.40 (Li) to 4.06 (Cs) for the s orbitals of alkali atoms [Rau 1997]. For atoms other than the alkalis the quantum defect is small, so we can remain with Eq. 3.12 and arrive at a modiﬁed K − L transition energy using Eq. 3.11, which gives 1 1 3 (3.14) E K−L E0 (Z − Zscreen )2 2 − 2 = E0 (Z − Zscreen )2 . 4 1 2 which is suﬃcient for most estimates. While Moseley’s approximate treatment of nuclear charge screening is useful as a rough explanation and has therefore been reproduced in generations of physics textbooks, it is not quite proper [Whitaker 1999], so don’t expect it to provide fully accurate results! As noted in Box 3.1, in early 1926 Erwin Schr¨odinger was inspired by de Broglie’s electron wave hypothesis to come up with a corresponding wave equation [Schr¨odinger 1926a] which he immediately used to ﬁnd a full quantum mechanical solution for electronic states in the hydrogen atom [Schr¨odinger 1926b]. Schr¨odinger’s eigenstate solution is nicely described in a myriad of textbooks on quantum mechanics; see for example [Griﬃths 2004]. It describes electron states in a single electron atom in terms of wavefunctions ψ(r, θ, ϕ) = Rn, (r) Y m (θ, ϕ),
(3.15)
where the radial part of the solution Rn, (r) depends on Bohr’s principal quantum number n and a total angular momentum quantum number , while the angular part of the solution is described in terms of spherical harmonics Y m (θ, ϕ) that depend on the total angular momentum indexed by and also on its zˆaxis projection in the presence of an external ﬁeld, indexed by m . Wolfgang Pauli soon incorporated electron spin (postulated by Goudsmit and Uhlenbeck in 1925, and shown by Dirac to be required in a relativistic version of quantum mechanics) into the solution as an additional quantum number s [Pauli 1927], and this ﬁnally gave a solid theoretical footing for the notion of stable “closed shells” that Bohr had proposed in 1922 and for which Hund had articulated a pattern of orbital ﬁlling soon after [Hund 1925b, Hund 1925a, Kutzelnigg 1996]. When combined with the angular momentum quantum number , electron spin allows one to characterize the total spin of an atom with a quantum index j = ± s.
(3.16)
We shall not recreate the full explanation here since it is described in quantum mechanics textbooks, but with these quantum numbers the set of allowed states for atoms can be found to begin with those shown in Table 3.1. In Tables 3.1 and 3.2, we confront the outcome of a series of unfortunate events, which is that the phenomenology of xray ﬂuorescence lines and their shells was described [Barkla 1911, Siegbahn 1925] before a complete theory of quantum mechanics of the atom emerged. As a result, we are left with multiple conﬂicting notations, including another variant for xray ﬂuorescence lines proposed by the International Union of Pure and Applied Chemistry (IUPAC) [Jenkins 1991]. We can wish it otherwise, but as Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.1 The Bohr model, energy levels, and xray shells
33
Table 3.1 Quantum states for the atom, corresponding to Schr¨odinger’s solution for the
hydrogenlike atom (Eq. 3.15), with n as the principal quantum number, as the orbital angular momentum quantum number and m as its zˆaxis projection, s as electron spin, and j as the total angular momentum. The occupancy of each state is indicated, along with its modern spectroscopic state name and the xray shell notation due to Siegbahn [Siegbahn 1925], since it predates the work of Schr¨odinger and Pauli. As one gets to higher quantum indices, the energy ordering is not as cleanly separated; for Z = 21 (Sc) through Z = 30 (Zn), there is interplay between the energies of the 3d and 4s states, or the M4 , M5 , and N1 shells. In this case, the Siegbahn notation for states coincides with the IUPAC convention [Jenkins 1991]. n 1
0
m 0
s ± 12
2
0
0
± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12 ± 12
2
1
0
2
1
−1, +1
3
0
0
3
1
−1, +1
3
2
−1, 0, +1
3
2
−2, +2
4
0
0
4
1
0
4
1
−1, +1
4
2
−1, 0, +1
4
2
−2, +2
4
3
−2, 0, +2
4
3
−3, +3
5
0
0
5
1
0
5
1
−1, +1
j 1 2 1 2 1 2 3 2 1 2 3 2 3 2 5 2 1 2 1 2 3 2 3 2 5 2 5 2 7 2 7 2 7 2 7 2
Occupancy 2
State 1s
Siegbahn K
2
2s
L1
2
2p1/2
L2
4
2p3/2
L3
2
3s
M1 M2
4
3p3/2
M3
6
3d3/2
M4
4
3d5/2
M5
2
4s
N1
2
4p1/2
N2
2
4p3/2
N3
2
4d3/2
N4
2
4d5/2
N5
2
4 f5/2
N6
2
4 f7/2
N7
2
5s
O1
2
5p1/2
O2
2
5p3/2
O3
the Danish philosopher Søren Kierkegaard famously wrote, “Life can only be understood backwards, but it must be lived forwards.” We therefore show in Fig. 3.8 the xray transitions that can be expected from a zinc atom (Z = 30) in the Siegbahn notation. These transitions allow for the selection rules [Agarwal 1991, Eq. 2.83] of Δn 0 Δl = ±1 Δ j = 0, ±1
(3.17)
in quantum mechanics. These selection rules are based on the orthogonality of the spherical harmonics Y m (θ, ϕ) of Eq. 3.15 when calculating transition rates using Fermi’s Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
34
Xray physics
Table 3.2 Xray ﬂuorescence transitions written in several notations, including using initial and
ﬁnal quantum states, the xray ﬂuorescence line identiﬁcation of Siegbahn [Siegbahn 1925], and the IUPAC recommended notation [Jenkins 1991]. Transitions such as Kα3 and Lβ2 are forbidden by selection rules for singleelectron wavefunctions, but it can be weakly present in multielectron atoms (see for example [Agarwal 1991, Table 2.5]). Some xray databases [Elam 2002] provide xray ﬂuorescence yields ω corresponding to speciﬁc ﬁnal states such as ωK , and within those ﬁnal states they further indicate relative ﬂuorescence intensity from various initial states. Initial state 2p3/2 2p1/2 2s 3p3/2 4p3/2 4p1/2 3p1/2 3d5/2 3d3/2 3d3/2 4p5/2 4d3/2 4p1/2 4 f7/2 4 f5/2
Final state 1s 1s 1s 1s 1s 1s 1s 2p3/2 2p3/2 2p1/2 2p3/2 2p1/2 2s 3d5/2 3d5/2
IUPAC KL3 KL2 KL1 KM3 KN3 KN2 KM2 L3 M5 L3 M4 L2 M4 L3 N5 L2 N4 L1 N2 M5 N7 M5 N6
Siegbahn Kα1 Kα2 Kα3 (forbidden) Kβ1 Kβ2I Kβ2II Kβ3 Lα1 Lα2 Lβ1 Lβ2 (forbidden) Lγ1 Lγ2 Mα1 Mα2
Golden Rule of Γ=
2π  ψ f  H ψi 2 ρ f
(3.18)
for the transition rate Γ between initial ψi and ﬁnal ψ f quantum states when coupled by a Hamiltonian H, where ρ f gives the density of ﬁnal states. Because multielectron quantum states diﬀer slightly from those given by Eq. 3.15, “disallowed” transitions can be weakly present (see for example [Agarwal 1991, Table 2.5]). In Table 3.2, we list several transitions in the various notations. While a full and accurate calculation of xray ﬂuorescence energies and yields corresponding to various xray transitions requires a quantum mechanical solution for multielectron atoms, the experimental results are tabulated [Bambynek 1972, Krause 1979a, Elam 2002], including in computerreadable formats as described in Appendix A. Some of these tabulations include not only the overall ﬂuorescence yield by ﬁnal atom state (such as ωK , ωL2 , ωL3 , and so on), but also the relative intensities from various initial states [Salem 1974, Elam 2002] as expressed by ratios like Kα2 /Kα1 in Siegbahn notation (Table 3.2). Using the strongest of these transitions, we show ﬂuorescence yields in Figs. 3.5 and 3.7, and ﬂuorescence energies for the stronger emission lines in Fig. 3.4. The ﬂuorescence energies show the general Z 2 trend as expected from Moseley’s law, and it is also clear that ﬂuorescence is small compared to Auger emission in lighter atoms. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.1 The Bohr model, energy levels, and xray shells
2 5
Binding energy (eV)
10
Fermi level
9
20
j
Xray shell
0
Ɛ
Quantum state
n
35
4
1 1/2 4p1/2 N2
4
0 1/2 4s
3 3
2 5/2 3d5/2 M5 2 3/2 3d3/2 M4
3 3 3
1 3/2 3p3/2 M3 1 1/2 3p1/2 M2 0 1/2 3s M1
2 2 2
1 3/2 2p3/2 L3 1 1/2 2p1/2 L2 L1 0 1/2 2s
1
0 1/2 1s
N1
50 87 100 200 500 1000
137
LƠ1 LƠ2 Lơ1
1021 1044 1194
2000
KƠ1 KƠ2 Kơ1 Kơ3
5000 10,000
9659
K
Figure 3.8 Electron energy levels and several xray ﬂuorescence transitions for Zn, showing the
correspondence of Barkla’s notation for xray shells to electron state notation in quantum mechanics. Additional states are listed in Table 3.1, and additional transitions are listed in Table 3.2. In the transition metals from Z = 21 (Sc) to Z = 30 (Zn), there is interplay between the energies of the 3d and 4s states. We show here the energies of the various states as tabulated by Zschornack [Zschornack 2007], though they are similar to those of Bearden and Burr [Bearden 1967] and Deslattes et al. [Deslattes 2003].
3.1.3
Beyond the core: the Fermi energy, valence electrons, and plasmon modes As much as an xray physicist might want to think otherwise, most of what happens in the world is driven not by the properties of corelevel electrons in atoms and xray transitions with photon energies of 102 –105 eV, but instead by the outermost electrons which have binding and transition energies of a few eV. This is the realm of chemical bonds and visible light interactions (see Box 3.2), and these electronic states can be accessed via nearabsorption edge resonances in xray spectromicroscopy (as will be discussed in Section 9.1.2). As a result, it is important for xray microscopists to step back from >100 eV photon chauvinism and begrudgingly acknowledge the importance of electrons in the outer, weakly bound states of atoms. When considering an ensemble of atoms, the Fermi–Dirac distribution function fFD (T )
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
36
Xray physics
of fFD (T ) =
1 exp[(E − E F )/(kB T )] − 1
(3.19)
describes the probability that a quantum state of energy E will be occupied by an electron; it involves Boltzmann’s constant of kB = 8.62 × 10−5 eV/kelvin = 1.38 × 10−23 J/K,
(3.20)
the absolute temperature T in kelvin, and the Fermi energy E F of the atom. At zero temperature, electrons ﬁll up all available quantum states until the Fermi energy −E F is reached, after which no higherenergy states are occupied; therefore, there is an energy gap between the last occupied state and the vacuum (the state of unbound electrons traveling with zero velocity). Electronic state occupancy distributions cease this allornothing behavior at ﬁnite temperatures, but since Fermi energies are typically a few eV and room temperature corresponds to an energy kB T 1/40 eV, the roomtemperature Fermi–Dirac distribution function of Eq. 3.19 still has a fairly abrupt transition of occupancy going from 1 to 0. What lies beyond the Fermi energy? In isolated atoms such as in a gas, the Bohr model tells us that there are availablebutunoccupied quantum states with higher values of the principal quantum number n, so there exist feweV transitions that outershell electrons can make to these available states (energies that can be supplied or released via visiblelight photons). In solids, symmetric and antisymmetric interactions between electron states in neighboring atoms lead to bands of allowed energy states rather than the discrete levels shown in Fig. 3.8, and the relationship between these bands and the Fermi level2 determines the electron transport characteristics of the material: • In conductors, the Fermi level lies within the conduction band so that there is essentially no energy cost (aside from losses associated with occasional electron inelastic scattering) to electron transport. • In semiconductors, there are allowed electron states (bands) at energies only a few multiples of the thermal energy kB T away, so that according to the Fermi–Dirac distribution of Eq. 3.19 there can be some population in these states and thus weak electrical conductivity. Dopants can shift the Fermi level relative to the band structure, thereby leading to large changes in conductivity. • In insulators, the next available electron states (bands) might lie many multiples of kB T away, so that unless the material is subjected to a very high electric ﬁeld there is no opportunity for electrons to “jump” far enough above the Fermi level and become transported. Fermi levels, band structure, and resulting material electronic properties are discussed further in introductory texts on solid state physics (see for example [Harrison 2011, Ashcroft 1976]). Chemical bonds between atoms involve electrons in the atom’s last, most weakly 2
The Fermi energy is properly deﬁned only at zero temperature, while the Fermi level is the ﬁnite temperature quantity that aﬀects the distribution of Eq. 3.19.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.1 The Bohr model, energy levels, and xray shells
37
Box 3.2 Chemical bond dissociation energies Consider the C–H bond involved in removing the ﬁrst hydrogen atom when burning methane. The bond dissociation energy is 104 kilocalories/mole or 435 kilojoules/mole, so that the energy per molecule in terms of electron volts or eV is (435×103
1 eV J 1 mole eV )·( . )·( ) = 4.5 mole 1.602 × 10−19 J NA = 6.02 × 1023 molecules molecule
This is the energy of an ultraviolet photon with a wavelength (Eq. 3.7) of λ = (1240 eV · nm)/(4.5 eV) = 275 nm. Chemical bond dissociation energies span a range that includes 142 kJ/mol for the O–O peroxide bond and 1072 kJ/mol for dissociation of carbon monoxidea (CO), corresponding to 1.5–11 eV or 110–840 nm. a
One can only hope that the state of Colorado in the USA won’t dissociate, in spite of its twoletter abbreviation!
bound occupied shell (the states just below, or at, the Fermi level). These are referred to as the atom’s valence electrons. Chemical bonds tend to involve energies in the range 1.5–11 eV (see Box 3.2). The eﬀect of chemical bonding on nearedge xray absorption spectra is discussed further in Section 9.1.2. When atoms are in the close proximity that a solid provides, their valence electron states are altered by coupling with their neighbors. This can give rise to a collective oscillation mode known as the plasmon resonance. In order to calculate this, we ﬁrst need to consider the number density of atoms na of na =
ρNA , A
(3.21)
where ρ is the material’s density (typically g/cm3 ), NA = 6.02 × 1023 is Avogadro’s number, and A is the atomic weight (typically g/mole) of the atom type (mixtures of atom types are considered in Section 3.3.5). The electron density ne in the material is then given by ne = Zna ,
(3.22)
where Z is the element number, and thus the number of electrons per neutral atom. For nondelocalized electrons such as those in insulating solids (as opposed to the case of semiconductors and conductors, where some electrons move as if they had a reduced mass m∗ ), the plasmon frequency ω p is given by
ne e2 ωp = , (3.23) me 0 which leads to a plasmon excitation energy of E p = ω p . In Box 3.3, we estimate the energy of the plasmon resonance in glass to be E p = 30 eV, and most solids have strong collective excitation modes in the 8–50 eV energy range (see Fig. 3.15 for an example of the plasmon resonance in amorphous ice). This becomes important when we consider Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
38
Xray physics
Box 3.3 Plasmon mode energy in fused silica In Section 3.3.5, we show that the mean electron density n¯ e for fused silica (amorphous SiO2 with a density of 2.20 g/cm3 ) is n¯ e = 6.61×1029 electrons/m3 . This leads to a plasmon mode energy E p using Eq. 3.23 of
ne e2 h E p = ω p = 2π m 0
(6.61 × 1029 m−3 ) · (1.60 × 10−19 C)2 6.63 × 10−34 J · s = 2π (9.11 × 10−31 kg) · (8.85 × 10−12 C2 /N · m2 ) 1 eV ) = 30.2 eV. = (4.83 × 10−18 joules) · ( 1.60 × 10−19 joules This photon energy corresponds to a wavelength of E = hc/λ = 41 nm. Since the plasmon oscillations are quickly damped, their lifetime is short (with a standard deviation σt ), so their standard deviation in energy σE is broad according to the Heisenberg uncertainty principle [Heisenberg 1927] which can be written as [Griﬃths 2004, Eq. 3.63 and 3.70] (3.24) σE σt ≥ . 2 Because of this broad energy distribution centered on something like 30 eV, photon absorption begins to increase signiﬁcantly at photon energies above about 8 eV, or wavelengths below about λ = 160 nm (see Fig. 3.9).
the refractive index n of materials in Section 3.3; in particular, in Section 3.3.2 we will see that plasmon resonances set the great (strongly absorbing) divide between the lowfrequency, visiblelight form of the refractive index, and the highfrequency, xray form.
3.2
Atomic interactions, scattering, and absorption Having discussed energy levels and transitions in atoms, we now wish to consider their more general interactions with photons. This will be done in terms of atomic cross sections σ, which are often written in units of barns, where 1 barn = 10−24 cm2 (legend has it that the young American physicists from farm country who worked on the Manhattan Project to develop nuclear weapons came to regard a cross section of this size as “big as a barn door”). The cross section σ is related to the mean free path Λ by3 Λ= 3
1 , σna
(3.25)
We have chosen to use Λ for mean free paths so as to reserve λ for describing the wavelength of X rays and other electromagnetic waves.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.2 Atomic interactions, scattering, and absorption
200 130 100
80 70
/LQHDU$EVRUSWLRQ&RHIILFLHQWѥP
1
60
50
40
39
30
ѤLQQP
Absorption of fused silica 0.1
0.01 5
10
15
20
25
30
35
40
Photon energy (eV) Figure 3.9 Linear absorption length μ−1 for fused silica (a common optical glass) in the ultraviolet (UV) and extended ultraviolet (XUV) wavelength range. The linear absorption coeﬃcient (LAC) μ leads to internal absorption of a beam within a medium of thickness t according to exp[−μt] (Eq. 3.76). This shows how visible light optics become strongly absorptive at wavelengths shorter than about 160 nm, with the large absorption resonance arising due to plasmon modes (collective oscillations of the electrons in the glass; see Section 3.1.3 and Box 3.3). The plasmon modes set the great dividing line between low and highfrequency forms of the refractive index n. The data shown here were compiled from several sources [Kitamura 2007], and are available via an internet search for pilon silica optical properties xls.
where na is found from Eq. 3.21. The fraction of particles that are removed from the “unaﬀected” category for a beam of intensity I over a thickness of material x is given by I dI = −I na σ = − dx Λ
(3.26)
so that the unaﬀected fraction of the beam declines as I = I0 exp[−x/Λ]. Therefore the mean free path Λ represents the thickness over which the unaﬀected fraction of the beam declines to 1/e = 0.368 of its original value. While photon cross sections with materials include phenomena such as pair production at much higher energies [Hubbell 1980], the main three interactions of interest to xray microscopists are: Photoelectric absorption σabs : a photon is entirely absorbed by an atom. Following photoelectric absorption and emission of an electron, an atom can release its energy by either emission of a characteristic xray, or by emission of an Auger electron (see Section 3.1.1). Elastic (coherent) scattering σel : a photon is scattered by the atom with no transfer of Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
40
Xray physics
energy (well, almost – see Box 4.3). This process was described by Rayleigh as one wherein electrons oscillate in the electric ﬁeld of the incident photon, and reradiate a wave at the same frequency. The elastically scattered photon is locked in phase with the incident photon (with a phase shift described by Eq. 3.53), which is why elastic scattering is sometimes called coherent scattering. Inelastic (incoherent, Compton) scattering σinel : a photon is scattered inelastically by imparting kinetic energy to an electron. Conservation of momentum and energy leads to the Compton relationship between the incident wavelength λ, the scattered photon of wavelength λ at an angle θ relative to the incident photon, and the electron mass me of λ = λ +
h (1 − cos θ) me c
(3.27)
from which one can ﬁnd an expression for the energy decrease ΔECompton of the inelastically scattered photon with energy E0 = hc/λ of ΔECompton =
E02 (1 − cos θ) me c2
(3.28)
where me c2 = 511 keV
(3.29)
for an electron. For 180◦ backscattering of a 10 keV photon, Eq. 3.28 gives ΔECompton = 0.39 keV. The cross section for this process was calculated by Klein and Nishina [Klein 1928, Klein 1929]. There can also be inelastic energy transfers to electron energy states in the atom, but the cross section for Compton scattering from valence electrons usually dominates. An inelastic scattered photon loses its phase relationship relative to the incident photon, which is why the process is sometimes called incoherent scattering. The relative strength of these interactions for carbon is shown in Fig. 3.10. As can be seen, interactions in the E 30 keV energy range can be well described with a total cross section σtot composed of photoelectric absorption σabs and elastic scattering σel as σtot = σabs + σel .
(3.30)
In transmission xray microscopy of lighter materials below about 20 keV, we can largely ignore the eﬀects of inelastic or Compton scattering. However, Compton scattering adds a background signal that must be considered when considering weak elastic scattering at high angles, or the detection of xray ﬂuorescence as discussed in Section 9.2. Examination of Fig. 3.10 reveals something important about xray interactions: there are no cloudy days in xray microscopy! What is meant by that sunny statement? In a cloud, or in fog, you have reasonably good light transmission but very little ability to form sharp images of objects. Of course the reason for this is that visible photons are Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.2 Atomic interactions, scattering, and absorption
41
10
1
0.1
0.01
107 106
Carbon n tio rp so Ab
105 104 103 102 10
m abs
Cross section (barns = 1024 cm2)
h (nm) 100
Elastic scattering mel
Inelastic scattering m
1
inel
100 101 102 101
102
103
104
105
106
Energy (eV) Figure 3.10 Photon cross sections in carbon as a function of energy, showing the contributions of diﬀerent processes: photoelectric absorption σabs (with an absorption edge at ∼290 eV), elastic scattering σel , and Compton scattering as the dominant form of inelastic scattering σinel . This ﬁgure shows that, for xray microscopy of lighter materials, absorption dominates and plural scattering can be ignored at photon energies below ∼30 keV. Data from Hubbell [Hubbell 1980]; see also Fig. A.2.
multiply elastically scattered on their path from the object to your eye, so that the ray directions of the light are lost. With X rays, Fig. 3.10 shows that absorption dominates over scattering over an energy range up to about 20 keV (somewhat higher energies for higher Z materials; see Fig. A.2). As a result, if a photon is scattered, it is far more likely that any subsequent interaction of that photon will be an absorption event. This in turn means that multiple scattering events are very unlikely. The situation in electron microscopy is very diﬀerent (electrons are never simply absorbed, but instead undergo both elastic and inelastic scattering), as will be discussed in Section 4.10.2. In electron microscopy, this leads to diﬃculties in interpreting signals from samples that have a thickness of many scattering mean free paths.
3.2.1
Scattering by a single electron In order to examine scattering processes in greater detail, we begin by considering the scattering of radiation by a single electron. Following conventional treatments of scattering processes, we suppose that an xray amplitude ψ(r) must have an asymptotic form comprising a plane wave incident along the z axis and a spherical scattered wave with a dependence on the polar angle θ and azimuthal angle ϕ (Fig. 3.11) of ψ(r) = exp[−ik0 z] +
exp[−ik0 r] F(θ, ϕ), r
(3.31)
2π λ
(3.32)
where k0 =
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
42
Xray physics
y
z Ƨ
x Figure 3.11 Geometry for xray scattering, with θ as the deﬂection angle and ϕ as the azimuthal
angle.
is the wave number k in vacuum, and λ the wavelength. (We have made here a particular choice of sign convention for forwardpropagating waves; see Box 3.4). The form factor F(θ, ϕ) is related to the diﬀerential cross section dσ/dΩ by [Eisberg 1964] dσ = F(θ, ϕ)2 . dΩ
(3.33)
When the scatterer is a single free electron at the origin, the process is simple Thomson scattering and, for radiation linearly polarized along ϕ = 0, we have [Jackson 1962] (3.34) F(θ, ϕ) = −re 1 − sin2 θ cos2 ϕ, where re = 2.818 × 10−15 meters
(3.35)
is the classical radius of the electron. Integrating Eq. 3.33 with Eq. 3.34 over θ and ϕ leads to the Thomson total cross section for one electron: σThom =
8 2 πr . 3 e
(3.36)
Averaging Eq. 3.33 over all incident polarizations [Jackson 1999, Eq.14.125] leads to a cross section for unpolarized radiation of 2 dσ 2 1 + cos θ . (3.37) electron = re dΩ 2
3.2.2
Scattering by an atom For an assembly of electrons, it can be shown [Lipson 1958] that, within the ﬁrst Born approximation (see Section 3.3.4), the xray amplitude G(q) diﬀracted in direction k by an electron density distribution ρ(r) relative to a single free electron at the origin is given by +∞ (3.38) ρ(r ) exp[−iq · r ]d3r. G(q) = −∞
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.2 Atomic interactions, scattering, and absorption
43
Here, k0 and k represent incident and scattered wave vectors, and q = k0 − k is the momentum transfer so that q = q = 2k0 sin(θ/2) (see also Box 4.2). In the particular case that ρ(r) represents an atom, G(q) can be regarded as an atomic form factor f0 (q). For the forward direction (θ = 0), the integral of Eq. 3.38 shows that f0 (0) = Z. For a spherically symmetric atom, the integrals over the polar and azimuthal angles can be carried out so that the (in this case) real function f0 (q) is given by ∞ sin(qr) 2 r dr. ρ(r) (3.39) f0 (q) = 4π qr 0 can be calculated for particular wavefunctions, and are tabulated [Lonsdale Values of f0 (K) 1962]. This procedure accounts for the spatial distribution of the atomic electrons but not for the fact they are bound. That is, it applies for normal but not for anomalous dispersion (dispersion around an absorption edge, as discussed in Section 3.4). In order to represent the eﬀects of anomalous dispersion, we replace f0 (q) with a complex number f˜ representing the eﬀective number of electrons per atom (more properly, the number of electron oscillation modes per atom, as will be described in Section 3.3) which is deﬁned in view of Eqs. 3.33 and 3.34 by F(θ, ϕ) = − f˜ re 1 − sin2 θ cos2 ϕ. (3.40) atom In the xray scattering and crystallography communities, it is conventional to write f˜ as f˜ = f0 (q) + Δ f + iΔ f ,
(3.41)
where  f0 (q) Z, the atomic number and thus total number of electrons in the atom, while Δ f and Δ f are small and, for light elements, are independent of θ and ϕ [Lonsdale 1962]. In the soft xray optics and xray microscopy communities, the standard notation (see Eq. 3.65, and Henke [Henke 1981]) is f˜ = f1 + i f2 ,
(3.42)
where values of f1 and f2 are well tabulated (see Appendix A, and Fig. 3.16). When the atom is much smaller than the xray wavelength, as it is in the soft xray region, the amplitudes scattered by the individual electrons add coherently for all values of the polar angle. On the other hand, for shorter wavelengths, the coherent superposition is applicable only near the forward direction. Thus the f1 and f2 tables apply for all q vectors in the soft xray range, but only for q → 0 in the hard xray region (this leads to problems in reconciling diﬀerent tabulations of xray optical constants, as will be discussed in Appendix A). The expression for F(θ, ϕ) given in Eq. 3.40 allows us to apply the optical theorem [Jackson 1999, Eq. 10.139] of σ = (4π/k)Im[ f (0)] to write the total atomic cross section σT as σT = 2λ Im [F(θ = 0)] = 2λre f2 ,
(3.43)
where σT is equal to the sum of the cross sections for absorption and scattering. Because in most cases in xray microscopy we do not consider angular variations in scattering, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
44
Xray physics
2000
CSDA range (nm)
1000
Polystyrene 100
Si Au
10
1 20
100
1000
10,000
Electron energy (eV) Figure 3.12 The range of secondary electrons in polystyrene, silicon (Si), and gold (Au) as a function of the energy of the primary electron [Ashley 1976, Ashley 1978] (more recent compilations are available [Tanuma 1988, Tanuma 2011]). This calculation is in the continuous slowing down approximation, so it is known as the CSDA range of electrons. This ﬁgure illustrates how xray absorption by one atom can lead to ionizing radiation damage to many other molecules in the vicinity, an eﬀect that is also shown in Fig. 11.4. Note that the mean free path for inelastic scattering of individual electrons is shown in Fig. 6.9.
we assume that f1 and f2 do not depend on q so that the elastic scattering cross section can be found by integrating Eq. 3.40 over θ and ϕ (as was done to obtain Eq. 3.36 from Eq. 3.34), giving σel =
8 2 ˜2 8 2 2 πr  f  = πre ( f1 + f22 ), 3 e 3
(3.44)
which holds roughly for λ 1 nm. The fact that F(θ, ϕ) is real in Eqs. 3.34 and 3.40 might suggest that the optical theorem will require that σT = 0 for a single free electron. This surely cannot be true; the explanation is that while the optical theorem is exact, Eqs. 3.34 and 3.40 are approximations which neglect the eﬀect of the energy radiated on the motion of the driven electron. When this is taken into account [Heitler 1954], F(θ, ϕ) acquires an extra factor [1 − i4πre /(3λ)]. The value of the imaginary part of this factor is too small to be important for most purposes (although it was apparently known to Thomson), but it is exactly the factor needed to reproduce the correct Thomson total cross section σThom = (8/3)πre2 by application of the optical theorem. Examination of Eqs. 3.43 and 3.44 allows some further conclusions (beyond the one that there are no cloudy days in xray microscopy!) concerning the magnitudes of σT and σel . The latter is much smaller because re λ in cases of interest to us. This is reﬂected in Fig. 3.10, which shows that σT is dominated by absorption at most energies of interest for xray microscopy. For free atoms, the dominant eﬀect is photoelectric absorption, while the inelastic (Compton) cross section is negligible in the soft xray range. However, in spite of the relatively small size of the atomic elastic cross section, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
45
it would be wrong to conclude that elastic scattering is unimportant in soft xray optical systems. If the amplitudes scattered by the atoms in a unit of matter are added coherently, the total scattered intensity scales with the square of the volume of the unit while the total absorbed intensity scales linearly with volume. Therefore, as the unit becomes larger, scattering is increasingly favored over absorption so long as the superposition continues to be coherent. This property is exploited in gratings and zone plates, and is considered further below. (Similar coherence arguments apply to other radiation, such as electrons, neutrons, or hard X rays, though the angular extent of the enhancement scales as the wavelength and is restricted to small angles in electron microscopy, for example). We will see in Section 3.3.3 that the linear absorption coeﬃcient (LAC) [J¨onsson 1928] for xray propagation in media can be written as μ = 2λre na f2 = na σabs ,
(3.45)
which is consistent with Eq. 3.43 because, for uniform matter, the elastically scattered amplitudes add to zero in all directions except the forward. For observable scattering to take place there must therefore be some degree of nonuniformity – and a specimen without nonuniformity is a pretty boring specimen for xray microscopy! The consequence of photoelectric absorption is the generation of Auger and photoelectrons (along with xray ﬂuorescence photons). These energetic electrons then undergo inelastic scattering to produce a cascade of lowerenergy electrons, and the typical range of this electron shower as a function of the primary electron energy is shown in Fig. 3.12 (the inelastic mean free paths of lowenergy electrons in several materials are shown in Fig. 6.9). Remember that chemical bonds have energies of only a few eV (Box 3.2), so one primary xray absorption event can lead to many electrons that damage many chemical bonds. One can use electron optics to collect the Auger and photoelectrons for sensitive surface microscopies with samples that are suitably vacuum compatible; this is what is done in scanning photoelectron emission microscopy (SPEM; Section 6.4) and photoemission electron microscopy (PEEM; Section 6.5).
3.3
The xray refractive index We have described above the characteristics of how individual photons with speciﬁc energy and momentum interact with individual atoms. As the joke from quantum mechanics goes, we treat a photon as a particle on Mondays and Wednesdays, and as a wave on Tuesdays and Thursdays (since there are so few quanta in need of repair, apparently their mechanics are able to take threeday weekends – either that or they work in France). Since you’re probably reading this section on a Tuesday or a Thursday, it’s time to consider how electromagnetic waves interact with refractive media in order to understand more about xray physics. We will discuss the essence of the story here; further details are provided in Appendix B, available online at www.cambridge.org/Jacobsen. Let’s start with a short review of a simple physical system: the damped, driven harmonic oscillator. A typical experiment involves a cart with mass m on a lowfriction
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
46
Xray physics
track, connected to an oscillating driving force through a spring; the cart might have a sail on it to provide a velocitydependent damping force. With a periodic driving force of F0 cos(ωt), where ω = 2π f is the angular frequency associated with an oscillation period T = 1/ f , we can write the sum of the forces F = ma leading to acceleration a = d2 x/dt2 for a mass m as dx d2 x = −kx + (−b) + F0 cos(ωt) (3.46) 2 dt dt with an opposing spring force −kx proportional to an oﬀset from x = 0, and a velocitydependent damping force −b dx/dt opposing the motion. This can be rearranged into a more convenient diﬀerential equation form as m
d2 x dx F0 iωt e + γ + ω20 x = (3.47) 2 dt m dt involving a resonant frequency ω0 of k , (3.48) ω0 = m a damping coeﬃcient γ = b/m, and notation based on complex numbers such that ˜ iθ ] indicates the observable quantity. The solution to the diﬀerential the real part Re[Ae equation of Eq. 3.47 can be written in the form x(t) = Re[A(ω) eiωt−δ(ω) ].
(3.49)
The drivingfrequencydependent magnitude response A(ω) is given by F0 /m F0 /m = A(ω) = (ω20 − ω2 )2 + (γω)2 (ω20 − ω2 )2 + (ωω0 /Q)2 and the phase retardation δ(ω) can be found from γω tan[δ(ω)] = 2 , ω0 − ω2
(3.50)
(3.51)
with both quantities plotted together in Fig. 3.13. In the second form of Eq. 3.50, a quality factor Q ≡ ω0 /γ was used, in which case the magnitude response at the resonance frequency becomes QF0 (3.52) A(ω0 ) = k while the phase becomes π (3.53) δ(ω0 ) = , 2 which describes the phase shift imparted to an elastically scattered photon. It is easy to remind oneself of the properties of a damped, driven mechanical oscillator by using your hand as the driving force for a string on which hangs a mass as a pendulum harmonic oscillator: • At frequencies well below the resonance, the motion is very nearly in phase with the driving force. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
12 10
For Q = 10 180°
3.0 2.5
8
2.0 6
1.5
4
1.0
2 0
ƣƷ
A(Ʒ)
47
0.5
0° 0.6 0.8 1.0 1.2 1.4
0.0 1.6
Ʒ/Ʒ0 Figure 3.13 Resonance in the classical damped, driven harmonic oscillator. As the driving frequency ω is increased towards the resonant frequency ω0 , the amplitude A(ω) of the response increases (Eq. 3.50), as does the phase retardation δ(ω) found from Eq. 3.51. Above resonance, the amplitude response quickly decreases, while the phase retardation increases towards 180◦ .
• At the resonance frequency, the motion of the object is exactly 90◦ behind the phase of the driving force (Eq. 3.53) but the magnitude of oscillation is at a maximum. • Above the resonance frequency, the motion of the object approaches an opposite phase from the driving motion. • And here’s a bit more of a subtle point: the shape of the magnitude response curve A(ω) shown in Fig. 3.13 is a bit asymmetric, with a slightly higher response below the resonance than above. All of these points give us a helpful mechanical model for understanding the refractive properties of media for electromagnetic waves of diﬀerent wavelength.
3.3.1
Electromagnetic waves in media One of the great triumphs of classical physics was the uniﬁcation by Scottish physicist James Clerk Maxwell of several electromagnetic phenomena into a selfconsistent set of equations, which Oliver Heaviside (who was selftaught) later simpliﬁed using vector calculus [Hunt 1991a] into the four equations we know collectively today as Maxwell’s Equations (see Appendix B at www.cambridge.org/Jacobsen). Based on earlier work by Weber and Kohlrausch (see [Kirchner 1956, Kirchner 1957]), Maxwell found that electromagnetic waves in linear media should travel with a phase velocity of vp =
ω 1 λ = = √ T k μm
(3.54)
where μm is the magnetic permeability and is the electric permittivity of the medium. In the case of electromagnetic waves in a vacuum, the resulting phase velocity is the speed of light √ (3.55) c ≡ 1/ μ0 0 = 2.9979 × 108 m/s, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
48
Xray physics
where μ0 can be found by measuring (for example) the magnetic ﬁeld produced by an electric current in a wire, and 0 can be found by measuring the capacitance between two plates in a vacuum. Maxwell noted with some triumph [Maxwell 1861] that The velocity of transverse undulations in our hypothetical medium, calculated from the electromagnetic experiments of M.M. Kohlrausch and Weber, agrees so exactly with the velocity of light calculated from the optical experiments of M. Fizeau, that we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena. [Italics as in the original.]
With a linear medium other than vacuum, from Eq. 3.54 one can write the phase velocity as c (3.56) vp = , n where
n≡
μm μ0 0
(3.57)
is the index of refraction n (equivalently, the refractive index n) of the medium. As is shown in detail in Appendix B at www.cambridge.org/Jacobsen, the strength of the magnetic component of electromagnetic waves is much weaker than the electric component is, and the magnetic permeability μm = μ0 (1 + χm )
(3.58)
of most media is much closer to the value μ0 in vacuum than is the case for electric permittivity = 0 (1 + χe )
(3.59)
in media relative to the value 0 in vacuum. That is, χm tends to be extremely small while χe tends to be somewhat small compared to unity (Appendix B), and furthermore with electromagnetic waves the magnetic ﬁelds tend to be small while the electric ﬁelds are appreciable (Appendix B.3). As a result, one can consider these electromagnetic waves in media as producing mainly a dielectric response due to the displacement of the atom’s electron charge distribution from the nucleus, as shown in Fig. 3.14. This leads to the Drude model of the refractive index [Drude 1902] where one can write the refractive index n for waves of driving frequency ω propagating in a dielectric medium according to Eq. B.27 in Appendix B, or [Griﬃths 1989, Eq. 9.170] n≡
gj na e2 k =1− (ω2 − ω2j ) + iγ j ω , 2 2 2 2 2 k0 2me 0 j (ω j − ω ) + γ j ω
(3.60)
where g j are the weights for each of j electron oscillation modes with associated resonant frequencies ω j , na is given by Eq. 3.21, me is the mass of an electron (Eq. 3.29), and each electron oscillation mode has a damping coeﬃcient γ j . Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
E



+


 +   
49
E
d +q
+ 
q
p
Figure 3.14 A simple model of inducing a dipole moment on an atom. With no applied electric ﬁeld (left), an idealized atom has a positively charged point nucleus and a symmetrically distributed negative charge cloud due to the electrons. When placed in an electric ﬁeld, the fact that the positively charged protons are nearly 2000 times more massive than the electrons means that the nucleus stays pretty much in place, while the electron cloud is displaced, leading to a dipole moment p for the atom.
3.3.2
The great frequency divide and the refractive index In the general expression for the refractive index of Eq. 3.60, we have in the denominator a term (ω2i − ω2 ) involving a particular resonant frequency ω j and the driving frequency ω. When ω matches a particular oscillator mode’s resonant frequency ω j , that mode will contribute a local maximum in energy transfer (absorption) as well as a phase resonance reminiscent of the mechanical resonator case shown in Fig. 3.13. However, because an atom has many quantum states for its electrons and thus many oscillator modes, the response of one particular mode may be a rather minor contributor to the overall frequencydependent polarization of the material and thus its refractive index. In order to better characterize the overall refractive index and thereby see when we have the term (ω2i − ω2 ) predominantly positive or negative, we have to ask: what are the dominant oscillator modes of electrons in atoms? The answer is that the plasmon modes are the dominant modes, setting forth a great frequency divide between low and highfrequency forms of the refractive index for materials. In Section 3.1.3 we showed the expression of Eq. 3.24 for the plasmon energy, and in Box 3.3 we estimated that ω p = 30 eV for fused silica, though we noted that the short lifetime of plasmon resonances can lead to a rather broad distribution about that energy due to the Heisenberg uncertainty principle (Eq. 3.24). In Fig. 3.9 we showed that this is indeed indicative of the broad optical absorption response of fused silica, which in fact has very good ultraviolet transmittance compared to many other glasses. The dominance of the plasmon response appears across a range of materials and measurement methods. As an example, we show in Fig. 3.15 a spectrum of inelastically scattered electrons in amorphous ice (Section 11.3.1), which shows that the relative probability for inelastic energy transfer is highest in the plasmon range, with a maximum at about 20 eV, which is what Eq. 3.24 gives using the electron density of ice. Since the plasmon frequency ω p is the great divide in the dielectric response of materials to electromagnetic waves, the general expression of Eq. 3.60 has diﬀerent reduced approximations in the case for ω ω p , or visible light, compared to the x
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
50
Xray physics
Equivalent photon wavelength (nm) 200 100 60
40
20
Electron inelastic scattering strength (A.U.)
1.0
Extreme UV
Visible Vacuum light UV 0.8
0.6
Singleelasticscatter spectrum of vitreous ice at 100 keV (data from Rchard Leapman, NIH)
0.4
Plasmon peak
0.2
0.0 0
200 100
20
40
60
80
100
Electron energy loss (eV)
Figure 3.15 The inelastic energy loss spectrum of 100 keV electrons in amorphous ice, showing the dominance of plasmon mode losses in the 10–40 eV energy range (not unlike the case for fused silica; see Fig. 3.9). Electron energy loss spectroscopy (EELS) data acquired in a 100 kV electron microscope by Richard Leapman of the National Institutes of Health, with the singlescatter spectrum calculated by the author using the Fourierlog deconvolution method [Johnson 1974, Wang 2009a]. Plural scattering eﬀects in electron interactions will be discussed in Section 4.10; see also Fig. 4.78.
ray case of ω ω p . For visible light, the driving frequency ω is lower than most of the oscillator mode resonant frequencies ωi , so one arrives (see Appendix B.1 at www.cambridge.org/Jacobsen) at an expression for the refractive index of [Griﬃths 1989, Eq. 9.173] ⎞ ⎛ ⎞ ⎛ ⎜⎜⎜ na e2 g j ⎟⎟⎟ ⎜⎜⎜ na e2 g j ⎟⎟⎟ 2 ⎟ ⎜ ⎟⎟⎟ . ⎜ ⎟⎟ + ω ⎜⎝⎜ Re[n] 1 + ⎜⎝⎜ (3.61) 2me 0 j ω2j ⎠ 2me 0 j ω4j ⎠ Recognizing that the wavelength in vacuum is given by λ = 2πc/ω, we can also write this as B (3.62) Re[n] 1 + A 1 + 2 , λ which is known by visiblelight optical system designers as Cauchy’s equation (see also Eq. B.33 in Appendix B.1). For crown glass, the coeﬃcient of refraction is A = 0.5320, and the coeﬃcient of disperson is B = 8107 nm2 . X rays are on the high side of the great frequency divide, so one arrives at a diﬀerent expansion of Eq. 3.60. Ignoring smaller terms discussed in Appendix B.2, one arrives at an expression for the refractive index for X rays in dielectric media of n=1−
gj n a e2 (ω2 − ω2j ) + iγ j ω . 2 2 2 2 2 2me 0 j (ω − ω j ) + γ j ω
(3.64)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
51
Box 3.4 Sign convention for ψ and n It is an arbitrary choice to say that forwardpropagating plane waves (Eq. 3.31) go as ψ = ψ0 exp[−i(kz − ωt)] instead of ψ = ψ0 exp[+i(kz − ωt)]. The space z and time t variations must have opposite signs to yield a positive phase velocity v p [French 1966, Chap. 7] given by Eq. 3.72, but both of the above sign conventions meet the condition of the wave equation [Born 1999, Eq. 1.3.5] of 1 ∂2 ψ ∂2 ψ = 2 2. 2 ∂x v p ∂t
(3.63)
Therefore there is no right or wrong choice for choice for exp[−ikz] or exp[+ikz]. As pointed out by Attwood [Attwood 2017, Eq. 1.26 footnote], some of the early xray literature used exp[−ikz] [Compton 1927] while other xray literature used exp[+ikz] [James 1982, AlsNielsen 2011, Attwood 2017], as does much of the optics literature [Born 1999, Goodman 2017]. One book [Cowley 1981, Cowley 1995] even switched sign conventions between editions! Our choice of ψ = ψ0 exp[−ikz] aﬀects several expressions which would appear diﬀerently with the choice ψ = ψ0 exp[+ikz]: • The refractive index of n = 1 − δ − iβ (Eq. 3.67) would become n = 1 − δ + iβ with exp[+ikx] (see for example [Attwood 2017, Eq. 1.26]). • One would change Eq. 3.65 of n = 1 − αλ2 ( f1 + i f2 ) to become n = 1 − αλ2 ( f1 − i f2 ). (Some of this was noted some time ago [Ramaseshan 1975].) That’s all ﬁne; physically the phase is still advanced in the medium, as demonstrated by the xray prism experiment shown in Fig. 3.19.
This is usually written in a much a simpler form of n = 1 − αλ2 ( f1 + i f2 )
(3.65)
with α≡
re na , 2π
(3.66)
where once again re is the classical radius of the electron and na is given by Eq. 3.21. The refractive index is frequently written as n = 1 − δ − iβ
(3.67)
so that a waveﬁeld that has propagated through a material of thickness t is modiﬁed according to ψ = ψ0 e−kβt e+ikδt
(3.68)
relative to a wave that has propagated a distance t in vacuum (and k = 2π/λ as in Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
52
Xray physics
100
10
h (nm)
1
0.1
100
f1
Gold
f2
f1 and f2
10
f1 1
Carbon
0.1
0.01 10
f2
100
1000
10,000 30,000
Energy (eV) Figure 3.16 Complex number of oscillators modes ( f1 + i f2 ) for carbon and gold as a function of xray energy, as tabulated by Henke et al. [Henke 1993]. In the regions near xray absorption edges, this tabulation is generally not valid due to nearedge eﬀects, as discussed in Section 9.1.2. Note that at high energies, f1 → Z (Eq. B.43 in Appendix B at www.cambridge.org/Jacobsen), while the absorptive term f2 declines roughly as λ2 relative to the phase shifting term f1 . This makes phase contrast imaging especially favorable at higher xray energies, as will be discussed in Section 4.7.
Eq. 3.32). The expression of Eq. 3.67 uses the deﬁnitions of δ ≡ αλ2 f1
(3.69)
β ≡ αλ f2 ,
(3.70)
2
and for 3D imaging of weakly absorbing objects one can relate δ to the electron density, as will be shown in Eq. 10.72. In Eq. 3.65, ( f1 + i f2 ) is the frequencydependent number of oscillation modes per atom (Eq. 3.42), and it is natural that the sum of these modes tends to approach the atomic number Z for neutral atoms. As discussed in Appendix B.2 for the case of high frequencies (or higher photon energies, and shorter wavelengths), f2 will decline as λ2 relative to f1 , while f1 should approach Z (Eq. B.43 in the online appendix). This is indeed what is observed in experimental values of f1 and f2 , such as those shown in Fig. 3.16 (and see Eq. 3.77 below for a discussion of the wavelength or energy scaling of the linear absorption coeﬃcient). Finally, one should note that if one makes a diﬀerent choice for the sign convention of forwardpropagating waves, one arrives at n = 1 − δ + iβ instead, as discussed in Box 3.4. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
ѥP
ѥP
1.0
53
ѥP
ѥP
200 nm 0.5
400 nm
C at 500 eV
Imaginary part
ѥP 600 nm 0.0
ѥP
ѥP 0 nm
ѥP 800 nm
0.5
ѥP 100 nm
ѥP
ѥP
ѥP C at 5000 eV
1.0
ѥP 1.0
0.5
0.0 Real part
0.5
1.0
Figure 3.17 The complex modulation of a transmitted wave as given by Eq. 3.71 leads to an Argand spiral (named after the French mathematician JeanRobert Argand). The magnitude of the transmitted wave is reduced by exp[−kβt], and the phase is advanced by exp[+ikδt] as given by Eq. 3.71. This is shown here in the real and imaginary plane for 500 eV and 5 keV X rays being transmitted through various thicknesses t of carbon with ρ = 2.2 g/cm3 , using the data tabulation of Henke et al. [Henke 1993]. Phase contrast imaging methods may need to employ phase unwrapping algorithms [Goldstein 1988, Volkov 2003] to correctly interpret phases that go beyond π.
The xray refractive index tells us how an incident waveﬁeld ψ0 (x, y) with wave number k0 = 2π/λ (Eq. 3.32) traveling in the zˆ direction is modulated by an object with a thickness t and 2D refractive index distribution of n(x, y) = 1 − δ(x, y) − iβ(x, y). The waveﬁeld transmitted by the object becomes ψ(x, y) = ψ0 (x, y) exp[−ikt] exp[kt(iδ(x, y) − β(x, y))] = ψ0 (x, y) exp[−ikt] exp[ikδ(x, y)t] exp[−kβ(x, y)t],
(3.71)
where we have, for simplicity, written k0 as k because the nonvacuum wave propagation characteristics are captured in δ+iβ for the medium. There is ﬁrst of all a geometric phase exp[−ikt] according to propagation through a distance t in vacuum. On top of that, a wave traveling through uniform medium with thickness t undergoes a net amplitude reduction of exp[−kβt] and a phase advance of exp[+ikδt], as shown in Fig. 3.17. This pure projection through the specimen’s thickness works for the case where the specimen is within the depth of ﬁeld limit, as will be discussed in Section 4.4.9. For thicker objects, one should use the multislice method, as will be discussed in Section 4.3.9. At this point, you should feel a bit disturbed. (Oops – we don’t mean to pry into your Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
54
Xray physics
Wave propagation direction Xray refractive media
Figure 3.18 Schematic representation of an xray wave traveling in media, showing phase advance and attenuation. Figure due to Benjamin Hornberger [Hornberger 2007a].
psychological status). Why? Recall that the phase velocity for an electromagnetic wave in a refractive medium is shown in Eq. 3.56 to be v p = c/n. In other words, since the real part of the refractive index is 1 − δ for X rays in a medium, one can use the binomial approximation on (1 − δ)−1 to arrive at a phase velocity (Eq. B.44 in Appendix B at www.cambridge.org/Jacobsen) of v p c(1 + δ),
(3.72)
which is faster than the speed of light in a vacuum! Doesn’t that violate special relativity, which is described as setting an absolute speed limit in the university of c? What’s even more curious is that, to our knowledge, the ﬁrst person to point out this characteristic of xray propagation in media was Einstein himself [Einstein 1918], as we have discussed in Section 2.2 – and Einstein failed to comment on this point in his short paper! We suspect that the reason for this is that Einstein had already calculated both the phase and group velocities, since he already had some understanding of the nature of the xray refractive index. In Eq. B.46 we show that the group velocity is well approximated by vg = 1 − δ,
(3.73)
and the group velocity describes the speed at which the wave transmits energy. That is, the group velocity describes the speed of the main body of the wave, while wavefronts race ahead at the phase velocity (Fig. 3.18) until dispersion in the wave starts to reduce the energy it carries – in other words, wave attenuation! If wavefronts are faster in media than in vacuum, doesn’t that imply that prisms refract X rays the opposite direction from how visible light is refracted? Yes, it does! This was demonstrated already in 1924 [Larsson 1924] (see Fig. 3.19). There are additional curious consequences of the refractive index of X rays in media, such as the nature of xray reﬂectivity (Section 3.6) and the characteristics of refractive focusing lenses (Section 5.1). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
55
Figure 3.19 Xray refraction demonstration of Larsson et al. [Larsson 1924], showing that for X
rays the refracted rays (Gebrochener Strahl, of several ﬂuorescence lines from the xray tube used) go in the direction one would expect from n 1 − δ, which is opposite from the case with visible light. Also shown are the direct and reﬂected rays.
3.3.3
Xray linear absorption coefﬁcient We found in Eq. 3.43 that the optical theorem gives a cross section for beam loss in the forward direction (that is, ignoring how scattering can redistribute energy directionally) of σT = 2λre f2 , and we related interaction cross sections with beam intensity losses in Eq. 3.26 of dI/dx = −Ina σ. Together these expressions imply an intensity decrease in the forward direction of dI = −I 2re na λ f2 = −μ I (3.74) dz with a linear absorption coeﬃcient (LAC) μ of μ = 2re na λ f2 = 4παλ f2 ,
(3.75)
where in the latter form we have used α from Eq. 3.66. Integration of Eq. 3.74 with an initial beam intensity of I0 leads to the wellknown Lambert–Beer law of xray absorption through a material of thickness t of I = I0 exp[−μt].
(3.76)
This is often simply referred to as Beer’s law, and this law is frequently celebrated in liquid form by some xray microscopists. The inverse μ−1 is the absorption or attenuation length, which is the distance over which a beam is reduced in intensity by a factor of exp[−1] 0.37 due to absorption. The linear absorption coeﬃcient μ includes an explicit decrease at shorter wavelengths due to λ, and (as shown in Eq. B.42 online at www.cambridge.org/Jacobsen) we expect f2 to decrease with λ2 . Therefore we expect the linear absorption coeﬃcient to scale as μ ∝ λ3 ∝ E −3 .
(3.77)
Finally, in Section 9.1 we’ll also ﬁnd it useful to consider the mass absorption coeﬃcient μ deﬁned by μ (3.78) μ ≡ ρ Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
56
Xray physics
ƪ (nm) 100 1000
10
1
0.1
absorption length ƫ1 (çm)
100
Carbon 10
1
0.1
Gold
0.01
0.001 10
100
1000
10,000
Energy (eV) Figure 3.20 Absorption length μ−1 (which is the inverse of the linear absorption coeﬃcient μ of
Eq. 3.75) for X rays in carbon and gold, as tabulated by Henke et al. [Henke 1993]. This ﬁgure shows the general trend of μ−1 to increase as λ3 , and the presence of xray absorption edges as shown in Fig. 3.3. The assumed densities were ρ = 2.26 g/cm3 for carbon, and 18.92 for gold.
which is typically expressed in units of μg/cm2 (see Eq. 9.3). We can arrive at the same result in another way. In Eq. B.30, we show that waves in media have their amplitude attenuated according to the imaginary part of the refractive index as exp[k0 Im[n] x], and in Eq. 3.65 we found that re (3.79) Im[n] = − na λ2 f2 . 2π Since k0 = 2π/λ, this leads to a wave amplitude reduction of I = I0 exp[k0 Im[n] x] = I0 exp[−
2π re na λ2 f2 x] λ 2π
(3.80)
or an intensity reduction of I = I0 exp[−2kβx] = I0 exp[−μx],
(3.81)
μ = 2kβ
(3.82)
giving the relationship
as another expression equivalent to Eq. 3.75 for the linear absorption coeﬃcient, thus reproducing Beer’s law of Eq. 3.76. This again conﬁrms the consistency between the atomic scattering view of xray interactions described in Section 3.2, and the refractive index view described in the present section. Perhaps we don’t have to use only particle views of X rays on some days, and wave views on other days! In many cases in xray imaging it is useful to work with an image representation Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
57
that is linear with specimen thickness (for example, in nanotomography as discussed in Chapter 8, or spectromicroscopy as discussed in Chapter 9). For a transmission image I(x, y) based on absorption contrast, this means working with the optical density D(x, y). This is calculated from knowledge of the incident ﬂux I0 as I(x, y) D(x, y) = − ln = μ t(x, y), (3.83) I0 which is obtained from the Lambert–Beer law of Eq. 3.76 or Eq. 3.81, except that we now assume a single material with one value of linear absorption coeﬃcient μ, and use t(x, y) to represent the thickness as projected onto each image pixel location (x, y). This linear treatment assumes that the specimen is within the depth of ﬁeld DOF = 2δz of the imaging system (as will be given in Eq. 4.215), and that the ﬁrst Born approximation applies – a topic that we now turn to.
3.3.4
The Born and Rytov approximations In Box 3.4, it was noted that electromagnetic waves traveling in the xˆ direction must have the form ψ = ψ0 exp[−i(kx − ωt)] so as to have positive phase velocity and to meet the condition of the wave equation of Eq. 3.63. If we instead separate out the space and timevarying parts of the wave to have ψ = ψ(x) exp[iωt], Eq. 3.63 yields the condition 1 ∂2 ψ = 2 (−ω2 )ψ ∂x2 v ∂2 ω ψ + ( )2 ψ = 0 2 v ∂x ∇2 ψ + k2 n2 ψ = 0,
(3.84)
where in arriving at the ﬁnal form we have made use of v p = c/n (Eq. 3.56), c = λ f , and ω = 2π f while also generalizing the onedimensional derivative ∂2 /∂x2 as the Laplacian ∇2 . Thus we obtain in Eq. 3.84 the wellknown Helmholtz equation (see [Born 1999, Eqs. 8.3.2 and 13.1.4] and [Goodman 2017, Eq. 3.13]) for waves in a medium with refractive index n, or n(r) for an inhomogeneous medium. We now consider the case where the wave ψ becomes a combination of a wave ψ0 incident from vacuum onto a medium, and a scattered wave ψ s that is formed while traversing an object with n(r). We therefore make the substitution ψ → ψ0 + ψ s .
(3.85)
If we also make the substitution k2 n2 → k2 + (n2 − 1)k2 in the Helmholtz equation (Eq. 3.84), and then use the result of ∇2 ψ0 + k2 ψ0 = 0 for the Helmholtz equation of the incident wave ψ0 before it hits the refractive medium, we arrive at ∇2 ψ s + k2 ψ s = −k2 (n2 − 1)ψ0 + k2 (n2 − 1)ψ s . (3.86) Now if we could neglect the term [k2 (n2 − 1)ψ s ], we would have a linear diﬀerential equation allowing us to calculate the inhomogeneous refractive index distribution n(r) (that is, the threedimensional object) from the scattered wave ψ s and the incident wave Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
58
Xray physics
ψ0 . Neglecting the term [k2 (n2 − 1)ψ s ] is in fact what is done in the ﬁrst Born approximation [Kaveh 1982], a simpliﬁcation that was ﬁrst applied for matter wave scattering in quantum mechanics [Born 1926] (one can use the ﬁrst approximation solution to recursively yield higherorder approximations [Born 1999, Sec. 13.1.4]). In eﬀect, one assumes that the incident waveﬁeld ψ0 that reaches the downstream features in the object is the same as the waveﬁeld ψ0 that illuminated the upstream features. It can be shown that the condition for satisfying this requirement is [Tatarski 1961, Eq. 7.5] 
ψs  (n − 1) ψ0
(3.87)
and Eq. 3.71 tells us how strongly an incident wave ψ0 is modulated by a material of thickness t due to the xray refractive index of n = 1 − δ − iβ. Since δ and β are both small, the Born approximation is quite frequently satisﬁed when imaging thinner samples in xray microscopes, especially at high xray energies. Since Eq. 3.71 tells us that both the magnitude and phase modulations are exponential functions of the refractive index, an alternative approach due to Rytov [Rytov 1937, Chernov 1960] works in a logarithmic expansion of the waveﬁeld, so that χ = ln(ψ).
(3.88)
One can then substitute ψ = eχ into the Helmholtz equation (Eq. 3.84) and arrive at a diﬀerential equation of [Kaveh 1982] ∇2 χ + ∇χ · ∇χ + k2 n2 = 0.
(3.89)
Making a substitution equivalent to Eq. 3.85 of χ → χ0 + χ s as well as χ1 = χ s ψ0 , one obtains (3.90) ∇2 χ1 + k02 χ1 = −ψ0 k2 (n2 − 1) − k2 ψ0 ∇χ s · ∇χ s . When comparing the Rytov expansion of Eq. 3.90 against the Born expansion of Eq. 3.86, one sees that the term [k2 ψ0 ∇χ s · ∇χ s ] that is neglected is slightly diﬀerent. It can be shown that the Rytov approximation makes two lessrestrictive demands [Tatarski 1961, Eq. 7.15]: the ﬁrst is (n2 − 1) 1,
(3.91)
which is quite easily satisﬁed for the xray refractive index; and the second is λ∇ψ 2π,
(3.92)
which eﬀectively means that the waveﬁeld ψ should have small changes over the distance of a wavelength λ. An additional limitation of the Born expansion in ψ is that its linear approximation runs into challenges with phase wrapping when the phase kδt approaches π. Because the Rytov approximation works on the logarithm of kδt, it does not suﬀer from the same phase wrapping problem [Kaveh 1982]. The Rytov approximation has certain advantages for singlestep calculations of xray scattering from thicker objects [Sung 2013], although these advantages tend to disappear [Gureyev 2004] in the phase retrieval methods used in coherent diﬀraction imaging Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
59
(Chapter 10). If one instead models wave propagation through a thick object not in a single step but instead as applying to successive thin slabs of the object (the multislice method described in Section 4.3.9), the eﬀects of upstream features on the illumination of downstream features are accounted for. In multislice methods one does not assume that the bracketed terms [] are zero in either the Born (Eq. 3.86) or Rytov (Eq. 3.90) expansions of the Helmholtz equation, respectively; instead, the actual combination of ψ0 + ψ s is calculated slicebyslice and carried forward to illuminate the next slice.
3.3.5
Oscillator density in molecules, compounds, and mixtures Up until now we have discussed optical constant for single elements, including tabulations for ( f1 + i f2 ) for all of the elements at a variety of xray wavelengths (see for example [Henke 1993], and Appendix A). We now discuss the “mixture rule” [Deslattes 1969, McCullough 1975, Jackson 1981] for calculations involving collections of atoms, such as in a mixture, compound, or molecule. This mixture rule assumes that we can simply add up the net absorption and phase eﬀects of all the atoms in proportion to their stoichiometric ratio, an assumption that holds well unless we discuss the ﬁner details of xray absorption spectra (Sections 9.1.2 and 9.1.7). We start by considering an example for a mixture unit, which might be H2 O for one water molecule, or (C5 O2 H8 )n for the repeated monomer in poly(methyl methacrylate), or some other mixture. To provide a hard example, let’s consider a simpliﬁed borosilicate glass consisting of 80 percent SiO2 and 20 percent B2 O3 with a density of 2.23 g/cm3 . In this case the stoichiometric mixture unit can be written as 1 mixture unit = 0.8(SiO2 ) + 0.2(B2 O3 ) = B0.4 O2.2 Si0.8 ,
(3.93)
where the subscripts for each element refers to its stoichiometric weighting si in the mixture unit (that is, sB = 0.20 · 2 = 0.40 for boron, sO = 0.80 · 2 + 0.20 · 3 = 2.20 for oxygen, and sSi = 0.80 · 1 = 0.80 for silicon). We can then go on to calculate a number of properties of this mixture unit. Its total atomic weight A¯ is given by si · Ai , (3.94) A¯ = Z
where Z means i=1,...,92 (for the 92 for naturally occurring elements), and a mole counts up Avogadro’s number NA of mixture units rather than of individual atoms. For a mixture unit of our example glass, we have A¯ = 0.40 · (10.81 = 61.99
g . mole
g g g ) + 2.20 · (15.999 ) + 0.80 · (28.085 ) mole mole mole
The atom number density na for a single material type is given by Eq. 3.21. Therefore the number density of mixture units is nm.u. = ρNA
1 1 = ρNA A¯ Z si Ai
(3.95)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
60
Xray physics
which for our example mixture unit of borosilicate glass is nm.u. = (2.23
1 100 cm 3 mixture units g )·( )·( ) ) · (6.02 × 1023 3 mole 61.99 g/mole 1m cm
or nm.u. = 2.17 × 1028 (mixture units)/m3 . Turning instead to atoms, the number density na,i for atom type i in the mixture is given by na,i = ρNA
si si , = ρNA A¯ Z si Ai
(3.96)
so for boron in our example glass we obtain na,B = (2.23
100 cm 3 atoms g 0.4 )·( ) · ) · (6.02 × 1023 3 mole 1m 61.99 g/mole cm
or na,B = 8.66 × 1027 (boron atoms)/m3 . If we add up all the individual element atom densities, we obtain a total atom number density of si Z si (3.97) = ρNA Z n¯ a = ρNA ¯ s A Z i Ai which is n¯ a = 7.36 × 1028 atoms/m3 . The electron density ne,i for atom type i in the mixture is given by si Zi si Zi ne,i = ρNA (3.98) = ρNA A¯ Z si Ai which for boron’s electrons in our example glass is ne,B = (2.23
0.4 · 5 100 cm 3 atoms g )·( )·( ) , ) · (6.02 × 1023 mole 61.99 g/mole 1m cm3
giving ne,B = 4.33 × 1028 (boron atom electrons)/m3 with an overall electron density n¯ e given by si Zi , (3.99) n¯ e = ρNA Z Z si Ai which is n¯ e = 6.67 × 1029 electrons/m3 for our borosilicate glass. Finally, the fractional density ρi for atom type i is given by ρi = ρ
si Ai si Ai , = ρ A¯ Z si Ai
(3.100)
which gives ρB = 0.156 g/cm3 for boron in our glass. To calculate the oscillator density f¯1 + i f¯2 for a mixture unit, we need to know the oscillator strengths f1 + i f2 for each of the atom types. Let’s do this at a photon energy of 999.7 eV (very close to 1 keV) using values tabulated by the Center for Xray Optics at Lawrence Berkeley Lab (henke.lbl.gov/optical constants/): Element B O Si
f1 + i f2 at 999.7 eV 5.245 + i0.316 8.225 + i1.756 14.142 + i1.456.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.3 The xray refractive index
The mean oscillator strength for a mixture unit is then ( f¯1 + i f¯2 ) with f¯1 = si f1,i
61
(3.101)
Z
f¯2 =
si f2,i
(3.102)
Z
or in our example f¯1 = 0.40 · 5.245 + 2.20 · 8.225 + 0.80 · 14.142 = 31.507 oscillator modes f¯2 = 0.40 · 0.316 + 2.20 · 1.756 + 0.80 · 1.456 = 5.154 oscillator modes. The net real and imaginary parts of the refractive index n = 1 − δ¯ − iβ¯ for the mixture unit draw upon Eqs. 3.65 and 3.66, leading to re nm.u. λ2 ( f¯1 + i f¯2 ) δ¯ + iβ¯ = 2π re ρNA 2 si f1,i + i si f2,i ) = λ ( 2π A¯ z Z 1 re ρNA λ2 ( si f1,i + i si f2,i ) = 2π Z si Ai z Z
(3.103)
which for our glass at 1 keV (or λ = hc/E = (1240 eV · nm)/(1000 eV) = 1.24 nm using Eq. 3.7) gives 2.82 × 10−15 m ) · (2.17 × 1028 m−3 ) · (1.24 × 10−9 m)2 · (31.507) δ¯ = ( 2π = 4.71 × 10−4 2.82 × 10−15 m ) · (2.17 × 1028 m−3 ) · (1.24 × 10−9 m)2 · (5.154) β¯ = ( 2π = 7.70 × 10−5 . The net linear absorption coeﬃcient μ¯ is found from Eq. 3.82 as
or from Eq. 3.75 as
μ¯ = 2kβ¯
(3.104)
si μ¯ = 2re nm.u. λ f¯2 = 2re ρNA Z si f2,i Z si Ai Z
(3.105)
or in our example μ¯ = 2 · (2.82 × 10−15 m) · (2.17 × 1028
atoms ) · (1.24 × 10−9 m) · 5.154 m3
= 7.81 × 105 m−1 , giving an absorption length μ−1 of 1 106 μm = 1.28 μm. · m 7.81 × 105 m−1 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
62
Xray physics
For calculations at energies in between those where tabulated values are available, examination of Fig. 3.16 makes it clear that one should interpolate f1 on a linear scale, and f2 on a logarithmic scale. The “mixture rule” appears in older texts as well as more recent papers [Jackson 1981] in terms of a weighted sum of mass absorption coeﬃcients (see Eqs. 3.78 and 9.3) as μ μ¯ = wi ( )i (3.106) ρ¯ ρ Z with wi =
si Ai . A¯
(3.107)
From Eqs. 3.21 and 3.75 we have NA μ = 2re λ f2 . ρ A
(3.108)
We can therefore rewrite Eq. 3.106 as 2re
NA ¯ si Ai NA λ f2 = (2re λ f2,i ) ¯ ¯ Ai A A Z
(3.109)
which, when cancelling the terms 2re NA λ/A¯ between right and lefthand sides and Ai within the righthand side, reduces to f¯2 = si f2,i , (3.110) Z
which is really just a restatement of Eq. 3.102. Thus all is well with the universe.
3.4
Anomalous dispersion: life on the edge We began our discussion of refractive indices by invoking the experience of the damped, driven mechanical oscillator. This showed that one can expect strong phase shifts around an absorption resonance (Fig. 3.13). With the mechanical oscillator there is relatively little energy transfer when the driving frequency goes above the resonance frequency; with X rays on atoms the story is somewhat diﬀerent because enhanced absorption remains “turned on” at energies above the threshold needed to remove an electron from a speciﬁc atomic state via photoelectric absorption (Fig. 3.3). Because of this, and the fact that in atoms we have not a single resonance but a set of available oscillation modes, we can expect phase resonances around absorption edges to diﬀer considerably from the single mechanical oscillator case. We therefore brieﬂy consider these anomalous characteristics of the xray refractive index, or anomalous dispersion.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.4 Anomalous dispersion: life on the edge
6
63
f2
Henk e
4
He
nk
e
f1, f2
2
0
f1
2 270
280
290
300
310
320
Energy (eV) Figure 3.21 Comparison of the oscillator strength ( f1 + i f2 ) for graphite at the Carbon K
absorption edge. The crosses and boxes represent experimental measurements by Dambach et al. [Dambach 1998], while the light red and grey curves are from the tabulation of Henke et al. [Henke 1993] for f1 and f2 , respectively. The solid black curve shows a smoothed version of the experimental nearedge f2 data spliced into the longerspectralrange tabulation of f2 of Henke et al., while the solid red curve shows a local calculation of f1 using the Kramers–Kronig expression of Eq. 3.111 based on the combined f2 curve [Jacobsen 2004].
3.4.1
The Kramers–Kronig relations Examination of highfrequency refractive index expression of Eq. 3.60 reveals that the same parameters factor into both the phase shifting and absorptive components of the xray refractive index: the resonance frequencies ω j and damping coeﬃcients γ j appear in both the real and imaginary parts. This suggests that if one has made a complete measurement of the imaginary part of the refractive index (i.e., the absorption spectrum), one can calculate the phaseshifting response. It turns out that with a few basic assumptions (such as that the electric susceptibility χe goes to zero as the driving frequency ω goes towards inﬁnity, and a requirement for causality in that charge displacements can lag but not lead the application of an electric ﬁeld), one can relate the real and imaginary parts of the permittivity using the Kramers–Kronig relations (see e.g., [Nussenzveig 1972, Burge 1993], or [Attwood 2017, Sec. 3.8], or [Jackson 1999, Sec. 7.10]). For the purposes of xray optical interactions, these relations can be written [Henke 1981] in terms of the oscillator strength ( f1 + i f2 ) to give 2 ∞ 2 f2 (E) f1 (E) = Z + d − Δ fr , (3.111) E 0 E2 − 2 where Δ fr is a relativistic correction term that is negligible for soft X rays. Since one can ﬁnd f2 (E) from the absorption spectrum μ(E)=− ln [I(E)/I0 (E)]/t in a material of
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
64
Xray physics
thickness t using Eqs. 3.75 and 3.76, one can then calculate the phase shifting spectrum using Eq. 3.111. This is in fact how various tabulations of the oscillator strength f1 (E) + i f2 (E) have been arrived at (see Appendix A). Because Eq. 3.111 involves an integration over all frequencies, one must have absorption spectra covering a very wide range of energies to obtain a reasonable approximation for f1 (E), though one can also splice a nearedge absorption spectrum into a largerrange absorption spectrum tabulation to obtain nearedge phaseshifting spectra [Palmer 1998, Jacobsen 2004, Yan 2013, Watts 2014]. Xray spectra of atoms in solids, and of molecules, have ﬁne structure near their absorption edges, as will be discussed in Section 9.1.2. Because this is dependent on the details of the chemical bonding of an atom, any tabulation of elementbyelement oscillator strength ( f1 +i f2 ) will not accurately reﬂect the exact values exhibited by a particular material near an absorption edge. As an example, we show in Fig. 3.21 experimental values for ( f1 + i f2 ) for graphite near the carbon K edge obtained via interferometry experiments by Dambach et al. [Dambach 1998], along with the tabulated values of ( f1 + i f2 ) of Henke et al. [Henke 1993], and ﬁnally a calculation [Jacobsen 2004] of f1 from the Dambach f2 nearedge data spliced into the full Henke f2 tabulation. This illustrates the type of ﬁne detail in the response of ( f1 +i f2 ) near absorption edges which is lost in elementbyelement tabulations. A discussion of the possibilities of using nearedge structure in phase contrast ( f1 ) spectromicroscopy is given in Section 9.1.5.
3.5
Xray refraction When electromagnetic waves cross a boundary where there is a change of refractive index from n1 to n2 , waves can be refracted according to Snell’s law of n1 sin θ1 = n2 sin θ2 ,
(3.112)
where θ is relative to the surface normal (Snell’s law can be found from Fermat’s principle, which is in turn described in Section 4.1.3). Rays exiting a higher refractive index medium 2 and encountering medium 1 are refracted exactly along the surface when the condition n1 sin(θ1 = 90◦ ) = n2 sin(θ2 = θc ) is met; that is, when n1 sin(θc ) = (3.113) n2 where θc is referred to as the critical angle (see Fig. 3.23). For visible light, n1 = 1 in air and n2 = 1.33 in water, so the critical angle becomes θc = sin−1 (1./1.33) = 49◦ (beyond the critical angle one has total internal reﬂection). As a result, if one is sitting in a lake looking up, one sees the abovewater world refracted within a circle of light known as Fresnel’s window, and the underwater world reﬂected from the water surface at angles beyond 49◦ (Fig. 3.22). Now consider the case of X rays, where the refractive index is less than 1. This means that the region outside of the mirror has a higher refractive index (n ≡ 1) than the material inside mirror itself (n = 1 − δ). Therefore, while Fig. 3.23 showed the blueshaded higherindex medium as representing water at the air–water visiblelight Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.5 Xray refraction
65
Dive boat “Double Trouble”
Critical angle
Total internal reflection from the depths...
Figure 3.22 Illustration of external refraction, and total internal reﬂection, at the air–water interface. Within Fresnel’s window (as indicated by the dashed yellow line), one can see a cloudless blue sky and the the scuba diving boat Double Trouble on the surface of Lake Huron. Beyond the critical angle of Eq. 3.113, one sees a reﬂection from the depths below, with little light present due to weak scattering of light from the water column back to the surface. Within the water, one also sees diver Tom Jones on his decompression safety stop about 5 meters below the water surface. The slight departures of Fresnel’s window from the indicated critical angle line are due to very weak waves on the water’s surface. Photo by the author, while diving the wreck of the Cedarville near Mackinaw City, Michigan.
Ƨ1
n1 n2 Ƨ2
Critical angle
Visible: vacuum (n1=1) medium (n2>1)
X rays: medium (n1n1
Figure 3.23 Total internal reﬂection for visible light and X rays. This ﬁgure illustrates the critical
angle (purple) for total internal reﬂection (Eq. 3.113) for the case where medium 2 (n2 ; shaded in blue) has a higher refractive index than medium 1 does (n1 ). With visible light, if medium 2 is water, a viewer looking up from under water can see a view such as that shown in Fig. 3.22.
interface, the blue area should represent vacuum at the vacuum–material interface for X rays. That is, external reﬂection by xray mirrors is really a manifestation of total internal reﬂection in the vacuum! Now with large refractive indices we express the critical angle θc of Eq. 3.113 relative to the surface normal; however, because the xray refractive index diﬀers very little from 1, for X rays we shall instead refer to the complementary grazing angle θ as shown in Fig. 3.24. With this complementary angle, and with n2 = 1, the expression of Eq. 3.113 for the critical angle becomes in the xray Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
66
Xray physics
n1
n2 Figure 3.24 Grazing incidence critical θc for xray reﬂectivity, where n2 is the vacuum and n1 is
the medium of the mirror material (see Fig. 3.23). The grazing angle θc is complementary to the referencedtonormal critical angle θc .
case 1−δ (3.114) 1 1 − (θc )2 /2 1 − δ √ (3.115) giving θc 2δ. This can also be written as θc = λ 2α f1 using Eq. 3.65. Now α ∝ na (Eq. 3.66) and na ∝ ρ/A (Eq. 3.21), while f1 ∝ Z (see Fig. 3.16, as well as Eq. B.43 in Appendix B at www.cambridge.org/Jacobsen). Therefore the scaling of the critical angle with material type and xray wavelength can be written as (3.116) θc ∝ λ ρZ/A, cos(θc ) =
so the improvements by going to higher atomic number Z and density ρ materials are helpful but are oﬀset somewhat by the 1/A atomic weight scaling and the square root dependence of these terms together.
3.6
Xray reﬂectivity The existence of a grazingincidence critical angle θ for total internal reﬂection implies that xrays incident at grazing angles below this value will be externally reﬂected, and indeed this eﬀect can be seen in Fig. 3.19 with a more detailed description arriving a few years later [Jentzsch 1929]. In general, waves are partially reﬂected from the boundaries between two materials with diﬀerent refractive indices, as described by the Fresnel equations [Griﬃths 1989, Sec. 9.3.2]. At normal incidence, the Fresnel reﬂectivity R⊥ for rays going from vacuum to a refractive medium is given by 1 − n 2 , R⊥ = (3.117) 1+n which for X rays (where Eq. 3.67 describes the refractive index) leads to 1 − (1 − δ) 2 δ 2 δ2 = . R⊥ = 1 + (1 − δ) 2−δ 4
(3.118)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.6 Xray reﬂectivity
ơ/ƣ=0
1.0
ơ/ƣ=10 2 ơ/ƣ =1 0 1
ơ/ ƣ
0.4
.3 =0
0.6
ơ/ƣ =1
Reflectivity Rs
0.8
67
0.2
0.0 0.0
0.5
1.0
Ƨ’/Ƨ’c
1.5
2.0
2.5
Figure 3.25 Grazing incidence reﬂectivity of xray mirrors as a function of the ratio β/δ of the
absorptive to phaseshifting parts of the xray refractive index n = √ 1 − δ − iβ. Absorption leads to a softening of the reﬂectivity cutoﬀ around the critical angle θc = 2δ. See also [Attwood 2017, Fig. 3.8].
Since δ is in the range of 10−3 –10−6 for X rays in media, normal incidence reﬂectivity from a single refractive interface is very weak. Crystals and layered synthetic multilayers can produce high reﬂectivity by using a coherent superposition of many weak individual reﬂected amplitudes (Section 4.2.3), but otherwise it is not practical to work with normal incidence reﬂective optics for X rays. The expression for grazing incidence reﬂectivity is considerably more complicated [Parratt 1954, Henke 1972, Henke 1981], though one can obtain similar results using ﬁnite diﬀerence methods [Fuhse 2006] or multislice propagation methods [Li 2017a] with n = 1 − δ − iβ (multislice methods are described in Section 4.3.9). Xray reﬂectivity involves a factor a2 of 1 a2 ≡ (3.119) sin2 θ − δ + (sin2 θ − δ)2 + β2 2 where δ and β are from n = 1 − δ − iβ (Eq. 3.67) and θ is the grazing angle of incidence. The reﬂectivity Rσ (θ ) for X rays with the electric ﬁeld vector oscillating parallel to the plane of reﬂection is then given by Rσ (θ ) =
4a2 (sin θ − a)2 + β2 . 4a2 (sin θ + a)2 + β2
(3.120)
In this expression, absorption in the mirror material leads to a “softening” of the reﬂectivity cutoﬀ around the critical angle θc , as shown in Fig. 3.25. The ratio of reﬂectivity Rπ (θ ) for when the electric ﬁeld oscillates perpendicular to the plane of reﬂection diDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
68
Xray physics
Grazing angle Ƨ’ (degrees) 0.2
100
0.4
Grazing angle Ƨ’ (degrees)
0.8
Critical angle Ƨ’c
1.0
10.0
101
Reflectivity
60
Ir reflectivity at 10 keV
40 20 0
0.1 1
80
% reflectivity
0.6
102
Ir reflectivity at 10 keV
103 104
Critical angle Ƨ’c
105 106 0
5
10
Grazing angle Ƨ’ (mrad)
15
1
10
100 200
Grazing angle Ƨ’ (mrad)
Figure 3.26 Reﬂectivity of an iridiumcoated mirror at 10 keV, on both a linear and a logarithmic scale. The critical angle for iridium at 10 keV is about 8.3 mrad, as given by √ θc = 2δ (Eq. 3.115).
vided by Rσ (θ ) is Rπ (θ ) 4a2 (a − cos θ cot θ )2 + β2 = , Rσ (θ ) 4a2 (a + cos θ cot θ )2 + β2 so of course the total reﬂectivity for an unpolarized beam is 1 Rπ (θ ) . R(θ ) = Rσ (θ ) 1 + 2 Rσ (θ )
(3.121)
(3.122)
Since bending magnet sources and most linear undulator sources at synchrotrons deliver radiation with the electric ﬁeld in the horizontal direction, a vertically deﬂecting mirror involves Rσ (θ ) while a horizontally deﬂecting mirror involves Rπ (θ ), but in any case the polarization dependence is small, and observed mainly below 100 eV. As Eq. 3.115 shows, xray mirrors show strong reﬂectivity only at very shallow grazing angles, and reﬂectivity becomes quite low at angles beyond the critical angle, as shown in Fig. 3.26, or alternatively at energies above the energy at which the critical angle is approximately equal to the grazing angle; this is shown in Fig. 3.27. Therefore xray mirrors can be used as lowpass ﬁlters: xrays below a certain energy will be reﬂected, while those above will not. This is illustrated in Fig. 3.27, which shows how a mirror can be used to block high diﬀraction orders from a grating monochromator (Section 7.2.1) and thus improve spectral purity of the beam used in an experiment. Grazing incidence mirrors are often used as ﬁrst optics in higherenergy synchrotron light sources, where only a fraction of the beam power and essentially none of the harder X rays of the source (or gamma rays from Bremsstrahlung produced by electron scattering from residual gas in the storage ring) get deﬂected into the beamline where an xray microscope is located. Grazing incidence mirrors require incredibly smooth surfaces, as will be discussed in Section 5.2. To understand why, lay a ﬂashlight on a hard surface in a darkened room and notice how prominent any dust and debris appear, or look out an airplane window Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
3.6 Xray reﬂectivity
69
100 290 eV
% reflectivity
80
75.5%
580 eV
40 mrad
61.5% 60 60 mrad
40
33.8%
Fused silica
20 5.4% 0 0
200
400
600
800
1000
1200
Photon Energy (eV) Figure 3.27 Calculated soft xray reﬂectivity of a fused silica mirror at grazing angles of incidence of 40 and 60 mrad. For carbon edge spectromicroscopy, it is helpful to remove secondorder light from grating monochromators (like 580 eV light when acquiring data at the carbon K edge around 290 eV). One way to do this is to use a fused silica mirror which has a change in its reﬂectivity around the oxygen K edge around 540 eV, so that the ratio of monochromator second to ﬁrstorder light is (5.4/61.5) = 8.8 percent at 60 mrad grazing angle versus (33.8/75.5) = 44.8 percent at 40 mrad grazing angle. Mirrors at a ﬁxed grazing angle near the critical angle for a certain xray energy can act as lowpass spectral ﬁlters to remove from an xray beam at photon energies above a range of interest.
at the long shadows cast by mountains at sunrise or sunset. To quantify the reduction in mirror reﬂectivity that this leads to, we will travel ahead to grab two results: 1. Bragg’s law, which tells us that the optical path length diﬀerence from waves reﬂected from two partially reﬂecting surfaces separated by d is 2d sin θ , where θ is the grazing angle of incidence (see Fig. 4.9 and Eq. 4.33). 2. When a waveﬁeld is subject to random phase errors characterized by a Gaussian deviation with square root variance Θ, the mean amplitude is reduced by a factor exp[−Θ2 /2] (Eq. 4.20) so the intensity is reduced by exp[−Θ2 ]. That is, for Gaussiandistributed random surface height errors characterized by a root mean square (RMS) roughness of σ, the mirror reﬂectivity will be reduced [Davies 1954] by a factor ησ of 2σ sin θ 2 ) = exp −(4πσ sin θ /λ)2 ησ = exp −(2π (3.123) λ so that to achieve 90 percent of the theoretical eﬃciency limit one must keep the RMS Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
70
Xray physics
surface height errors down to a value of √ ln(1/0.9) λ σ≤ . (3.124) 4π sin θ For an iridium mirror at 6 mrad grazing incidence with 10 keV X rays, this means that one must have σ ≤ 0.53 nm. This emphasizes that xray mirrors must be quite smooth to have high reﬂectivity (see Fig. 5.5).
3.7
Concluding limerick We conclude our discussion of the physics of xray interactions with materials with a limerick: In atoms each electron will wait Sitting tight in its own quantized state They will all interact With a wave; that’s a fact Letting amplitudes ﬁnd their own fate
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:55:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.004
4
Imaging physics
4.1
Waves and rays In Chapter 3 we talked about the ways that X rays interact with individual atoms, and then with amorphous media as described by the refractive index. But xray microscopes are not able (yet?) to image individual atoms, and images of uniform amorphous media are not very interesting. We therefore turn to how xray waveﬁelds interact with everything in between. In 1690, Christiaan Huygens’ Traite de la Lumi`ere put forward his wave theory of light which included a simple picture for the collective eﬀect of a series of point emitters of spherical waves of light. As one goes some distance from the emitters, one sees a smooth wavefront from their collective eﬀect, as shown in Fig. 4.1. As a result, by rearranging the set of emitters, or by altering the time delay from which they emit, one can generate wavefronts that can be thought of as the line of peak electric ﬁeld at a moment in time for an electromagnetic wave. The local wave direction is then perpendicular to the wavefront. We can think of each Huygens point source as emitting a spherical wave with amplitude λ (4.1) ψ = ψ0 e−i(kr − ωt) + iϕ , r where k = 2π/λ is the wave number, ω = 2π f is the angular frequency, and ϕ is a phase advance. (Again, we have made a particular choice of sign convention for forwardpropagating waves as discussed in Box 3.4.) With many Huygens emitters arranged in a row, we can add them up (Eq. 4.64) to arrive at plane waves with a vector wave number k indicating the propagation direction, and threedimensional positions x, so that a plane wave propagates as (4.2) ψ = ψ0 e−i(k · x − ωt) + iϕ .
4.1.1
Adding up waves While the Huygens construction provides a great conceptual picture of how waves superimpose, let’s dive a bit more into the mathematics. The expressions of Eqs. 4.1 and 4.2 both involve complex exponentials of the form ψ = Aeiϕ as shown in Fig. 4.2; here, A is the magnitude of the vector, and ϕ the phase (see Box 4.1 regarding the terminology we use here). From the complex amplitude ψ = Aeiϕ representing the waveﬁeld, the real
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
72
Imaging physics
Figure 4.1 The Huygens construction for producing wavefronts from a series of point source
emitters of waves. A plane wave is shown at left, and a converging spherical wave is shown at right, along with waveﬁelds created by summing up the contributions of Huygens point sources (see Eq. 4.64).
Box 4.1 Amplitude, magnitude, and phase Unfortunately there is some variation in how the words “amplitude” and “magnitude” are used. We prefer to say that magnitude A and phase ϕ are combined in the complex amplitude ψ = Aeiϕ , which when multiplied by its conjugate gives the intensity I = ψ† ψ = A2 . (We also refer to image contrast modes as absorption contrast and phase contrast in Section 4.7, rather than amplitude and phase contrast.) Some use “amplitude” to refer to A, but when they do so we like to quote Inigo Montoya from the movie The Princess Bride: “You keep using that word. I do not think it means what you think it means.”
part Re[Aeiϕ ] = A cos ϕ often represents some measurable quality, like the electric ﬁeld when referring to electromagnetic waves. The waveﬁeld ψ times its complex conjugate ψ† gives the wave intensity or I = ψ† · ψ.
(4.3)
One can think of the imaginary part Im[Aeiϕ ] = A sin ϕ as holding onto the momentarily “invisible” property of the wave (after all, though ϕ = π/2 gives a real part Re[Aeiϕ ] = 0, the wave is still in existence with its magnitude A unchanged). A timedependent wave eiωt goes through a 2π phase change over a time period T (with frequency f in cycles per second, or angular frequency ω = 2π f in radians per second). Let us then think about the addition of several waves that oscillate at the same frequency ω. We can freeze a moment in time, and know that the sum at that moment is going to be the same as the sum at a later time t except for the common rotation of all waves in the complex plane of eiωt . The addition can be done graphically by placing the head of one wave’s vector at the toe of another. If we add N waves each described by Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.1 Waves and rays
Im
73
A
Re
Figure 4.2 Complex circle representation of the wave amplitude Aeiϕ as a phasor (represented by
an arrow in the complex plane). Here A is the magnitude and ϕ the phase, and real and imaginary parts are indicated by Re[Aeiϕ ] = A cos ϕ and Im[Aeiϕ ] = A sin ϕ respectively.
ψ j = A j eiϕ j , a little bit of trigonometry lets us ﬁnd the intensity result R2 to be ⎞2 ⎛ ⎞2 ⎛ ⎟⎟⎟ ⎜⎜⎜ ⎟⎟⎟ ⎜⎜⎜ 2 A j sin ϕ j ⎟⎟⎠⎟ + ⎜⎜⎝⎜ A j cos ϕ j ⎟⎟⎠⎟ R = ⎜⎜⎝⎜ j
=
N j=1
(4.4)
j
A2j + 2
N N
A j Ak cos(ϕ j − ϕk ).
k> j j=1
Now in the case where all waves have the same phase ϕ, the cosine term will always be 1 and one can then show that ⎞2 ⎛ N ⎜⎜⎜ ⎟⎟⎟ R2coherent = ⎜⎜⎜⎝ A j ⎟⎟⎟⎠ (4.5) j
so if all waves have the same magnitude A, one arrives at R2 = N 2 A2 .
(4.6)
However, if we have completely uncorrelated phases, uniformly distributed around the circle of Fig. 4.2, then the cos(ϕ j − ϕk ) terms will tend to be negative as often as positive and that part of the sum will drop out, leaving a net magnitude of N A2j (4.7) Rincoherent  = j
so if all waves have the same magnitude A, one arrives at √ R = NA
(4.8)
and R2 = NA2 .
(4.9)
That is, the net result of adding random phases with a uniform √ distribution over all phase angles is to produce a vector with some nonzero length NA, but with a phase that Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
74
Imaging physics
Imaginary
+Ƨu Real
Ƨu
Figure 4.3 Adding phasors with phases uniformly distributed over a restricted range of −θu to +θu . The resultant has a phase of zero, and a magnitude reduction ηu given by Eq. 4.13.
cannot be predicted. This is sometimes referred to as the drunken sailor problem: if a sailor takes N steps of equal length but √ each in a completely random direction, the sailor is likely to travel a distance of N but in a direction that neither we (nor the sailor!) can predict.1 Let’s consider something in between fully coherent, and fully incoherent, wave addition. Now the expectation value f of a function f (x) modulated by a probability distribution P(X) is given by ∞ f (x) P(x) dx. (4.10) f = −∞
Let’s apply this to the case of adding up waves when each one has an individual phase diﬀerence θ, or Aei(ϕ+θ) = ψeiθ . Let’s consider ﬁrst the case where the phases are distributed uniformly over a range from −θu to +θu (see Fig. 4.3). The probability of having one particular value of θ is then P(θ) = 1/(2θu ), so the expectation value for the wave amplitude ψu can be found from Eq. 4.10 to be θu θu θu 1 ψ ψu = ψeiθ dθ = cos(θ) dθ + i sin(θ) dθ . (4.11) 2θu 2θu −θu −θu −θu Now because sin(−θ) = − sin(θ), the integral of the imaginary part from −θu to 0 will cancel out the integral from 0 to +θu , so we are just left with θu 1 ψ sin(θu ) [sin(θu ) − sin(−θu )] = ψ ψu = ψ cos(θ) dθ = . (4.12) 2θu −θu 2θu θu That is, the net amplitude for phases uniformly distributed between −θu and +θu is reduced by a factor ηu = ψu /ψ of ηu (θu ) =
sin(θu ) θu
(4.13)
which has a value of 1 when θu = 0, as can be shown using L’Hˆopital’s rule. The 1
It makes one want to cry out in song: What shall we do with a drunken sailor?
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.1 Waves and rays
75
Relative probability
1.0
0.8
0.6
0.4
0.2
0.0 3
2
1
0
x
1
2
3
Figure 4.4 The Gaussian or normal distribution function√with zero mean of exp[−x2 /(2σ2 )],
shown without its integral normalization factor of 1/(σ 2π). About twothirds of the events (68.3 percent) occur within one standard deviation (−σ to +σ) of the mean, 95.5 percent fall within ±2σ, and 99.7 percent of the events occur within ±3σ. In addition, the full width at halfmaximum is related to the standard deviation by FWHM = 2.35σ.
absolute value of the function ηu (θu ) is shown in Fig. 4.5; the absolute value is shown because when θu goes beyond π one begins to have a preponderance of phasors on the left, or negative half, of the real plane. We will encounter this function again in Eq. 4.106 where we will give it the name of a “sinc” function sinc(θ) in Eq. 4.105. Now let’s consider phases which are distributed according to a Gaussian or normal probability distribution, which is usually written as (x − u)2 1 . (4.14) P(x, u) = √ exp − 2σ2 σ 2π The shape of this function is shown without the normalizing factor in Fig. 4.4. In this case, the net wave amplitude ψn is found from θσ 1 θ2 iθ e ψn = ψ (4.15) √ exp − 2 dθ 2θσ −θσ θσ 2π which can be expanded to θσ θσ 1 θ2 θ2 cos(θ) exp[− 2 ]dθ + i sin(θ) exp[− 2 ] dθ . ψn = ψ √ 2θσ 2θσ −θσ θσ 2π −θσ
(4.16)
Once again, the imaginary part is antisymmetric about θ = 0 so the −θσ to 0 integrand cancels out the 0 to θσ integrand, leaving ψn = ψ
1 √
θσ 2π
∞
cos(θ) exp[− −∞
θ2 ] dθ. 2θσ2
(4.17)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
76
Imaging physics
Using the standard result of ∞
cos(kx) e
−∞
−ax2
dx =
π −k2 /(4a) e , a
(4.18)
with a = 1/(2θσ2 ), we arrive at ψn = ψe−θσ /2 2
(4.19)
or an amplitude reduction factor ηn = ψn /ψ of ηn (θσ ) = e−θσ /2 = e−σθ /2 2
2
(4.20)
for Gaussian distributed phases characterized by a standard deviation σθ in radians; this function is shown in Fig. 4.5. The intensity goes as the square of the amplitude, or ηI = e−σθ , 2
(4.21)
which is a result wellknown in the literature [Mar´echal 1947a, Mar´echal 1947b, Ruze 1952, Mahajan 1983]. The resulting reduction in the peak intensity versus the peak intensity with no errors present is referred to as the Strehl ratio [Strehl 1895, Strehl 1902]. Of course, when the errors are limited to σθ π/2, one can approximate Eq. 4.21 with ηi 1 − σ2θ
(4.22)
which, when applied to aberrations across a lens, is known as the Mar´echal approximation [Mar´echal 1947a, Mar´echal 1947b].
4.1.2
Rayleigh quarter wave criterion The amplitude reduction factors ηu (θr ) of Eq. 4.13 and ηn (θσ ) of Eq. 4.20 are the basis behind an understanding articulated by John William Strutt (1842–1919), the British physicist whom we usually refer to as Lord Rayleigh due to his inheritance of a barony. Consider a distribution of errors in an optical system (such as index of refraction gradients due to thermal gradients in the atmosphere, or surface imperfections on mirrors or lenses). If those errors are kept below about λ/4, then the Rayleigh quarter wave criterion tells us that the performance reduction of the optical system will be modest. Errors within a total range of λ/4, or λ/8 on either side of the correct value, lead to phase variations in the sense of Fig. 4.3 of 1/8 of 2π or π/4. As can be seen from Fig. 4.5, ηu (π/4) 0.90 while ηn (π/4) = 0.73, so with either distribution one indeed has a relatively modest modiﬁcation to the net amplitude (of course, these numbers should be squared when considering intensity reductions).
4.1.3
Connecting waves and rays The Rayleigh quarter wave criterion provides a good way to think about the connection between waves and straightline ray paths. Fermat’s principle says that light travels along the path of least time, or lowest accumulated optical path length of optical path length = n
(4.23)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.1 Waves and rays
77
1.00 0.90
Magnitude reduction
0.75
0.73 0.63 Uniform:
0.50 Gaussian: 0.29 0.25
0.00 0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
Figure 4.5 Illustration of the Rayleigh quarter wave criterion. When adding up the complex
amplitudes of many waves with some degree of randomness in their individual phases, the net amplitude is reduced by a factor η = ψ /ψ, which may be acceptable if close to unity. Shown here is ηu  (Eq. 4.13) for a uniform phase distribution over a range −θu to +θu as shown in Fig. 4.3. Also shown is ηn (Eq. 4.20) for a Gaussian or normal distribution of phases characterized by a standard deviation θσ . For the uniform case, the net waveﬁeld is reduced only slightly (ηu = 0.90) at θu = π/4, or a full path length range of λ/4. This is known as the Rayleigh quarter wave criterion.
where n is the index of refraction of the medium, and is the geometric distance along the path; the accumulated optical phase is then optical phase = 2π
n . λ
(4.24)
Let’s consider the possible paths involved in light traveling from Point A to Point B as shown in Fig. 4.6. Path 1 with length 1 = z will be the straightline path that Fermat’s principle favors, while Path 2 involves light traveling through a point oﬀset by a distance y at the midpoint of the straight line path with a distance of
2 z
2y z 2 2 1+ z 1 + 2y2 /z2 , 2 = 2 ( ) + y = 2 2 2 z where we have made use of the binomial approximation (1 + a)m 1 + ma for a 1.
(4.25)
The optical phase diﬀerence θ between Path 2 and Path 1 is then θ = 2π
2 − 1 z(1 + 2y2 /z2 ) − z 4πy2 = 2π = . λ λ λz
(4.26)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
78
Imaging physics
Path 3
Path 2 Point A
z/2
y Path 1
z/2
Point B
Figure 4.6 Illustration of Fermat’s principle for optical paths.
If we were to consider the addition of all paths with oﬀsets of −y to +y, we really have a set of phases bound from −θu to +θu , which is precisely what we solved for in our expression for ηu of Eq. 4.13. In other words, if we limit θu to ±π/4 as being within the Rayleigh quarter wave criterion, we have a limit yλ/4 of √ λz (4.27) yλ/4 = 4 within which all optical paths contribute coherently. This means that optical rays can be rather fat; for visible light with λ = 500 nm traveling across a room with dimension z = 10 m, we have yλ/4 = 560 μm while on a typical synchrotron light source beamline with λ = 1 nm and z = 70 m we have yλ/4 = 66 μm. As we’ll see in Section 4.11, even one photon explores all optical paths within the Rayleigh quarter wave criterion, so this example really does show us how light rays are not inﬁnitesimally skinny (like a fashion model), but with some substantial width (like real people).
4.2
Gratings and diffraction The picture of the addition of many Huygens point sources provides a convenient way to understand diﬀraction and interference, which we will now explore.
4.2.1
Slits and plane gratings If we place a slit of width b next to an even row of inphase Huygens point sources, a downstream plane will see only those point sources within the slit aperture. If we pair up each point source with one that is exactly half an aperture distance away, as shown in Fig. 4.7, there will be a certain angle θ with a λ/2 optical path length diﬀerence to these paired point sources. At that angle in the far ﬁeld, each and every point source will perfectly cancel the wave amplitude contribution of its partner and there will be no light intensity; that is, the ﬁrst minimum of the single slit diﬀraction pattern is at an angle given by b sin θ = λ.
(4.28)
With slits that are large compared to the wavelength (or b λ), the angles are small so sin θ θ (the small angle approximation). In this case we can write θ
λ b
(4.29)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.2 Gratings and diffraction
79
Ƨ b/2 Ƨ b/2 sinƧ
Figure 4.7 Schematic explaining the ﬁrst minimum in the diﬀraction pattern of a slit. Each Huygens point source has a matching partner (in this case, with a matching color) which is exactly π out of phase, due to an optical path length diﬀerence of λ/2 when reaching to a distant measurement plane. We thus have the condition (b/2) sin θ = λ/2, or b sin θ = λ as the condition for the ﬁrst minimum in intensity in the diﬀraction pattern.
as the angle to the ﬁrst minimum of the single slit diﬀraction pattern. This will be important when we consider diﬀraction from lens apertures, and slits in beamlines. We will look in more detail at slit diﬀraction when we discuss Fraunhofer diﬀraction in Section 4.3.5. Let’s now consider a thin plane grating with period d, as shown in Fig. 4.8. If we now pair Huygens point sources from one grating aperture with the equivalent source in the next aperture, we will have the mth integer order of constructive interference (and a maximum in the diﬀraction pattern) when we meet the condition d [sin(θ1 ) + sin(θ2 )] = mλ.
(4.30)
Note that we have not yet said anything about the phase diﬀerence between one pairing of point sources within the open holes of the grating versus another pairing; one can even have single slit diﬀraction minima at the same angle of higher orders of plane grating diﬀraction maxima, thus canceling out interference maxima from the grating. When the incident wave is perpendicular to the grating and we consider the m = 1 or ﬁrst order of diﬀraction only, we have θ1 = 0 so we can drop sin(θ1 ) from Eq. 4.30 and refer to θ2 simply as θ. In this case we can rewrite Eq. 4.30 as 1 sin(θ) θ = , d λ λ
(4.31)
where the latter version is in the small angle approximation. When we discuss Fourier transforms in Section 4.3.3 and Fraunhofer diﬀraction in Section 4.3.5, we will introduce the concept of a spatial frequency u of u=
1 θ d λ
(4.32)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
80
Imaging physics
d Ƨ1
Ƨ2 dsin(Ƨ1)
dsin(Ƨ2)
Figure 4.8 Constructive interference in a plane grating. When plane waves are incident at an
angle θ1 , one has constructive interference between Huygens point sources at a speciﬁed position within the open bars of the grating at angle θ2 when the condition d[sin(θ1 ) + sin(θ2 )] = mλ is met.
for a speciﬁed periodicity d. This allows us to equate a property of the grating (its inverse period 1/d) with a wavelengthscaled diﬀraction angle of u θ/λ.
4.2.2
Volume gratings and Bragg’s law Now let’s consider the case of having two thin partially reﬂecting mirrors located a distance d apart from each other, as shown in Fig. 4.9. If a plane wave is incident on this structure at an angle θ relative to the surface plane, part of the wave amplitude will reﬂect oﬀ of the ﬁrst surface while part will continue to the surface below, where a fraction of the wave amplitude will reﬂect again. Now when waves are incident at an angle from the surface greater than the critical angle θc of Eq. 3.114, we know that the xray reﬂectivity of a single mirror surface will be very low. However, when many weakly reﬂected wave amplitudes are added in perfect superposition, the net amplitude can become quite large and high reﬂectivity can result. For this to occur, the optical path length diﬀerence between waves reﬂected by subsequent surfaces must be an integer number m of wavelengths, or 2d sin(θ) = mλ,
(4.33)
which is known as Bragg’s law, after William Lawrence Bragg who worked with his father William Henry Bragg. A slight correction [Compton 1923, Eq. 2] to Bragg’s law can be made for refraction in the crystal, leading to δ = mλ, (4.34) 2d sin(θ) 1 − sin2 θ and indeed this is how one of the ﬁrst measurements of the phase shifting part δ of the xray refractive index n = 1 − δ − iβ was made [Stenstr¨om 1919]. In Fig. 4.10, we compare a plane grating of period dG where the constructive interDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.2 Gratings and diffraction
81
Wavefronts
ƧB
ƧB Ƨ
d
d sin(ƧB)
d sin(ƧB)
2ƧB
Partially reflecting planes
Figure 4.9 Bragg’s law for strong reﬂectivity from a series of weak partially reﬂecting surfaces
separated by a distance d. When the path length diﬀerence between a wavefront reﬂecting from a subsequent plane is longer by a distance 2d sin(θB ) = mλ where m is an integer, one has a coherent superposition of the partial waves reﬂected from each surface. The net deﬂection angle of the wave is 2θB .
ference condition of Eq. 4.30 becomes dG sin(θ) = mλ,
(4.35)
and a volume grating of period dB where Bragg’s law (Eq. 4.33) becomes 2dB sin(θB ) = mλ.
(4.36)
For the Bragg grating, the net deﬂection angle of the beam is θ = 2θB .
(4.37)
Since the Bragg grating spacing dB has a component dG = dB / cos(θB )
(4.38)
perpendicular to the direction of the incident wave, we can rewrite Eq. 4.36 as θ θ (4.39) 2dB sin(θB ) = 2dG cos(θB ) sin(θB ) = 2dG cos( ) sin( ) = mλ, 2 2 where in the last step we have used Eq. 4.37. From the trigonometric identity sin(2ϕ) = 2 cos(ϕ) sin(ϕ) and the substitution ϕ ≡ θ/2, we can rewrite Eq. 4.39 as dG sin(θ) = mλ,
(4.40)
which reproduces Eq. 4.35 provided dG = dG . That is, we see that a Bragg grating reproduces the diﬀraction condition of a plane grating perpendicular to the beam when viewing the Bragg grating period projected along that same perpendiculartothebeam direction (dG ). Of course the Bragg grating has a zˆ axis component to its periodicity, as is shown in Fig. 4.10 (something that we’ll return to in Section 4.2.5 when we discuss the Ewald sphere). Bragg’s law is applicable to a variety of situations. With visible light, one can create Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
82
Imaging physics
ƧB ƧB Ƨ
Ƨ h
h h
Figure 4.10 Schematic of diﬀraction from plane and volume (Bragg) gratings. Diﬀraction from a plane grating obeys dG sin(θ) = mλ (Eq. 4.30), while diﬀraction from a volume grating obey’s Bragg’s law of 2dB sin(θB ) = mλ (Eq. 4.33). For a Bragg grating, the periodicity dG of the edges of the grating planes viewed perpendicular to the incident beam direction is dG = dB / cos(θB ), allowing one to equate the plane and volume grating diﬀraction cases as shown in Eq. 4.40. The transition from plane to volume grating diﬀraction is characterized by the Klein–Cook parameter QK–C in Eq. 4.140.
volume gratings from layers of materials with alternating refractive indices (in which case one must modify Eq. 4.33 to incorporate the eﬀect of the refractive index n of the alternating materials into the optical path length). Color holograms are made in this fashion, because for a ﬁxed angle of incidence there is only one wavelength λ which will meet Bragg’s law from a particular periodicity. As a result, when the hologram is illuminated by broadspectrum spatially coherent light, one can reﬂect diﬀerent waveﬁelds from diﬀerent wavelengths of light, thus producing the desired real or virtual color image for viewing. The thickness at which one transitions from planar to volume diﬀraction can be described by the Klein–Cook parameter QK–C , given by Eq. 4.140.
4.2.3
Bragg’s law and crystals The Bragg son–father team was not thinking of color holograms in 1913, of course – holography would not be invented for another 35 years! Instead, they were trying to understand the diﬀraction characteristics of xray beams on crystals (more on this in Section 10.1). As we saw in Fig. 4.1, the Huygens construction allows one to create parallel wavefronts from the combination of coherent spherical waves from an array of closely spaced point sources. What the Braggs realized was that weak xray scattering from individual atoms in a crystalline lattice would behave in exactly the same way. In fact, in a regular arrangement of atoms in a crystal, one can have many diﬀerent “planes” of atoms; these are denoted by their Miller index hkl , which gives the number of atomic units between planes in three dimensions as shown (in 2D) in Fig. 4.11. The number of planes that participate in a coherent superposition of reﬂected waves is determined by how far the xray beam penetrates into the crystal, and thus the narrowness of the angular range over which this superposition is maintained. In xray microscopy, one of the most common uses of crystal diﬀraction is for x
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.2 Gratings and diffraction
83
1
2
1 1
1
Figure 4.11 Bragg planes in crystals, and their Miller indices. Since this book is not printed on 3D paper, we show a 2D array of atoms and example Bragg planes with their corresponding 2D Miller indices hk .
ray monochromators at synchrotron light sources (see Section 7.2.1). Thanks to the semiconductor industry, one can obtain silicon crystals of amazingly high quality at quite reasonable cost. Silicon is a relatively light element, which leads to a reasonably favorable ratio of f1 to f2 (scattering depends on both f1 and f2 , as shown in Eq. 3.44, while absorption depends on f2 only, as shown in Eq. 3.75; see Fig. 3.16). For diﬀraction from the 111 planes, the d spacing is 0.31356 nm, so for a 10 keV xray beam Bragg’s law gives θ = 11.4◦ . Now the 1/e intensity absorption length μ−1 (see Eq. 3.75) of 10 keV X rays in Si is μ−1 = 135 μm, so the number n p of Bragg planes that end up being illuminated along the intoandout set of ray paths is given by np
134 μm μ−1 = 2.2 × 105 . d 2 · 0.314 × 10−3 μm
(4.41)
As a result, one would expect to be able to maintain Bragg diﬀraction over an angular range dθ of about 1/(2.2×105 ) radians on either side of the incident beam angle, or a full width of about dθ = 9.3 μradians. The real story [Batterman 1964, James 1982, AlsNielsen 2011] is given by the Darwin width ωD for crystal diﬀraction, which for Si 111 is ωD = 26.6 μrad (see Fig. 4.12). Once one has determined the angular spread dθ of the beam, diﬀerentiation of Bragg’s law (Eq. 4.33) can be used to show that E2 dθ (4.42) hc so that the Darwin width ωB for Si 111 at 10 keV gives an energy resolution of 1.41 eV. dE = 2d cos θ
4.2.4
Synthetic multilayer mirrors Perfect crystals represent nature’s method of generating beautiful periodic partially reﬂecting planes. However, as one goes to longer wavelengths, it becomes harder and harder to ﬁnd crystals with appropriate d spacing; one of the largest values known is for YB66 which has d = 2.344 nm [Wong 1990]. How might one achieve this without crystals? Starting back in 1935, DuMond and Youtz had the idea that one might use vacuum
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
84
Imaging physics
Ƨ (degrees) 11.402
11.404
11.406
11.408
1.0
Reflectivity
0.8
0.6
Ƨ%UDJJ 0.4
0.2 0.0 40
20
0
20
40
60
80
100
GƧѥUDG Figure 4.12 The rocking curve, or reﬂectivity versus incidence angle, for diﬀraction of 10 keV X rays from Si 111 . The Bragg angle calculated using Eq. 4.34 is 11.4042◦ . The Darwin width of ωD = 28.4 μrad (the angular width of the rocking curve) is found from a calculation with dynamical diﬀraction eﬀects taken into account. The rocking curve shown here was calculated using Sergey Stepanov’s X0h program (http://sergey.gmca.aps.anl.gov/x0h.html).
evaporation of successive Au and Cu thin ﬁlms to create synthetic multilayers as a longerperiod diﬀractive structure to measure the wavelength of X rays [DuMond 1935], and in a later paper [DuMond 1940] they referred to even earlier attempts on this goal by Deubner and by Koeppe. A later study using alternating layers of Pb and Mg (giving a greater diﬀerence in the xray refractive index between the two thinﬁlm materials) was made by Dinklage and Frerichs [Dinklage 1963] but the multilayers had their xray reﬂectivity fade in days due to interlayer diﬀusion. Multilayers for xray spectroscopy [Fischer 1966, Henke 1966, Mattson 1966] produced using selfassembly of organic ﬁlms (known Langmuir–Blodgett ﬁlms [Blodgett 1935, Blodgett 1937]) suﬀered from radiation damage. Subsequent work by Dinklage using Fe or Au and Mg yielded somewhat better results [Dinklage 1967]. The breakthrough came in the 1970s, when Eberhard Spiller at IBM Research Labs came up with a key conceptual understanding, and began to produce multilayers that eventually led to highperformance xray mirrors [Spiller 1972, Spiller 1974]. His idea was this: since one can think of the interference between the incident and reﬂected waves as a standing wave pattern with nodes and antinodes, one can enforce this pattern by placing highdensity, absorptive layers where the nodes must be located. Soon Spiller and collaborators were producing multilayers of Re/W and C with d = 9.2 nm and with nearnormal incidence reﬂectivities approaching 10 percent at about 65 eV [Haelbich 1979], and using them as the basis for designs for xray microscopes with normal incidence mirrors [Lovas 1982]. As noted in Chapter 2, Underwood and Barbee Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.2 Gratings and diffraction
85
at Stanford and CalTech soon produced very successful multilayers using sputtering [Underwood 1979, Underwood 1981a, Underwood 1981b]. Synthetic multilayers obey Bragg’s law with the refractive correction included, except that for two layers with respective thicknesses d1 and d2 one uses an overall periodicity d = d1 + d2
(4.43)
and a thicknessweighted [Spiller 1994, Eq. 7.8] phaseshifting part of the refractive index δ¯ of d1 δ1 + d2 δ2 , (4.44) δ¯ = d1 + d2 in which case one can use δ¯ in Bragg’s law with refraction included (Eq. 4.34). Detailed expressions for ﬁnding the optimum values of d1 and d2 for a given material pair are available [Yamamoto 1992]. Another way to understand multilayer mirrors is to go back to the expression for nor√ mal incidence reﬂectivity in Eq. 3.118. You get an amplitude contribution R⊥ from each interface, and with properly spaced interfaces each weak reﬂected amplitude adds up coherently over N interfaces to yield a net intensity reﬂectivity of N 2 R⊥ . As with a true crystal, absorption limits the number of layers that can contribute to the reﬂectivity. In addition, the portion of the wave that is reﬂected by the upper layers does not contribute to the reﬂectivity by the deeperlying layers. Still, in the extended ultraviolet (EUV) and soft X ray region, incident waves will penetrate tens to hundreds of layers with the proper choice of material combinations, so that nearnormal incidence reﬂectivities of up to 69.5 percent have been achieved at 92 eV [Yulin 2006], 71 percent at 98 eV [Bajt 2002], 20 percent at 395 eV with d = 1.59 nm [Eriksson 2008], and 2.5 percent at 511 eV with d = 1.22 nm [Eriksson 2008]. As the photon energy is increased and the wavelength shortens, random phase errors introduced by interface roughness at a lengthscale approaching the xray wavelength λ become increasingly detrimental, much in the way that grazing incidence mirror reﬂectivity is reduced due to surface roughness (Eq. 3.123). Larger d spacings can be used for shorter wavelengths λ at grazing incidence angles; for example, d = 1.97 nm WSi2 /Si multilayers have delivered reﬂectivities of 54 percent at 10 keV and 66 percent at 25 keV [Liu 2004]. If one adjusts the layer spacing across the length of a curved mirror (known as a graded multilayer), one can also use multilayer reﬂective coatings on nanofocusing mirrors, as will be discussed in Section 5.2.4. The number of layers, N, determines both the wavelength range (bandwidth) that is reﬂected, and the angular acceptance. The fractional bandwidth Δλ/λ and the angular range Δθ are both approximately equal to 1/N. Procedures for detailed calculations on the performance of multilayer mirrors are described elsewhere [Vinogradov 1977], and websites such as one provided by the Center for Xray Optics (use an internet search for “henke.lbl.gov multilayer reﬂectivity”) provide interfaces for numerical calculations of multilayer mirror performance. Modern multilayer mirrors can have a wide range of d spacing – down to the thickness of just a few atoms – as long as the material properties are appropriate. To avoid strain buildup, it is important to use materials with good matching of atomic lattice Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
86
Imaging physics
Box 4.2 Momentum transfer and 2π There is, unfortunately, some discrepancy in the literature on the vector form of Bragg’s law of q = k0 − k (Eq. 4.49). In order to be consistent with the wave number k0 of Eq. 3.32, we use here a convention for the magnitude of the incident wave vector based on 2π , (4.45) k0  = λ which is also in use in much of the literature [Williams 2003, AlsNielsen 2011, Attwood 2017]. However, another convention (see for example [Ewald 1969, Chapman 2006b]) uses the deﬁnition 1 . (4.46) λ To some extent, physicists use Eq. 4.46 while materials scientists use Eq. 4.45 – but to borrow a saying attributed to Mark Twain, “all generalizations are false, including this one,” and one book [Cowley 1981, Cowley 1995] has even switched conventions between editions! The choice of convention obviously aﬀects relationships involving wave vectors k and the crystal momentum transfer vector q, such as Eq. 4.57 of q 2πu. We also use the term “Fourier space” to refer to spatial frequencies u and “reciprocal space” to refer to momentum transfer q, and refer to the Ewald sphere in reciprocal space. k0  =
spacing, and low diﬀusion of atoms across the interface; popular choices today include alternating layers of W and B4 C, Mo and Si [Barbee Jr 1985], or WSi2 and Si [Liu 2005, Liu 2006]. Multilayers can be deposited on curved surfaces using vacuum evaporation or (preferably) ioninduced sputtering. While the Bragg condition needs to be satisﬁed, the range of angles (or the range of wavelengths) accepted can be quite sizable. Basically, it is the number of layers contributing to the constructive interference that determines the range of acceptance, as discussed above.
4.2.5
Momentum transfer and the Ewald sphere In Fig. 4.10, we showed the diﬀerence between diﬀraction from plane and volume gratings, and in Section 4.2.1 we described how the periodicity d of a plane grating can be shown (within the Fraunhofer approximation; Section 4.3.2) to diﬀract light to a spatial frequency u = 1/d as given by Eq. 4.32. With volume gratings, the equivalent construction is the Ewald sphere [Ewald 1913]. This construction, illustrated in Fig. 4.13, involves vectors in a 3D farﬁeld diﬀraction space called reciprocal space (for planar gratings, the equivalent is Fourier space; see also Box 4.57). In reciprocal space, an incident wave of wavelength λ and wave number k0 = 2π/λ is incident on the crystal as a vector k0 , and the Braggdiﬀracted ray has a direction k with identical wavelength (see Box 4.2 for a note on the factor of 2π in k, and Box 4.3 for a note on why k0  = k in practice). The diﬀraction is done by a crystal
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.2 Gratings and diffraction
(a)
(b)
qz
k0
q
ƧB
ƧB
Ƨ
Ewald sphere
ƧB ƧB
q/2
qy
(c) k0
k q
87
q
k
ƧB
ƧB
ky kz
k0
d
g
atin
e gr
m Volu
Figure 4.13 Illustration of the Bragg diﬀraction condition and the Ewald sphere (shown here in two dimensions for the azimuthal angle ϕ = π/2 and thus q x = 0, as can be seen from Fig. 3.11). which In (a), a volume grating with vector periodicity d has a momentum transfer q = 2π/d, causes an incoming wave k0 to undergo Bragg diﬀraction to a direction k. The Bragg angles are represented by θB , and the total scattering angle is θ. A graphical representation of q = k − k0 (Eq. 4.49) is shown in (b). For a ﬁxed incoming beam direction k0 , as one rotates the volume grating through angles θB one samples a set of accessible momentum transfer vectors q that trace out positions on the Ewald sphere (c; shown here in 2D with {ky , kz } rather than 3D with {k x , ky , kz }). At higher Bragg angles, one can reach larger values of q corresponding to smaller unit cell periods d.
of periodicity d which has a momentum transfer q of q =
2π d
(4.47)
so that the Fourier transform (Section 4.3.3) into reciprocal space is written as [Cowley 1981, Eq. 1.21] ∞ C a(r)e−iq·r dr. (4.48) A(q) = 2π −∞ (Here C is a weighting constant for the interaction strength.) In reciprocal space, the relationship between the incident k0 and scattered k waves, and the crystal periodicity q, is given by q = k − k0
(4.49)
as shown graphically in Fig. 4.13B. The momentum transfer q of the volume grating has a length that can be found from Fig. 4.13B as q = k0  sin(θB ). 2
(4.50)
The vector components of the momentum transfer q can be found from either Eq. 4.50 or Eq. 4.47. We also include the azimuthal angle ϕ as shown in Fig. 3.11 (with ϕ = π/2 representing the case of the crystal’s momentum transfer q being aligned to the yˆ axis as shown in Fig. 4.13). This gives the following relationships between the scattering angles and the vector components of the corresponding momentum transfer vector within the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
88
Imaging physics
crystal: 2π cos(θB ) cos(ϕ) d 2π 4π sin(θB ) cos(θB ) cos(ϕ) = sin(2θB ) cos(ϕ) = λ λ 2π qy = q cos(θB ) sin(ϕ) = cos(θB ) sin(ϕ) d 2π 4π sin(θB ) cos(θB ) sin(ϕ) = sin(2θB ) sin(ϕ) = λ λ 2π qz = −q sin(θB ) = − sin(θB ) d 4π 2 = − sin (θB ) λ
q x = q cos(θB ) cos(ϕ) =
(4.51)
(4.52)
(4.53)
so that q2 = q2x + q2y + q2z .
(4.54)
The relationship between the Bragg angle as θB and the total scattering angle θ is θ = 2θB ,
(4.55)
allowing us to use θ in the usual sense for optics. When the scattering angle θ is small (and with ϕ = π/2), we can approximate qy as qy =
θ 2π θ2 2π 2π cos( ) (1 − ) , d 2 d 4 d
(4.56)
where in the last expression we have assumed that we can neglect θ2 /4 relative to 1. We can therefore use the relationship u = 1/d of Eq. 4.32 to relate the spatial frequency uy of a 2D grating (Eq. 4.32) with the yˆ momentum transfer of a crystal qy (Eq. 4.52) as qy 2πuy
(4.57)
in the notation convention described in Box 4.2. The reason that q is referred to as a momentum transfer is shown in Fig. 4.13A: it leads to a change in the vector momentum of a photon (in reality, the magnitude of the momentum removed from the photon and transferred to the crystal is very small; see Box 4.3). For crystals, Bragg diﬀraction spots arise when there is the happy intersection of the Ewald sphere (deﬁned by the direction and wavelength of the illuminating and scattered beams k0 and k which in turn give the momentum transfer q according to Eq. 4.49), and the reciprocal lattice points of the regular array of atom positions. This is shown in Fig. 4.14. It is for this reason that only a few Bragg diﬀraction spots show up from a crystal in a given given illuminating beam direction, so that crystals are often rocked during crystallographic data collection to allow more reciprocal lattice points to coincide with the Ewald sphere. But let’s return from crystals to talk about microscopy! Whereas the structural periodicity in crystals really does concentrate the scattering signal into a few reciprocal lattice points as shown in Fig. 4.14, the 3D Fourier decomposition of a more general object (introduced in 2D in Section 4.4.7) is composed of a more continuous distribution Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.2 Gratings and diffraction
89
Box 4.3 Momentum transfer in xray scattering The momentum transfer q of Eq. 4.49 that is imparted to a crystal during Bragg scattering is quite small; after all, it’s an elastic rather than inelastic process (see Section 3.2). Consider the case of a 10 keV xray photon that is fully backscattered, so that the Bragg angle is θ = 90◦ yielding 2θ = 180◦ ). The change in photon momentum Δpλ = 2h/λ gives rise to a change in the crystal momentum Δpy = m Δv of Δpλ = Δpy = m Δv 2h = mc Δβ λ 2 · hc , (4.58) Δβ = λ mc2 where we have used the classical result for crystal momentum p = mv, the normalized velocity β = v/c, and the relativistic expression for photon momentum p = E/c. Even if our crystal were as light as a pair of carbon atoms each of 12 atomic mass units, the change in velocity of the twoatom “crystal” from backscattering of a 10 keV photon would be Δβ =
2 · (1240 eV · nm) = 8.9 × 10−7 (0.124 nm) · (24 · 931.5 × 106 eV)
where we have used Eq. 3.7. For this twoatom “crystal” initially at rest, the kinetic energy imparted would be 1 2 1 mc (Δβ)2 = · 24 · (931.5 × 106 eV) · (1.8 × 10−6 )2 = 0.009 eV 2 2 so indeed the change in the energy of the 10 keV incident photon would be diﬃcult to measure – and of course the mass of a real crystal rather than a pair of atoms would lead to an even smaller energy change. Finally, one can turn things around and use a momentum transfer argument to arrive at Bragg’s law [Duane 1923].
of Bragg lattices. Therefore the 3D diﬀraction pattern is also more continuous, with the characteristics of a speckle pattern, as will be described in Section 10.3.1. We have shown in Eq. 4.57 that there is a direct connection between diﬀraction by a plane grating, and diﬀraction by a volume grating. In fact, it’s worth looking a bit more at the Ewald sphere representation of several circumstances relevant to microscopy, as shown in Fig. 4.15. The details depend on the imaging case discussed: • For 2D imaging, in Section 4.4.7 we will use an optical transfer function, or OTF(u x , uy ), to describe how certain plane grating spatial frequencies u in Fourier space are preserved by the imaging system. • For 3D imaging with conventional tomography, in Chapter 8 we will make the assumption of obtaining pure projection images with no diﬀraction eﬀects involved (that is, that the object ﬁts within the depth of ﬁeld as discussed in Section 4.4.9). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
90
Imaging physics
Ewald sphere Periodicity “hits”
ky Periodicity “misses”
kz
Figure 4.14 Crystal “hits” and “misses” on the Ewald sphere. It is only when reciprocal lattice points of a crystalline material intersect with the Ewald sphere that the Bragg condition is met and a diﬀraction spot appears. For noncrystalline objects, one does not have the same concentration of scattering into reciprocal lattice points, so the 3D diﬀraction pattern is more continuous (with the characteristics of a speckle pattern as will be discussed in Section 10.3.1).
In that case, each projection image produces in Fourier space a ﬂat plane with information in the {u x , uy } spatial frequency directions, but with no extent in the uz direction which is along that projection’s viewing direction. This is shown in E and F in Fig. 4.15. • For 3D imaging with coherent, monochromatic beams as in holography and coherent diﬀraction imaging (Chapter 10), data acquired from one illumination direction will ﬁll in information on the surface of an Ewald sphere in reciprocal space (case B in Fig. 4.13). While this in principle means that there is some zˆ information in a coherent image obtained from one viewing direction, the information is far too little to use to reconstruct anything but the simplest, highly constrained 3D objects, as will be discussed in Section 10.2.3. Instead, one needs to ﬁll in more of 3D reciprocal space with information, as shown in Fig. 4.13 and also in Fig. 10.10. A more indepth2 discussion of coherent imaging of thick objects and the limitations of the pure projection approximation is provided in Section 10.5. In fact, there are connections between these diﬀerent pictures, as we shall now see. Let us ﬁrst consider the question of when the Ewald sphere “lifts oﬀ” in the zˆ direction from the { xˆ, yˆ } plane, as shown in Fig. 4.16. For a specimen enclosed within a depth s, the smallest momentum transfer in the zˆ direction where one will start to see diﬀerences from the specimen’s depth extent compared to a pure projection image is q s = 2π/s. If we collect scattering information out to an angle θ, the maximum extent of the Ewald sphere in the zˆ direction will be given by qz , which we ﬁnd from Eqs. 4.53 and 4.55 as qz = (4π/λ) sin2 (θ/2). Thus the Ewald sphere will “lift oﬀ” of the object when qz = 2
Yeah, I couldn’t resist. . .
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.2 Gratings and diffraction
Illumination direction
91
kx,y kz
(a) Full Ewald sphere, (b) Collimated monochromatic, illumination, single image monochromatic, single image
(c) Collimated illumination, polychromatic, single image
(d) Noncollimated illumination, monochromatic, single image
(e) Collimated (f) Pure projection, (g) Pure projection, several illumination illumination, single image directions monochromatic, several illumination directions Figure 4.15 The Ewald sphere describes the region in reciprocal space over which one obtains information during scattering using a single wavelength λ. With collimated, monochromatic illumination, one obtains information in reciprocal space along the surface of the Ewald sphere (a, b, and e). Limiting the angular extent over which scattering is obtained (thus limiting the eﬀective numerical aperture (N.A.) of the experiment) limits the extent of the Ewald sphere, as shown in (b)–(e). With polychromatic illumination, one obtains information in the volume between the two spheres corresponding to the two bounding wavelengths (c). With noncollimated illumination (such as one obtains with a condenser lens bringing convergent illumination onto the specimen; see Section 4.4.7) one obtains information between Ewald spheres corresponding to the limits of the illumination angles (d). If instead one obtains pure projection images (the usual assumption in standard tomography; see Chapter 8), one has the situation in which no volume diﬀraction eﬀects are observed (e and f). Finally, information obtained over several illumination directions delivers either a set of Ewald spheres (e; see also Fig. 10.10), or in the case of pure projections without volume diﬀraction, a set of tomographic projections (g; see also Fig. 8.2).
q2 /2, which we can rewrite using Eq. 4.53 and q s = 2π/s as s=
λ . θ2
(4.59)
If we represent the maximum detected scattering angle θ with the numerical aperture, or N.A., and the largest permitted object depth extent s with the depth of ﬁeld, or DOF, Eq. 4.59 becomes λ , (4.60) DOFEwald = N.A.2 which matches the depth resolution δz of a lens (Eq. 4.213 with cz = 1), and which Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
92
Imaging physics
qz
ky qy
kz qs
Ewald sphere
Specimen depth extent
Figure 4.16 The limit of a pure projection image from the perspective of the Ewald sphere occurs when the Ewald sphere “lifts oﬀ” of the specimen depth extent in reciprocal space, due to the values of qy and qz at its outer edge. This leads to an Ewaldspherebased depth of ﬁeld DOFEwald given by Eq. 4.60 (see also Section 4.4.9).
is half of the depth of ﬁeld (Eq. 4.214) of a lens (see Box 4.7 for a short discussion of depth resolution versus depth of ﬁeld). If we use the Ewald sphere construction for transverse resolution, we have from Eqs. 4.52 and 4.55 the result θ 2π 4π sin( ) = , λ 2 d
(4.61)
which leads to N.A. =
λ , 2 Δy
(4.62)
where we have again used θ = N.A. and furthermore used the grating halfperiod Δy = d/2 as an expression for the minimally resolved feature size. Substituting Eq. 4.62 into Eq. 4.60 gives a relationship between depth of ﬁeld and transverse resolution of DOFEwald = 4
Δ2 (Δy)2 =4 r, λ λ
(4.63)
where in the second case we use the symbol Δr used to represent the realspace pixel size throughout this book. The expression of Eq. 4.63 is close to the lensbased estimate of 5.4cz (δr )2 /λ of Eq. 4.215, as will be discussed in Section 4.4.9. Other estimates in the coherent imaging literature include DOFEwald 2(Δr )2 /λ with N.A. λ/Δr [Rodenburg 1992, Eq. 15], DOFEwald < λ/(2N.A.2 ) [Chapman 2006b, Eq. 21], and DOFEwald ≤ 5.2(Δr )2 /λ in numerical studies of multislice ptychography [Tsai 2016]. An extended discussion3 on the coherent imaging of thick specimens and the limitations of the pure projection approximation is given in Section 10.5.
4.3
Waveﬁeld propagation As electromagnetic waves encounter apertures, refractive media, and the like, a proper handling of their propagation to downstream planes would include discussions of bound3
Again with the puns. . .
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
93
Figure 4.17 Coordinate system used for wave propagation.
ary conditions, the Helmholtz equation, and more, and there are many excellent treatments of this problem (see for example [Goodman 2017, Chap. 3] or [Schmidt 2010]). The treatment we present below is conceptually simple, and suﬃcient for solving most problems.
4.3.1
The Huygens construction Our discussion of the Huygens construction began with the idea of constructing wavefronts from a set of points, each of which emits a spherical wave (Fig. 4.1). This provides a good framework for tackling the problem of waveﬁeld propagation, using the coordinate system of Fig. 4.17. Consider a plane wave ψ = ψ0 e−i(kz−ωt) that is incident on a plane (x0 , y0 , 0) where it is modulated by a complex function g˜ (x0 , y0 ). This modulation could include an aperture outside of which g˜ (x0 , y0 ) = 0, or it could include a biological cell which modulates the magnitude and phase of the incident wave. We’ll assume that the net eﬀect of the modulation g˜ (x0 , y0 ) is to produce a modiﬁed waveﬁeld ψ0 (x0 , y0 ) immediately after the object. To calculate the light amplitude at a downstream point (x, y, z), we simply add up the contributions of the modulated Huygens point sources as shown in Fig. 4.1 to ﬁnd λ ψz (x, y) = A
∞ −∞
ψ0 (x0 , y0 )
e−ikr cos θ dx0 dy0 , r
(4.64)
where we have dropped the eiωt timedependent phase since it applies to both ψz (x, y) and ψ0 (x0 , y0 ). The prefactor λ/A provides a scaling to cancel both the dimensionality of 1/r via the λ term, and to cancel the dx0 dy0 area via the term 1/A term. The cos θ term accounts for the obliquity of waves so as to correctly reduce their energy per area when reaching a downstream plane at nonnormal angles of incidence. The radius r from the point (x0 , y0 , 0) to (x, y, z) is given by r = z2 + (x − x0 )2 + (y − y0 )2 = z
1+
(x − x0 )2 (y − y0 )2 + . z2 z2
(4.65)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
94
Imaging physics
Box 4.4 The Fresnel approximation in xray microscopy Equation 4.68 gave the Fresnel approximation as 2πz ρ4 π λ 8z4 2 with ρ as the transverse distance. In Section 4.4.3 we will see that the Rayleigh resolution for a lens is δr = 0.61λ/N.A.. The numerical aperture of a lens is N.A. ≡ n sin(θ) (Eq. 4.172), with n = 1 − δ 1 for the case of X ray focusing. Therefore we can write N.A. = 0.61λ/δr , and since the resolution for xray microscopes is not comparable to the wavelength, we can assume N.A. = ρ/z. This lets us write the Fresnel approximation condition of Eq. 4.68 as 2πz 0.614 λ4 π 2 λ 8δ4r 0.614 3 λz δ4r 2 3 1/4 λz . δr 0.61 2
(4.69)
For propagation distances of z = 1 μm at 500 eV, this gives a resolution limit to the Fresnel approximation of 6 nm, while at 10 keV it gives a resolution limit of 0.6 nm. For distances of z = 1 mm, the limits are 32 nm at 500 eV and 3 nm at 10 keV. That is, the Fresnel approximation is very well satisﬁed in hard xray microscopes, and usually satisﬁed in soft xray microscopes (especially for short propagation distances such as are used in multislice methods as discussed in Section 4.3.9).
In the limit of [(x − x0 )2 + (y − y0 )2 ] z2 , we can expand this as (x − x0 )2 + (y − y0 )2 (x − x0 )4 + (y − y0 )4 r =z 1+ − + . . . 2z2 8z4 (x − x0 )2 + (y − y0 )2 z 1+ 2z2
(4.66) (4.67)
where the truncated series version of Eq. 4.67 involves the Fresnel approximation, terms (for more on AugustinJean Fresnel, where we ignore the x4 /z4 and higherorder see the beginning of Section 5.3). Let ρ = (x − x0 )2 + (y − y0 )2 represent transverse distances; the Fresnel approximation assumes π 2πz ρ4 , λ 8z4 2
(4.68)
where the phase error is the maximum allowed by the Rayleigh quarter wave criterion (Section 4.1.2). This condition is satisﬁed in most cases (see Box 4.4). If we apply the Fresnel approximation expansion of Eq. 4.67 to exp[−ikr] but simply use r z for 1/r, and assume cos(θ) 1, we can write Eq. 4.64 as [Goodman 2017, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
95
(x − x0 )2 + (y − y0 )2 ψ0 (x0 , y0 ) exp −iπ dx0 dy0 , λz −∞
(4.70)
cf. Eq. (414)] ψz (x, y) = B
∞
with B≡
λ 2πz exp[−i ]. Az λ
(4.71)
The exp[−i2πz/λ] phase factor simply incorporates the fact that any plane wave has a phase that oscillates per period λ, while the λ/(Az) term is there to provide unit cancelation for the area of integration dx0 dy0 as well as to incorporate the 1/r dropoﬀ of the amplitude from spherical waves – but since we are considering plane waves ψ0 (x0 , y0 ) which have been modulated by a complex object transmittance g˜ (x0 , y0 ), we have no 1/z dropoﬀ in the amplitude of a perfect plane wave. Thus we shall drop the factor B here, and in what follows. If we expand the (x − x0 )2 and (y − y0 )2 terms, we can also write this as [Goodman 2017, cf. Eq. (417)] ∞ x2 + y2 ψ0 (x0 , y0 ) ψz (x, y) = exp −iπ λz −∞ ⎡ ⎤ # x02 + y20 ⎥⎥⎥ ⎢⎢⎢ xx + yy0 $ ⎥⎦ exp i 2π 0 exp ⎢⎣−iπ (4.72) dx0 dy0 . λz λz That is, since the term exp[−iπ(x2 + y2 )/(λz)] does not depend on the integration variables (x0 , y0 ), it can be pulled outside the integral. These two equivalent expressions of Eqs. 4.70 and 4.72 are known as the Fresnel diﬀraction integral.
4.3.2
Fraunhofer approximation The expression of Eq. 4.72 has already made use of the Fresnel approximation of Eq. 4.68. Before proceeding further, let’s consider an additional approximation of π
x02 + y20 π λz 2
(4.73)
x02 + y20 . λ
(4.74)
or z4
This is the wellknown Fraunhofer farﬁeld approximation. For a 10 μm diameter aperture, Eq. 4.74 requires z 0.16 m with 500 eV soft X rays or z 3.2 m with 10 keV hard X rays, so the Fraunhofer approximation is considerably more restrictive. However, the payoﬀ it provides is to considerably simplify the Fresnel diﬀraction integral of Eq. 4.72 to the Fraunhofer diﬀraction integral of ∞ % xx0 yy0 & + ψ0 (x0 , y0 ) exp i2π (4.75) ψz (x, y) dx0 dy0 λz λz −∞ ∞ ψ0 (x0 , y0 ) exp[i 2π (u x x + uy y)] dx0 dy0 , (4.76) −∞
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
96
Imaging physics
where we have also dropped the factor outside the integral of exp[−iπ(x2 + y2 )/(λz)] which we will not notice if we are looking only at intensities I = ψ† · ψ. In the second expression, we have made use of spatial frequencies u x and uy , which we introduced in Eq. 4.32 as wavelengthnormalized diﬀraction angles.
4.3.3
Fourier transforms: analytical and discrete The Fraunhofer approximation has led us to Eq. 4.76, and most readers will recognize that it shows that the farﬁeld waveﬁeld is simply a Fourier transform of the input waveﬁeld ψ0 (x, y). Therefore it is appropriate to take a short detour to discuss Fourier transforms. The Fourier transform of a function a(t) in the time domain leads to a function A( f ) in the frequency domain of ∞ a(t) ei 2π f t dt. (4.77) A( f ) = −∞
For example, the sound from a musical instrument captured on a microphone will lead to a voltage as a function of time or a(t), while the Fourier transform will show the frequency representation of that signal A( f ). The equivalent for functions a(x) in real space going to their Fourier space representation A(u) in spatial frequencies is obviously ∞ a(x) ei 2πux dx (4.78) A(u) = −∞
and the Fourier transform is invertible as ∞ A(u) e−i 2πux d f. a(x) =
(4.79)
−∞
Now there are entire books written on Fourier transforms and their properties (see for example [Bracewell 1986]), but the requirements for validity include that the function a(t) must be integrable and without inﬁnite discontinuities. Fourier transforms are so integral to imaging (forgive the pun) that we will represent them by a more compact notation of A(u) = F {a(x)}
(4.80)
a(x) = F −1 {A(u)}.
(4.81)
and
That is, we’ll use a lowercase letter for the function in real space a(x), and an uppercase letter for the function in Fourier space A(u). We’ll also make use of the convolution theorem of Fourier transforms, which states ∞ a(s) b(x − s) ds (4.82) a(x) ∗ b(x) = −∞
= F −1 {A(u) · B(u)}.
(4.83)
The convolution between two functions of Eq. 4.82 involves a sequence of shift (x − s), multiply, and add (integrate) operations; for example, the convolution of a broad square Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
1.0 0.8
97
1.0
a(x)
0.8
0.6
0.6
0.4
0.4
0.2
b(x)
0.2
0.0
0.0 0.4
0.2
0.0
0.2
x
0.4
0.4
0.2
0.0
x
0.2
0.4
1.0 0.8 0.6
a(x)*b(x)
0.4 0.2 0.0 0.4
0.2
0.0
x
0.2
0.4
Figure 4.18 Illustration of the convolution of two functions a(x) and b(x), as deﬁned in Eq. 4.82.
function b(x) with a narrow but rounded function a(x) will lead to a broad function with rounded edges, as shown in Fig. 4.18. We need to consider two other important properties of Fourier transforms, proofs of which we leave to other sources [Bracewell 1986]. One is the Dirac delta function δ(x − x0 ) or impulse, which is a function with the properties ⎧ ⎪ ⎪ ⎨+∞ x = x0 δ(x − x0 ) = ⎪ ⎪ ⎩0 x x0 ∞ δ(x − x0 )dx = 1 (4.84) −∞
In other words, δ(x − x0 ) is nonzero only at x = x0 and zero elsewhere, and with an integral equal to 1. The Fourier transform of this function is simple: F {δ(x − x0 )} = 1,
(4.85)
or a ﬂat function at all spatial frequencies. One can think of this in musical terms as a sharp strike on a cymbal, which produces a sound at a wide range of frequencies. Another important property of Fourier transforms involves the shift theorem of F {a(x − x0 )} = A(u)e−i 2πux0 .
(4.86)
In optics terms, as one translates an aperture sideways by a distance x0 , the transmitted wave receives a linear phase ramp at a particular plane, which can be thought of as a plane wave coming in at an angle θ = λ/x0 . For numerical calculations, the analytical Fourier transform expression of Eq. 4.78 is replaced by the discrete Fourier transform (DFT) using N sampling points over an even spacing or pixel size of Δr of A(um ) =
N−1
a(n · Δr ) ei 2π (n·Δr ) um Δr
(4.87)
n=0
The Nyquist sampling theorem states that a sampling interval of Δr is suﬃcient for Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
Imaging physics
(a) Image
(b) log10(Reciprocal space intensity)
(c) Vignetted image
(d) log10(Reciprocal space intensity)
Reciprocal space intensity
(streaks)
98
102 (e) Power spectrum 104 106 108 1010 1012 0.001
Corners 0.01
0.1
0.5 1.0
Radial spatial frequency ur
Figure 4.19 Example of fast Fourier transform or FFTbased image processing. Panels (a) and (c) show images of Wilhelm Conrad R¨ontgen, who discovered X rays; in panel (c), the image has been digitally vignetted as described below Eq. 4.95. Panels (b) and (d) show the power spectrum images from images (a) and (c); the power spectrum image is the square of the discrete Fourier transform of the image (eﬀectively the intensity of the diﬀraction pattern of the image) shown here on a logarithmic intensity scale. Note how the lack of digital vignetting in image A leads to streaks on the horizontal and vertical axes in the power spectrum (c) due to the periodicity of the discrete Fourier transform (see Eq. 4.95). Panel (e) at right shows the azimuthally averaged image power from panel (d) as a function of radial spatial frequency ur . The pixel size of the image is 1 unit, so the maximum spatial frequency at the left–right and bottom–top edges of the Fourier transform is given by the Nyquist limit (Eq. 4.88) as u x,y = 1/(2 · 1) = 0.5 inverse pixel √ units while the diagonal lines to the corners lead to spatial frequencies up to 2 higher, or ur (max) = 0.71 inverse pixel units. It is very common to ﬁnd that image power declines with spatial frequency ur as approximately u−a r ; in this image, a = 2.95 (this is discussed further in Section 4.9.1). Image of R¨ontgen from the public domain via Wikipedia.
representing a function A(u) only if it is bandwidthlimited up to a maximum frequency umax of umax =
1 . 2Δr
(4.88)
Looking back at the expression of Eq. 4.32 that gave a spatial frequency of a grating with period d of u = 1/d, one can see that the Nyquist sampling theorem corresponds to a cycle of one open bar and one closed bar on a grating with a period of 2Δr . The discrete Fourier transform is discussed in more detail elsewhere (see for example [Press 2007, Chap. 12] or [Bracewell 1986]), but it is worthwhile making a few brief comments. The wellknown fast Fourier transform (FFT) algorithm [Cooley 1965] uses symmetries in the sine and cosine representation of the Fourier transform to carry out the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
99
computation using only about FFT computational steps N log(N)
(4.89)
computational operations. The discrete Fourier transform of Eq. 4.87 would otherwise require a sum over N spatial positions at each of N spatial frequencies, or N 2 computational operations. FFTs work best with N being some power of 2 (socalled “radix2” FFTs), like N = 256 or N = 1024, though some FFT routines also work well with other integers > 2 factored into N. One can always “zeropad” a smaller array into a radix2 array, such as by embedding a 240 × 240 image in a 256 × 256 array, with no eﬀect on image information but much faster FFT processing times. In fast Fourier transforms, the number of sampling points N is preserved between real and Fourier space. The discrete positions cover a range of {xn } = {0, 1, . . . , (N − 1)} · Δr , while the discrete frequencies cover a range of + , N/2 − 1 1 (N/2 − 1) {um } = −1, − , . . . , 0, . . . , . N/2 N/2 2Δr
(4.90)
(4.91)
The size of a pixel Δu in the Fourier array is therefore found from N Δu = umax , 2 for which we can use Eq. 4.88 to obtain Δu =
1 . N Δr
(4.92)
For a detector with physical pixel size Δdet located a distance Zdet away in the far ﬁeld of a real space object, one can also write Δu =
Δdet λZdet
(4.93)
as explained in Eq. 10.16. Most FFT routines deliver their output in the order of zero to maximum spatial frequencies, followed by most negative to least negative frequencies, or + , (N/2 − 1) 1 1 N/2 − 1 , −1, − ,...,− {um } = 0, 1, . . . , (4.94) N/2 N/2 N/2 2Δr where 1/(2Δr ) represents the Nyquist limit of Eq. 4.88. Thus one must do a shift of the array by half of its width to rearrange the output to go in the order of Eq. 4.91. The FFT assumes that the input sequence a(xn ) is periodic; that is, it assumes that a(xn=N ) = a(xn=0 ).
(4.95)
In other words, the array is assumed to be repeated like tiles in a ﬂoor. This can lead to edgeringing artifacts if there is a discontinuity from the left edge to the right edge of the array, or bottom to top; this will produce streaks along the horizontal and vertical axes, respectively, in the power spectrum image as shown in Fig. 4.19. As a result, it Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
100
Imaging physics
is often useful to digitally vignette the image prior to carrying out FFTbased analysis. One way to do this is as follows: 1. Subtract the average or “D.C.” value from the image (D.C. is used here to mean “direct current” as opposed to “alternating current” as a carryover from electrical engineering terminology). 2. Roll oﬀ the edges to zero with a smooth function such as a Gaussian with a standard deviation of 4–6 pixels (other functions like the Hamming or Chebyshev windows can be used, but they vignette more of the image). 3. Add the D.C. value back. An example is shown in Fig. 4.19C. Multidimensional FFTs are done as a series of 1D FFTs. That is, in 2D, one ﬁrst takes a set of rowbyrow FFTs followed by a set of columnbycolumn FFTs. Also, most FFT algorithms work “inplace,” meaning the FFT output is overwritten onto the same array memory as supplied the FFT input. The speed of FFTs on modern computers must be appreciated in the context of some history: back in 1987, it took the author about ﬁve minutes to do a 5122 complex FFT on a MicroVAX II computer which cost something like US$25,000 and occupied about the same space as a twodrawer ﬁle cabinet. As of 2015, many mobile phones could perform the same calculation in about 10 μsec! The discrete Fourier transform can also be thought of as a matrix operation F between an input vector a = ain and an output vector A = aout , so that one can write aout = Fain .
(4.96)
This concept can be helpful when using numerical optimization methods to solve inverse problems [Gilles 2018], as discussed in Section 10.3.6.
4.3.4
Power spectra of images With the FFT now being part of our toolkit, it is often very informative to look at the power spectrum of an image (again, this terminology is a carryover from the electrical engineering literature). By digitally vignetting an image, taking its discrete Fourier transform using an FFT, and squaring the result to look at Fourier space intensities, one can see how information in the image is distributed across spatial frequencies (see Fig. 4.19; it is nearly always better to view the logarithm of the Fourier space intensities since on a linear scale one will see little beyond the power at the very lowest spatial frequencies). It is also useful to carry out an azimuthal average, and examine the power spectrum which is the Fourier space intensity as a function of radial spatial frequency ur [usually viewed as log10 (Power) versus log10 (ur )]. In our experience with images across a broad range of length scales and imaging modalities, it is very common to ﬁnd that the power spectrum declines as a constant power law factor a, or I(ur ) ∝ u−a r .
(4.97)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
101
In Fig. 4.19 we found a = 2.95, while in the xray ﬂuorescence images shown in Fig. 4.49 we have a ranging from 2.8 to 3.0. (The scaling of image signal with spatial frequency will be discussed further in Section 4.9.1). If limited photon statistics are involved, the power spectrum will often roll oﬀ at the highfrequency end to a ﬂat noise ﬂoor as discussed below Eq. 4.207; this is seen in Fig. 4.49. Indeed, spatial frequency position of the “knee” between the u−a r image signal decline and the ﬂat noise ﬂoor provides a rapid estimate of image resolution, as given by Eq. 4.251. If one has multiple images of the same object, an even better resolution estimate is provided by Fourier shell correlation as given by Eq. 4.255.
4.3.5
Fraunhofer diffraction With the compact notation of Fourier transforms in our toolkit, we see that the Fraunhofer diﬀraction integral of Eq. 4.76 can be written in a rather simple way: ψz (x, y) = F {ψ0 (x0 , y0 )}.
(4.98)
As stated immediately after Eq. 4.76, the farﬁeld diﬀraction intensity I = ψ† · ψ from an input waveﬁeld ψ0 (x0 , y0 ) is just the square of the Fourier transform of ψ0 (x0 , y0 ). Let’s consider the simple example of 1D diﬀraction from a slit of width b as shown in Fig. 4.7. The waveﬁeld modulation g˜ (x0 ) is 1 inside the range −b/2 to b/2. This is often written as a rectangle or “rect” function rect(x0 /b) = 1 for −b/2 ≤ x0 ≤ b/2 and 0 otherwise. The farﬁeld waveﬁeld (ignoring the outsidetheintegral exp[−iπx2 /(λz)] phase term shown in Eq. 4.72) using the Fraunhofer diﬀraction integral of Eq. 4.76 becomes ψ(u) = ψ0 = ψ0
b/2
1 ei 2π ux0 dx0
−b/2 b/2 −b/2
cos(2πux0 ) dx0 + i
b/2
−b/2
sin(2πux0 ) dx0 .
(4.99)
Now because sin(−θ) = − sin(θ), the sine integral from −b/2 to 0 cancels out the integral from 0 to b/2, and because cos(−θ) = cos(θ) we can just double the cosine integral from 0 to b/2. As a result, the integral can be simpliﬁed as
b/2
ψ(u) = 2ψ0
cos(2πux0 ) dx0 .
(4.100)
0
If we make the substitution θ ≡ 2πux0
(4.101)
we have dθ = 2πu dx0 or dx0 = dθ/(2πu), and the upper integration limit becomes Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
102
Imaging physics
θmax = πub, so the integral becomes θmax 2 1 ψ(u) = ψ0 ψ0 [sin(θmax ) − sin(0)] cos(θ) dθ = 2πu πu 0 sin(πub) 1 ψ0 sin(πub) = b ψ0 = πu πub sin(ν) = b ψ0 ν with ν ≡ πub and the intensity I = ψ† · ψ is
I=
ψ20
sin(ν) ν
(4.102)
(4.103) 2 (4.104)
where in Eq. 4.104 we have left out the multiplying factor b2 (just as we left the simple multiplying factors of Eq. 4.71 out). The astute reader4 will notice that we have already arrived at this result when adding up phases over a restricted range to arrive at an expression for ηu in Eq. 4.13, with ηu  plotted in Fig. 4.5; that integration is indeed what we are doing when calculating the Fourier transform of a slit. The expression sin(ν)/ν appears so often that it is given a special name as a “sinc” function sin(ν) (4.105) ν which has a value of sinc(0) = 1 as can be found from L’Hˆopital’s rule. The rect and sinc functions are paired via the Fourier transform as x0 F {rect( )} = sinc(πub). (4.106) b Returning to the solution of Eq. 4.102, the sinc function has its ﬁrst minimum at ν = π, which translates into a ﬁrst minimum at the spatial frequency u of sinc(ν) =
θ π = πu b = π b. λ Therefore the angle θ of the ﬁrst minimum in the diﬀraction pattern is λ , b which is exactly what we anticipated in Eq. 4.29. θ =
4.3.6
Fresnel propagation by integration, and by convolution Now that we have written Fraunhofer diﬀraction using the compact notation of ψz (x, y) = F {ψ0 (x0 , y0 )}, let us return to the Fresnel diﬀraction integral. The ﬁrst form of Eq. 4.70 was ∞ (x − x0 )2 + (y − y0 )2 dx0 dy0 . ψ0 (x0 , y0 ) exp −iπ ψz (x, y) = λz −∞ 4
And you’re astute, right?
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
103
Box 4.5 Propagators: proper, and improper The derivation of the Fresnel diﬀraction integral shown here is based on a simple picture of adding up Huygens point sources. A more proper approach is given by the Rayleigh–Sommerfeld theory involving Green’s functions for electromagnetic wave solutions within aperture boundaries. This can be shown [Sherman 1967] to give a diﬀerent form of the Fourier space propagator (Eq. 4.108) of [Goodman 2017, Eq. 3.78] 2πz 2 2 1 − (λu x ) − (λuy ) (4.110) H(u x , uy ) = exp −i λ while for the real space propagator (Eq. 4.107) a more accurate expression is 2π 2 2 2 h(x, y) = exp −i x +y +z . (4.111) λ However, the Fresnel approximation eﬀectively reduces Eq. 4.110 to the form shown in Eq. 4.108, which is suﬃcient for most calculations of interest in xray microscopy.
Now look at this expression while also considering the convolution theorem of Fourier transforms of Eqs. 4.82 and 4.83: a(x) ∗ b(x) =
∞
−∞
a(s) b(x − s) ds = F −1 {A( f ) · B( f )}.
We therefore see that the ﬁrst form of the Fresnel diﬀraction integral simply involves a convolution of the input plane waveﬁeld ψ0 (x, y) with a Fresnel propagator h(x, y) of $ # π h(x, y) = exp −i (x2 + y2 ) λz
(4.107)
with the feature that the Fourier transform of a propagator has much the same form [Goodman 2017, Eq. (420)] of H(u x , uy ) = exp[−iπ λz (u2x + u2y )].
(4.108)
(More exact expressions for the propagator functions are shown in Box 4.5.) As a result, we can rewrite the ﬁrst form of the Fresnel diﬀraction integral of Eq. 4.70 as . ψz (x, y) = ψ0 (x0 , y0 ) ∗ h(x0 , y0 ) = F −1 F {ψ0 (x0 , y0 )} · H(u x , uy ) .
(4.109)
This convolution approach to the Fresnel diﬀraction integral involves taking the Fourier transform of the input plane waveﬁeld ψ0 (x0 , y0 ), multiplying it by a propagator in Fourier space H(u x , uy ), and taking the inverse Fourier transform of the result. The second form in which we wrote the Fresnel diﬀraction integral was shown in Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
104
Imaging physics
Eq. 4.72 as
∞ x2 + y2 ψz (x, y) = exp −iπ ψ0 (x0 , y0 ) λz −∞ ⎡ ⎤ $ # x02 + y20 ⎥⎥⎥ ⎢⎢⎢ y x ⎥⎦ exp i2π( x0 + y0 ) dx0 dy0 . exp ⎣⎢−iπ λz λz λz
We see that this includes a Fourier transform of the product of ψ0 (x0 , y0 ) and an input plane propagator h(x0 , y0 ), which is then multiplied by an output plane propagator h(x, y) to yield ψz (x, y) = h(x, y) F {ψ0 (x0 , y0 ) h(x0 , y0 )}
(4.112)
for an equivalent expression of the Fresnel diﬀraction integral.
4.3.7
Fresnel propagation, distances, and sampling We have two equivalent approaches for propagating waveﬁelds: the convolution approach of Eq. 4.109, and the single Fourier transform approach of Eq. 4.112. What are the conditions for using one approach versus another? The diﬀerence between the two lies in the version of the propagator function that the input waveﬁeld is multiplied with: in the convolution approach of Eq. 4.109, one multiplies the Fourier transform of the input waveﬁeld F {ψ0 (x0 , y0 )} with the Fourier space propagator H(u x , uy ), whereas in the single Fourier transform approach of Eq. 4.112 the input waveﬁeld ψ0 (x0 , y0 ) is multiplied by the real space propagator h(x0 , y0 ). To see why one approach might be favored over another [Voelz 2009, Li 2015a], consider Fig. 4.20 which shows h(x0 , y0 ) and H(u x , uy ) for two diﬀerent example distances z for a given wavelength λ. At short propagation distances, the Fourier space propagator H(u x , uy ) varies more slowly, while the real space propagator h(x0 , y0 ) undergoes rapid oscillations. Especially in the case of numerical waveﬁeld propagation, these rapid oscillations can require a very ﬁne spacing of sampling points, and thus very large array sizes. Let’s consider the case [Li 2015a] of a waveﬁeld propagation calculation where we want to know the output waveﬁeld ψz (x, y) over the same maximum radius R in which we know the input waveﬁeld ψ0 (x0 , y0 ), and let’s use N sampling points in both cases with a spacing Δr = R/N. The total number Nr of π phase halfcycles for the real space propagator (Eq. 4.107) of exp[−iπr2 /(λz)] is Nr =
N 2 Δ2r R2 = . λz λz
(4.113)
In Fourier space, Nyquist sampling (Eq. 4.88) gives a corresponding maximum radial spatial frequency of ρmax = 1/(2Δr ), so the number Nρ of π phase halfcycles for the Fourier space propagator (Eq. 4.108) is Nρ = λzρ2max =
λz 4 Δ2r
(4.114)
The matching distance z0 at which we arrive at an identical number of π phase halfcycles in real and reciprocal space, or N = Nr = Nρ , is found from Eqs. 4.113 and 4.114 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
For ƪ QPZLWKz0 PP Near: z PP Far: z PP
1.0
U\SDUW ,PDJLQD art Real p
0.5 Real Space
105
0.0
6PRRWKVDPSOLQJ ZLWKVLQJOH)7 PHWKRG
ï0.5
Reciprocal Space
ï1.0 1.0 0.5
0.0
6PRRWKVDPSOLQJ ZLWKGRXEOH)7 PHWKRG
ï0.5
ï1.0
0
1
2
3
4
0
1
rѥP
2
3
4
5
rѥP
Figure 4.20 Real space h(x0 , y0 ) and Fourier space H(u x , uy ) propagators for two example
distances z for a given wavelength λ, plotted in terms of a radius r = x2 + y2 which corresponds to radial spatial frequencies ρ = r/(λz). In each case, the real part is shown in blue and the imaginary part in red. The Fourier space propagator is more slowly varying at short propagation distances, while the real space propagator is more slowly varying at longer distances. The propagator functions are deﬁned in Eqs. 4.107 and 4.108. Figure adapted from one made by Kenan Li [Li 2015a].
to be z0 =
2RΔr 2R2 = . λ λN
(4.115)
This leads to the conclusions of Box 4.6 for the approach that should be used for waveﬁeld propagation as a function of distance z [Li 2015a]. Finally, we note that the classical Fresnel number F0 for propagation from an aperture of radius a over a distance L is given by F0 =
a2 , λL
(4.116)
which we will later see matches the radius of the central zone of a Fresnel zone plate (Eq. 5.20). If we solve Eq. 4.115 for N, we obtain N=2
R2 = 2F0 . λz0
(4.117)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
106
Imaging physics
Box 4.6 Waveﬁeld propagation methods and distances The matching distance z0 of 2RΔr 2R2 = λ λN of Eq. 4.115 lets us decide between two methods for Fresnel waveﬁeld propagation based on the pixel size Δr , calculation grid radius R and number of grid points N, and wavelength λ. z0 =
• If the propagation distance is z < z0 , it is preferable to use the convolutionbased approach (Eq. 4.109) of . ψz (x, y) = ψ0 (x0 , y0 ) ∗ h(x0 , y0 ) = F −1 F {ψ0 (x0 , y0 )} · H(u x , uy ) . with H(u x , uy ) exp[−iπ λz (u2x + u2y )] because at shorter distances H(u x , uy ) varies more slowly. • If the propagation distance is z > z0 , it is preferable to use the doubleFouriertransform approach (Eq. 4.112) of ψz (x, y) = h(x, y) F {ψ0 (x0 , y0 ) h(x0 , y0 )} π (x2 + y2 )] because at longer distances h(x0 , y0 ) varies with h(x, y) = exp[−i λz more slowly.
In both cases, the more exact propagator functions h(x, y) and H(u x , uy ) are given in Box 4.5.
Therefore we see that the number of sampling points N required at the matching distance z0 is equal to twice the Fresnel number F0 if the aperture a spans the whole space R. Waveﬁeld propagation can also be computed using other methods such as the ﬁnitediﬀerence method, and these methods can oﬀer faster computation speed and better accuracy for larger values of the index of refraction n [Van Roey 1981, Scarmozzino 1991]. While these methods have been used in a few cases for simulating xray waveﬁeld propagation [Fuhse 2006, Melchior 2017], we have emphasized the Fourier transform based approach here both for conceptual simplicity and because thus far it has been used by most xray microscopy researchers.
4.3.8
Propagation and diffraction in circular coordinates Lenses are often round; in such cases, it is best to work in circular coordinates with x = r cos(θ) y = r sin(θ) r = x2 + y2 (4.118) u x = ρ cos(θ∗ ) uy = ρ sin(θ∗ ) ρ = u2x + u2y so that r is the radius in real space and ρ in reciprocal space. Extension of the Fresnel diﬀraction integrals of Eqs. 4.70 and 4.72, and the Fraunhofer diﬀraction integral of
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
107
1.0
J0i
0.5
J1i 0.0
0.5 0
2
4
i
6
8
10
Figure 4.21 The Bessel functions J0 (r) and J1 (r). These Bessel functions of the ﬁrst kind are
used in calculating the Hankel transform of Eq. 4.123 as well as the diﬀraction pattern from a pinhole as given by Eq. 4.134.
Eq. 4.76, to circular coordinates is straightforward in terms of dealing with propagator functions of (4.119) h(r) = exp −iπr2 /(λz) in real space and
H(ρ) = exp −iπ λz ρ2
(4.120)
in reciprocal space. What is less straightforward is the equivalent of the Fourier transform, which we now consider. The 1D Fourier transform expression of Eq. 4.78 of ∞ a(x) ei 2π f x dx A( f ) = −∞
becomes
∞
/ 0 a(r) exp i 2π (ρ cos(θ∗ ) r cos(θ) + ρ sin(θ∗ ) r sin(θ) r dr dθ r=0 θ=0 2π ∞ / 0 a(r) r dr exp i 2π ρr cos(θ∗ − θ) dθ (4.121) =
A(ρ, θ∗ ) =
r=0
2π
θ=0
in circular coordinates. The integral over θ is known as a Bessel function of the ﬁrst kind with zero order J0 of 2π 1 J0 (w) = exp[iw cos(θ)] dθ, (4.122) 2π 0 which is a pure real function (see Fig. 4.21) apart from an arbitrary uniform phase set by the choice of θ∗ . If we choose θ∗ = 0 and use the Bessel function result of Eq. 4.122 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
108
Imaging physics
1.0 0.8
2J1(i)/i
0.6 0.4
[2J1(i)/i]2
0.2 0.0 0.2 0
5
10
i
15
20
Figure 4.22 The Airy function with no central stop (b = 0) shown both as the amplitude 2J1 (ν)/ν of Eq. 4.132, and the intensity [2J1 (ν)/ν]2 of Eq. 4.134. The Airy function describes diﬀraction by a circular aperture, or the focal spot of a lens. Note that the sign of the amplitude ﬂips in successive side lobes.
in Eq. 4.121, we have
∞
A(ρ) = 2π
a(r) J0 (2πρr) r dr = H{a(r)},
(4.123)
0
which is known as the Fourier–Bessel or zeroethorder Hankel H{} transform of the function a(r). Like the Fourier transform, the Hankel transform is invertible so a(r) = H −1 {A(ρ)}.
(4.124)
We have arrived at circular coordinates equivalents of the Fresnel diﬀraction integrals. The convolution form of Eq. 4.109 can be written as ψz (r) = H −1 {H{ψ0 (r0 )} · H(ρ)} ,
(4.125)
while the single transform approach of Eq. 4.112 can be written as ψz (r) = h(r) H {ψ0 (r0 ) h(r0 )} .
(4.126)
The Fraunhofer diﬀraction integral of Eq. 4.98 becomes ψz (ρ) = H{ψ0 (r0 )}.
(4.127)
Several papers discuss numerical implementation of waveﬁeld propagation using Hankel transforms [Yu 1998, GuizarSicairos 2004, Norfolk 2010, Li 2015a]. In Section 4.3.5, we calculated the Fraunhofer diﬀraction pattern of a slit of width b, describing it as a rectangle function or rect(x/b). Let us consider here the Fraunhofer diﬀraction pattern of a pinhole of radius a, which we’ll describe in terms of a circle function or circ(r/a). From Eqs. 4.123, we see that the farﬁeld or Fraunhofer diﬀraction Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
109
1.0
0.8
0.6
Ƭ 0.4
0.2
0.0 0.0
0.5
1.0
1.5
Ƭ
2.0
2.5
2 Figure 4.23 The sinc (ν) = [sin(ν)/ν]2 function describing the intensity of light diﬀracted by a slit of width b, with ν = πub (Eq. 4.104), and the Airy intensity function [2J1 (ν)/ν]2 describing the intensity of light diﬀracted from a circular aperture of radius a, with ν = 2πρa (Eq. 4.134). The ﬁrst minimum of the slit diﬀraction pattern is at νﬁrst min = π, while for the pinhole it is at νﬁrst min = 1.22π. The Airy intensity function also describes the focus of a lens, as will be discussed in Section 4.4.3.
amplitude of such a pinhole is
∞
ψz (ρ) = 2π 0
ψ0 circ
= ψ0 2π
r a
J0 (2πρr) r dr
a
J0 (2πρr) r dr.
(4.128)
0
If we make the substitution r = 2πρr, we obtain ψz (ρ) = ψ0
2π (2πρ)2
a
J0 (r ) r dr .
(4.129)
0
Now a recurrence relationship of Bessel functions of the ﬁrst kind is d n+1 x Jn+1 (x) = xn+1 Jn (x), dx which leads to
r J0 (r ) dr = r J1 (r )
(4.130)
(4.131)
which when evaluated at the integration limits of 0 to a allows one to write the solution Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
110
Imaging physics
of Eq. 4.129 as 2π 2J1 (2πρa) 2πρa J1 (2πρa) = ψ0 (πa2 ) 2πρa (2πρ)2 2J1 (ν) = ψ0 (πa2 ) ν
ψz (ρ) = ψ0
(4.132)
with ν ≡ 2πρa.
(4.133)
Thus one has the area of the pinhole πa2 as a scaling factor, and an intensity distribution I = ψ† · ψ of 2 2 2 2 2J1 (ν) . (4.134) I(ρ) = ψ0 (πa ) ν This expression for the farﬁeld diﬀraction pattern of a pinhole makes use of the Airy function 2J1 (ν)/ν (named after the British mathematician George Biddell Airy), which is shown in both amplitude and intensity form in Fig. 4.22. The ﬁrst minimum of the intensity function is at νﬁrst min = 1.220 π,
(4.135)
whereas the diﬀraction pattern of a rectangular aperture of [sinc(ν)]2 = [sin(ν)/ν]2 has a ﬁrst minimum at νﬁrst min = π. From Eqs. 4.133 and 4.135, the divergence semiangle θﬁrst min = λρﬁrst min from the optical axis to the ﬁrst minimum of the Airy pattern is given by λ θﬁrst min = 0.61 . a
(4.136)
The Airy (circular aperture) and sinc (square aperture, or slit) function intensities are shown together in Fig. 4.23.
4.3.9
Multislice propagation So far we have discussed the propagation of waves over some distance through free space. If “empty” samples were all we could simulate for xray microscopy, we would be in a sorry state! In the case of suﬃciently thin specimens with thickness t, we found in Eq. 3.71 that a waveﬁeld is modulated by exp[kt(iδ(x, y) − β(x, y))] to yield an exit wave within the pure projection approximation. For thicker specimens, the multislice method [Cowley 1957, Ishizuka 1977] (also called the beam propagation method [Van Roey 1981]) provides a way to simulate waveﬁeld propagation through real objects with a refractive index distribution n(x, y, z) = 1 − δ(x, y, z) − iβ(x, y, z) (Eq. 3.67), as shown in Fig. 4.24. The sample is considered as if it were sliced into a set of thin slabs of material along the beam propagation direction. For each slab, two separate steps are carried out: 1. Within the slab of thickness Δz, the incoming waveﬁeld ψ j is modulated by the net
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.3 Waveﬁeld propagation
6ODE·VRSWLFDO modulation
Thick object
111
)UHHVSDFH SURSDJDWLRQ
Beam direction
6z
ƶj
ƶj ƶj+1
Slab j
ƶ'j
ƶ'j
ƶj+1
6z Figure 4.24 Schematic representation of the method of multislice propagation, used to simulate a waveﬁeld propagating though a nonhomogeneous refractive medium [Cowley 1957]. Along the beam direction, the object is represented by a series of slices. At the entrance of a slice, the incident waveﬁeld ψ j is ﬁrst modulated by the refractive eﬀects of the slab of material, leading to a modiﬁed waveﬁeld at the same plane of ψj . This waveﬁeld is then propagated to the next slab entrance, yielding the next slice’s waveﬁeld of ψ j+1 .
optical eﬀect of the slab integrated along the beam direction, giving a modulated waveﬁeld ψj of ψj (x, y) = ψ j (x, y)
Δz
exp[−ikn(x, y, z) z] dz Δz & 2πz % exp = ψ j (x, y) iδ(x, y, z) − β(x, y, z) dz λ 0 2πΔz (i δ(x, y, z j ) − β(x, y, z j ) , ψ j (x, y) exp λ 0
(4.137)
where the latter expression is appropriate for a sample that is deﬁned on a regular grid of longitudinal positions z j . 2. This materialmodulated wave is then brought to the next plane ψ j+1 by freespace propagation using the convolution propagation method of Eq. 4.109. (As noted at the end of Section 4.3.7, one can instead use ﬁnitediﬀerence waveﬁeld propagation methods with potentially faster computation speed and higher accuracy.) Further details on its formulation for problems including spherical wave propagation are available [Munro 2019]. This approach allows one to simulate the exit wave (see Box 10.2) that would emerge from the illuminated object without invoking either the Born or Rytov approximations (Section 3.3.4). Once one has exited the object and is in free space, this exit wave can be propagated some further distance, including into the farﬁeld condition for obtaining the diﬀraction pattern of the extendedindepth object [Thibault 2006]. The multislice method is easy to implement, and it applies to a surprisingly wide range of xray optical phenomena (including grazing incidence reﬂection, and diﬀracDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
112
Imaging physics
tion by thick, high aspect ratio optics where coupled wave equation methods have been used previously [Li 2017a]). How many slices must one use? Since the trans√ verse distance from an edge to the ﬁrst Fresnel fringe in propagation scales as λz (see Eq. 4.217), we wish to have the transverse pixel size Δr be a small fraction 1 of this distance, or √ (4.138) Δr = 1 λz. Nyquist sampling would suggest 1 ≤ 0.5 and the Rayleigh quarter wave criterion would suggest 1 ≤ 0.25. The transverse sampling condition implies that the longitudinal sampling be some small fraction 2 of z = Δ2r /( 12 λ), giving Δz =
2 Δ2r . 12 λ
(4.139)
Values of 1 = 0.1 and 2 = 0.1 give good agreement with a test of the convergence of the multislice method as the slice thickness Δz is decreased [Li 2017a], though the way to be sure is to decrease 1 and 2 and see that one approaches an asymptotic limit for small step size. Another way to answer the question is to consider the transition from planar to volume diﬀraction gratings for a grating of a thickness Δz and period d as given by the Klein–Cook parameter QK–C of [Klein 1967] QK–C =
2π λ(Δz) , n d2
(4.140)
where n is the mean refractive index (which of course is n 1 for X rays based on Eq. 3.67). Values of QK–C 1 are adequately described using plane grating diﬀraction, while the condition QK–C 1 means that volume diﬀraction eﬀects begin to come into play (Section 4.2.2). If the grating halfperiod is bΔr , one can rearrange Eq. 4.140 in terms of the slice thickness to ﬁnd ΔzK–C =
2nb2 QK–C (Δr )2 . π λ
(4.141)
The condition of having 1 = 0.1 and 2 = 0.1 corresponds to QK–C = 5π/(nb2 ). As an example, if the pixel size is oneﬁfth the ﬁnest zone width drN in a Fresnel zone plate, one has b = 5 and QK–C = π/5, which indeed satisﬁes QK–C 1.
4.4
Imaging systems Now that we have the basics in hand for how waveﬁelds propagate, we can go on to discuss imaging systems. We will connect waveﬁeld propagation with lens imaging in Section 4.4.2, but we begin with a short review of the basics of lensbased imaging. A lens with focal length f serves to image an object from a longitudinal position s to an image position s according to 1 1 1 + = . (4.142) s s f
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
113
Figure 4.25 Thin lens imaging. An object at a distance s is imaged to a point s by a lens of focal
length f according to Eq. 4.142. The magniﬁcation M is given by M = h /h, which is negative and with a magnitude greater than 1 in the case shown here. Parallel rays are directed through the focal points f of the lens, and rays that go through the focal point emerge parallel to the optical axis.
The lateral magniﬁcation M of the image is given by M=
s h =− , h s
(4.143)
where negative magniﬁcations indicate an inverted image (the case shown in Fig. 4.25). Given the fact that the size times angular divergence of a beam is a constant at any focus (Eq. 4.190), one can also relate the object divergence θ to the image convergence θ via θ = −θ/M.
(4.144)
For large magniﬁcation, s is near a focal length f so we can write s = f + Δz with Δz f , giving 1 1 Δz 1 1 1 1 1 + = + (1 − ) + s s f + Δz s f (1 + Δz/ f ) f f s 1 Δz 1 − 2 + , f s f which when substituted into Eq. 4.142 yields 1 Δz = 2. s f
(4.145)
The lateral image magniﬁcation can then be expressed as M=
−s − f 2 /(Δz) − f = , s f + Δz Δz
(4.146)
which means we can write approximate expressions for imaging distances of s = f + Δz = f (1 + s =
f2 −M f. Δz
Δz f ) f − f M
(4.147) (4.148)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
114
Imaging physics
That is, if we have a lens with a focal length of f = 1 mm and image with a magniﬁcation of M = −1000, the object will be at a distance s = Δz = −(1 mm)/(−1000) = 1 μm more than a focal length f away from the lens, and the image will be at a distance of s = −(−1000) · (1 mm) = 1 m. For large demagniﬁcation of an xray source to a small focus (for example, with M = −1/100) we have the equivalent approximate expressions of f (4.149) s− M s f − M f, (4.150) which would place the source at 100 f away and the image at f + ( f /100) away from the lens in this example. Now let us consider the longitudinal magniﬁcation (the magniﬁcation in the zˆ direction). If we move the object a distance Δs farther away from the lens, the image will be at a distance Δs farther away as well. That is, we have 1 1 1 + = s + Δs s + Δs f
(4.151)
so we wish to approximate the lefthand side 1 1 1 1 Δs 1 1 Δs + + (1 − ) + (1 − ) = s + Δs s + Δs s(1 + Δs/s) s (1 + Δs /s ) s s s s Δs Δs 1 1 1 (4.152) + − 2 + 2 = , s s f s s which then allows us to subtract 1/s + 1/s from the lefthand side of Eq. 4.152, and 1/ f from the righthand side by using Eq. 4.142. This leaves us with the condition s2 = −Δs M 2 (4.153) s2 or the statement that the longitudinal or z magniﬁcation Mz is related to the lateral magniﬁcation M by Δs = −Δs
Mz = M 2 .
(4.154)
Let us consider a practical example of an xray microscope with a depth of ﬁeld (Section 4.4.9) of DOF = 5 μm, and an image magniﬁcation of M = −200 with an optic with a focal length of f = 2 mm. The imagerecording detector will be at a distance of s = −(−200)(2 mm) = 4 meters from the lens, while the DOF will be magniﬁed by Mz = 2002 to (2 × 10−6 ) · (2002 ) = 0.2 meters. That is, the entire DOF region will show up in the image at the same magniﬁcation (the detector is much thinner than 0.2 meters) so one can treat image features within the DOF as being presented as a pure projection to the detector, with no depthcoupled lateral magniﬁcation changes.
4.4.1
Field of view In thinlens imaging theory, rays that go through the center of a lens are undeviated no matter what their incident angle is (the lens surfaces are always ﬂat right on the optical
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
115
a DOF
Optical axis
Ƨ Im
ag
ing
ax
is
c
b
Figure 4.26 Schematic used for calculating the ﬁeld of view of an imaging system. An object at distance s and height h is imaged at a distance s at a height −h if a lens were able to truly image a plane to a plane; in fact, imaging happens between the spherical surfaces shown as dashed lines, so that the longitudinal oﬀsets a and b should be less than the depth of ﬁeld DOF for imaging.
axis). When the source is moved oﬀaxis by a distance h, the image is therefore moved oﬀaxis by a distance h as given by the magniﬁcation M of Eq. 4.143. However, in setting up the geometrical optics conditions of either a thin refractive lens or a Fresnel zone plate, the distances s and s turn out to be distances from the object to the lens center, rather than oﬀset distances from the lens along the optical axis; that is, the thinlens imaging equation of Eq. 4.142 is for imaging a spherical surface to a spherical surface (the dashed lines shown in Fig. 4.26). When imaging a planar object to a planar detector, we must consider the departures from the conditions of spherical object and detector planes. Considering the geometry of Fig. 4.26, we see that the modiﬁed object distance is s + a, and the modiﬁed image distance is s + b. Projecting this to a distance along the optical axis gives s = (s + a) cos(θ) and s = (s + b) cos(θ), from which we ﬁnd (1 − cos θ) cos θ (1 − cos θ) b=s . cos θ a=s
(4.155) (4.156)
Now because the object radius from the lens is increased from s to s + a, the image radius from the lens is decreased to (s − c) where c is a positive number, leading to a thinlens imaging condition along the imaging axis of 1 1 1 + = . s + a s − c f
(4.157)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
116
Imaging physics
By taking steps like those used in Eq. 4.152, we obtain the result c=a
a s (1 − cos θ) s2 , = 2 = 2 2 cos θ s M M
(4.158)
which shows the M 2 dependence on changes in the object’s longitudinal position, as expected from Eq. 4.154. Given that we know now the image point to be shifted by a distance c inside the normal image radius on the imaging axis, we can project the net shift Δz along the optical axis as s (1 − cos θ) (1 − cos θ) + s cos θ Δz = (c + b) cos θ = cos θ M 2 cos θ s 2 s h + s + s (1 − cos θ) = M2 M2 2s2 2 2 1 h h (4.159) − M −M , 2 s M f where we have used 1 − cos θ θ2 /2, θ h/s, s f as given by Eq. 4.147, and M = −s/s as per Fig. 4.26. In order for this position to be imaged at full resolution, it should be within the microscope’s DOF as will be discussed in Section 4.4.9. That is, we require Δz ≤ DOF, or (using Eq. 4.215) −M
2cz δ2r h2 ≤ . s 0.612 λ
Renaming the height h to be hDOF , the radial ﬁeld of view as set by DOF limits is
2cz f δr . (4.160) hDOF ≤ 0.61 M λ Consider the case of δr = 40 nm resolution imaging with a Fresnel zone plate with a focal length of f = 0.84 mm at λ = 2.42 nm, used in an imaging system with a magniﬁcation of M = −500 (and assume cz = 1 as will be discussed in Section 4.4.9). In this case we have hDOF = 6.0 μm, or a fulldiameter ﬁeld of view of 2hDOF = 12.0 μm with full image sharpness.
4.4.2
Optical system via propagators A lens acts as a Fourier transform device, since a light source placed at the focal point will generate a plane wave emerging from the lens, and an incident plane wave will produce a point focus. The point source or focus is a Dirac delta function δ(x − x0 ) centered at the transverse position x0 , and the Fourier transform of a delta function is 1 at all frequencies (Eq. 4.85). Moving the point source sideways to some other transverse position leads to a plane wave at a diﬀerent angle relative to the optical axis, as shown in Fig. 4.27. This is the shift theorem of Fourier transforms (Eq. 4.86) in action! With this understanding, let’s brieﬂy consider a Fourier optics approach to an imaging system [Goodman 2017, Chap. 5]. Let’s start with a planoconvex lens with curvature
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
117
Figure 4.27 Lenses act as Fourier transform devices. Point sources of light placed a focal length away from the lens lead to plane waves emerging from the lens, and an oﬀset of the point source leads to rays at an angle (red) as expected from the shift theorem of Fourier transforms (Eq. 4.86).
R2 as shown in Fig. 4.28. The thickness t(r) of the lens as a function of radius r can be found from t(r) = R2  cos(θ) − r2 = R2  cos(θ) − (R2  − tmax ) = tmax − R2 [cos(θ) − 1] θ2 1 r2 r2 (4.161) tmax − R2  = tmax − R2  = tmax − 2 2 2 2R2  R2  where we have used the approximation cos(θ) 1 − θ2 /2 for a thin lens with small aperture. For visible light where (n − 1) > 0, the phase retardation produced by this radially dependent glass thickness is ϕ(r) = −k (n − 1) t(r),
(4.162)
where k = 2π/λ as usual. Ignoring the constant [−nktmax ] term, and using the fact that the sign convention for lens radii has R2 = −R2 , the phase shift exp[iϕ(r)] relative to there being no lens at all is given by ϕ(r) = k (n − 1)
r2 2R2 
(4.163)
If we repeat the same calculation for a convexplano lens with R1  = +R1 , and add the results together to represent a doubleconvex lens, we have 1 πr2 πr2 1 , (4.164) − = ϕ(r) = (n − 1) λ R1 R2 λf where in the last step we have used 1 1 1 = (n − 1) − , f R1 R2
(4.165)
which is known as the lensmaker’s equation. Let’s now describe an optical system using propagators and lens phase functions. We start with a point source which we’ll represent with a Dirac delta function δ(x − 0, y − 0). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
118
Imaging physics
t(r) R2 Ƨ r2
r tmax
Figure 4.28 Geometry for calculating the glass thickness of a planoconvex lens, leading to
Eq. 4.161.
If we use the convolution approach to propagation of Eq. 4.109, the waveﬁeld ψ1 (x, y) entering the lens is given by . ψ1 (x, y) = ψ0 F −1 F {δ(x − 0, y − 0)} · H(u x , uy ) . = ψ0 F −1 1 · H(u x , uy ) x 2 + y2 = ψ0 exp −i π (4.166) λs where we have used the fact that the Fourier transform of a Dirac delta function is 1 everywhere (Eq. 4.85), and where we have written out the real space propagator h(x, y) – which is, after all, the Fourier transform of H(u x , uy ) – explicitly for the propagation distance s. We then multiply by the phase function of the lens eiϕ(r) (using Eq. 4.164). The waveﬁeld ψ1 (x, y) exiting the lens is then given by 2 x + y2 ψ1 = ψ1 exp iπ λf x2 + y2 , (4.167) = ψ0 exp −iπ λZ where 1 1 1 ≡ − . Z s f
(4.168)
If f = s, then 1/Z = 0 and the entire quadratic phase term becomes exp[−i · 0] = 1. In that case, the lens has taken a point source and turned it into a plane wave, as shown in Fig. 4.27. If we now propagate ψ1 by a distance s to obtain a waveﬁeld ψ2 , we have / 0. ψ2 = F −1 F {ψ1 } exp −iπ λs (u2x + u2y )  / 0 / 0. = ψ0 F −1 exp −iπ λZ (u2x + u2y ) exp −iπ λs (u2x + u2y ) . (4.169) Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
119
1.0
Sum Spot 1
0.8
Spot 2
Airy intensity
0.735
0.6
0.4
0.2
Ƭ
0.0 5
0
Ƭ
5
10
Figure 4.29 The Rayleigh resolution criterion is met when the Airy intensity pattern of one diﬀractionlimited focal spot is centered at the ﬁrst minimum of the Airy intensity pattern of a second focal spot, or at a separation of ν = 1.22π. This leads to the Rayleigh resolution of δr = 0.61λ/N.A. as given in Eq. 4.173. The summed intensity in between the two images (the “dip”) drops to 73.5 percent of the singlesource intensity maximum. When the lens has a halfdiameter central stop or b = 0.5 in Eq. 4.176, the “dip” drops to 52.2 percent of the maximum. Other resolution criteria include the Sparrow criterion shown in Fig. 4.35.
Consider the case when s = −Z : the quadratic phase factors then cancel each other out, so we are left with ψ2 = ψ0 F −1 {1} = ψ0 δ(x − 0, y − 0).
(4.170)
That is, we have imaged from a point to a point! Going back to Eq. 4.168, the condition of s = −Z can be written as 1 1 1 1 1 1 1 − + = (4.171) = − = − or s Z s f s s f which simply reproduces Eq. 4.142. We therefore see that a lens works to counteract the Fresnel propagation of a waveﬁeld from the object point to the image point.
4.4.3
Diffraction and lens resolution In the above analysis, we have neglected to include an important factor: lenses have a limiting aperture, which we will refer to as a pupil function p(a) where a is the radius of the lens. Since we have seen that a plane wave incident on a lens is imaged to a point, or F {1} = δ(x − 0, y − 0), the focus of a ﬁniteaperture lens will involve a Fourier transform of the pupil function p(a). We have already solved this problem when we calculated the farﬁeld diﬀraction pattern of a pinhole: the amplitude is given by Eq. 4.132 as the Airy function [2J1 (ν)/ν]. In the present case we wish to calculate the light amplitude
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
120
Imaging physics
as a function of radius r at the focal plane using ν = 2πρr. Now the spatial frequency ρ = θ/λ is a wavelengthnormalized diﬀraction angle, which in this case is determined by the limiting angle of a lens known as its numerical aperture of a (4.172) N.A. ≡ n sin(θ) . f The exact expression allows for there to be a refractive medium n between the lens and the focus, while the approximate form assumes n = 1 (entirely appropriate for X rays) and small angles. We then have ν = 2π(N.A./λ)r in the Airy function, and in Eq. 4.135 we found that the ﬁrst zero in the Airy amplitude (and ﬁrst minimum in the Airy intensity) is at νﬁrst min = 1.22π from which we ﬁnd a spatial resolution δr of δr = rﬁrstmin = 0.61
λ . N.A.
(4.173)
This is the celebrated Rayleigh resolution for a lens, with an intensity proﬁle as shown in Fig. 4.23. This intensity proﬁle, which represents the image of an inﬁnitely small illumination source as produced by an aberrationfree circular lens, is known as the intensity point spread function psf(ν) of the lens of 2 2J1 (ν) (4.174) psf(ν) = ν with r ν = 1.22π . δr
(4.175)
This is also called the diﬀractionlimited focus of the lens. Departures from this result due to ﬁnite illumination sources are discussed in Section 4.4.6. Rayleigh arrived at the expression of Eq. 4.173 by considering the image of closely separated incoherent sources (the light from stars as imaged in a telescope, in his case); he declared them to be resolvable when the center of the image of one star was located at the position of the ﬁrst Airy minimum of the image of the second star, as shown in Fig. 4.29. In Fig. 4.30, we show 2D images of the sum of two Airy intensity spots as a function of their separation. The analysis above was for a normal optic that is continuous from the outer aperture to the center. Fresnel zone plates used in scanning xray microscopes usually have a central stop which is used in conjunction with an ordersorting aperture as shown in Fig. 5.17 (zone plate monochromators also require central stops, as shown in Fig. 6.4). We therefore consider the properties of a circularly symmetric optic with a central stop of fractional diameter b (the case of noncircularly symmetric optics, such as Kirkpatrick–Baez mirrors and multilayer Laue lenses, is considered in Section 4.4.5). With a central stop fraction b, the intensity point spread function is modiﬁed to an obstructed Airy function [Linfoot 1953, Tschunko 1974] of 2 2 2J1 (ν) 1 2 2J1 (b ν) − b (4.176) psf(ν, b) = ν bν (1 − b2 ) with ν = 1.22π r/δr as in Eq. 4.175. In Fig. 4.31, we show the modiﬁcations to the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
0.8
1.0
1.2
0.2
0.4
0.6
121
Figure 4.30 The Rayleigh resolution criterion is illustrated by showing the sum of two Airy intensity spots separated by the indicated fraction of the Rayleigh resolution criterion of δr = 0.61λ/N.A. as given in Eq. 4.173. The Sparrow resolution criterion [Sparrow 1916], where the “dip” between the two spots just disappears, corresponds to a fraction 0.77 of the Rayleigh resolution (see Eq. 4.177) for the case of an unobstructed circular optic.
Airy function that this causes. While increasing values of b lead to a slight narrowing of the central Airy disk that can be interpreted as an improvement in spatial resolution [Rarback 1981] and as a narrowing of the fullwidth at halfmaximum (FWHM) probe diameter (Fig. 4.32), the fraction of energy that goes into the central Airy disk decreases as b is increased (Figs. 4.33 and 4.34) and we will see below that the optical transfer function is also aﬀected (Fig. 4.48). It is not easy to remove these side lobes without reducing the resolution of the focal spot; for example, if one were to try to modify the optic to produce a circfunctionlike cutoﬀ of intensities beyond the ﬁrst Airy minimum, the Fourier transform of such a function would be an Airy amplitude pattern with which one would have to modulate the optic, thereby losing ﬂux while also requiring higher numerical aperture without a corresponding improvement in spatial resolution [Lu 2006, Sec. 2.7]. The Rayleigh resolution criterion is of course somewhat arbitrary; it is based on the speciﬁc properties of an unobstructed circular lens and the position of its ﬁrst diﬀraction minimum. Another criterion that is sometimes used is the Sparrow resolution criterion [Sparrow 1916], which is when the “dip” between the two point source images just disappears (that is, the second derivative of the intensity proﬁle becomes zero at the midpoint between the two point sources; see Fig. 4.35). In Sparrow’s original paper, he pointed out that the human visual system’s propensity for edge detection means that an observer will detect two lines in a spectrum (rather than one) when this condition is met, though this is perhaps optimistic for imaging of more “crowded” specimens. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
122
Imaging physics
1.0
Intensity
0.8
0.6 Full aperture 0.4 Halfdiameter central stop
0.2
0.0 6
4
2
0
2
4
6
Ƭ Figure 4.31 Focused intensity proﬁle for a full optic (with an Airy intensity proﬁle as given by Eq. 4.134), and for one with a halfdiameter central stop (or b = 0.5 in Eq. 4.176). The optic with a central stop would produce a lower peak intensity by a factor of 1 − b2 , which is not shown in this plot. While the central spot is narrower (the ﬁrst minimum is at ν = 1.001π for b = 0.5, versus ν = 1.22π for b = 0), the ﬁrst Airy ring carries a larger fraction of the focused energy, as is shown in Fig. 4.33, and the optical transfer function is also signiﬁcantly aﬀected, as is shown in Fig. 4.48.
For an optic with no central stop or b = 0, the Sparrow resolution criterion is met when ν = 0.941π, while with a halfdiameter central stop or b = 0.5 it is met when ν = 0.862π, giving spatial resolution values of λ δr [Sparrow, b = 0] = 0.47 N.A.
λ δr [Sparrow, b = 0.5] = 0.43 N.A.
I[Sparrow] = 1.126
(4.177)
I[Sparrow] = 1.083,
(4.178)
which can be compared with the standard Rayleigh result of Eq. 4.173. For an unobstructed lens, the Sparrow criterion gives a resolution that is (0.941/1.22) = 0.77 the value of the Rayleigh resolution, which one can see in Fig. 4.30.
4.4.4
Beating the diffraction limit in light microscopy Returning to the Rayleigh resolution criterion given in Eq. 4.173, it does exactly what it was asked to do: it gives us a simple estimate for how easily one can distinguish two nearby incoherent sources. In the following section, we will see that image transfer functions provide some nuance to judging resolution beyond a single hardandfast number. In visible light microscopy, there are a wide range of approaches that have enabled imaging well beyond the Rayleigh resolution limit, including the following: • Structured illumination involves imaging a grating onto the illuminated ﬁeld of view
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
123
0.85 0.843 at b=0 Fraction of Rayleigh resolution
0.80
0.747 at b=0.5
0.75
0.70
0.65
0.60 0.0
0.2
0.4 0.6 Central stop fraction b
0.8
1.0
Figure 4.32 The fullwidth at halfmaximum (FWHM) probe size versus Rayleigh resolution for optics with central stop fractions b of the total optic diameter. As the central stop fraction b increases, the FWHM probe diameter decreases but more energy is thrown into the outer rings of the Airy function (Figs. 4.33 and 4.34). 2
Table 4.1 Integral of the twodimensional Airy and sinc intensity distributions at the radius
νﬁrst min = 1.22π corresponding to the Rayleigh resolution for an optic with no central stop (Eq. 4.135). This integral result, referred to in Fig. 4.34 as IR , is listed below for circular optics with various fractional center stop diameters b, along with the actual radius νdark ring of the ﬁrst minimum or “dark ring” of the modiﬁed Airy function (Eq. 4.176). The sinc2 distribution for crossed cylindrical optics is addressed further in Section 4.4.5. Optic type sinc2 circular circular circular circular circular circular circular circular circular circular
b 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
IR 0.826 0.843 0.824 0.770 0.690 0.596 0.499 0.405 0.314 0.221 0.120
νdark ring 3.832 3.786 3.664 3.502 3.322 3.144 2.974 2.814 2.666 2.530
of an object, essentially allowing for the doubling of the maximum spatial frequency from which structural information is collected and thus a twofold improvement in spatial resolution [Gustafsson 2000]. Nonlinear ﬂuorescence response can improve this further [Gustafsson 2005]. • Nearﬁeld scanning optical microscopes (NSOMs) use a subwavelength aperture Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
124
Imaging physics
8
Energy in central Airy disk
0.8
6
Ƭ of dark ring
Fraction of total energy
1.0
0.6
0.4
0.2
0.351 at b=0.5
0.072 at b=0
0.0 0.0
0.2
0.4
Energy in first Airy ring
0.6
Central stop fraction b
Ƭ=7.01 at b=0
Ƭ=7.19 at b=0.5
Sec
ond
dark
ring
4
First dark ring 2
0.8
1.0
0 0.0
0.2
0.4
0.6
0.8
1.0
Central stop fraction b
Figure 4.33 Properties of an optic as a function of increasing the fractional diameter b of a central stop. The lefthand ﬁgure shows the energy in the central Airy disk and the ﬁrst Airy ring as a function of b, while the righthand ﬁgure shows the value of ν of the location of the ﬁrst and second Airy dark rings. Values of the fraction of energy for the central Airy disk, and the positions νdark ring of the ﬁrst dark ring, are shown in Table 4.1, while values for the energy in the ﬁrst Airy ring and the position of the second dark ring are indicated on the plot. Central stops are often a required feature of xray optics (see Figs. 5.17 and 6.4), and while the central Airy disk is narrowed as b is increased (Fig. 4.31), a greater fraction of energy goes into subsidiary Airy rings (Fig. 4.34) and the optical transfer function is also aﬀected (Fig. 4.48).
to generate a small spot of light. The fraction of light that makes it through this aperture is small, and the beam quickly diverges over a distance of about 100 nm from the tip, but images with a resolution of tens of nanometers can be obtained [Betzig 1991]. • Stimulated emission depletion (STED) microscopy works by using a phase spiral to produce a “hollow” focus spot of one wavelength which suppresses the excitation of visiblelight ﬂuorescence transitions in a molecule, and an overlapping focal spot of an excitation beam. The net eﬀect is that visiblelight ﬂuorescence is limited only to the center of the “hollow” beam so that a spatial resolution of better than 50 nm can be obtained, as proposed and ﬁrst demonstrated by Hell et al. [Hell 1994, Klar 2000]. • One can measure the position of widely separated objects with a precision far greater than the resolution. While two objects are hard to distinguish at separations ﬁner than the Rayleigh resolution, as shown in Figs. 4.29 and 4.30, one can ﬁt a function such as a Gaussian to the focal spot of an isolated object such as the point spread functions shown in Fig. 4.31. Even if the point spread function is not exactly Gaussian in proﬁle, and even if it is “noisy” due to a limited number of photons being collected in the measurement, one can still ﬁnd the position of the center of the Gaussian to a precision much ﬁner than the width of the Gaussian. In visiblelight microscopy, Eric Betzig proposed such a trick if one could somehow control the turning on and oﬀ of ﬂuorescence from nearby emitters [Betzig 1995]. It was soon observed by Dickson, Moerner, and collaborators that some individual ﬂuorophores will spontaneously switch into a longlasting dark state yet they Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
125
1.0 b=0.4
b=0.0
Integrated intensity
0.8
b=0.1
b=0.5
b=0.2 2
0.6
sinc
b=0.6
b=0.3 b=0.7
0.4 b=0.8
First dark ring
0.2
b=0.9
1.22 0.0 0
2
4
6
8
10
Ƭ Figure 4.34 Fraction of energy versus ν, where νﬁrst min = 1.22π = 3.83 is the probe radius
corresponding to the Rayleigh resolution (Eqs. 4.135 and 4.175). This is shown for circular optics with central stops with various fractions b of the optic diameter (b = 0.5 is typical for order sorting in scanning xray microscopes, as shown in Figs. 5.17 and 6.4). The position of the ﬁrst dark ring in the Airy distribution is also shown (this is also shown at right in Fig. 4.33). Finally, this ﬁgure also shows the radial integral of the sinc2 intensity distribution that applies to orthogonal pairs of cylindrical optics, as will be discussed in Section 4.4.5. The numerical values of IR , the integrated intensity at the Rayleigh resolution, are given in Table 4.1, along with the radii νdark ring of the ﬁrst dark ring in the Airy distribution. This ﬁgure is inspired by [Michette 1986, Fig. 8.17].
can be subsequently triggered with light into going back to a ﬂuorescing state [Dickson 1997]. This eventually led to the development of the superresolution techniques of PALM (for photoactivated localization microscopy) [Betzig 2006], STORM (stochastic optical reconstruction microscopy) [Rust 2006], FPALM (ﬂuorescence PALM) [Hess 2006], and their variants.
These and other related advances are summarized in recent review papers [Hell 2007, Yamanaka 2014]. Eric Betzig, Stefan Hell, and William Moerner received the 2014 Nobel Prize in Chemistry for the development of superresolution methods in light microscopy. They have changed the emphasis on high spatial resolution alone in xray microscopy of biological specimens; instead, xray microscopy provides important complementary capabilities such as the ability to obtain isotropic resolution 3D images of the full native, unlabeled content of thicker specimens (Chapter 8 and ptychographic tomography in Chapter 10), and intrinsic chemical and elemental information separate from the use of speciﬁc visiblelight ﬂuorophores (Chapter 9). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
126
Imaging physics
1.2
1.126
Sum
Ƭ
Airy intensity I
1.0
Spot 1
0.8
Spot 2
0.6
0.4
0.2
0.0 5
0
Ƭ
5
10
Figure 4.35 In the Sparrow resolution criterion, two nearby incoherent sources are considered to be resolved when the net intensity proﬁle ﬂattens out in between the sources (its second derivative is zero). For two Airy intensity patterns from unobstructed (b = 0) circular optics, this condition occurs at ν = 0.941π rather than the ν = 1.22π separation corresponding the Rayleigh resolution criterion shown in Fig. 4.29. The corresponding Sparrow resolution formula for an unobstructed lens is given in Eq. 4.177, with Eq. 4.178 giving the equivalent for a lens with a halfdiameter central stop (b = 0.5).
4.4.5
Cylindrical (1D by 1D) optics With visiblelight optics, and when using Fresnel zone plates (Section 5.3) and some compound refractive lenses (Section 5.1.1) for xray focusing, circularly symmetric lenses are employed so the discussions given above are directly applicable. With circular optics, the focus amplitude is the Fourier transform of a circ(r/a) pupil function, which yields an Airy2 (r/a) intensity pattern as shown in Fig. 4.22. However, not all xray optics are circularly symmetric! Kirkpatrick–Baez mirrors (Section 5.2.2), multilayer Laue lenses (Section 5.3.6), and some compound refractive lenses (Section 5.1.1) use separate optics to focus in each of the two directions orthogonal to the beam propagation direction, as shown in Fig. 4.36. The equivalent in visible light optics is the use of cylindrical (rather than spherical) lenses that produce line foci, so that 2D focusing is achieved with two cylindrical lenses arranged orthogonal to each other and to the beam direction as with Kirkpatrick–Baez optics (Fig. 2.1). With such a pair of linefocusing optics, the focus amplitude is the Fourier transform of a rect(x/a x ) · rect(y/ay ) pupil function which yields a sinc2 (x/a x ) sinc2 (y/ay ) intensity pattern (a comparison between fullaperture Airy and sinc functions was shown in 1D in Fig. 4.23). If a square central stop with fractional width b is employed, the focus spot
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
d bd
Vertical lenses
Vertical lens
d
bd Circular lens with optional central stop
Cylindrical lenses with optional central stop
Horizontal lens
Horizontal lenses
d
127
bd Kirkpatrick—Baez, MLL optics
Figure 4.36 The pupil functions of diﬀerent optics along with the intensity distribution in focus. In circular optics (left), one obtains an obstructed Airy2 focus corresponding to an optic with diameter d and a central stop with fractional diameter b (Section 4.4.3). When using orthogonally crossed onedimensional focusing optics such as grazing incidence optics or multilayer Laue lenses, the direct equivalent is to have separate optics (each with diameters d as indicated, and central stops of fractional width b) for focusing in the horizontal and vertical directions, respectively as shown in the middle. For this case (which is almost never encountered with xray optics), the focus intensity distribution is given by Eq. 4.179. With crossed orthogonal compound refractive lenses (CRLs; see Section 5.1.1), the case at middle is what applies except the fractional central stop diameter is usually b = 0, so the intensity proﬁle is as shown at right. Another common arrangement (right) is to have single oﬀaxis optics for each focusing direction; this represents the case for Kirkpatrick–Baez mirror optics (Fig. 2.1, and Section 5.2.2) as well as for most multilayer Laue lenses (MLLs; Section 5.3.6). In this latter case, because there is no central stop, one does not have the interference eﬀect between two focused beams converging from diﬀerent directions (the case of b > 0 in the middle ﬁgure) so there are no enhanced sidelobes oﬀ of the central focus. Instead, the lenses in each direction (horizontal, and vertical) act like fullaperture 1D optics with a full diameter of d as indicated at right. One must still pay careful attention to the alignment of the two 1D oﬀaxis optics [Yan 2017].
instead becomes [Yan 2012] I(X, Y) =
sin(X) sin(Y) sin(bX) sin(bY) 2 I0 − 2 2 X Y X Y (1 − b )
(4.179)
with X = 2πRx/(λ f ) and Y = 2πRy/(λ f ), where R is the distance from the optical axis to the outer aperture of the optic and f is the focal length. (This result is essentially the sinc()2 version of the obstructed Airy function of Eq. 4.176.) Crossed cylindrical optics that are symmetric about the optical axis and have central stops therefore show strong side lobes surrounding the central focus, which one can think of as arising from interference between the two light beams converging on the optical axis from opposite directions (Fig. 4.36). However, in xray optics, this situation almost never arises: compound refractive lenses (CRLs) are symmetric about the optical axis but do not need central stops, while Kirkpatrick–Baez mirrors (KB mirrors) and multilayer Laue lenses (MLLs) are almost always located only on one side of the optical axis, as shown at right in Fig. 4.36. With circular optics, the Rayleigh resolution δr = 0.61λ/N.A. (Eq. 4.173) is deﬁned based on the radial position of the ﬁrst minimum in the diﬀraction pattern of an optic Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
128
Imaging physics
with no central stop (b = 0). As was noted in connection with comparisons of diﬀraction from slits and circular apertures (Fig. 4.23), the position of the ﬁrst diﬀraction minimum for a cylindrical optic is a bit narrower, giving a value for the spatial resolution of λ . (4.180) N.A. However, because the focus proﬁle is not circularly symmetric, Eq. 4.180 cannot be interpreted as providing a precise measure of the spatial resolution in all directions. An alternative therefore is to consider the fraction of focused light intensity contained within various radii. This is shown in Fig. 4.34 and Table 4.1, where one can see that 83 percent of the intensity is contained within a radius corresponding to the central Airy disk for an unobstructed (b = 0) circular optic. It is worthwhile to explicitly review the situation that applies to each of several types of xray optics: δr,cyl 0.5
• Fresnel zone plates are circular optics as shown at left in Fig. 4.36. They have circular optic properties as described in Section 4.4.3, with further detail provided in Section 5.3.1. • Kirkpatrick–Baez and Montel nanofocusing systems (Section 5.2.2) are usually cylindrical halfoptics, and furthermore, they usually do not reach to the center of the optical axis. That means they are represented by the case shown at right in Fig. 4.36. We showed in Section 4.4.2 that Fresnel propagation of the waveﬁeld transmitted through a lens to the focal plane leads to a focal spot characterized by the Fourier transform of the pupil function of the lens. Therefore, with an oﬀaxis halfoptic, the pupil function in each direction is just a simple shifted rect() function so the resulting focal spot is simply a sinc()2 function with no modiﬁcation to account for a central stop and thus no accentuation of the side lobes. However, the N.A. of the optic is now set not by the maximum reﬂection angle (see for example Eq. 5.10), but by the total aperture d of the optic, or (referring to Fig. 5.7) the meridional angle αm from the optic to the image plane. Therefore N.A. = αm /2 or N.A. x,y = d x,y /(2 f x,y ) where f x,y is the focal length in each direction, and the diﬀraction limit to spatial resolution is approximately given by using that N.A. in Eq. 4.180. • Multilayer Laue lenses (Section 5.3.6) are also usually halfoptics located oﬀ of the optical axis, with the same properties of producing a sinc()2 intensity distribution in each direction. The numerical aperture of the optic is again given by N.A. x,y = d x,y /(2 f x,y ) rather than by N.A. = λ/(2drN ) as would have been the case for a Fresnel zone plate (Eq. 5.27). This means the diﬀractionlimited spatial resolution for an MLL will not be given by δr = 1.22drN as one would have expected from Eq. 5.28; instead, one again has a spatial resolution more like that given by using the optic’s N.A. in Eq. 4.180, which is reduced from what one would have expected based on the width of the thinnest layer in the multilayer Laue lens. • Compound refractive lenses (Section 5.1.1) can be fabricated either as circularly symmetric optics [Lengeler 1999b] in which case the usual considerations of circular optics apply, or as orthogonal pairs of cylindrical optics made using eiDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
Coherence width
129
Plane wave
Coherence length
Figure 4.37 Coherence lengths and widths refer to the distances over which there is good phase correlation between wavefronts. Shown here are the directions of coherence lengths and widths, respectively, for a quasimonochromatic plane wave.
ther macroscopic [Snigirev 1996] or nanofabrication [Aristov 2000b] techniques. There is no reason to use central stops with compound refractive lenses, so their focal spot proﬁle is the sinc(X)2 sinc(Y)2 distribution that one obtains from Eq. 4.179 with b = 0. However, absorption limits the transmission at the outer ends of the aperture, so it is diﬃcult to directly apply the results of either Eq. 4.173 (for circularly symmetric optics) or Eq. 4.180 (for orthogonal 1D or cylindrically symmetric optics) to estimate the spatial resolution of compound refractive lenses. Thus we see that it is a bit diﬃcult to apply a single, precise number of the spatial resolution of many xray optics. This will be seen further in Section 4.4.7 and Fig. 4.49. Finally, one must use care in aligning two 1D optics relative to each other [Yan 2017].
4.4.6
Coherence, phase space, and focal spots In our discussion of the diﬀractionlimited point spread function of an optic, we made assumptions of point source wave emitters, and perfect plane waves. We now consider the consequences of imperfection in these assumptions, leading us into a discussion of coherence. We do this in a simpliﬁed treatment, rather than follow the more sophisticated approaches of the Gaussian–Schell model [Coisson 1997] or the Wigner distribution approach [Kim 1986] as applied to synchrotron radiation experiments. Coherence refers to the degree of phase correlation that exists across a waveﬁeld. For a plane wave, we can consider two separate aspects of coherence, as shown in Fig. 4.37: • Coherence length c refers to wavefront positions separated in time, or separated longitudinal positions, over which there is good phase correlation. This is related to the degree of monochromaticity of a wave, so that the coherence length c of an illuminating beam is given by the number λ/Δλ of nicely overlapping waves times the wavelength λ, or E λ2 =λ (4.181) c = Δλ ΔE where Δλ represents the spread of wavelengths in a quasimonochromatic beam
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
130
Imaging physics
p2 2r
Ƨ
p1
Figure 4.38 The van Cittert–Zernike theorem can be used to calculate the degree of mutual coherence between two points when illuminated by an incoherent quasimonochromatic source illuminating an aperture. Shown here is the example of Eq. 4.183 of calculating the degree of mutual coherence μ1,2 between points 1 and 2 separated by an angle θ in the far ﬁeld, when the aperture is a pinhole with diameter 2r.
(see also Eq. 7.4). The coherence length is sometimes referred to as longitudinal coherence. • Coherence width wc refers to wavefront positions separated transversely to the wave propagation direction over which there is good phase correlation. The coherence width is sometimes referred to as transverse coherence. Of course the details of calculating coherence lengths and widths depends both on the degree of mutual coherence that one wishes to achieve, as well as the statistical distribution describing the departures from monochromatic plane waves. We already gained some feel for these issues in Section 4.1.1. One approach for considering spatial coherence is given by the van Cittert–Zernike Theorem [Zernike 1938, van Cittert 1939, van Cittert 1958], which involves integrating wavefront sources over an arbitrary aperture to gauge the degree of partial coherence between two points downstream, as shown in Fig. 4.38. It allows one to calculate the degree of mutual coherence μ1,2 that exists between wavefronts at location 1 versus location 2, and this degree of mutual coherence is equal to the fringe visibility V of V=
Imax − Imin Imax + Imin
(4.182)
in a simple interferometry experiment. If a pinhole aperture of diameter 2r is illuminated with a spatially incoherent but monochromatic source as shown in Fig. 4.38, the van Cittert–Zernike theorem gives a degree of mutual coherence μ1,2 between two points separated by an angle θ of [Born 1999, Eq. 10.4.28] μ1,2 =
2J1 (νμ ) , νμ
(4.183)
2πrθ . λ
(4.184)
with νμ =
The alert reader should sense something familiar here.5 The result of Eq. 4.183 is just 5
To quote the legendary New York Yankees baseball catcher Yogi Berra, “It’s like d´ej`a vu all over again.”
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
131
the same Airy function result that we saw in Eq. 4.132 and Section 4.4.3, and we plotted [2J1 (ν)/ν]2 in Fig. 4.23. To put the van Cittert–Zernike expression for mutual coherence into perspective, let’s consider ﬁrst its relationship to the Heisenberg uncertainty principle in quantum mechanics. This was originally stated [Heisenberg 1927] in terms of the commutation of operators for position xˆ and momentum pˆ as [ xˆ, p] ˆ = i where = h/(2π) (Eq. 3.1). If it is instead stated as the product of the standard deviation in position σ x and in momentum σ p , one arrives at [Griﬃths 2004, Eq. 3.63] σx σ p ≥
2
(4.185)
or, as more commonly written, . (4.186) 2 We saw the energy–time version of Eq. 4.185 in Eq. 3.24. Now let’s consider a nonrelativistic particle with momentum pz = mv, so that it has a de Broglie wavelength from Eq. 3.5 of λ = h/pz . If this particle encounters a slit of width of 2Δx, its position distribution can be measured at a downstream plane, as shown in Fig. 4.39. We would expect from the single slit diﬀraction result of Eq. 4.29 that the ﬁrst zero in its probability distribution is at an angle of θ = λ/(2Δx). Taking this as a measure of uncertainty in momentum in the xˆ direction Δp x , we have (Δx) · (Δp) ≥
Δp x = pz sin(θ) h h λ = = λ 2Δx 2Δx
(4.187)
or h . (4.188) 2 In fact in Eq. 4.104 we found that the intensity distribution for single slit diﬀraction is given by I(ν) = sinc2 (ν) with a ﬁrst minimum at νﬁrst min = π. Thus if (Δx) · (Δp x ) = h/2 corresponds to ν = π, then the Heisenberg uncertainty relationship of (Δx) · (Δp) ≥ /2 corresponds to ν ≤ 1/2. Therefore, if one were to treat the Heisenberg uncertainty relationship as providing a degree of mutual coherence described by the van Cittert– Zernike theorem but for a 1D slit described by sin(ν)/ν = sinc(ν) rather than a 2D pinhole described by 2J1 (ν)/ν, one would have (Δx) · (Δp x ) =
μ12 = sinc(ν = 1/2) =
sin(1/2) = 0.96, 1/2
which is a very strong phase correlation. Of course it is not quite right to treat a slit halfwidth of Δx as being equivalent to σ x , which is the standard deviation of a Gaussian, nor is it quite right to treat the position of the ﬁrst minimum sinc(νﬁrst min ) as being equivalent to σ p (and one can almost hear Wolfgang Pauli’s ghost saying “it is not even wrong!” [Peierls 1960]). But hopefully it is informative. This Heisenberginspired connection to particle mechanics brings to mind Liouville’s theorem in classical mechanics, which states [Goldstein 2002, Sec. 9.9] that a system Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
132
Imaging physics
6x
Ƨ pz
px
Figure 4.39 The Heisenberg uncertainty principle can be thought of as a single slit diﬀraction experiment, where a particle with momentum pz is restricted to a position range 2Δx with a corresponding intensity distribution in angles Δp x /pz .
with a constant Hamiltonian H = T + V (where T is the kinetic energy and V is the potential energy) has a constant volume in phase space, or (Δp) · (Δq) = constant.
(4.189)
If we relate Δp to trajectory angles as we have done in Eq. 4.187, then Liouville’s theorem can be thought of as saying that the size times the divergence of a beam is a constant at any focus, or sθ = s θ
(4.190)
in the notation of simple imaging systems discussed in connection with Eq. 4.142. Therefore one can use optics (which involve no change in the Hamiltonian) to image a beam of particles to a smaller width with larger divergence, or a larger width with smaller divergence. So let’s consider the focus of a light beam in light of the Rayleigh resolution result of Eq. 4.173 of δr = 0.61λ/N.A., and work not with the radius but the diameter contained within the central Airy probe, and not the halfangle N.A. but the full opening angle 2N.A. of the lens. This leads to a fullwidth, fullangle phase space area product p0 of λ ) · (2N.A.) = 2.44λ (4.191) p0 = (2 · 0.61 N.A. for the diﬀractionlimited focus of an aberrationfree lens. We started out this section by wondering about departures due to illumination sources that were not pointlike, and how they would aﬀect the focus of a lens. The situation with very small and very large sources is obvious, as illustrated in Fig. 4.40. But what about the inbetween cases? In scanning microscopes using absorption or ﬂuorescence contrast, the net intensity distribution of the focus spot is the relevant characteristic, and this is given by a convolution (Fig. 4.18) of the diﬀractionlimited focus with the geometrical image of the source. Referring to the geometry of Fig. 4.41, we wish to consider the illumination source size–angle product p of p = h(2θ) = h (2θ )
(4.192)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
133
Image: demagnified source convolved with aperture diffraction
Illumination source Neardiffractionlimited
Sourcelimited
Figure 4.40 Small versus large illumination sources and their eﬀect on the focus of a lens. When the geometric image of the source is small compared to the diﬀractionlimited focus, the focus is almost unaﬀected, whereas with a large source the lens focus will strongly resemble the geometric image of the source.
and how it aﬀects the focus of a lens. Now p0 = 2.44λ describes a geometrical image h with a radius extending out to the position of the ﬁrst Airy minimum, or ν = 1.22π for a circular optic with no central stop. Therefore, p = 1λ
corresponds to
ν = π/2,
(4.193)
so convolution of the point spread function of a lens with a disk extending to a radius of ν = π/2 will produce the p = 1λ result. The results of such a calculation are shown in Figs. 4.42 and 4.47, where one can see that the focus is nearly unaﬀected [Jacobsen 1992b, Winn 2000] for values of the phase space parameter p less than about 1λ. Another way to describe spatial coherence from a radiation source is to think of the case of multimode lasers. While there are details of optical cavities that we shall gloss over here, some lasers emit pure, singlecoherencemode beams (such as a TEM00 mode) while others will emit into multiple modes. In the latter case, one can use a spatial ﬁlter to “clean up the beam” by removing most of the contribution from other modes. These spatial ﬁlters often consist of a microscope objective used to image the incoming laser beam to a small spot with a convergence semiangle θ, and then a pinhole is placed at that spot to limit the beam height h and thus control the phase space area of the beam that passes beyond the spatial ﬁlter. (In synchrotron beamlines, beamline optics and slits can perform the same function; see Section 7.2.2). The number Msource Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
134
Imaging physics
Figure 4.41 Geometry for convolution of an illumination source with the diﬀractionlimited focal spot of a lens with diameter 2δr . The source size h is imaged down to a smaller spot h , and the beam divergence/convergence angles changing in opposite proportion; that is, hθ = h θ .
of incoming spatially coherent modes can be characterized by Msource =
p h(2θ) = λ λ
(4.194)
in each transverse direction, so that Msource = 1 corresponds to p = λ. One can then adjust the size h of the pinhole in the spatial ﬁlter to control the number of modes that are transmitted, with a ﬂuxversuscoherence tradeoﬀ as shown in Figs. 4.42 and 4.47. We now bring this p = 1λ phase space criterion back into the framework of the van Cittert–Zernike theorem result for a pinhole of radius r (Eq. 4.183), which was a function of νμ = 2πrθ /λ with θ as the angular separation between two measurement points. In our case, the pinhole radius is r = h/2 so the value of νμ corresponding to one edge of the optic versus its center (giving θ = θ from Fig. 4.41) is given by νμ = 2π
(h/2)θ hθ rθ = 2π =π . λ λ λ
(4.195)
Substituting p = h(2θ) in the above gives νμ =
πp 2λ
(4.196)
as the argument to give the degree of mutual coherence between the center and edge of the illuminated lens. Therefore, for p = 1λ (for which the focus of a lens maintains neardiﬀractionlimited performance as shown in Fig. 4.42), the degree of mutual coherence between the lens center and edge is μcenter,edge =
2J1 (νμ ) 2J1 (π/2) = 0.72, = νμ π/2
(4.197)
which happens to be similar in magnitude to the value of ηn = 0.73 that we found for the Rayleigh quarter wave criterion for with Gaussian or normally distributed phases in Section 4.1.2. We summarize our discussion of coherence, phase space, and lens focal spots with the following comments: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
135
Normalized to constant source photon areal density
Normalized to constant source energy
4 3 2 1 10 0 15
5
0
5
10
15
4 3 2
Ƭ
1
015 10
5
0
5
10
15
Ƭ
Figure 4.42 Intensity proﬁle of the focus of a lens as a function of illumination source modes Msource as given in Eq. 4.194. The calculation at left is normalized to constant total energy, thus emphasizing the preservation of the sharpness of the focus at Msource 1, or p 1λ (Eq. 4.192). The calculation at right is normalized to constant areal photon density at the source, showing how the total focused ﬂux Φ increases as one opens up an optic’s illumination aperture h. A circular lens with a halfdiameter central stop was used, with a diﬀractionlimited focal proﬁle I(ν) as given by Eq. 4.176 and shown in Fig. 4.31. An earlier version of the ﬁgure at left was shown by Winn et al. [Winn 2000]; see also Fig. 4.47.
• To reach neardiﬀractionlimited focusing with a lens in a scanning microscope, one should restrict the illumination phase space to a product of p = h(2θ) 1λ as shown in Fig. 4.41. This must be done in each orthogonal direction (that is, both in xˆ and yˆ for an illumination beam propagating in the zˆ direction). This is equivalent to saying Msource 1
(4.198)
in each transverse direction. • If one has a source with a phase space area signiﬁcantly larger than p 1λ, one good strategy is to form an intermediate image of the source, and place an aperture at that location to limit the source size h imaged by the scanning microscope’s objective lens. This is what a spatial ﬁlter does. • As will be discussed in Section 7.1.1, xray sources are often characterized in terms of their spectral brightness Bs (Eq. 7.3), which describes the ﬂux per source size per solid angle per spectral bandwidth. The spatially coherent ﬂux Φc within a given spectral bandwidth is given by Φ c = B s · λ2
(4.199)
based on p = 1λ in each transverse direction, which to our knowledge was ﬁrst pointed out independently by Green [Green 1976] and Kondratenko and Skrinsky [Kondratenko 1977]. The “per spectral bandwidth” characteristics of source brightness describe the coherence length c of Eq. 4.181. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
136
Imaging physics
Other modes Other modes
Multimode source
Nanofocus
Central coherent mode
Secondary source aperture
Multimode source
Nanofocus
Central coherent mode
Other modes
L1
L2
L1
L2
L3
L4
Figure 4.43 Onestage and twostage source demagniﬁcation beamline design schemes. These represent two alternative beamline choices for selecting a single coherent mode for nanofocusing experiments. Onestage schemes oﬀer minimal ﬂux loss due to imperfect beamline optics, while twostage schemes oﬀer shorter beamlines and greater control over ﬂux versus resolution tradeoﬀs. Figure adapted from [de Jonge 2014].
• In the latest synchrotron light sources (Section 7.1.4), the electron beam emittance (size·angle product at an electron beam focus) is now decreasing toward or even dipping below the intrinsic photon wavelength λ (these two quantities are combined to yield the net source emittance, as will be shown in Eq. 7.12). These facilities are then referred to as diﬀractionlimited storage rings [Eriksson 2014]. In fact, many synchrotron light sources have already approached or exceeded diﬀractionlimited status with their vertical emittance. However, the horizontal emittance is typically about a hundred times larger than in the vertical due to dispersion of the beam in its nearcircular orbit, and it is only now that the horizontal emittance is being reduced to give Msource 1 in the latest machines. • Another way to refer to a source’s extent in phase space is through the classical optics term of e´ tendue (or e´ tendue g´eom´etrique). • Fullﬁeld imaging systems do not require singlemode illumination. Instead, they can accept about as many spatially coherent modes in each direction as the illumination ﬁeld divided by twice the spatial resolution, as discussed in Section 4.5. They still beneﬁt from source spectral brightness Bs , since it determines the photon ﬂux per resolution element in the image. • The requirement of Msource 1 for highresolution scanning microscopy is similar to the requirement for coherent diﬀraction imaging and ptychography, as will be discussed in Section 10.3.2. One can of course bring much more powerful methods into play to discuss the eﬀects of partial coherence [Goodman 2015, Kim 1986, Coisson 1997, Vartanyants 2016], but the above summary is often suﬃcient for practical experiments. When the radiation source has many spatial modes Msource that must be thrown away in order to achieve a diﬀractionlimited focus, one can be very ﬂexible with the optical design of a beamline; for example, one does not have to be concerned with the tilting of phase space ellipses with propagation distance (as discussed in Section 7.2.2, and shown in Fig. 7.10), which can otherwise lead to a loss of useful ﬂux and an undesired correlation of angle with position in the illumination arriving at apertures or beamline Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
137
optics. With a diﬀractionlimited storage ring, one must be much more careful to preserve the acceptance and coherence of the selected mode; this involves both increased precision of the beamline optics, and also proper optical design. In Fig. 4.43, we show two alternative schemes [de Jonge 2014] for selecting a single coherent mode from a nearly diﬀractionlimited source: • One option (Fig. 4.43A) is to place the nanofocusing optic at some distance from the source, so that it accepts only a single coherent mode and demagniﬁes the source directly. Because no beamline focusing optics or apertures are used between the source and the nanofocusing optic, one does not lose ﬂux or degrade the coherence of the central mode due to imperfections in beamline optics. However, one must then choose the diameter of the nanofocusing optic based on properties of the source; this sets conditions on the optic that may not be optimal due to other considerations, such as minimum focal length due to working distance constraints, or maximum diameter due to thickness limits in multilayer Laue lenses (Section 5.3.6) or ﬁeld diameter limits in electron beam lithography of Fresnel zone plates (Section 5.3). One may also require diﬀerent desired optic diameters in the horizontal and vertical directions, complicating the use of circular zone plates or refractive lenses. This approach tends to lead to long beamlines, with attendant conventional construction costs. For example, to demagnify a 40 μm source (representative of the horizontal source size expected for the next generation of highbrightness storage rings) to a 20 nm spot while using a nanofocusing optic with a convenient focal length of 0.1 meters, one would need to have the distance L1 in Fig. 4.43A be about 200 meters. Finally, oscillations in source position will lead to oscillations in the probe position or, equivalently, pixel position errors in scanning microscopy; oscillations in the source angle will lead to oscillations in focused beam intensity only to the degree that the source has one or a few modes. • Another option (Fig. 4.43B) is to use beamline optics to image the source onto a secondary source aperture, which can be adjusted to pass one or several coherent modes (see Section 7.2.2). This allows the experimenter to make a ﬂuxversusresolution tradeoﬀ if the source has multiple coherence modes, and it also allows one to reject beamlineopticscaused degradations of the coherence of a single mode (spatial ﬁlters in visiblelight laser laboratories work on the same principle by placing a pinhole at a lens focus) at the cost of a further loss of ﬂux. Oscillations in either source position or angle would lead to oscillations in the focused beam intensity, which can in principle be corrected using a mostly transparent beam ﬂux monitor (such as a thin diamond ﬁlm) after the secondary source aperture. One can also adjust the secondary source aperture’s diameter, and distance L3 to the nanofocusing optic (Fig. 4.43B), so as to accommodate a desired nanofocusing optic diameter. By using twostage demagniﬁcation, one can design a shorter beamline with lower conventional construction costs; for example, if one chose L1 = 30 m due to the distance at which a ﬁrst optic can be placed after an accelerator shield wall and L2 = 3 m, one can demagnify a 40 μm source to 4 μm at the point of the secondary source aperture, and then if L3 = 20 m to work Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
138
Imaging physics
with a nanofocusing optic focal length of L4 =0.1 m, one will have demagniﬁed a 40 μm source to a 20 nm focus in a total beamline length of only 53.1 m. One of the engineering challenges in this approach is to make highquality, controllable apertures of just a few micrometers in width and with the ability to handle high power densities (Section 7.2.3), while another is to not add “phase noise” via imperfections in the refocusing optic. The high conventional construction costs associated with long beamlines can be balanced against the cost of optics and slits for the twostage scheme. As will be seen in Section 7.1.6, undulators in synchrotron light sources produce radiation with a spectral bandwidth of ΔE/E 1/Nu where Nu , is the number of magnetic periods in the undulator (Nu 100 in most cases). This bandwidth can be broadened by angular divergence of the source, so that diﬀractionlimited storage rings will provide improved spectral purity which can also be exploited to yield further gains in the ﬂux of a nanofocused beam. Rather than doing spectral ﬁltering with a crystal monochromator with a bandpass of order of 0.1–1 eV, one could use a multilayercoated nanofocusing mirror (Section 5.2.4), or a multilayercoated deﬂecting mirror (Section 4.2.4), or a double multilayer monochromator (Section 7.2.1) to select the entire approximately 1 percent spectral bandwidth of the undulator’s harmonic output. Smallerdiameter Fresnel zone plates have only 100–200 zones, and again they could use the entire spectral output of the source, as could a conventional nonmultilayercoated Kirkpatrick–Baez reﬂective nanofocusing system (Section 5.2.2).
4.4.7
Transfer functions In Fig. 4.19 we introduced the idea that one can take the Fourier transform of an image and learn about its distribution of information in reciprocal space. Now consider the fact that the Fourier transform takes a signal and represents it as some linear combination of sine waves with diﬀerent frequencies, complex amplitudes, and phase oﬀsets. With this picture in mind, we can represent the object which is imaged by its Fourier decomposition, where the object is represented by a set of diﬀraction gratings of various periodicities, orientations, and strengths. We can then consider how an imaging system transfers information from these gratings of diﬀerent spatial frequency. This is precisely the approach put forward by the great German optical physicist Ernst Abbe (1840–1905). More detailed treatments are provided elsewhere (see for example [Goodman 2017]); we provide here a more conceptual summary. Let us start by considering a coherent plane wave incident on a grating of period d, which is then imaged by an objective lens, as shown in Fig. 4.44. As noted in the discussion of Eqs. 4.149 and 4.150, xray microscopes are often used for large demagniﬁcation so we will consider this grating to be located at a small distance Δz from the focal length of the lens, so that the semiangle subtended by the lens is very nearly the numerical aperture N.A.. As one decreases the grating period d, the +1 order diﬀraction angle θ λ/d increases until it reaches the numerical aperture N.A. of the lens, at which point the diﬀracted ray is no longer collected by the lens and imaged (the same happens
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
139
Ray position in lens aperture Grating with periodicity
d
Ƨ
Objective lens (imaging lens)
+ud
Spatial frequency
ud
1
Contrast transfer function
Figure 4.44 Optical transfer function for coherent imaging as a function of spatial frequency u
normalized to ud of Eq. 4.200. If a purely parallel wave is incident on a grating with period d, diﬀracted rays are captured by the lens and imaged as long as the grating spatial frequency does not exceed uN = N.A./λ (Eq. 4.200) and the grating halfperiod is larger than or equal to Δmin,coherent = λ/(2N.A.) (Eq. 4.201).
with the −1 diﬀraction order). Thus the maximum spatial frequency ud from the grating captured by the coherent imaging system is ud =
1 N.A. = dmin λ
(4.200)
and the ﬁnest halfperiod feature that can be resolved has a width of λ dmin = . (4.201) 2 2N.A. All spatial frequencies within the range −ud to +ud are collected with unit eﬃciency, so we speak of this coherent imaging system as having an optical transfer function (OTF) of 1 within the range −uN to +uN , as shown in Fig. 4.44. When considering an overall imaging with imperfections in detectors or other components, one can also speak of an overall system contrast transfer function (CTF), or the unsigned version (for reasons that will become clear when we discuss phase contrast in Section 4.7) as the modulation transfer function (MTF). Now let us consider an imaging system as shown in Fig. 4.45, where a condenser lens is used to image a source of illumination onto a grating after which an objective lens again collects light to deliver an image (see also Fig. 1.1). Consider a ray from the upper edge of the illuminating lens that travels at a negative angle (and thus negative spatial frequency u = θ/λ). It can be diﬀracted by a grating of period d to the maximum positive spatial frequency that can be collected by the objective lens. From Eq. 4.30, which becomes 2d θ = λ with θ = N.A., one can see that the highest spatial frequency Δxmin,coherent =
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
140
Imaging physics
Condenser lens (illuminating lens)
Condenser u Objective
+u
Ƨ
Objective lens (imaging lens)
Ƨ
Transfer function
2
Grating with periodicity dȾ
0
2
u/ud
Figure 4.45 Optical transfer function for incoherent imaging as a function of spatial frequency u
normalized to ud of Eq. 4.200. The ﬁnest grating period d that can be detected is one where a ray comes from one extreme angle from the condenser or illuminating lens, and is diﬀracted to the opposite extreme angle and just collected by the objective or imaging lens. As shown at right, the range of collection angles in the orthogonal direction is reduced in this case, and the optical transfer function (OTF; Eq. 4.203) is determined by the degree of overlap between these apertures in reciprocal space. With an incoherent imaging system, there is some (decreasing) transfer for spatial frequencies twice as high as the limit ud (Eq. 4.200) for coherent imaging systems.
ud that can be transferred from the object through to the image is ud =
1 2N.A. = 2ud . = d λ
(4.202)
That is, one can see features of half the size with incoherent brightﬁeld imaging with critical illumination as compared to coherent illumination. However, when collecting the most extreme rays in one direction, the lens apertures become very narrow in the orthogonal direction; the degree of overlap between the illuminating and collecting apertures (the condenser and the objective lens, respectively) becomes very small. As a result, the OTF for the ﬁnest detectable periodicities becomes small. One can in fact calculate the OTF for incoherent brightﬁeld imaging from the degree of overlap of these apertures, leading to a result [Goodman 2017, Eq. 733] of % u &2 % u & 2 u OTFincoherent (u) = 1− , (4.203) − arccos π 2ud 2ud 2ud which is shown in Fig. 4.46. One can arrive at the OTF of an imaging system from another approach: by calculating the normalized autocorrelation function of the amplitude transfer function [Goodman 2017, Eq. 729]. For brightﬁeld incoherent imaging, which is the case that described absorption contrast in a fullﬁeld microscope with critical illumination, a scanning microscope with a large area transmitted ﬂux detector, or a scanning microscope with ﬂuorescence detection, this means that the OTF can be calculated from a Fourier transform Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
Coherent
1.0
nt; er e op oh l st In c e n tr a c no
Transfer function value
141
0.8
0.6
0.4
Incoherent; b=0.5 central stop
0.2
0.0 2
1
0
1
2
f/fd Figure 4.46 Optical transfer function for coherent and incoherent imaging with optics with no central stop, and also the incoherent transfer function for an optic with a halfdiameter (b = 1/2) central stop.
of the intensity point spread function (Eq. 4.174), or OTFincoherent (u x , uy ) = F {psf(x, y)}.
(4.204)
One might not think this so useful since Eq. 4.203 is easy to calculate and plot, but the real virtue of this approach comes when one consideres modiﬁcations to a standard point spread function such as due to partial coherence of the illumination (Fig. 4.47), or optics with central stops (Fig. 4.48), or defocus eﬀects, as will be discussed in the next section. Our discussion here is aimed at providing just a taste of OTFs. The full meal includes servings of bilinear transfer functions [Saleh 1979, Courjon 1987], partial coherence [Hopkins 1951, Hopkins 1957], and treatments using the Wigner distribution function [Bastiaans 1986]. In Section 4.5, we will see how the N.A. of the condenser lens aﬀects the OTF.
4.4.8
Deconvolution: correcting for the transfer function For incoherent brightﬁeld imaging, the image delivered by a microscope can be described by a convolution of the microscope’s point spread function psf(x, y) with the transmittance o(x, y) of the object being imaged, yielding an image i(x, y) of i(x, y) = o(x, y) ∗ psf(x, y) = F −1 {O(u x , uy ) · OTF(u x , uy )},
(4.205)
where we have made use of the properties of convolution given in Eq. 4.83 and the result of Eq. 4.204 that the OTF is a Fourier transform of the point spread function. The expression of Eq. 4.205 can be rearranged to give + , F {i(x, y)} −1 o(x, y) = F , (4.206) OTF(u x , uy ) Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
142
Imaging physics
No central stop (b=0)
Halfdiameter central stop (b=0.5) 4.00
4.00 0.1
0.1
3.50
3.50 3.00
0.1
3.00
0.5
0.5
0.1
2.50
2.50
2.00
2.00 1.50
1.50 0.1 0.4 0.2 0.5 0.3
1.00 0.50 0.00 2.0
0.50
0.3 0.5 0.1 0.2 0.4
1.5
1.0
0.1 0.5 0.3 0.2
1.00
0.5
0.0
0.5
1.0
1.5
2.0
1.5
0.4
0.3 0.5 0.4
0.1 0.2
0.00 2.0
1.0
0.5
u/ud
0.0
0.5
1.0
1.5
2.0
u/ud
Figure 4.47 Optical transfer function (OTF) for incoherent brightﬁeld imaging as a function of illumination source modes Msource (Eq. 4.194), or illumination phase space p as given in Eq. 4.192, and spatial frequency u normalized to ud of Eq. 4.200. The OTF is shown for a fulldiameter optic (left), and one with a halfdiameter central stop (right). In this combination of grayscale image and contour lines, the OTF for a fully coherent source (Msource = 0) and fulldiameter optic as plotted in Fig. 4.46 is shown here as a height out of the paper along the abscissa. These contour images are the OTF representation of the same information shown in Fig. 4.42. 1.00 0.05
Central stop fraction b
0.05
0.80
0.1 0.1
0.60 0.2 0.2
0.40 0.3 0.4 0.5
0.20 0.1 0.2 0.4 0.5 0.05 0.3
0.00 2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0
u/ud Figure 4.48 Optical transfer function of a lens versus the fractional diameter b of the central stop. With large central stops, the midrange spatial frequencies u are increasingly lost due to the large side lobes in the point spread function. See also Figs. 4.31 and 4.33.
which makes it seem deceptively easy to recover the true object without the blurring eﬀects of the lens. One would indeed be deceived to apply Eq. 4.206 directly! To understand why, consider two results shown earlier: the fact that image signals tend to drop oﬀ with spatial frequency as ∝ u−a , as shown in Fig. 4.19, and that OTFs decrease towards very small values at high spatial frequencies, as shown for example in Fig. 4.46. As a Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
50
25
6
10 Ca
105
102 1
104 10
Ca
3
102
101
MTF
K
10
S
500
(b) 3
0
+DOISHULRGQP 100
50
25
2.0
Deconvolution Filter
10 Intensity (a.u.)
+DOISHULRGQP 100
500
(a)
143
1.0 Ca
0.1 S
N
K
101 0.01
0.5
1
5
10
Radial spatial frequency urѥP1) (d) Original
20
1
5
10
20
Radial spatial frequency urѥP1) (e) Deconvolved
(c) Probe (PSF)
ѥP
Ca fluorescence
ѥP
Figure 4.49 The most basic way to deconvolve the blurring due to an optical system’s point spread function from an image is to apply a Fourierplane ﬁlter which is the inverse of the optical transfer function (OTF), as shown in Eq. 4.206. However, because the OTF reaches low values at high spatial frequencies u, this could lead to a magniﬁcation of noise. In this example, power spectra (a) (Section 4.3.4) were obtained from xray ﬂuorescence images (Section 9.2) with diﬀerent signal levels due to intrinsic concentrations of the elements sulfur, potassium, and calcium within a frozen hydrated alga (see Fig. 12.2 for a related image). Since each element’s power spectrum had a separate signal dependence S ∝ u−a r and “noise ﬂoor” N, one can ﬁnd for each element a value of the spatial frequency ur,S=N where the signal trend reaches the noise ﬂoor (at a falue of ur,S=N = 5.2 μm−1 for Ca in this example). This gives a reasonably good estimate of the spatial resolution δr,S=N of that element’s image by using Eq.4.251, giving δr,S=N = 96 nm in this example. One can also use each element’s signal trend and noise ﬂoor to calculate a separate Wiener ﬁlter W(ur ) (Eq. 4.207) for each element Z which can be combined with the modulation transfer function (MTF) to calculate an elementspeciﬁc deconvolution ﬁlter D(u x , uy ) (Eq. 4.209) of which azimuthal averages of D(ur ) are shown in the upperright plot (b). To obtain the MTF, the intensity probe function psf(x, y) shown in the lowerleft image (c) was reconstructed along with a transmission image using ptychography (Section 10.4). The lowercenter image (d) shows a calcium xray ﬂuorescence image from the alga, while the lowerright image (e) shows this image with the calcium deconvolution ﬁlter D(u x , uy ) applied. Figure modiﬁed from [Deng 2017b], which shows deconvolved images for other elements as well.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
144
Imaging physics
result, the unthinking application of Eq. 4.206 to object deconvolution in the presence of any noise in the measured image would lead to a rather unsatisfactory situation: the weak, highfrequency parts of the image signal – which would likely include signiﬁcant noise – would be divided by a very small number, thus multiplying the noise relative to the “good” signal at lower spatial frequencies. A good way to avoid this problem is to incorporate a Wiener ﬁlter [Wiener 1949] W(u) which is given by W(u) =
S (u)2 , S (u)2 + N(u)2
(4.207)
based on an estimate of the trend of signal S (u)2 and noise N(u)2 powers as a function of spatial frequency u, as discussed in Section 4.3.4 [Press 2007, Eq. 13.3.6]. In images where the main source of noise is due to photon statistics (as discussed in Section 4.8), the noise will be uncorrelated from one pixel to the next, so that noise will have the characteristics of a Dirac delta function which has a Fourier transform that is uniform across all spatial frequencies (Eq. 4.85). As a result, one can usually construct a Wiener ﬁlter W(ur ) rather easily from the trend of S (ur )2 ∝ u−a r and a ﬂat “noise ﬂoor” as shown in Fig. 4.49. This leads to a modiﬁcation of Eq. 4.206 for recovering the object o(x, y) to . (4.208) o(x, y) = F −1 F {i(x, y)} · D(u x , uy ) , where D(u) is a Wiener deconvolution ﬁlter function of D(u x , uy ) =
S (ur )2 1 · , OTF(u x , uy ) S (ur )2 + N(ur )2
where of course ur =
u2x + u2y .
(4.209)
(4.210)
Since xray microscope images are often photonstatisticslimited, image signal considerations limit the resolution gain that deconvolution can provide [Jacobsen 1991, Deng 2015c], as shown in the example of Fig. 4.49. Alternative approaches to deconvolution involve methods to recover the “true” object seek to incorporate constraints known, or assumed, to apply to its characteristics. These approaches include computer optimization methods as discussed in Section 8.2.1, including those that explicitly account for Poisson noise models such as the maximumlikelihood/estimation maximization (MLEM) algorithm discussed in Section 8.2.2.
4.4.9
Depth resolution and depth of ﬁeld With the exception of the brief discussion of multislice propagation in Section 4.3.9 and the Ewald sphere sampling conditions discussed in Section 4.2.5, up until now we have restricted our discussion to twodimensional objects imaged infocus. For a circular lens, the intensity distribution I(z) along the focal distance goes like [Linfoot 1953,
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
)
145
No central stop (b=0) 3
2
2
1
) Defocus (fraction of
Halfdiameter central stop (b=0.5)
3
1
Beam direction
Intensity point spread function
0
0
1
1
2
2 3
3 25 20 15 10 5
0
Ƭ
5
Defocus (fraction of
1
0.1
2
0.05
1
1 0.1 0.05
10 15 20 25
0.05
2
0.4
5
3
transfer function
0.2 0.3
0
Ƭ
0.05 reversal Contrast
3 modulus of optical
0
25 20 15 10 5
10 15 20 25
0.1
0.5 0.5 0.4 0.3
0.2
0.05
0.4
0.3
0 1
0.2 0.1
0.4
0.5
0.1
0.05
2
2
0.2
0.5 0.3
0.05
3
3
0.05
2.0
2.0 2.0
0.1
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0
u/ud
u/ud
Figure 4.50 Point spread function (PSF; top) and modulation transfer function (MTF; bottom) versus defocus from a circular lens with no central stop (b = 0; left) and with a halfdiameter central stop (b = 0.5; right). These ﬁgures show how the depth of ﬁeld for absorption or ﬂuorescence contrast xray imaging is 2δz . For the point spread function, the transverse positions are shown in the normalized coordinate ν of Eq. 4.175 where the Rayleigh resolution is at ν = 1.22π = 3.83, while the defocus distances Δz are shown as a fraction of the longitudinal resolution δz of Eq. 4.213, with cz = 1. The MTF is shown as a grayscale image with contour lines superimposed, and the transverse spatial frequency coordinate is shown as a fraction of the coherent imaging cutoﬀ frequency ud of Eq. 4.200. The side lobes of the Airy pattern lead to defocusbased contrast reversals at certain midrange spatial frequencies, as described in Section 4.4.9. These simulations recreate earlier published results [Wang 2000].
Eq. 9]
% & 2 sin u(1 − b2 ) I(z) ∝ u
(4.211)
with u≡
π N.A.2 z 2 λ
(4.212)
and with b as the central stop fraction like in Eq. 4.176. The Rayleigh criterion (Section 4.4.3) used the position of the ﬁrst minimum of the Airy intensity [2J1 (ν)/ν]2 as the measure of the transverse resolution. The ﬁrst minimum of the longitudinal intensity distribution of Eq. 4.211 occurs when u = π, giving a suggested longitudinal resolution of 2λ/N.A.2 . In fact, a more realistic criterion is to deﬁne the depth resolution δz as half Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
146
Imaging physics
this value, or λ , (4.213) N.A.2 with cz = 1. This choice is illustrated ﬁrst in Fig. 4.50, which shows at top the intensity point spread function for a circular lens as a function of the radial parameter ν and the longitudinal or depth parameter δz as deﬁned in Eq. 4.213. These point spread functions show a ﬁrst minimum in the axial intensity distribution at 2λ/N.A.2 as expected (and furthermore they show that the longitudinal resolution is decreased as one introduces a central stop to the optic, or b 0). At the bottom is shown the modulus of the OTF (the MTF) for the various defocus distances as calculated using the following procedure: δz = cz
1. The Airy amplitude of the lens focus was calculated using the nonsquared version of Eq. 4.174. 2. This amplitude was propagated by the speciﬁed defocus distance using the neardistance approach of Eq. 4.109. 3. The defocused probe was multiplied by its complex conjugate to obtain the intensity I(x, y, Δz), since that is the relevant probe function for absorption or ﬂuorescence contrast. 4. The optical transfer function OTF(u x , uy ) was obtained by Fourier transform of this defocused probe intensity, as described in Section 4.4.7. 5. The center row of this 2D pattern was extracted to ﬁll in this defocus distance row in the 2D array OTF(u x , Δz). The spatial frequencies were then scaled according to the coherent transfer function cutoﬀ frequency ud of Eq. 4.200 with a maximum incoherent frequency of u/ud = 2, as was shown in Eq. 4.202, and the defocus distances were scaled according to the longitudinal resolution δz of Eq. 4.213 with cz = 1. With the PSF and MTF functions thus visualized, it is clear that the criterion of δz = cz λ/N.A.2 of Eq. 4.213 with cz 1 indeed describes the loss of image contrast versus defocus distance better than 2λ/N.A.2 does. The depth of ﬁeld is twice the depth resolution, as discussed in Box 4.7. Therefore the longitudinal dependence of the point spread function of Eq. 4.211 leads to a DOF of λ , (4.214) N.A.2 with cz 1. It has been suggested that optics with a central stop fraction b would have an extended depth of ﬁeld [Welford 1960b]. However, the optical transfer function with b = 0.5 does not show an increase in depth of ﬁeld in Fig. 4.50 because of the defocus eﬀects on the side lobes of the point spread function. As the defocus is increased to a point where the side lobes of the Airy pattern approach the central lobe in integrated intensity, one can have a reversal of image contrast for spatial frequencies that somewhat match the inverse length scale between these side lobes (recall that the side lobes of the Airy intensity pattern have alternating positive and negative amplitudes; see Fig. 4.22). This leads to contrast reversals at midrange spatial frequencies for certain values of defocus; these regions are indicated in Fig. 4.50. The DOF = 2δz = 2cz
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.4 Imaging systems
147
Box 4.7 Depth resolution and depth of ﬁeld We use depth resolution δz in the spirit of the Rayleigh criterion for transverse resolution δr , as shown in Fig. 4.29: the minimum displacement of object B relative to an object A at which one can see that there are two objects, not one. With the Rayleigh resolution in the transverse direction, one could also set object C to be on the other side of object A; if one pulls objects B and C in just closer than the Rayleigh resolution δr , one has a depth of ﬁeld 2δr in which one cannot (by strict application of the Rayleigh resolution criterion) distinguish the image as being of one object versus three. It is for this reason that we deﬁne the depth of ﬁeld as twice the depth resolution, or DOF = 2δz as in Eq. 4.214. In our thinking, the depth of focus and depth of ﬁeld are two phrases for the same thing, except the term “depth of ﬁeld” is more applicable to lensless imaging (Chapter 10, and in particular Section 10.5).
eﬀect is perhaps easier to visualize from an image rather than from a transfer function, as shown in Fig. 4.51 for which the simulations were done in the following manner: 1. The Airy amplitude of the lens focus was calculated as described above. 2. This amplitude was propagated by the speciﬁed defocus distance as described above. 3. The defocused intensity point spread function was calculated for absorption or ﬂuorescence contrast imaging as described above. 4. This defocused intensity point spread function was convolved with an object consisting of several absorptive bar structures shown at top in Fig. 4.51 to yield a defocused 2D image I(x, y, Δz). 5. The center row of this defocused image was extracted to ﬁll in this defocus distance row in a sort of “longitudinal image” I(x, Δz), with distances x scaled to the Rayleigh resolution δr of Eq. 4.173, and defocus distances Δz scaled to δz of Eq. 4.213, with cz = 1. Figure 4.51 reinforces the idea that the DOF is 2δz = 2cz λ/N.A.2 with cz = 1, as given in Eq. 4.214. The value of DOF = 2δz , as given in Eq. 4.214, is an important consideration when carrying out nanotomography in xray microscopes (Chapter 8). Conventional tomography assumes that each image taken from a particular viewing angle of an object represents a pure projection through the object, leading to the Radon transform of Eq. 8.2. If an object extends beyond the DOF, one violates this assumption as some parts of the object will no longer be viewed infocus in a particular projection image. From the definitions of transverse resolution of Eq. 4.173 and DOF of Eq. 4.214, one can rewrite the depth of ﬁeld as δr 2cz δ2r DOF = 2δz = 5.4cz δr , (4.215) λ 0.612 λ where we suggest that cz = 1 (this suggestion is in good agreement with experiments [Wang 2000]). That is, as the transverse resolution approaches the xray wavelength, the DOF approaches the transverse resolution, as can be seen in Fig. 4.52. This is the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
148
Imaging physics
Bar widths: 0.5, 1.0, 2.0 times Rayleigh resolution
15
10
5
0
5
Distance (fraction of
)
No central stop (b=0.0)
Defocus (fraction of
3
10
15
) Halfdiameter central stop (b=0.5)
3
Contrast reversal
2
2
1
1
0
0
1
1
2
2
3
3 15
10
5
0
5
Distance (fraction of
10
)
15
15
10
5
0
5
Distance (fraction of
10
15
)
Figure 4.51 Images of various feature sizes versus defocus. An object (top) consisting of absorptive bars of width of 0.5, 1.0, and 2.0 times the Rayleigh resolution of Eq. 4.173 was convolved with a defocused point spread function, and the center row of the image was extracted to form this image I(x, Δz). The transverse positions across the image are scaled in terms of the Rayleigh resolution δr , and the defocus positions are scaled according to δz of Eq. 4.213. These simulations recreate earlier results [Wang 2000].
reason that some soft xray tomography experiments use zone plates not with the highest possible spatial resolution, but with a slightly reduced value (or Fresnel zone plates with larger zone width, like δrN = 45 nm, even if δrN = 20 nm zone plates might be otherwise available). Otherwise, if Eq. 4.215 indicates that the maximum allowable sample thickness is about 1 μm or less, one might consider the relative merits of soft xray tomography relative to tomography in the electron microscope, as will be discussed in Section 4.10. If one is able to take a series of images through the depth extent of the specimen, there are computational methods to combine the sharpest information from each image to yield a projection with depthofﬁeld eﬀects reduced [Jochum 1988, Liu 2012b, Sp¨ath 2014, Selin 2015, Ot´on 2016]. The discussion above applies to lensbased imaging. What about imaging methods where it is not the lens, but the maximum recorded coherent scattering angle from the specimen that sets the resolution δr ? This was illustrated in Fig. 4.16 in Section 4.2.5, and we arrived at the expression DOFEwald = λ/N.A.2 in Eq. 4.60, as well as DOFEwald = 4Δ2r /λ in Eq. 4.63. We also noted other expressions in the literature that diﬀered by a factor of 2 in either direction. More detailed numerical simulations carried out in the context of multislice ptychography [Tsai 2016] have arrived at an empirical result of 5.2(δr )2 /λ, in very close agreement with Eq. 4.215 with cz = 1. Coherent imaging of samples that are thicker than the DOF (thus violating the pure projection approximation) is discussed further in Section 10.5. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.5 Fullﬁeld imaging
149
Q P N N H9 H 9 Ѥ Ѥ N H9 Q Ѥ Q P P QP N H9 Ѥ QP
10
H9
Ѥ
100
N
'HSWKRIILHOGLQѥP
1000
1
0.1
0.01 1
10
100
Transverse resolution
1000
in nm
Figure 4.52 Depth of ﬁeld (DOF) versus transverse resolution for diﬀerent xray energies. The DOF of Eq. 4.215 represents a thickness limit for simple 3D imaging using an xray lens to deliver a pure projection image onto a lowerresolution pixelated detector, and this limit is more challenging to work within at lower xray energies. The energy of 0.29 keV corresponds to the carbon K absorption edge, which is popular for spectromicroscopy studies of speciation in organic materials (Section 9.1), while the 0.54 keV xray energy is at the high energy end of the “water window” for highcontrast imaging of organic materials in water, between the carbon and oxygen K absorption edges.
4.5
Fullﬁeld imaging The discussion of illumination phase space and partial coherence of Section 4.4.6 was appropriate for scanning microscopes, where an optic is used to produce a small focus spot. With fullﬁeld imaging microscopes, the situation is somewhat diﬀerent: • Every resolved region in the object involves a coherent phase space of the Rayleigh resolution (the resolved region radius, rather than diameter) times the full opening angle of the lens, or 1.22λ rather than 2.44λ as given by Eq. 4.191. • A Rayleigh resolution region only needs to have phase correlation with its immediate n neighbors, where there is overlap of the Airy pattern (usually n = 2–3). However, there does not need to be any phase correlation with regions several multiples of m away. Thus each grouping of n pixels can make use of a diﬀerent coherent mode, so that a transmission xray microscope (TXM) can accept a number of coherence modes MTXM of the image ﬁeld of view w divided by n times the Rayleigh resolution δr , or MTXM =
w nδr
(4.216)
coherent modes in each transverse direction { xˆ, yˆ }. This is easier to achieve with Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
150
Imaging physics
1.7 0.8
0.7 0.61
0.6
b=0.00
b=0 .2 b=0 0 .4 b= 0 b= 0.60 0. 80
0.5
1.6 0.7 0.61 0.6
1.5
0.5 1.4
Best ratio
Rayleighlike resolution factor k
Rayleighlike resolution factor k
0.8
0.4
0.4
1.3 0
1
2
3
0.0
0.2
0.4 0.6 0.8 Condenser obstructed fraction b
1.0
Figure 4.53 Image resolution versus the ratio m = N.A.cond /N.A.obj of condenser to objective numerical apertures. What is shown at left is the factor k corresponding to the usual 0.61 in the Rayleigh resolution of δr = 0.61λ/N.A. of Eq. 4.173. The factor k is plotted as a function of m for condensers with various central stop fractions b (the same parameter was used to characterize objective lens central stops in Eq. 4.176). The dots in the ﬁgure at left are at the best value of k, and the condenser ratio m at which it was achieved, for each condenser stop size b; those values are plotted at right as a function of b. As the condenser aperture is increased, the objective aperture can accept diﬀraction from slightly ﬁner grating periods (see Fig. 4.45). However, partial coherence considerations lead to a maximum improvement at a ratio of m of about 1.5. Calculation based on an approach described by Hopkins and Barham [Hopkins 1950] as extended for condensers with central stops by McKechnie [McKechnie 1972].
laboratory sources, which generally have large areas and emit into a solid angle of 2π (thus ﬁlling a large number M of modes). With synchrotron light sources, most beamline optics (Section 7.2) do not deliver such a large source phase space or e´ tendue to an endstation, and one must often use a phase diﬀuser [Uesugi 2006] (such as a rotating piece of paper with thickness t variations to produce random phase variations ϕ = kδt, as given by Eq. 3.69), or a wobbling condenser lens [Rudati 2011] to spread the illumination out into a suﬃciently large phase space area. This is especially true with undulator sources (Section 7.1.6), which are intrinsically small phase space sources at modern synchrotron light sources. • Fullﬁeld imaging systems thus depend both on the total ﬂux delivered by the source into a large phase space area (large ﬁeld of view), and also the spectral brightness of the source (short exposure time per pixel). It is for the above reasons that some fullﬁeld imaging microscopes at synchrotron light sources are operated at bending magnets for ease in obtaining illumination over large ﬁelds of view, while others use undulator sources for the fastest possible exposure time in smaller ﬁelds of view. As Fig. 4.45 indicated, the numerical aperture of the condenser lens N.A.cond can play a role in the ﬁnest periodicity that one can see in a fullﬁeld imaging system. This would suggest that one can image even higher spatial frequencies than the cutoﬀ of ud = 2ud of Eq. 4.202 by using a condenser with a higher N.A.cond , as more extreme illumination angles are diﬀracted into the acceptance aperture N.A.obj of the objective Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.5 Fullﬁeld imaging
m=0.2
m= 0
.8
0.8
0.6
5 0. m=
Apparent transfer function T(f)
1.0
151
m=N.A.cond/N.A.obj
m=1.5 m=1.0
0.4
0.2
0.0 0.0
0.5
1.0
1.5
2.0
u/ud Figure 4.54 Optical transfer function as function of the ratio m = N.A.cond /N.A.obj of the condensertoobjective numerical aperture. Shown here is the “apparent” transfer function for a sine wave [Becherer 1967, Eqs. 58, 59, 61]. Because the response of a partially coherent imaging system is not linear, the assumption of carrying out a simple Fourier decomposition of the object and applying a transfer function to the various spatial frequency components is not fully valid.
lens. What that simple picture neglects is that there can start to be oﬀsetting contributions of brightﬁeld and darkﬁeld (Section 4.6) signals in this situation, leading to a reduction of contrast and non linear response. In Fig. 4.53, we show one measure of this eﬀect as calculated by Hopkins and Barham [Hopkins 1950] and as extended for condensers with central stops by McKechnie [McKechnie 1972] (this eﬀect is also shown via numerical calculations in a later paper by Jochum and MeyerIlse [Jochum 1995]). This calculation shows the equivalent k of the 0.61 factor in the Rayleigh resolution formula of δr = 0.61λ/N.A. of Eq. 4.173, as one changes both the condensertoobjective aperture ratio m = N.A.cond /N.A.obj , and as one changes the condenser central stop fraction b. To understand the calculation, ﬁrst recall that Fig. 4.29 shows that the intensity at the image positon between two point objects drops to 73.5 percent of its maximum when the objects are separated by a distance equal to the Rayleigh resolution. One can calculate the intensity distribution between two point objects with diﬀerent values of b and m [McKechnie 1972], and ﬁnd the separation where the “dip” in the center drops to 73.5 percent of its maximum; this gives the Rayleighlike resolution factor k which is plotted in Fig. 4.53. As can be seen, it is advantageous to use a condenser numerical aperture that is about 1.5 times larger than the objective numerical aperture, or m = N.A.cond /N.A.obj 1.5. Another way to think of the resolution of a fullﬁeld microscope as a function of the condensertoobjective aperture ratio m = N.A.cond /N.A.obj is in terms of the OTF. However, the nonlinearity of image response noted above means that the basic assumption of applying a transfer function to the Fourier decomposition of an object is not directly applicable. Even so, one can calculate an “apparent transfer function” T ( f ) for a sinewave object [Becherer 1967], as is shown in Fig. 4.54, and from it one can Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
152
Imaging physics
Direct illumination
Condenser Objective
N.A.cond Illumination source
N.A.obj
Object plane
Köhler illumination
Condenser Objective
Illumination source
Object plane
Condenser aperture
Figure 4.55 Critical and K¨ohler illumination in fullﬁeld imaging. When imaging the source
directly onto the object in the critical illumination method, any imperfections in the uniformity of light output from the source are imaged directly onto the specimen. In K¨ohler illumination [K¨ohler 1893, K¨ohler 1894] the source is imaged onto the back focal plane of the condenser; positions at this plane become plane wave illumination directions at the object so that the source is not directly imaged onto the object. This comes, however, at the cost of a more complicated optical system. The relative numerical apertures of the condenser lens N.A.cond and objective lens N.A.obj are also shown.
see that the response at low frequencies is enhanced when one uses a smaller condenser numerical aperture. This has been exploited in modern zone plate xray microscopes [Schneider 1998a, Schneider 2010]. Besides considering just the numerical aperture of the condenser, one can also make choices in how the illumination source is transferred to the object, as is shown in Fig. 4.55. With critical illumination, the illumination source is imaged directly onto the object. With K¨ohler illumination, positions on the source are transferred to incidence angles on the object [K¨ohler 1893, K¨ohler 1894]; this adds to the complexity of the optical system, but it has the advantage that any spatial structure in the source is not imaged directly onto the object. Finally, it should be noted that aberrations on the condenser lens have no inﬂuence on the resolution of the objective lens and thus the imaging system [Zernike 1938] (see also [Born 1999, Sec. 10.5.2]).
4.5.1
TXM condensers, STXM detectors, and reciprocity In Fig. 4.45, we considered the transfer function for incoherent imaging with equal condenser and objective lens N.A.. In that simple description, there is in fact no diﬀerence
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.6 Darkﬁeld imaging
153
Fullfield imaging Condenser
Objective
Illumination source
Detector
Object plane
Detector Condenser
Illumination source Objective
Scanning Figure 4.56 Reciprocity between fullﬁeld and scanning microscopy. For small illumination sources, the imaging characteristics produced by a given condenser aperture in fullﬁeld imaging are the same as those for a given detector aperture in scanning microscopy [Welford 1960a, Zeitler 1970b, Zeitler 1970a, Barnett 1973].
in the transfer function whether the light “rays” are going from left to right as shown, or from right to left. And what if they were going from right to left? Then the objective lens would be producing the illumination of the object, and the condenser lens would be collecting the light and delivering it to a detector. Now consider the case of a scanning microscope: the illumination comes from a source with small phase space extent at some distance to the right, so only a Rayleighresolutionsized region is illuminated on the object. As a result, there is no need to image the object onto the detector, but the angular extent of the detector will still aﬀect what spatial frequencies one can collect information from. This leads to an important concept for comparing the optical performance of fullﬁeld versus scanning microscopes: the principle of reciprocity as shown in Fig. 4.56. This was ﬁrst put forward by Welford in 1960, who stated
We can show in fact that if the scanning objective A has NA α and the collector B has NA β then the scanning imagery is, within the approximation of scalar diﬀraction theory, exactly equivalent to conventional microscopy with an objective having NA α [and] condenser of NA β . . . [Welford 1960a].
Welford’s original paper did not in fact show how this comes about, nor did a later paper outlining the same principle [Crewe 1970], but a more detailed discussion has been provided by Zeitler and Thomson [Zeitler 1970b, Zeitler 1970a] and by Barnett [Barnett 1973]. The notion holds true for darkﬁeld imaging [Engel 1974], as will be described in the following section, and it holds for the Zernike method for phase contrast imaging as discussed in Section 4.7.3. Just as Fig. 4.53 shows that there is an optimum condenser aperture for fullﬁeld imaging of about 1.5 times the objective aperture, for scanning microscopy there is an optimum detector collection angle from the optical axis which is about 1.5 times the objective N.A.; this becomes especially important in darkﬁeld imaging, as discussed in the next section and shown in Fig. 4.59. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
154
Imaging physics
Fullfield
Annular aperture
Condenser Objective
Extended source
DF stop
Object
Positionsensitive detector
Scanning Coherent source
Objective DF stop
Areaintegrating detector
Object
Figure 4.57 Schematic of optical conﬁgurations for darkﬁeld imaging in fullﬁeld and scanning xray microscopes. In the fullﬁeld case, an annular aperture is placed at the back focal plane (about a focal length behind) the condenser, and the direct illumination is blocked by a complementary ringshaped stop in the back focal plane of the objective. In the scanning case, a circular stop is used to block the direct illumination. Figure adapted from [Vogt 2001].
4.6
Darkﬁeld imaging Consider a coherent waveﬁeld incident upon a circular aperture of radius a in an opaque screen. We know that the farﬁeld diﬀraction pattern will have an amplitude distribution given by Eq. 4.132 and an intensity distribution given by Eq. 4.134. Babinet’s principle [Babinet 1837] (see also [Born 1999, Sections 8.3.2 and 11.3]) says that the optical amplitude downstream of a disk of radius a will be the exact complement of that from the pinhole; that is, when adding the two diﬀraction amplitudes together one will simply have the original incident waveﬁeld. What does this mean for imaging? Consider the example optical layouts shown in Fig. 4.57. In both the fullﬁeld and scanning microscope examples, a darkﬁeld stop is used to block the direct illumination, while light scattered by the sample is deﬂected out of the beam path and into the detector (one can also separate the bright and darkﬁeld signals by using a pixelated detector [Thibault 2009b, Menzel 2010]). The resulting eﬀect on an image is shown in the simulation of Fig. 4.58. As can be seen, the dark ﬁeld image is a complement to the bright ﬁeld image (as expected from Babinet’s principle), so that small features and edges are enhanced in the darkﬁeld image. However, darkﬁeld images of periodic objects must be examined with a knowing eye, as ﬁne periodic features can appear to be “doubled” [Morrison 1992a], as shown at lower left in the simulated dark ﬁeld image of Fig. 4.58. Dark ﬁeld imaging is especially useful for highlighting small, dense features in a
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.6 Darkﬁeld imaging
Object
Bright field
155
Dark field
100 nm
1.0 Dark field
ld ie
Transfer function
f ht rig
B
0.8 0.6 0.4 0.2 0.0 0
1 2 3 Normalized spatial frequency u/ud
4
Figure 4.58 Simulation example of brightﬁeld and darkﬁeld imaging, demonstrating how the darkﬁeld image is a complement to the brightﬁeld image as expected from Babinet’s principle. The object (shown at left) consists of an opaque 400 nm diameter circle, opaque bars with a width starting from 10 nm at left with a width and spacing that increase with the square root of the bar number, and two 25 nm diameter gold spheres with complex transmittance expected for 520 eV X rays. The image at middle is a simulation using the bright ﬁeld transfer function OTFincoherent of Eq. 4.203 for a Fresnel zone plate with drN = 30 nm or a cutoﬀ in its spatial frequency response at 2ud = 1/drN = 33.3 μm−1 . The image at right is a simulation using the dark ﬁeld transfer function shown, which is the complement of OTFincoherent with a rolloﬀ at 1.5 times 2ud , in accordance with the ideal condensertoobjective aperture ratio m = N.A.cond /N.A.obj as shown in Fig. 4.53 (or, in scanning microscopes, the detector aperture as expected from reciprocity and as shown in Fig. 4.59). Notice the apparent doubling of the ﬁnest bars in the darkﬁeld image, due to the image’s edgeenhanced nature. The darkﬁeld image was shown on an intensity scaling of I 0.1 to highlight the lowerintensity features, since in dark ﬁeld these require detection of only a few photons above a low or negligible background.
larger object. One example in which this is useful is the identiﬁcation of gold labels that can be attached to speciﬁc proteins in cells using immunolabeling techniques [Chapman 1996c, MeyerIlse 2001]. While there is no intrinsic photon exposure advantage to using dark ﬁeld rather than bright ﬁeld for imaging immunogold labels (as will be discussed in Section 4.8.4), in practice it is often more convenient to see such labels clearly isolated in dark ﬁeld images [Chapman 1996d, Chapman 1996c], as shown in Fig. 4.60. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
Imaging physics
m=N.A.det/N.A.obj=1.5
Image plane distance (nm) with 36.6 nm Rayleigh resolution objective
Image plane distance (nm) with 36.6 nm Rayleigh resolution objective
28 26 24 22 20 18 16 14 12 10 8 6 4 2
Fixed 24 nm separation, 36.6 nm Rayleigh resolution
1.07 1.25 1.50 2.14 3 5
m=N.A.det/N.A.obj
Object separation (nm)
m=N.A.det/N.A.objA
120 120 110 110 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10075 50 25 0 25 50 75 100 10075 50 25 0 25 50 75 100
Detector equivalent zone width (nm)
156
15 100 75 50 25
0
25
50
75 100
Image plane distance (nm)
Figure 4.59 Eﬀect of varying the detector aperture on dark ﬁeld image contrast in scanning microscopy. For the left and middle ﬁgure, two cylinders of protein (each 20 nm in diameter) were placed with a centertocenter distance indicated by “Object separation.” A 2D dark ﬁeld image was then calculated for an objective zone plate with outermost zone width of drN = 30 nm (and a Rayleigh resolution of δr = 1.22drN = 36.6 nm) at 517 eV photon energy, and a line was extracted from each 2D image and shown here as the row at the speciﬁed “Object separation” value. This was done for a detectortoobjective aperture ratio m = N.A.det /N.A.obj of m → ∞ (left) and the approximate optimum value of m = 1.5 (middle), showing better distinguishability at smaller separations at m = 1.5. At right is shown a simulation with a ﬁxed 24 nm centertocenter separation for the two cylinders, with a row extracted per 2D image as the value of m was changed between each simulation. Because of reciprocity (Section 4.5.1) between scanning and fullﬁeld imaging systems, it is not surprising to ﬁnd that m = N.A.det /N.A.obj 1.5 is preferred for scanning microscopy, and also m = N.A.cond /N.A.obj 1.5 is preferred for fullﬁeld imaging as shown in Fig. 4.53. Figure adapted from from Figs. 5 and 6 of [Vogt 2001].
4.7
Phase contrast Imaging in absorption contrast mode is conceptually straightforward. However, when we discussed the X ray refractive index in Section 3.3.2 and obtained the expression n = 1 − αλ2 ( f1 + i f2 ) = 1 − δ − iβ (Eq. 3.65), we saw in Fig. 3.16 that the phaseshifting part of the refractive index of f1 or δ becomes signiﬁcantly larger than the absorptive part f2 or β at xray energies above about 1 keV. In hindsight it is obvious that one should exploit the phaseshifting part δ of the xray refractive index for highcontrast imaging, but the ﬁrst clear statement of this came somewhat late in the history of the ﬁeld via a conference presentation in August 1986 by Schmahl and Rudolph [Schmahl 1987], who discussed soft xray microscopy but also pointed towards the potential for using higherenergy X rays. (An earlier paper by Bonse and Hart on an xray crystal interferometer [Bonse 1965] mentioned the possibility of phase contrast xray imaging with further brief comments appearing in subsequent reviews [Hart 1970b, Hart 1975], but Schmahl and Rudolph were the ﬁrst to directly point out the potential for reduced radiation dose). In fact, it is truly remarkable that absorption contrast xray radiography has been used for over a century in medical imaging with nobody thinking of the potential of using phase contrast for lower radiation exposure – even though Einstein speculated on how n = 1−δ might produce grazing incidence reﬂection eﬀects in medical imaging (Section 2.2) way back in 1918. Contemplate for a moment the collective blindness of so many
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.7 Phase contrast
157
ѥP
Figure 4.60 Combined brightﬁeld and darkﬁeld imaging of immunogold labels. This image shows a ﬁbroblast with silverenhanced 10 nm gold nanoparticles immunolabeled for actin, imaged after airdrying. The grayscale image is a brightﬁeld image showing absorption within the cell, while the red image overlay is a darkﬁeld image that highlights the small silverenhanced gold particles. Both images were acquired using a scanning xray microscope with a drN = 45 nm outermost zone width zone plate operated at 496 eV [Chapman 1996c].
xray scientists (myself included, in spite of work in xray holography around this time [Howells 1987]) for so long! As we now know, phase contrast is immensely important for transmission imaging in hard xray microscopy. As an example, consider the absorption and diﬀerential phase contrast images of a diatom taken at 10 keV, as shown in Fig. 4.61: the diatom is essentially invisible in absorption contrast, and easily identiﬁable in phase contrast. Since lighter elements also have very poor xray ﬂuorescence yield (Figs. 3.5 and 3.7), phase contrast imaging provides the best way to study light material samples in hard xray microscopes. When studying biological specimens using xray ﬂuorescence excited by multikeV X rays (Section 9.2), phase contrast lets one see overall cellular structure and even measure the local mass so that one can make maps not just of elemental content but of concentration [Hornberger 2008, Holzner 2010, Kosior 2012a, Gramaccioni 2018]. Once one gets to frequencies above the microwave range, one cannot measure phase directly. Instead, the phase of a waveﬁeld is measured by mixing it with some other Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
158
Imaging physics
Absorption
ѥP
Differential phase
ѥP
Figure 4.61 Phase contrast provides the best way to image lighter materials when using multikeV xray microscopes (Fig. 3.16 explains why). The image at left shows the xray absorption signal of a diatom (phytoplankton cell) obtained from the sum of all segments of a transmitted ﬂux detector used for scanning xray microscopy at 10 keV; the light elements that dominate the diatom’s composition produce so little absorption that the diatom is almost invisible. The image at right shows a diﬀerential phase contrast image (Section 4.7.4) of the same diatom, using the same segmented detector signals, but this time looking at the signal diﬀerence between the indicated segments divided by the sum. Figure adapted from [Hornberger 2008].
known waveﬁeld so that the phase is transferred into a measurable intensity, using constructive or destructive interference. One of the simplest ways to achieve this mixing is to use a propagation distance z so that nearby Huygens point sources provide the known or reference wave. As shown in Eq. 4.70, those nearby positions contribute to a measurement point (x, y) while being modulated by a propagator phase term; any diﬀerences in magnitude or phase of these nearby points can contribute to the net optical amplitude at the measurement point. One example wellknown in classical optics is diﬀraction from a halfslit. If an illumination source is placed at a very large distance away compared to the downstream propagation distance z, the intensity distribution versus the transverse distance x is found using the Cornu spiral [Jenkins 1976, Eq. 18k] to give Iedge (w), as shown in Fig. 4.62, where 2 (4.217) w=x λz is a dimensionless parameter. From the values of wfbf = 1.217 for the ﬁrst bright fringe (fbf), and wfdf = 1.872 for the ﬁrst dark fringe (fdf), we can write the transverse positions of the ﬁrst bright and dark fringes, respectively, as √ λz 0.861 λz (4.218) xfbf = 1.217 2 √ λz = 1.324. λz (4.219) xfdf = 1.872 2 Consider another example as shown in Fig. 4.63. In this case, Fresnel propagation was carried out using the convolution approach of Eq. 4.109 as appropriate for nearﬁeld distances. As one can see, while a phase object is invisible in terms of an intensity distribution at a plane immediately downstream of the object (or in the focus of a microscope), it becomes visible when one defocuses the microscope to produce Fresnel √ fringes at a transverse position of about λz and beyond. This eﬀect is commonly used Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.7 Phase contrast
159
in transmission electron microscopy of light materials, which show mainly phase contrast; the proper focus position can be determined from when the object “disappears” (when the image contrast approaches zero), after which a known defocus position (usually the Scherzer defocus [Reimer 1993, Eq. 3.69]) is chosen to provide phase contrast. Phase contrast imaging with sub100 nm spatial resolution was ﬁrst demonstrated in soft xray microscopy [Schmahl 1994], but even more activity has taken place in micrometerscale spatial resolution imaging using hard X rays [Davis 1995] (for recent reviews, see [Momose 2005, Wilkins 2014]). For coarser spatial resolution with hard X rays, a number of approaches have been used for phase contrast imaging: • One can place an object within a Bonse–Hart interferometer constructed using Bragg diﬀraction from two crystals cut within a single crystalline silicon block for stability [Bonse 1965, Momose 1996], or use a second analyzing crystal [Davis 1995]. • One can also use just the analyzing crystal in an approach called diﬀractionenhanced imaging [Chapman 1996a, Chapman 1997a]. • Another popular approach is to use one grating (which can be a phase grating) to produce selfinterference fringes via the Talbot eﬀect [Talbot 1836, Rayleigh 1881] at a downstream plane, and a second absorptive grating to measure deviations in the interference pattern produced by an object placed between the two gratings [Weitkamp 2004, Weitkamp 2005]. This is an approach which works even with sources of low coherence when a third grating is used [Pfeiﬀer 2006]. The grating method can also be used for darkﬁeld imaging [Pfeiﬀer 2008]. This approach is growing in importance for phase contrast in medical imaging, with much activity. • As a waveﬁeld propagates downstream from a phase object, Fresnel fringes begin to appear at the object’s boundaries from which one can obtain a phase contrast image, as shown in Fig. 4.63. Approaches of this type are discussed in Section 4.7.2. These methods all work at the micrometerscale spatial resolution of an xray image detector system which typically consists of a scintillator, microscope objective, and visiblelight camera (Section 7.4.7). Together the lower resolution approaches are seeing considerable activity and research impact, though they lay beyond the scope of this book on submicrometer xray microscopy.
4.7.1
Phase contrast in coherent imaging methods One does not require a fully spatially coherent beam in order to obtain phase contrast images; this will be made clear in the discussions of propagationbased phase contrast (Section 4.7.2) and Zernike phase contrast (Section 4.7.3). However, when one does have a nearly fully coherent beam, one can use methods like xray holography (Section 10.2) or especially xray ptychography (Section 10.4) to obtain highquality phase contrast images. These methods are discussed in Chapter 10.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
160
Imaging physics
I(1.217)=1.37
1.4
I(2.344)=1.20
1.2
Iedge(w)
1.0 0.8
I(2.739)=0.84 I(1.872)=0.78
0.6 0.4 0.2 0.0 1
0
1
2
w
3
4
5
Figure 4.62 Fresnel diﬀraction from a halfplane aperture. Shown here is the intensity √ distribution Iedge (w) with w = x 2/(λz) being a dimensionless parameter. The position of the ﬁrst bright fringe is at wfbf = 1.217, where one has an intensity of 1.37I0 , as indicated in the ﬁgure, while the position of the ﬁrst dark fringe is at wfdf = 1.872. The intensity is calculated using the Cornu spiral [Jenkins 1976, Eq. 18k].
4.7.2
Propagationbased phase contrast Wave propagation over short distances (or small Fresnel numbers; see Eq. 4.116) involves just a few Fresnel fringes around the edges of objects, as shown in Fig. 4.63. In this case the standard inline holographic treatment for image reconstruction that will be discussed in Section 10.2 is not terribly successful: the twin image is at a very closeby longitudinal or z distance from the desired image, leading to severe problems in image interpretation. However, the presence of fringes is still revealing of phase information about the specimen, so it is not surprising that there can be other approaches for reconstructing phase contrast images using these nearﬁeld fringes. We outline two of these approaches here. Because the object is convolved with a propagator function appropriate for a particular distance, one can use a deconvolution process as described in Section 4.4.8, except that one now wishes to use the known, complex optical transfer function OTF(u x , uy ) with the square root of the intensity I(x, y, z) at the downstream plane z providing i(x, y) = I(x, y, z) in the expression (Eq. 4.206): + , F {i(x, y)} −1 o(x, y) = F . OTF(u x , uy ) The transfer function OTF(u x , uy ) is simply a propagator function H(u x , uy ) as shown in Fig. 4.20. Unfortunately, the propagator function has zerocrossing points which present diﬃculties when dividing by OTF(u x , uy ), and the absence of phase in i(x, y, z) also leads to errors in the recovered object. One approach is to record nearﬁeld intensity distributions (which are eﬀectively holograms) at multiple, carefully selected [Zabler 2005] values of the propagation distance z and backpropagate each waveﬁeld to the object plane and thus use them in a combined ﬁt to reconstruct the complex object [Cloetens 1999a]. When combined with specimen rotations, this provides for a very
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.7 Phase contrast
Opaque object
161
90° phase object
0.02
First dark fringe at
1000
0.05
500 300
0.20
Phase advance and energy flow
0.50 1.00
100 50 30
2.00
Fresnel number
Propagation distance zѥP
0.10
10 5.00
5 3
10.00 20.00
1 50.00
100 nm
Figure 4.63 Illustration of propagation phase contrast. A waveﬁeld of 520 eV soft X rays descends from above to illuminate an 2a = 500 nm wide opaque bar at left, and a 2a = 500 nm wide phaseshifting bar (which advances the phase by 90 ◦ ) at √ right. The ﬁrst dark fringe appears as shown in Fig. 4.62, at a transverse position of xfdf = 1.87 λz/2 from the edge (Eq. 4.219). The Fresnel number F0 = a2 /(λz) (Eq. 4.116) changes with the size of the object, while the ﬁrst fringe width does not. Because the phase object advances the phase of the waveﬁeld, it leads to an energy ﬂow out to the sides as the propagation distance increases. The presence of fringes in the downstream intensity distribution can be used to calculate the phase object that produced them, as discussed in Section 4.7.2. Note that the periodicity of discrete Fourier transforms (Eq. 4.95) means that one sees fringes that seem to appear from other bars to the left and right of this array at the larger propagation distances.
successful approach for phase contrast tomography called “holotomography” [Cloetens 1999b]. An alternative way to represent the intensity distribution recorded downstream of an object is to use the transport of intensity (TOI) or equivalently the transport of intensity equation (TIE). This equation considers the intensity I(x, y, z) and phase ϕ(x, y, z) distributions at one plane, and describes the nearﬁeld evolution of these distributions as 2π ∂ I(x, y, z). ∇ x,y I(x, y, z) · ∇ x,y ϕ(x, y, z) = λ ∂z
(4.220)
The object is assumed to be composed of a single material with a net thickness projected onto a plane of t(x, y), so that I(x, y, z = 0) = I0 exp[−μt(x, y)] and ϕ(x, y, z = 0) = −(2π/λ)δt(x, y) where μ = (4π/λ)β and δ+iβ are as in Eq. 3.67. With these assumptions, Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
162
Imaging physics
one can reconstruct [Paganin 2002] the projected thickness t(x, y) of the object using  F {M 2 I(Mx, My, zd )/I0 . 1 t(x, y) = − log F −1 μ , (4.221) μ zd δ(4π2 u2x + 4π2 u2y ) + μ with M = (z s + zd )/z s being the geometrical magniﬁcation (Eq. 6.1) produced by a point source at a distance z s upstream of the source and a detector recording the intensity I(Mx, My, zd ) at a distance zd downstream of the specimen (see Fig. 6.1). This expression uses forward F and inverse F −1 transforms, and spatial frequencies u x and uy (Eq. 4.32). This approach does not require a very high degree of spatial coherence, since it makes use of only the ﬁrst few Fresnel fringes from the edge of an object (that √ is, it requires good mutual coherence over a width of about λzd as given by Eq. 4.217), so it ﬁnds widespread use. One can also consider a hybrid of these two approaches, where one uses the transportofintensity reconstruction of Eq. 4.221 to provide a ﬁrst guess of the complex object. Since freespace propagation of a waveﬁeld from the object’s exit ﬁeld to any downstream plane can be calculated in a straightforward manner (Section 4.3.7), one can then calculate the complex amplitude at any downstream plane. One can then develop a procedure where the magnitudes i(x, y, zi ) at several downstream planes zi indexed by i are squared and compared with the recorded intensities, and iteratively “nudge” these magnitudes towards the recorded values I(x, y, z) using iterative algorithms as discussed in Chapter 10. This can yield an estimate of the complex image i(x, y, 0) with very high ﬁdelity [Mayo 2003, Krenkel 2013], provided one has a high degree of mutual coherence over the lateral distance corresponding to the number of Fresnel fringes recorded at the largest of the distances zi . The approaches outlined above are among the ones most commonly employed for propagationbased phase contrast imaging, but additional approaches exist as described in a recent comparison [Burvall 2011].
4.7.3
Zernike phase contrast imaging When an optical system is used to image a weak phase object, a simpler and more direct method can be used, as developed by Fritz Zernike in 1935 [Zernike 1935, Zernike 1942a, Zernike 1942b] for which he received the 1953 Nobel Prize in Physics. In order to best understand the Zernike method, let’s ﬁrst consider the overlap of two waves in terms of the addition of two phasors in the complex plane. If the phasor from a weak phase object has a small phase shift, the length of the result of this phasor added with a “reference” phasor changes very little, as shown in Fig. 4.64. If, however, the “reference phasor” is at a relative phase of 90◦ , then small phase changes from the object produce larger changes in the length of the addition of the two phasors; that is, small phase changes lead to largerintensity variations. The next step is to create the conditions in a fullﬁeld microscope for interfering light scattered by the object with a 90◦ phaseshifted “reference” wave. The usual implementation of Zernike phase contrast involves matching an annular aperture in the illumination optics with an annular phaseshifting ring in the imaging optics, as shown
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.7 Phase contrast
Imaginary
163
d
ifte
)
(length
Ref.
d
ifte
Sh
Reference
Sh
Unshifted (lengths) Real
Unshifted
Figure 4.64 Addition of phasors in the Zernike phase contrast method. Consider the case of trying to measure small phase shifts in one wave (labeled “shifted”) via intensity diﬀerences when it is made to interfere with a reference wave. When the reference wave is parallel to the “shifted” wave as shown at left, the length of the resulting vector shows very little diﬀerence from the “unshifted” case, so the intensity variation is small. When the reference wave is at 90◦ relative to the “shifted” wave as shown at right, the small change from “unshifted” to “shifted” phase in from the object leads to larger intensity variations when the reference phasor is added.
in Fig. 4.65. The annular aperture is placed in the back focal plane of the objective; each point on the annular aperture then illuminates the entire ﬁeld of view of the microscope with a plane wave from one direction, giving a net eﬀect of an azimuthal ring of illuminating plane waves all at the same radial spatial frequency ρ. A single angle of illumination into the objective lens gets focused to a single point in its back focal plane, so the annular aperture produces a ring of light in the objective’s back plane. In other words, the light from the condenser’s annular aperture is imaged to a ring in the objective’s back focal plane, where a phase ring is located to impose a phase shift of 90◦ (or 270◦ or 450◦ . . .) on this “reference” illumination wave. In the meantime, the illuminated sample scatters light into a wide range of angles which are collected by the objective lens, and this wide range of angles translates to a wide range of positions in the objective’s back focal plane. In this one nearly all of the light scattered by phase gradients in the sample escapes being modiﬁed by the phase ring, leading to the desired interference condition shown at right in Fig. 4.64. A calculation of the image intensities produced by a feature of material f embedded in a matrix of a background material b (with both the feature and the matrix having a thickness t) has been carried out by Rudolph and Schmahl [Rudolph 1990, Eqs. 4.12 and 4.13]. Their calculation uses the xray refractive index n = 1 − δ − iβ (Eq. 3.67) and μ = 2kβ (Eq. 3.82) to write the linear absorption coeﬃcients within the feature and background materials as μ f = 4πβ f /λ and μb = 4πβb /λ, respectively. The perwavelength phase advances in the two materials are η f = 2πδ f /λ and ηb = 2πδb /λ. The phase ring has similar attenuation and phase change coeﬃcients μ p and η p for a phase ring thickness t p . The image intensities with a feature present (I f ) and absent (the Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
164
Imaging physics
Bertrand lens
Object Source
Annular aperture
Objective
Condenser
Phase ring
Image
Figure 4.65 Schematic of the Zernike method for achieving phase contrast in a fullﬁeld microscope. An annular aperture is placed in the back focal plane of the condensor lens so that its ring of positions is transferred to a ring of illumination input angles onto the specimen (as indicated by the green lines); this ring of angles is in turn imaged onto a phase ring in the back focal plane of the objective lens, where it receives a 90 ◦ phase shift. The fraction of the waveﬁeld scattered by a specimen feature at the object plane (indicated by red lines) passes mostly around the phase ring to reach its imaging point on the detector, ensuring that most of the object waveﬁeld is not phaseshifted, thus fulﬁlling the condition shown in Fig. 4.64. The Bertrand lens (named after a French mineralogist) can be inserted to image the phase ring onto the detector so that it can be properly aligned relative to the image of the annular aperture.
background intensity Ib ) are given by I f ,zpc = I0 e−μb t (e−μ p t p + 1) + e−μ f t −(μ f /2)t −(μb /2)t −(μ p /2)t p
+ 2e
e
e
(4.222) cos[(η f − ηb )t − η p t p ]
− 2e−(μ f /2)t e−(μb /2)t cos[(η f − ηb )t] − 2e−μb t e−(μ p /2)t p cos(η p t p ) Ib,zpc = I0 e−μb t e−μ p t p .
(4.223)
Of course in most cases one wishes to have cos(η p t p ) near zero as the phase ring provides a phase shift of 90◦ or 270◦ or increments thereof. Absorption in the phase ring can help increase contrast by making the length of the “reference” and “shifted” phasors closer to each other. We can simplify the above expressions by ignoring absorptive eﬀects, using cos(θ − 90◦ ) = sin(θ) θ, and using cos(θ) 1. These weakspecimen approximations lead to a lowestorder simpliﬁcation of Eqs. 4.223 and 4.223 of # t$ (4.224) I f ,zpc I0 1 + 4π(δ f − δb ) λ and Ib,zpc I0
(4.225)
as approximate expressions for Zernike phase contrast of thin objects, with a diﬀerence of t I f ,zpc − Ib,zpc  = 4π δ f − δb  (4.226) λ which is an expression that will be used in Eq. 4.270. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.7 Phase contrast
165
There are several additional factors to consider with the Zernike phase contrast method as implemented in xray microscopes: • The width of the annular condensor aperture and corresponding phase ring must be optimized based on two competing considerations. As the annulus width is increased, more light is transmitted from the source so that exposure times are decreased. However, with a wider phase ring, a greater range of low spatial frequencies from the object also pass through the phase ring so that the desired 90◦ phase shift is not realized between these object spatial frequencies and the illumination. The resulting eﬀect is that of a highpass ﬁlter, producing a halolike appearance on images of larger objects [Yamamoto 1983]. One way to utilize more of the source’s illumination while also minimizing the halo eﬀect is to use a large number of small source apertures in the condenser back focal plane, and provide a matching number of small phaseshifting “dots” in the objective’s back focal plane [Stampanoni 2010]. • Because of the reciprocity of condenser and objective in fullﬁeld microscopy, and detector and objective in scanning microscopy (Section 4.5.1), one can also place a phase ring in the back focal plane of the objective lens in a scanning microscope and use an annular detector region to achieve Zernike phase contrast (this was suggested by Wilson and Sheppard [Wilson 1984] and by Siegel et al. [Siegel 1990] prior to its ﬁrst demonstration [Holzner 2010]). While further improvements can be achieved using reﬁnements of the approach [Vartiainen 2015], if one uses a pixelated detector for the transmitted signal it may be better to use the method of ptychography as discussed in Chapter 10.4 since small focal spots in scanning microscopes require a high degree of coherence in the illumination (Section 4.4.6), meeting the criteria for ptychography. • With compound refractive lenses (Section 5.1.1), the extended length of the lens system means that one has additional options for the implementation of the Zernike method [Falch 2018].
4.7.4
Differential phase contrast While the Zernike method can be implemented in scanning xray microscopes as noted above, a simpler method is to use a segmented detector to record a diﬀerential phase contrast signal. Consider a sample with thickness t composed of material b, which has a feature of material f (Fig. 4.66). If an xray beam of width Δr is directed to the edge between feature and background, the beam will be refracted by an angle θr given by the fractional advance of the wavefront (Δϕ/2π)λ divided by the distance over which the phase undergoes that change, or (Δϕ/2π) λ kδ f − δb t λ t = = δ f − δb  , (4.227) Δr 2π Δr Δr where δ f and δb are the phase shifting parts of the refractive index n = 1−δ−iβ (Eq. 3.67) for the two materials. While this refraction angle is small, so is the numerical aperture N.A. of typical xray focusing optics, so that this refractive shift is enough to lead to a θr =
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
166
Imaging physics
6r b
t
f Ƨ
Beam
Wavef ro
nt
directio
n
Figure 4.66 Refractive prism model for diﬀerential phase contrast in scanning xray microscopy. The phase advance of the beam in a feature f relative to a matrix of a background material b with thickness t over the width Δr of the beam leads to a refraction angle given by Eq. 4.227.
noticeable signal diﬀerence in a segmented transmission detector, as shown in Figs. 4.67 and 4.61. While a variety of detector conﬁgurations have been used [Morrison 1992b, Kaulich 2002a, Feser 2006, Hornberger 2008], the most simple case is a detector split into two segments along the direction of the phase gradient. When a feature is present, the intensity recorded will be θr 1 θr 2θr 1 )−( − ) = I0 (4.228) I f ,dpc = Ileft − Iright = I0 ( + 2 N.A. 2 N.A. N.A. because each of the two segments accepts light from the semiangle given by N.A.. Now the width Δr over which the phase gradient takes place is the Rayleigh resolution of the focal spot, or Δr = δr = 0.61λ/N.A., as given by Eq. 4.173. We thus have N.A. = 0.61λ/Δr, which along with Eq. 4.227 lets us write Eq. 4.228 as I f ,dpc = I0
4 t δ f − δb . 1.22 λ
(4.229)
The diﬀerential phase contrast signal in the case of no feature present is zero, or Ib,dpc = 0. Several other approaches can be used to realize diﬀerential phase contrast in xray microscopes. One can use optics upstream of the objective lens [Polack 1995, Joyeux 1998], or pairs of oﬀset Fresnel zone plates [Wilhein 2001a, Wilhein 2001b, Kaulich 2002b], or single specially designed zone plates [Chang 2006] to produce an eﬀect like having two slightly separated, phaseshifted focal spots. A phase gradient across these two focal spots will produce a shift of interference fringes within the objective lens or within the detector acceptance angle, again leading to changes in the intensity of the detected image. With segmented detectors, one can use one of two approaches to recover the full phase contrast image from the simple diﬀerential version: • One can calculate the optical transfer function (Section 4.4.7) for the image recorded by each detector segment [McCallum 1995, McCallum 1996], and use this as a Fourier ﬁlter for image deconvolution [Hornberger 2007b] (see Section 4.4.8). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.7 Phase contrast
Segmented transmission detector
167
Energydispersive detector
Specimen OSA
Zone plate
Xray beam
Figure 4.67 Diﬀerential phase contrast in a scanning xray microscope with a segmented
detector. Phase gradients in the specimen lead to a deﬂection of the transmitted beam (Eq. 4.227), and this can lead to diﬀerences in the signal recorded in diﬀerent segments of the transmitted beam detector. Figure adapted from [Hornberger 2008].
This approach delivers the full contrast of ﬁner features in the specimen, though it requires careful calibration of the detector response and alignment. • Alternatively, the horizontal and vertical diﬀerential phase contrast images can be integrated using a Fourier derivative method developed for light microscopy [Arnison 2004], which has subsequently been applied to both gratingbased xray phase contrast imaging [Kottler 2007] and to scanning xray microscopy [de Jonge 2008]. In this approach, diﬀerential phase contrast images Δϕ x and Δϕy in orthogonal directions are combined to yield a phase contrast image ϕ using + , F {Δϕ x + iΔϕy } ϕ(x, y) = F −1 , (4.230) 2πi(u x + iuy ) This approach works especially well for low spatial frequency structures (that is, the resulting images do not suﬀer from the “halo” eﬀect seen in Zernike phase contrast), though the method by itself does not correct for reduced response at higher spatial frequencies. Even greater ﬂexibility in diﬀerential phase contrast imaging can be achieved by using a pixelated detector in a scanning microscope [Thibault 2009b, Menzel 2010], but (as noted above) in this case it can be even better to proceed to using ptychography as discussed in Section 10.4.
4.7.5
Grazing incidence imaging Another way to obtain phase contrast in an image is via topography in the grazing incidence image geometry [Fenter 2006], as shown in Fig. 4.68. Recall that Bragg’s law of 2d sin(θ) = mλ (Eq. 4.33) is based on there being a mλ optical path length diﬀerence (or a 2mπ phase shift) between partial reﬂection from layers separated by a distance d, as shown in Fig. 4.9. Therefore, if one inclines the illumination system
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
168
Imaging physics
Source Image detector
Illu
mi
na
ƧȾ
tio
ns
ide
y
e
Step feature
y’
g gin
sid
a
Im
Figure 4.68 Grazing incidence xray imaging of topographic features. If a specimen is illuminated at a grazing angle θ and the imaging system is inclined at the same angle θ , then one can image surface topography with high sensitivity, including the ability to image single atomic steps [Fenter 2006]. The image becomes elongated along the reﬂection direction, so that distances in the y direction in the image become distances y = y/ sin(θ ) along the surface.
(source plus condenser optic) by a grazing angle of incidence θ while also inclining the imaging system (objective optic plus image detector) by the same angle, surface steps will produce no image contrast if one satisﬁes Bragg’s law with whole integer values of m. If, however, the grazing angle θ and surface step height d are set such that m = 1/2, one will have a π phase shift across the position of the edge in the imaging ﬁeld, and even a slight defocus will allow this phase jump to be viewed as an image intensity variation, as can be seen by considering the example of Fig. 4.63. Thus the condition for obtaining good contrast for a surface step height of dstep is dstep =
λ , 4 sin(θ )
(4.231)
so one prefers larger grazing angles in order to see small steps. Consider the case of imaging at a grazing angle beyond the critical angle θc of Eq. 3.115. The reﬂectivity (which can be calculated using Eqs. 3.119–3.121) may be quite low at angles beyond the critical angle θc but it is not zero (see Fig. 3.26). The payoﬀ is that increasing θ decreases dstep . In the ﬁrst demonstration of this approach, the 100 surface of orthoclase (KAlSi3 O8 , density 2.59 g/cm3 ) was imaged since it naturally forms clean steps that are one unit cell or d = 0.6464 nm high. At 10 keV, this material has a phaseshifting decrement of the refractive index of δ = 5.4 × 10−6 so the critical angle for reﬂectivity is θc = 0.19◦ . When imaged at a grazing angle of θ = 2.7◦ where the reﬂectivity calculated using Eq. 3.122 is R 1.5 × 10−4 , one has dstep = 0.66 from Eq. 4.231 so that single atomic steps should be observable with high image contrast. This has indeed been observed [Fenter 2006], and this demonstration has then led to the imaging of the growth of ferroelectric epitaxial thin ﬁlms [Laanait 2014] and xray reaction front dynamics at the calcite–water interface [Laanait 2015]. One can also use grazing incidence phase contrast due to topography as a contrast mechanism for coherent diﬀraction imaging [Sun 2012] or CDI (Chapter 10). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
Number of events
Number of events
4.8 Image statistics, exposure, and dose
12 000 10 000
20 000 15 000
8000
Poisson Gaussian
10 000
6000 4000
5000 0 0
2
4
n
6
8
2000 0 0
2
4
n
6
8
10
25
30
6000 5000
8000 6000
4000
4000
3000 2000
2000 0 0
169
5
10
n
15
20
1000 0
0
5
10
15
n
20
Figure 4.69 A comparison of the Gaussian and Poisson distribution functions for small integer event counts n given an expected value of n¯ . As can be seen, the Gaussian distribution provides a good approximation to the correct Poisson distribution even at very small values of n¯ of 5–10.
4.8
Image statistics, exposure, and dose Xray microscopes naturally involve ionizing radiation, so it is important in many cases to use the minimum exposure possible for recording an image. Having discussed image contrast mechanisms above, and xray absorption in Section 3.3.3, we are ready to consider the question of photon statistics in imaging and the consequences of irradiation of the object. As will be discussed in Section 11.2, the best metric for comparing irradiation levels in diﬀerent materials and with diﬀerent exposure levels and photon energies is the absorbed dose, which is a measure of ionizing energy absorbed per mass (usually in units of a gray, which is 1 joule absorbed per kilogram). Our discussion below is for the case of 2D imaging. For 3D imaging, dose fractionation means that much the same conclusions apply, as discussed in Section 8.6.
4.8.1
Photon statistics and the contrast parameter Θ We ﬁrst consider the question of the statistics of individual discrete events. The mathematical foundations6 were laid down by the French mathematician Sim´eon Denis Poisson; though he initially disagreed with Fresnel’s theory of diﬀraction, he made so many other contributions to mathematics and physics that we can forgive him. Poisson considered the case of discrete events that produce an integer result (such as rolling a die) averaging to n¯ , and found that the probability of having a result of n in one particular measurement is given by n¯ n exp(−¯n), (4.232) P(n, n¯ ) = n! where the factorial is n! = n · (n − 1) · (n − 2) . . . 1. This expression is diﬃcult to calculate directly for all but the smallest values of n and n¯ , but fortunately it is well approximated 6
An amusing guide is available [Gonick 1993].
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
170
Imaging physics
Figure 4.70 Image appearance versus the mean number of photons per pixel n¯ . These images
were simulated by using the image pixel value as the value for n¯ , and then replacing that pixel with a noisy version n based on the Poisson distribution (Eq. 4.232). The “cameraman” image is commonly used in the imageprocessing literature, and can be found through a web search.
by a Gaussian distribution of
(n − n¯ )2 P(n, n¯ ) = √ exp − . 2¯n 2π¯n 1
(4.233)
The Gaussian result should be truncated to P = 0 for n < 0 so as to exclude the incorrect nonzero values of P that would otherwise result for negative (nonsensical) values of n due to the long tails of the Gaussian distribution. While we showed a zeromean Gaussian distribution in Fig. 4.4, in Fig. 4.69 we show a comparison for integer events with diﬀerent expected values of n¯ using both the Poisson and Gaussian distributions; as can be seen, the Gaussian distribution closely approximates the Poisson distribution even for very small values of n¯ . Comparing Eq. 4.233 with Eq. 4.14, we make the association that the standard deviation σ is given by √ (4.234) σ = n¯ . As noted in Fig. 4.4, about twothirds of the events fall within ±σ of the mean of n¯ . Finally, a set of images with diﬀering values of n¯ are shown in Fig. 4.70. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.8 Image statistics, exposure, and dose
171
Let us now consider the case of looking for a certain feature f within a background material b in a sample. If we have some imaging system which delivers a unitnormalized image intensity I f at a pixel where the feature is present, and an image intensity Ib otherwise, the (unsigned) signal we will get when comparing a featurepresent pixel against a featureabsent pixel when using a mean illumination of n¯ photons per pixel will be S = ¯nI f − n¯ Ib  = n¯ I f − Ib .
(4.235)
This signal must be detected against a background of ﬂuctuations due to particular values of n detected in one measurement of the intensity at a pixel versus another measurement. Using the Gaussian approximation to the √ Poisson distribution, we can characterize these ﬂuctuations by their standard deviation n¯ times the respective unitnormalized intensities I. Now if we are comparing the measurements from a featurecontaining pixel versus a backgroundcontaining pixel, the ﬂuctuations in the two measurements will be uncorrelated in which case we can add the ﬂuctuations together in rootsumofsquares fashion to arrive at a noise estimate of √ (4.236) N = ( n¯ I f )2 + ( n¯ Ib )2 = n¯ I f + Ib . We therefore see that the signaltonoise ratio (SNR) in the measurement is given by √ I f − Ib  n¯ (I f − Ib ) = n¯ SNR = √ I f + Ib n¯ I f + Ib √ = n¯ Θ,
(4.237)
where we have deﬁned Θ as a contrast parameter [Glaeser 1971, Sayre 1977b] of I f − Ib  . Θ≡ I f + Ib
(4.238)
The contrast parameter Θ diﬀers from the usual deﬁnition of contrast, or fringe visibility V given in Eq. 4.182, by use of the square root in the denominator. The SNR relationship given by Eq. 4.237 will be used to estimate required image exposures as detailed below. It can also be related to detective quantum eﬃciency (DQE), as will be shown in Eq. 7.34. Note that some authors in electron microscopy use a diﬀerent deﬁnition of SNR = S 2 /N 2 which scales linearly with the illumination rather than as the square root [van Heel 2000], so one must exercise some care in comparing SNR results across research communities. If we want an image to have a certain SNR, how many photons do we need? Solving Eq. 4.237 for the mean number of incident photons per pixel n¯ , we ﬁnd n¯ =
SNR2 . Θ2
(4.239)
And what value of SNR is acceptable? The usual criterion comes from the work of Albert Rose, who looked carefully into the response of the eye, and photographic ﬁlm, and electronic imaging systems in the early days of television [Rose 1946]; he concluded Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
172
Imaging physics
Critical Detection level limit
Ơ=0.07
ơ=0.03
ơ Ơ
LC
LD Measurement intensity
Figure 4.71 Binary hypothesis testing and error rates. Given a certain detection limit LD for
features being present (blue curve) versus the background (red curve), binary decisions based on a critical level LC will erroneously count a fraction α of the background events as false positives, and a fraction β of the feature events as false negatives. In this example, the false positive rate α was set to 7 percent such that kα = 1.476, while the false negative rate β was set to 3 percent such that kβ = 1.881.
that human observers were satisﬁed with a light intensity that produced an SNR of 5 when considering the “pixels” of the retina (the rod and cone cells), or SNRRose = 5.
(4.240)
As a result, for an object with a contrast parameter Θ = 1 and an imaging system with no light loss, we need to illuminate the object with at least n¯ = 52 or 25 photons per pixel. In some cases even lower exposures are used. For example, in singleparticle electron microscopy the SNR on images of individual molecules is much lower than 5 due to radiation damage limitations; even so, the SNR on the ﬁnal 3D molecular reconstruction can be quite high, because it combines the signal from lowdose images of 103 –105 individual molecules [Frank 2002]. Strictly speaking, the ﬂuence F is deﬁned in terms of an area rather than a pixel, which means we can calculate the required ﬂuence from the required perpixel exposure n¯ as F=
n¯ (Δr )2
(4.241)
where Δr is the physical size of a pixel.
4.8.2
Minimum detection limits Another approach to think about image statistics is to consider minimum detection limits. Consider two Gaussian distributions as shown in Fig. 4.71, with respective mean
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.8 Image statistics, exposure, and dose
173
False negative fraction Ơ
1
0.1
0.01
0.001
0.0001 0
1
2
3
4
kƠ Figure 4.72 Hypothesis testing error rate α as a function of kα , as given by Eq. 4.243. At an
intensity of n¯ Ib + kα σb , the fraction of false positive values above the critical level Lc is given by α. An identical relationship exists between β and kβ .
values of n¯ Ib for the background regions and n¯ I f for the featurecontaining regions, √ with standard deviations σb = n¯ Ib and σ f = n¯ I f respectively. Given a particular measurement intensity at one pixel, one might want to carry out binary hypothesis testing: is this pixel part of the background, or is it showing a feature? If one reduces the featurepresent intensity n¯ I f down to a detection limit LD such that its distribution begins to overlap with the background intensity n¯ Ib , there will be a critical level LC between the two distributions such that one will have two errors present in a binary hypothesis test against the critical level LC : • False positives are events with a value above the critical level LC which are part of the background distribution but which are falsely counted as being features. The fraction of total background events which give rise to false positives is designated by α, and the critical level LC is located at a position of kα σb above the background mean n¯ Ib . • False negatives are events with a value below the critical level LC which are part of the feature distribution but which are falsely counted as being background. The fraction of the total feature events which give rise to false negatives is designated by β, and the critical level LC is located at a position of kβ σ f below the feature mean n¯ I f . The relationship between α and kα involves the error function erf(x) of x 2 erf(x) = √ exp(−t2 ) dt, π 0
(4.242)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
174
Imaging physics
Table 4.2 Minimum detection limit examples. If the background signal is low (0.05 versus 0.2 in the third row of the table), one can decide if a pixel represents a background region or a featurepresent region (based on a threshold at the critical level LC ) even with quite low perpixel incident photon count n¯ .
Normalized feature intensity I f Normalized background intensity Ib Incident number of photons n¯ Signal separation scaling kα = kβ False negative, positive rate α = β (Eq. 4.243) Critical level LC in photons (Eq. 4.244) Detection limit LD in photons (Eq. 4.245) Contrast parameter Θ (Eq. 4.238) Signal to noise ratio SNR (Eq. 4.237)
1.0 0.2 10.0 1.75 0.040 4.5 10.0 0.73 2.31
1.0 0.05 10.0 2.45 0.007 2.2 10.0 0.93 2.93
which is frequently provided in numerical analysis subroutine libraries; one then has √ 1 − erf(kα / 2) α= , (4.243) 2 which is plotted in Fig. 4.72; the same relationship applies to β and kβ . The critical level LC for hypothesis testing is deﬁned [Currie 1968] as LC = n¯ Ib + kα σb = n¯ Ib + kα n¯ Ib (4.244) while the detection limit LD is deﬁned as
LD = LC + kβ σ f = n¯ Ib + kα n¯ Ib + kβ n¯ I f .
(4.245)
If the standard deviations in the featurepresent and background regions are the same (that is, if σ f = σb ), and if one accepts equal false positive α and false negative β rates, then the critical level lies halfway between the background and feature distribution peaks. In Table 4.2, we consider some numerical examples that might pertain to elemental detection using xray ﬂuorescence, as will be discussed in Section 9.2.3. These examples show that if the background levels can be kept low, one can reliably detect the presence of features even in quite noisy, lowphotonillumination (small n¯ ) images.
4.8.3
Signal to noise and resolution from experimental images We have described above an approach for predicting the photon exposure needed for imaging an object in the case where we can predict the image intensities measured with a feature present or absent, so that the contrast parameter Θ can be calculated using Eq. 4.238. We now consider a diﬀerent perspective on signal to noise in imaging: the case of determining the SNR and spatial resolution of an aquired image of an unknown object. We begin with an overall measure of image signaltonoise. One approach is to acquire two images I1 and I2 of the object using identical experimental conditions, and then to compare those images. Features actually present in the object should be highly
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.8 Image statistics, exposure, and dose
175
correlated, while noise should not. We make the comparison using a method developed for waveform signal processing [Bershad 1974] which was later applied to electron microscopy [Frank 1975b]; our treatment here draws on an analysis that is consistent with the deﬁnition of SNR of Eq. 4.237 [Huang 2009a]. The signal S should be correlated between the two images, while the two noise patterns N1 and N2 should not, giving I1 = S + N1
and
I2 = S + N2 .
(4.246)
The signal S has a mean value over all 2D pixels of S , while the noise patterns should have equal ﬂuctuations above and below the signal level (assuming photon per pixel values n¯ high enough that the Gaussian noise distribution of Eq. 4.233 wellapproximates the Poisson distribution of Eq. 4.232, as shown in Fig. 4.69). As a result, the noise means are N1 = N2 = 0, so that the image means become I1 = I2 = S . We then wish to consider the total signal and total noise for the two images using their variances: S 2 = (S − S )(S − S )∗ = S 2 − S 2 , ∗
N = (N1,2 − N1,2 )(N1,2 − N1,2 ) = 2
2 N1,2 ,
(4.247) (4.248)
where N1,2 = 0 has been used in the ﬁnal equality of Eq. 4.248. Again, the average is done over all pixel indices of one 2D image, which diﬀers from a variance calculation for a particular pixel in a set of separately measured images. This can be shown [Huang 2009a] to lead to a correlation coeﬃcient r between the two images of r=
(I1 − I1 )(I2 − I2 )∗ (I1 − I1 )2 (I2 − I2 )2
from which one can calculate a SNR of SNR =
S2 = N2
r . 1−r
(4.249)
(4.250)
This expression is the square root of the expression α = r/(1 − r) used in some electron microscopy papers [Frank 1975b], but as noted before some authors in electron microscopy prefer to deﬁne SNR = S 2 /N 2 , which scales linearly with the illumination rather than as the square root [van Heel 2000]. Our deﬁnition in Eq. 4.250 scales as the square root of illumination [Huang 2009a], as expected from Eq. 4.237. One can also use the spatial frequency dependence of the SNR in images to estimate the spatial resolution. With a single image, one can often see a power law rolloﬀ of signal versus spatial frequency as given by Eq. 4.97, while uncorrelated pixeltopixel noise ﬂuctuations give rise to a “ﬂat” power spectrum as discussed in Section 4.3.4. Thus one can estimate the spatial frequency ur,S=N at which the signal trend reaches the noise ﬂoor, and obtain a simple but useful measure of the halfperiod spatial resolution of 1 . (4.251) δr,S=N 2ur,S=N such as is shown in Fig. 4.49. One can obtain an even better measure if, like with Eq. 4.249, one has two independently measured images of the same object. The information about the object imaged Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
176
Imaging physics
250 nm
100 nm 50 nm
25 nm
10 nm
106 20 ms 50 ms 100 ms 200 ms
200 ms
Intensity (a.u.)
105
100 ms
104 103 102 101
50 ms
20 ms 0
500 nm
100
200 ms 20 ms 1
2
5
10
20
50
6SDWLDOIUHTXHQF\ѥP1) 0.65 (rad)
Figure 4.73 Image resolution versus exposure time, or ﬂuence F on the specimen. In this case, four separate phase contrast images were acquired of a microfabricated test pattern using 5.2 keV X rays, with continuous scanning and perpixel dwell times ranging from 20 ms to 200 ms as indicated. The larger, lowspatialfrequency features in the object have almost identical appearance in the four images, but ﬁner, highspatialfrequency features become more apparent as one increases the photon ﬂuence F as evidenced by increasing reconstructed feature power at spatial frequencies above 20 μm−1 . These images were obtained using xray ptychography (Section 10.4) in which the usual noise “ﬂoor” in the power spectrum (such as is shown in Figs. 4.19 and 4.49) is removed by the reconstruction process. Figure adapted from [Deng 2015a].
should be highly correlated, while the noise should not. In this case, one can measure the correlation of the phase of the Fourier transforms of the images as a function of spatial frequency in a method called Fourier ring correlation (FRC) or, in 3D, a Fourier shell correlation (FSC) [Saxton 1982]. When considering a speciﬁc range ur,i (or shell, or ring) of spatial frequencies, the Fourier shell correlation FSC12 between images 1 and 2 is given by [van Heel 2005] † ur ∈ur,i F 1 (r) · F 2 (r) , (4.252) FSC12 (ur,i ) = 2 2 ur ∈ur,i F 1 (r) · ur ∈ur,i F 2 (r) where at the same time one can calculate the number of pixels contained in the Fourier shell ur,i as n(ur,i ). As was shown in the discussion of image power spectra in Section 4.3.4, at low spatial frequencies one can expect a strong correlation due to the same object information being present in both Fourier transforms, while at high spatial frequencies one will see poor correlation between two instances of uncorrelated noise. A commonly used criterion for estimating the spatial resolution in a way that is consistent with acceptable correlations in crystallography datasets is to use a socalled 1/2 bit criterion, or a T 1/2 bit threshold value of [van Heel 2005] 0.2071 + 1.9102/ n(ur,i ) T 1/2 bit (ur,i ) = . (4.253) 1.2071 + 0.97102/ n(ur,i ) Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.8 Image statistics, exposure, and dose
177
Xray beam direction Overlaying material
Underlaying material
Figure 4.74 Imaging of a feature of material f in a slab of background material b, with a
thickness tb,o of overlying material. The feature has a width of Δr , and the feature/background slab has a thickness t. This simple model is used to estimate the exposure required for imaging with a given SNR.
The center of the spatial frequency shell ur,i,1/2 bit at which FSC12 (ur,i,1/2 bit ) = T 1/2 bit (ur,i,1/2 bit )
(4.254)
then gives a halfperiod spatial resolution δr,1/2 bit of δr,1/2 bit =
1 2ur,i,1/2 bit
.
(4.255)
The FRC (2D) and FSC (3D) method has sometimes been used to evaluate the resolution of images obtained using iterative reconstruction methods of the type discussed in Chapter 10. If one compares two reconstructions from separate, statistically independent datasets then the FSC/FRC method applies directly as illustrated in Fig. 11.12. If instead one compares diﬀerent random iterative reconstruction starts from one dataset, the FSC/FRC interpretation is not directly applicable, though one can still gain insights on the reliability of iterative phasing using the phase retrieval transfer function (PRTF) of Eq. 10.34.
4.8.4
Estimating the required photon exposure From the deﬁnition of the contrast parameter Θ given in Eq. 4.238, we can produce estimates of the number of photons n¯ required to image a feature based on image intensities produced with, and without, the feature present. We present here a simpliﬁed discussion of this topic which is explored in greater detail elsewhere [VillanuevaPerez 2016, Du 2018]. Consider a slab of a background material b with thickness t. Within this slab, we want to see if a given pixel of width Δr is composed of the feature material f or not (Fig. 4.74). Let us also consider an overlying thickness tb,o of the background material. Using absorption contrast imaging with a 100 percent eﬃcient optical system with 100 percent contrast transfer at the desired resolution scale, the image intensity with the
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
178
Imaging physics
Box 4.8 Modeling protein for biological imaging estimates For estimates of contrast, exposure, and dose when imaging biological systems, one needs to have an estimate of the linear absorption coeﬃcient μ = 2kβ and the phase shift per wavelength δ of organic material in a cell (Eqs. 3.67 and 3.82). While absorption coeﬃcients for protein were given in a pioneering early estimate of image contrast and dose in xray microscopy [Sayre 1977b, Sayre 1977a], the assumed composition was not detailed. Soon after, when there was speculation of developing xray lasers pumped by nuclear weapons [Broad 1986], Johndale Solem of Los Alamos National Laboratory wrote a fascinating technical report considering the possibilities of using such a laser for xray holography of living cells [Solem 1982a], which led to several followon publications [Solem 1982b, Solem 1984, Solem 1986]. Solem’s work involved signal estimates from xray scattering from protein, but the assumed composition of proteins was still not made explicit. With the advent of laserdriven xray laser [Matthews 1985, Suckewer 1985], researchers at Lawrence Livermore National Laboratory, including Jim Trebes, Louis DaSilva, and Richard London, studied their potential use for biological xray microscopy [London 1989, DaSilva 1992]. As part of this work, London made the reasonable assumption that the stoichiometric composition of a representative protein can be described from the average of all 20 amino acids, leading to a stoichiometric composition of H48.6 C32.9 N8.9 O8.9 S0.6 with a density when dry of 1.35 g/cm2 [London 1989]. The protein content of cells varies by type, but 25 percent protein in water is typical [Fulton 1982, LubyPhelps 2000].
feature present is given from the Lambert–Beer law (Eq. 3.76) by I f ,abs = I0 exp[−μ f t] exp[−μb tb,o ],
(4.256)
while the intensity in the background regions is Ib,abs = I0 exp[−μb t] exp[−μb tb,o ] with thinsample and nooverlyingbackground approximate expressions of # t$ I f ,abs I0 1 − 4πβ f λ and
# t$ Ib,abs I0 1 − 4πβb λ
(4.257)
(4.258)
(4.259)
giving a thinsample expression of t I f ,abs − Ib,abs  = 4π β f − βb . λ
(4.260)
Using a unitnormalized incident ﬂux I0 = 1 to be scaled with mean incident photon Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.8 Image statistics, exposure, and dose
179
number n¯ , the contrast parameter Θ of Eq. 4.238 becomes I f − Ib   exp[−μ f t] − exp[−μb t] = Θabs = exp[−μb tb,o /2] I f + Ib exp[−μ f t] + exp[−μb t] √ t 2π 2 β f − βb  λ
(4.261) (4.262)
where in the last expression we have used the limit of a thin specimen (so e−x 1 − x) and no overlying background material [Hornberger 2006]. If we instead use the Zernike phase contrast signal intensities given by Eqs. 4.224 and 4.225, we arrive at √ t (4.263) Θzpc 2π 2 δ f − δb  λ while if we use the diﬀerential phase contrast signal intensity of Eq. 4.229 we arrive at Θdpc =
4 t δ f − δb  1.22 λ
(4.264)
√ because in the diﬀerential phase contrast case the denominator term of I f + Ib is 1 √ since Ib,dpc = 0. (Other approaches for Θdpc have yielded a prefactor of 2 instead of 4/1.22 [Hornberger 2006]). As can be seen, all three contrast parameters are quite similar, except for the use of the absorptive β or phaseshifting δ parts of the refractive index of n = 1 − δ − iβ. If we require SNR = 5 to meet the Rose criterion of Eq. 4.240, the required number of incident photons per pixel n¯ is n¯ =
SNR2Rose . Θ2
(4.265)
We can therefore calculate the number of photons required for detecting the presence or absence of a feature using the thinsample approximate expressions for each of the three contrast methods considered here. For absorption contrast, we have n¯ abs
1 1 25 λ2 25 1 = , 8π2 t2 β f − βb 2 8π2 λ2 t2 α f f2, f − αb f2,b 2
(4.266)
for Zernike phase contrast we have n¯ zpc
1 1 25 λ2 25 1 = 2 22 , 2 2 2 8π t δ f − δb  8π λ t α f f1, f − αb f1,b 2
(4.267)
and for diﬀerential phase contrast we have n¯ dpc
1 1 25 · 1.222 λ2 25 · 1.222 1 = . 2 2 2 2 16 16 t δ f − δb  λ t α f f1, f − αb f1,b 2
(4.268)
In each case, the latter expression uses δ + iβ = αλ2 ( f1 + i f2 ) as given by Eqs. 3.65 and 3.67, along with α ≡ re na /(2π) as given in Eq. 3.66. The latter expressions make the scaling with wavelength more explicit. The similar forms of these three expressions allow us to make a few statements about exposure requirements in xray microscopes: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
180
Imaging physics
• Absorption contrast relies on diﬀerences in the absorptive part of the refractive index β, or of the complex number of oscillation modes f2 , while both phase contrast methods rely on the phaseshifting terms δ or f1 . As shown in Fig. 3.16, the phaseshifting terms f1 maintain large numerical values at higher xray energies E, while f2 declines approximately as 1/E 2 . This means phase contrast is increasingly favored over absorption contrast at higher photon energies (though one must eventually consider Compton scattering at high photon energies). • These calculations assume 100 percent detective quantum eﬃciency eﬃciency for the entire xray imaging system. In practice the eﬃciency is much lower. For example, when using zone plate optics downstream of the specimen in a fullﬁeld imaging system, one must account for the focusing eﬃciency of the zone plate which might be in the 5–15 percent range in practice (theoretical values are shown in Fig. 5.15). Hard xray imaging detectors based on scintillators and visible light lenses also show low eﬃciency, as discussed in Section 7.4.7, and there are additional detector statistical considerations as outlined in Section 7.4.1. • These calculations also assume 100 percent optical transfer of information at all spatial frequencies. This is not usually the case; for example, Fig. 4.45 shows how the OTF decreases in incoherent brightﬁeld imaging, leading to a higher exposure for feature sizes approaching the Rayleigh resolution limit. • As the expressions of Eqs. 4.266, 4.267, and 4.268 make clear, the photon exposure per pixel scales as the square of decreases of feature thickness t. For an isometric sample, the feature thickness is usually of the same scale as its lateral dimensions Δr . If we assume Δr = t, the radiation exposure scales as n¯ ∝ t−4 .
(4.269)
This apparent fourthpowerlaw scaling of the required number of incident photons per pixel n¯ also applies to radiation dose (Eq. 4.285), and is discussed further in Section 4.9.1. This scaling presents signiﬁcant challenges for highresolution imaging, as will be discussed in Chapter 11. • The prefactor terms of 1/(8π2 ) in Eqs. 4.266 and 4.267 are about seven times larger than the prefactor term of 1.222 /16 used in Eq. 4.268. However, the diﬀerential phase contrast estimate is for a feature with a size equal to the Rayleigh resolution limit, while the calculations for absorption and Zernike phase contrast do not account for any of the signal transfer losses that would otherwise appear in the OTF for imaging as noted above. This is likely to bring the prefactor terms closer to each other when imaging small features near the resolution limit of an xray microscope. These expressions also ignore noise sources other than those due to photon statistics. In spite of the simplistic nature of the exposure estimates provided in the above results, they provide a very helpful guide for understanding exposure requirements in xray microscopy (more detailed views are presented elsewhere [Schropp 2010c, VillanuevaPerez 2016, Du 2018]). In Fig. 4.75, we show a calculation of n¯ = 25/Θ2 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.8 Image statistics, exposure, and dose
1012
30 nm resolution for Cu in Si
30 nm resolution for protein in ice Absorption contrast
Absorption contrast
1010
8
ѥPWKLFN6L ѥP ѥ P
104 102
0.2
0.5
1
ѥP
Zernike phase contrast
2
5
Photon energy (keV)
10
20
108
ѥPWKLFNLFH
106
P ѥ
106
Incident photons per pixel
1010
Incident photons per pixel
1012
10
181
ѥP
Zernike phase contrast
ѥP
104 0.2
The “water window” 0.5
1
2
5
10
20
Photon energy (keV)
Figure 4.75 Photon exposure n¯ = 25/Θ2 for imaging 30 nm thick features of copper in silicon,
or protein in ice. These calculations used the exact expressions for image intensities of Eqs. 4.256 and 4.257 for absorption contrast, and Eqs. 4.223 and 4.223 for Zernike phase contrast with a nonabsorbing phase plate (slight improvements can be obtained if the phase plate is absorbing). The feature was assumed to be embedded in a background layer of silicon (for the materials science example) or ice (for the biological example) of 3, 10, 30, and 100 μm thickness, as indicated by the various color curves. The “water window” [Wolter 1952] spectral region between the carbon and oxygen K edges at 290 and 540 eV shows especially good contrast for hydrated organic materials. These calculations assume 100 percent eﬃciency for the xray imaging system, and no noise sources beyond photon statistics.
using the exact rather than thinsample approximate expressions for Θ for two example situations: 1. The imaging of 30 nm copper features in a background of overlying silicon with transmission exp[−μb tb,o ] (see Eqs. 4.256 and 4.261. This is an example of imaging materials science specimens. 2. The imaging of 30 nm thick protein features (see Box 4.8) in a background of ice. This is an example of imaging biological specimens. This calculation is done for a series of overlying thickness layers of the background material, to show the eﬀects of absorption on signal loss. It is done for absorption and Zernike phase contrast only, since the diﬀerential phase contrast expression is identical to Zernike phase contrast except for the diﬀerent numerical factor.
4.8.5
Imaging modes and diffraction The similarity of the expressions of required exposure for absorption (Eq. 4.266) and Zernike phase contrast (Eq. 4.267) point to several further important conclusions about the exposure required in xray microscopes. As discussed in Section 4.6, Babinet’s principle states that the optical waveﬁeld downstream from an illuminated object is the complement of the waveﬁeld produced by the complement of the object; that is, a pinhole and an absorptive disk produce complementary waveﬁelds which add up to the incident beam. This means that both absorptive
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
182
Imaging physics
and phase shifting objects lead to scattering of the beam, so that one can expect that a scattering experiment will have a unitnormalized image intensity that can be calculated from the combination of Eqs. 4.226 and 4.260 as t I f ,scat = 4π δ f − δb  + β f − βb  , (4.270) λ which is eﬀectively the combination of absorption and phase contrast. The relative strengths of absorption and diﬀerential phase contrast in scanning xray microscopy have also been considered using a slightly diﬀerent approach [Thibault 2009b]. A related conclusion concerns the relative merits of collecting images with a 100% efﬁcient lens, versus collecting diﬀraction patterns which must then be iteratively phased to yield an image as will be discussed in Chapter 10 (these iterative phasing procedures seem not to add any extra noise to the reconstructed image [Fienup 1978, Williams 2007b, Huang 2009a, Godard 2012]). The question of the dose eﬃciency of imaging versus diﬀraction was considered by Richard Henderson [Henderson 1995] in the context of electron microscopy. Henderson stated: It can be shown that the intensity of a sharp diﬀraction spot containing a certain number N √ of diﬀracted quanta will be measured with the same accuracy ( N) as would the amplitude (squared) of the corresponding Fourier component in the bright ﬁeld phase contrast image that would result from interference of this scattered beam with the unscattered beam [Henderson 1992]. The diﬀraction pattern, if recorded at high enough spatial resolution, would therefore contain all the intensity information on Fourier components present in the image. It would lack only the information concerning the phases of the Fourier components of the image which are of course lost. Thus, for the same exposure, holography should be equal to normal phase contrast in performance, and diﬀraction methods inferior because of the loss of the information on the phases of the Fourier components of the image.
Let us consider Henderson’s conclusion in the context of assuming that a signal with amplitude b at an object pixel is scattered to reach a certain detector pixel, with no other signal present. We can then use Eq. 4.238 to calculate a contrast parameter Θ for this diﬀraction experiment of I f − Ib  b2 − 0 = b. = √ Θdiﬀraction = I f + Ib b2 + 0
(4.271)
The number of photons n¯ that we must then illuminate the object pixel with is found from Eq. 4.237 to be n¯ diﬀraction =
SNR2 SNR2 = . Θ2 b2
(4.272)
Now let’s consider the case of imaging where this scattering amplitude is mixed with a strong illumination wave a at a detector pixel (this happens naturally in imaging systems; see for example Fig. 4.65). We obtain the maximum signal diﬀerence when the phasors for a and b are parallel (we’re looking for the presence of the amplitude b, rather than its phase shift which was what we sought in Zernike phase contrast as illustrated in Fig. 4.64). Therefore we’ll assume that amplitudes a and b are both real and positive. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.9 From exposure to radiation dose
183
We then have an image intensity when the scatterer is present of I f = (a + b)(a + b)† = a2 + b2 + 2ab,
(4.273)
and we have Ib = a2 when the scatterer is absent. The contrast parameter Θ is then I f − Ib  (a2 + b2 + 2ab) − (a2 ) b2 + 2ab = = √ Θimaging = I f + Ib 2a2 + b2 + 2ab (a2 + b2 + 2ab) + (a2 )
(4.274)
where we have used the binomial approximation of Eq. 4.25. Assuming the scattering wave to be weak compared to the illumination, we have b a and b2 ab, so that we can approximate Eq. 4.274 as √ √ 2ab b b (4.275) = 2√ 2 Θimaging √ 2 1 + b/(2a) 1 + b/a 2a + 2ab from which we calculate the required exposure as n¯ imaging =
SNR2 SNR2 2 . 2 Θ 2b (1 − b/a)
(4.276)
We therefore ﬁnd a ratio of n¯ imaging b b2 1 = 2 = (1 + ) n¯ diﬀraction 2b (1 − b/a) 2 a
(4.277)
so that the mixing of a strong reference signal in with the diﬀracted signal – which happens in imaging – oﬀers a reduction in the required exposure n¯ of only a factor of about 2. When the actual focusing eﬃciency of xray objective lenses is considered, diﬀraction in fact becomes more favorable. This conclusion follows through for ﬂuence F, and for radiation dose. From these perspectives, one can say that scattering and coherent imaging experiments involve a required exposure that is well approximated by the lesser of the absorption and phase contrast exposures shown in Fig. 4.75. The attainable spatial resolution is limited by the ﬂuence F (photons per area) on the specimen, and the contrast of features in the specimen. Detailed comparisons between diﬀerent xray microscopes then involve the eﬃciency of whatever optics and detectors lie downstream of the specimen in the illumination path.
4.9
From exposure to radiation dose X ray photons have energies far in excess of the 1.5–11 eV energy of chemical bonds (Box 3.2), so that xray exposure can cause radiation damage in the specimen being imaged. While the mechanisms of radiation damage are discussed in more detail in Section 11.2, we will concern ourselves here with the quantity that is used to calculate the magnitude of damage eﬀects: the radiation dose. Dose is given by the energy absorbed per mass, leading to units of 1 gray ≡ 1
joule kilogram
and
1 rad ≡
100 erg gram
(4.278)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
184
Imaging physics
1012
QPUHVROXWLRQIRU&XLQ6L
1014
QPUHVROXWLRQIRUSURWHLQLQLFH
ѥPWKLFN6L
Dose in Gy
P ѥ
1010
ѥP
108
Absorption contrast
1012
Absorption contrast
ѥP
ѥPWKLFNLFH
ѥ
106
104
0.2
10
Zernike phase contrast 0.5
1
2
5
10
20
6
3.0 P
0.2
P
P ѥ
108
Dose in Gy
1010
ѥP
The “water window” 0.5 1 2
Photon energy (keV)
Zernike phase contrast 5
10
20
Photon energy (keV)
Figure 4.76 Radiation dose in Gy for imaging 30 nm thick features of copper in silicon, or protein in ice. These calculations are based on the photon exposures shown in Fig. 4.75; the dose to the feature (Cu at left, or protein at right) is calculated assuming that the feature is in the middle of the overlying background material of thickness as speciﬁed. These calculations assume 100 percent eﬃciency for the xray imaging system, and no noise sources beyond photon statistics.
so that 1 gray = 100 rad, with gray being the preferred unit in the modern literature (the unit, abbreviated as Gy, is named after the British radiobiologist Louis Harold Gray). When it comes to eﬀects in living systems, one must apply a slight correction called the relative biological eﬀectiveness (RBE; also called a radiation weighting factor WR ) which is 1 for X rays and electrons, 2 for protons, and 20 for alpha particles and heavy ions [Valentin 2007, Table 2]. This leads to a second unit called the “dose equivalent” or “exposure” H which is given by H = D · RBE,
(4.279)
which is in sieverts in SI units (the unit, abbreviated Sv, is named after the Swedish medical physicist Rolf Sievert). An earlier term for exposure is the REM (which stands for the rather awkward phrase “R¨ontgen Equivalent Man”), where 1 Sv = 100 REM. As will be discussed in Section 11.2.3, the LD50 measure for human exposures to radiation is about 4 Sv; that is, if a set of people receive a radiation exposure of 4 Sv, about half of them will die even if given basic medical care. For an object exposed to an xray ﬂuence F (Section 7.1.1) of n¯ incident photons per area Δ2r , the “skin dose” (the dose imparted to the upstream, beamfacing surface of the material) can be found from considering the energy deposition per thickness dE/dx. Since the intensity declines according to the Lambert–Beer law (Eq. 3.76) as I = I0 exp[−μx], the decrease in intensity per thickness x is given by dI = −μ. dx
(4.280)
With n¯ photons each of energy E, the skin dose D (energy deposited per mass) is then Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.9 From exposure to radiation dose
185
12
Water window
7.5
11 10 9 8
200
100 12
70
30 20
7.5
9
10
7
9 11
5
10 9
8
8
10
12
,FHWKLFNQHVVѥP
50
7.5
8 7.5
2
7.5
3 7
7
Contour N: dose in 10N Gy for xray imaging of 10 nm protein in amorphous ice
7
1 0.1
0.2 0.3
0.5 0.7 1
2
3
5
7 10
20
3KRWRQHQHUJ\NH9 Figure 4.77 Contour plot and image of the radiation dose estimated for 10 nm resolution imaging of protein features in ice, as a function of xray energy and of ice thickness. The contour lines are labeled as the power of 10 of radiation dose in Gy; that is, a contour line of 9 means 109 Gy. These contour lines lie above a grayscale representation of dose with lower dose as dark grey and higher dose as light grey. This ﬁgure shows that the “water window” spectral region between the carbon and oxygen K absorption edges oﬀers the lowest dose for samples up to a few micrometers thick, and that for thicker specimens phase contrast at several keV hard xray photon energies begins to oﬀer radiation dose advantages [Wang 2013, Du 2018]. Figure adapted from [Du 2018].
given by the fraction of photons lost from the transmitted beam due to absorption, or D = n¯
dI E Eμ = n¯ 2 dx ρΔr ρΔ2r
(4.281)
where ρ is the density of the absorbing material. One can also make use of the expression for μ from Eq. 3.45, atom number density na from Eq. 3.21 (with Section 3.3.5 describing the calculation for molecules and mixtures), and α as deﬁned in Eq. 3.66 to write the skin dose as α (4.282) D = n¯ 4π hc 2 f2 ρΔr where hc 1240 eV·nm as given in Eq. 3.7. The expression of Eq. 4.282 gives one the impression that radiation dose decreases at higher xray energies E, due to the fact that f2 declines as about E −2 (see Fig. 3.16). However, one must also account for the number of photons n¯ required to see a small structure. If we use the expression of Eq. 4.266 for the required number of photons n¯ abs for absorption contrast microscopy, we can write the necessarily delivered skin dose D Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
186
Imaging physics
absorbed by the feature (thus using α f , ρ f , and f2, f in Eq. 4.282) as Dabs =
α f f2, f 25 E 2 1 . 2π hc ρ f Δ2r t2 α f f2, f − αb f2,b 2
(4.283)
while from Eq. 4.267 the dose for Zernike phase contrast imaging is given by Dzpc =
α f f2, f 25 E 2 1 . 2 2 2π hc ρ f Δr t α f f1, f − αb f1,b 2
(4.284)
Let’s look in particular at the expression for phase contrast imaging at multikeV photon energies. At these energies, f2, f ∝ E −2 , while f1, f Z f and f1,b Zb , as was shown in Fig. 3.16. Thus we see that the E 2 term and the E −2 scaling of f2, f largely cancel each other out, so at higher energies the radiation dose for Zernike phase contrast imaging tends toward a constant value. This can be seen in a more detailed numerical calculation for imaging copper in silicon, and protein in ice, in Fig. 4.76; it shows that absorption contrast for thin features tends to be minimized at lower photon energies, while the dose required for phase contrast imaging is less strongly biased toward low energies. Another view of the same type of calculations is shown in Fig. 4.77. Finally, more detailed calculations are presented elsewhere [Du 2018] which include eﬀects such as inelastic scattering, which become of increasing importance at higher energies.
4.9.1
Dose versus resolution The dose expressions of Eq. 4.283 and 4.284 reveal something else very important about xray microscopy. In both cases, the dose scales as the inverse of pixel size squared and thickness squared, or 1 (4.285) D∝ 22 Δr t (see also Eq. 4.269). Reallife features often have a width Δr that matches their thickness t, so we see that the radiation dose (which, of necessity, must be imparted to image features of a certain size) tends to scale with the fourth power of decreases in their size. That is, to double of the spatial resolution, one must increase the xray ﬂuence F and radiation dose by a factor of 24 = 16. This leads to challenges in radiation dose imparted to the specimen, as shown in Fig. 11.7 and discussed further in Section 11.3.4. In fact the story may not be quite so simple. In examination of power spectra of images such as Figs. 4.19 and 4.49, or coherent diﬀraction intensity recordings such as will be discussed in Chapter 10, we have always seen that there is a power law decline in Fourier plane power of I(ur ) ∝ u−a r , as indicated in Eq. 4.97. However, the power law parameter a varies from example to example: • In Fig. 4.19, we found a = 2.95. Of course, this is from a visiblelight photograph, rather than a xray micrograph. • In xray ﬂuorescence images of trace element distributions in frozen hydrated cells [Deng 2015c] as shown in Fig. 4.49, values of a have varied from 2.78 for sulfur, to 2.81 for phosphorous, to 2.94 for potassium.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.10 Comparison with electron microscopy and microanalysis
187
The power spectrum from an image is calculated by taking the Fourier transform of the images’ transmission signal and then squaring it (Section 4.3.4). This should be identical to the farﬁeld or Fraunhofer diﬀraction pattern of the object (Section 4.3.5). For farﬁeld diﬀraction patterns, again a power law relationship seems to hold, with power law parameters a as follows: • In 0.75 keV soft xray coherent diﬀraction imaging experiments with plungefrozen, freezedried yeast cells, a value of a 2.6 over spatial frequencies of 1–20 μm−1 and a 4.2 over spatial frequencies of 20–50 μm−1 has been observed [Shapiro 2005]. In 7.9 keV xray nanodiﬀraction experiments, a value of a = 3.19 was observed from initially living hydrated ﬁbroblasts while a value of a = 3.89 was observed from chemically ﬁxed, hydrated cells in the spatial frequency range 200– 500 μm−1 [Weinhausen 2014]. • Calculations based on scattering from a complex electron density for cubic 3D objects lead to a value of a = 4 [Howells 2009, VillanuevaPerez 2016]. An earlier paper obtains values of a = 3 for constant sample volume, and a = 4 for the case of spherical 3D objects [Shen 2004]. • In simulations of coherent diﬀraction patterns resulting from various objects, values of a range from 3.3 for randomthickness protein distributions and simulated cells [Huang 2009a], to 3.83 for 2D and 3.95 for 3D gold particles that are approximately spherical [Schropp 2010c]. • In smallangle xray scattering (SAXS), a cosine expansion of a spherical function gives dominant terms with a = 2 and a = 4. Departures from a pure spherical shape enhance the a = 4 term. While of course there are diﬀerences between the SAXS patterns of diﬀerent characteristic object shapes such as spheres and rods (see for example [Svergun 2003, Fig. 5]), it is generally observed that the SAXS intensity follows a I(q) q−4 dependence, with a relationship between momentum transfer q and spatial frequency u as given in Eq. 4.57. This q−4 dependence is sometimes referred to as Porod’s law [Porod 1982]. So what is the real answer? The safest one is to say there may be some dependence on the specimen, and the measurement method, but that the scaling is usually between the inverse third and fourth power (that is, a is between 3 and 4 in Eq. 4.97). One way to ﬁnd the scaling for a particular specimen and imaging mode is to acquire images with several diﬀerent exposure times, and measure the dosedependence of the achieved spatial resolution as measured using power spectra estimates (Eq. 4.97 and Fig. 4.73) or the Fourier ring correlation (FRC) approach given in Eqs. 4.252 and 4.253.
4.10
Comparison with electron microscopy and microanalysis This is a book about xray microscopy, so it is quite natural that we have not said much about microscopies using other radiation. Nevertheless, it is important to understand xray microscopy in the context of other techniques so that one can use the right microscope type for the imaging task at hand.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
188
Imaging physics
4.10.1
Elemental mapping In Section 4.4.3, we included brief comments on superresolution light microscopy which, along with conventional ﬂuorescence light microscopy, allows one to image speciﬁc ﬂuorophores in cells. These ﬂuorophores are detected by using shorter wavelength light to excite certain electronic transitions in atoms and molecules, with longer wavelength light used to detect the eventual response. Fluorescence light microscopy can be used to track the motion of single molecules within living cells, whereas in Chapter 11 we will see that xray microscopes are not able to take more than one image of a living cell due to radiation damage limitations. However, microscopies that can image total elemental content based on corelevel electron transitions oﬀer important complementary capabilities. Molecular ﬂuorophores require atoms to be in a speciﬁc chemical conﬁguration in order to be labeled and excited, while corelevel electrons are largely impervious to chemical bonds (Section 3.1.3); as a result, one can use corelevel electrons to measure the total content of a particular element independently of binding aﬃnities. In materials science, visiblelight ﬂuorescence plays a lesser role, and again corelevel electron transitions are often used to ﬁnd both major and minor constituents of a material. Corelevel electron transition energies can be approximated using the Bohr energy of Eq. 3.3, and the actual energies are shown in Fig. 3.2 and are well tabulated [Bearden 1967, Zschornack 2007]. One can remove electrons from these states using a probe with an energy above these binding energies, after which either xray ﬂuorescence or Auger electron emission results, as described in Section 3.1.1. Important diﬀerences arise when using diﬀerent energetic probes of diﬀerent types, however. Electrons are the lightest charged particle, and they can be focused to exquisitely small, sub0.1 nm focus spots in modern aberrationcorrected electron microscopes (and sub5 nm spots even in inexpensive scanning electron microscopes). However, when electrons enter a material they can sometimes swing around the strong point charge of an atom’s nucleus and produce a broad spectrum of continuum or Bremsstrahlung radiation, as will be shown in Fig. 7.4. This produces a background signal within which one must detect electronexcited xray ﬂuorescence emission. In addition, in thicker materials an electron beam begins to undergo signiﬁcant sidescattering (Fig. 11.4), so the volume from which xray ﬂuorescence is emitted begins to become larger than the lateral beam direction, thus aﬀecting the spatial resolution. Nevertheless, scanning electron microscopes equipped with energyresolving detectors (Section 7.4.10), known sometimes as electron microprobes as used for electron probe microanalysis (EPMA), serve as powerful, lowcost, compact laboratory instruments for imaging the distribution of various elements in a material [Jercinovic 2012, Rinaldi 2015]. Electron microscopes can be used for imaging corelevel electron transitions in another way. As an electron beam enters a thin specimen, some fraction of the electrons can undergo inelastic scattering where they transfer energy to bound electrons in the material. An electron spectrometer can then be used to measure the energy of these inelastically scattered electrons in a technique known as electron energyloss spectroscopy (EELS). Part of an EELS spectrum for 100 kV electrons in amorphous ice was shown in Fig. 3.15; the EELS spectra for both amorphous ice and Epon (a plastic embedding
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.10 Comparison with electron microscopy and microanalysis
101
189
22.4 eV
102 20.7 eV
Fraction/(0.1 eV)
3
10
Epon singlescatter (arbitrary fraction)
104 Amorphous ice
105
106 Amorphous ice singlescatter
7
10
0
100
200
300
400
500
600
700
Electron energy loss ƋE (eV)
Figure 4.78 Electron energy loss spectroscopy (EELS) measurement in amorphous ice and in Epon, a plastic embedding medium. This ﬁgure shows the zeroloss peak for amorphous ice, the plasmon resonances (shown in greater detail for amorphous ice in Fig. 3.15), and the carbon K (290 eV) and oxygen K (540 eV) edges in the energy loss spectra. For amorphous ice, the asmeasured spectrum shown in dark blue includes plural inelastic scattering eﬀects, while the light blue spectrum shows the singlescatter spectrum σinel (ΔE) calculated using a Fourierlog deconvolution approach [Johnson 1974, Wang 2009a]. Amorphous ice data courtesy of Richard Leapman, National Institutes of Health (similar to results shown in [Leapman 1995]), and Epon data courtesy of Ming Du, Qiaoling Jin, and Kai He, Northwestern University.
medium) are shown over a broader range in Fig. 4.78, where one can see a steplike rise in inelastic cross sections at 290 eV (carbon K edge in Epon) and 540 eV (oxygen K edge in amorphous ice). One can combine EELS spectroscopy with imaging systems either in scanning or in fullﬁeld (transmission) electron microscopes, in a technique that electron microscopists refer to as spectrum imaging [Jeanguillaume 1989, Hunt 1991b] and which xray microscopists refer to as spectromicroscopy (Section 9.1). EELS works best when the sample is no thicker than about one inelastic scattering mean free path Λ, and this distance will be shown in Fig. 4.80. For suﬃciently thin specimens, EELS can oﬀer exquisite sensitivity for trace element imaging of light elements [Aronova 2011]. Protons can also be used to excite corelevel xray ﬂuorescence in proton microprobes in a technique sometimes called protoninduced xray emission or PIXE. Here, the sensitivity for trace element imaging can be much higher, since protons have about 2000 times the mass of electrons and thus produce dramatically lower levels of continuum radiation or Bremsstrahlung. However, the higher mass of protons means they also can transfer much more energy to the atom, including socalled “knockon” damage in which they go beyond ionizing (and thus disrupting) an atom’s electronic state but displace entire atomic nuclei as in a microscopic game of billiards. Still, the high sensitivity of proton microprobes (due to lower levels of continuum radiation) means that they play an important role in trace element analysis [Ryan 2000, Mulware 2015]. Finally, one can use X rays to remove innershell electrons by absorption and thus Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
190
Imaging physics
108 pf LƠ pf KƠ
109
ef LƠ
1010
1011 XabsN
1013
Xf LƠ
Xa M
eelsL
ee K ls
1012
Xa K
Mass•dose in joules
ef KƠ
Xf KƠ Xa L
10
20
30
40
50
60
70
80
90
Atomic number Z Figure 4.79 Sensitivity–radiation dose product for the detection of a given mass of trace elements. The plot shows methods using X rays, electrons, and protons (pf , ef , and Xf ) to generate xray ﬂuorescence at various characteristic lines Kα and Lα. Xray diﬀerential absorption contrast Xa is also shown at various absorption edges K, L, M, and N. Finally, electron energy loss spectroscopy (EELS) is shown as eels for K and L edges. These estimates assume that the mass m of the element to be detected is present as a trace quantity in a “matrix” or majority material that is organic, and the dose D is as delivered to this organic matrix. The diﬀerential xray absorption curves Xa were calculated for an areal mass density of m/Δ2r = 10−7 grams/cm2 , with a scaling as (m/Δ2r )−1 for other areal densities with a pixel size of Δr . Figure redrawn from [Kirz 1980b].
excite the production of xray ﬂuorescence in scanning xray ﬂuorescence microscopes or xray microprobes. While this will be described in more detail in Chapter 9, for the moment we note that because xray photons have no electrical charge they do not swing around nuclei so no continuum radiation is produced. As a result, xrayexcited xray ﬂuorescence spectra show a very high peaktobackground ratio so that very high sensitivity is obtained for studies of trace elements in materials (Section 9.2.3). In addition, the xray beam itself does not undergo blurring due to sidescattering, since the cross section for absorption of X rays is so much higher than the elastic or inelastic scattering cross section at most energies, as shown in Fig. 3.10. The relative merits of using protons, electrons, or X rays to stimulate xray ﬂuorescence for elemental analysis are discussed in greater detail in a number of older papers [Birks 1964, Birks 1965, Cooper 1973, Goulding 1977, Horowitz 1978, Kirz 1978, Kirz 1980a, Grodzins 1983a] and a more recent review [Janssens 2000a]. Of particular note was the ﬁrst demonstration of xray ﬂuorescence analysis using synchrotron radiation at the Cambridge Electron Accelerator [Horowitz 1972]. A more detailed inDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.10 Comparison with electron microscopy and microanalysis
191
troduction to the topic was provided by Sparks [Sparks 1980], and several other papers introduced the gains in elemental detection sensitivity available using synchrotron radiation [Sparks Jr. 1979, Gordon 1982]. Accurate predictions of sensitivity, required exposure, and associated radiation dose depend strongly on the details of the sample and the experimental setup (with further details for xrayinduced xray ﬂuorescence microscopy provided in Section 9.2), but one comparison [Kirz 1980b] which considers radiation dose is shown in Fig. 4.79. This latter comparison included estimates for electron energyloss spectroscopy, as will be discussed in the following section, and diﬀerential xray absorption, as will be discussed in Section 9.1.1. As the ﬁgure shows, X ray induced xray ﬂuorescence oﬀers the best combination of sensitivity and minimal radiation dose for elements heavier than about Z = 20 (calcium), and EELS and xray diﬀerential absorption are better for lighter elements only if one can prepare suﬃciently thin specimens. Other trace element mapping techniques such as laserablation mass spectrometry [Becker 2014] or nanoscale secondary ion mass spectrometry (SIMS) [Moore 2011] oﬀer high sensitivity but are destructive to the specimen. Surveys of these and other trace element detection methods exist [Brown 2005].
4.10.2
Transmission electron microscopy Electron microscopes have a fundamental advantage of small wavelength relative to xray microscopes, which in most cases leads to higher spatial resolution. For an electron with a kinetic energy Ek given by acceleration over a given voltage V (for example, V = 300 kV for a Ek = 300 keV electron), the Lorentz factor γ of special relativity is given by 1 Ek = (4.286) γ =1+ me c 2 1 − β2 where Einstein’s famous formula gives me c2 = 511 keV as energy associated with the rest mass of an electron and v (4.287) β = = 1 − 1/γ2 c is the electron’s velocity v relative to the speed of light c. The relativistic momentum is given by p = γβme c, so that the electron’s de Broglie wavelength (Eq. 3.5; see also Eq. 3.7 for the numerical value of hc) becomes hc hc h = = p pc γβme c2 1240 eV · nm , = γ 1 − 1/γ2 · (511 × 103 eV)
λ=
(4.288)
or 0.0037 nanometers for a 100 keV electron. Therefore the spatial resolution in electron microscopes is not limited by the electron wavelength, and electron lenses with low numerical aperture can still show very high resolution. Instead, the spatial resolution of electron microscopes is usually limited by spherical aberrations in electron Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
192
Imaging physics
600
Mean free path ƒ (nm)
500
400
Ice 300
ƒel
ƒ inel
Protein ƒel ƒ inel
200
ƒ inel ƒel Si
100 0 80
100
200
300
400
Electron energy (keV) Figure 4.80 Electron mean free paths Λel and Λinel for elastic and inelastic scattering,
respectively, in protein, ice, and silicon as calculated using approximate formulae [Langmore 1992]. The shorter mean free path for inelastic scattering in protein and ice means that this interaction is more probable than elastic scattering; each inelastic scattering event involves a mean energy deposition ΔE (Eq. 4.290) of about 40 eV in soft materials. Inelastic mean free paths for lowenergy electrons in several materials are shown in Fig. 6.9.
optics to about 0.15 nm, though aberration correction systems [Hawkes 2015] are now making it possible to achieve a spatial resolution of 0.05 nm or better (with electron ptychography – discussed in Section 10.4 – also delivering sub0.1 nm resolution images [Jiang 2018]). The very high resolution of electron microscopy only applies to thin materials, and (like with xray microscopy) one must be concerned with radiation damage limitations especially for studies of soft and biological materials, as discussed in Chapter 11. One can use the same considerations of image contrast and required quanta as discussed in Section 4.8.1 to understand exposure requirements in electron microscopy, and go on to estimate the relationship between image resolution, specimen thickness, and required radiation dose. However, electron interactions are somewhat diﬀerent than xray interactions with materials: • Whereas absorption dominates over elastic scattering in xray interactions as shown in Fig. 3.10, electrons are rarely absorbed but instead undergo plural elastic and inelastic scattering in materials. One can obtain estimates for atomic cross sections σ and mean free paths Λ (as we have done for X rays in Section 3.2) to about 15 percent accuracy over the electron kinetic energy range of interest (30–300 keV) using some simple expressions [Langmore 1992] that extend earlier work [Langmore 1973, Wall 1974]. The resulting mean free paths shown in Fig. 4.80 indicate that electrons are strongly interacting. One can use these exDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.10 Comparison with electron microscopy and microanalysis
300 kV electrons in Si
1
300 kV electrons in amorphous ice
1
Single scattered [1el] 0.01
0.001 0
In
el
Fraction of intensity
ered catt Uns scat] [no
Fraction of intensity
Scattered beyond objective aperture [out]
0.1
as
tic
al
ly
sc
at
te
re
d
[in
el
Plural scattered [multel]
500
1000
]
1500
2000
Si thickness in nm
193
0.1 Single scattered [1el]
Scattered beyond objective aperture [out] Inela stic ally sca ttere d [in el] Unscattered [noscat]
0.01 Plural scattered [multel] 0.001 0
1000
2000
3000
Amorphous ice thickness in nm
Figure 4.81 Electrons sorted out into interaction categories for two materials: silicon, and amorphous ice (ρ = 0.92 g/cm3 ), which the latter of which is the background material when imaging hydrated biological specimens. This plot was calculated as described elsewhere [Du 2018] for electrons with a kinetic energy of 300 keV, representing the energy at which most intermediate voltage electron microscopes (IVEMs) operate. The electrons scattered outside the acceptance of the objective aperture [out] lead to a general darkening of the image with increased sample thickness, while the inelastically scattered electrons [inel] have had their energy and thus de Broglie wavelength changed; as a result, they contribute an outoffocus “haze” to the recorded image unless they are removed by an incolumn energy ﬁlter (delivering a socalled “zero loss” image [Schr¨oder 1990]). This usually sets the limit on overall sample thickness, though for radiationhard specimens one can use highangle darkﬁeld (HADF) imaging for thicker specimens, though with a loss in spatial resolution [Ercius 2006]. Phase contrast is obtained by mixing the unscattered and singly elastically scattered electrons [noscat and 1el]. Figure adapted from [Du 2018].
pressions in a transport model to understand as a function of specimen thickness what fraction of electrons are unscattered, what fraction undergo single and plural elastic scattering, what fraction have their energy (and thus focusing properties with electron optics) changed due to inelastic scattering, and what fraction are scattered to large angles [Langmore 1992, Du 2018]. For molecular resolution imaging, one obtains phase contrast images from the interference of unscattered and single scattered electrons, and one can see from Fig. 4.81 that this interference declines rapidly for specimen thicknesses in the hundred nanometer range. • At coarser resolution scales than for molecular imaging, one can estimate a refractive index for electron interactions in media based on the inner potential [Bethe 1928, Lenz 1954, Wyrwich 1958], whereby electrons are sped up in a media as they “see” the strongly concentrated positive charges of the nuclei of atoms (the atoms’ electrons appear more diﬀuse to highenergy electrons). This inner potential Ui leads to a refractive index [Reimer 1993, Eq. 3.20] of n=1−
Ui E + me c2 , E E + 2me c2
(4.289)
which is about n = 1 + 1.6 × 10−5 for carbon at 300 keV, based on measurements of Ui −7.8 eV obtained using electron biprism interference experiments [Keller Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
194
Imaging physics
1961]. Thus one has phase contrast in this case with a strength that is somewhat comparable to that seen in xray microscopy where n = 1 − δ (Eq. 3.67). • Unlike the case with X rays, electrons are not simply absorbed with the deposition of all of their energy in one interaction. Instead, they undergo inelastic scattering with a loss per inelastic scattering event that can be calculated from an EELS measurement of the inelastic scattering energy spectrum with cross section, such as was shown in Fig. 4.78. From such a spectrum, one can use Eq. 4.10 to calculate the average energy deposited per inelastic electron scattering event ΔE as 1∞ (ΔE) σinel (ΔE) d(ΔE) , (4.290) ΔE = 0 1 ∞ σinel (ΔE) d(ΔE) 0 provided σinel has been corrected for plural inelastic scattering eﬀects using an approach such as Fourierlog deconvolution [Johnson 1974, Wang 2009a]. Applying this to the singleinelastic scattering spectra shown in Fig. 4.78 yields ΔE = 39.3 eV for amorphous ice, and ΔE = 38.6 eV for Epon (similar to earlier measurements showing ΔE 37 eV in nucleic acids [Isaacson 1973, Isaacson 1975]). The energy ΔE is well above the energy of chemical bonds (Box 3.2), but it is also well below the energy deposited per xray absorption event; this has consequences for atomic resolution imaging of soft materials, as described in Box 4.9. • Once one has used the properties of electron interactions to calculate the required exposure n¯ TEM of electrons per pixel of width Δr , one can calculate the radiation dose DTEM based on the fraction of events where an electron undergoes inelastic scattering as n¯ TEM dE/dx n¯ TEM ΔE = . (4.291) DTEM = ρ Δ2r Δ2r Λinel ρ Using Λinel = σinel na and a parameterized approximation [Langmore 1992] for σinel for protein, Eq. 4.291 gives an estimate for the radiation dose associated with a 1 e− /nm2 exposure of 32, 21, and 17 kGy for 100, 200, and 300 keV electrons in protein, respectively. The damage caused to soft materials by these radiation doses will be discussed in Section 11.2.1. One can use these characteristics of electron interactions to estimate the required electron exposure and resulting radiation dose in electron microscopy.
4.10.3
A comparison of transmission imaging with electrons and with X rays The calculations above allow one to compare electron and xray microscopy for transmission imaging of organic material in a background of ice, representative of a biological specimen imaged at cryogenic temperatures (see Section 11.3). Detailed calculations of this sort have been described for absorption contrast in xray microscopy in a landmark pair of papers [Sayre 1977b, Sayre 1977a], and the xray calculations were then extended for phase contrast xray microscopy [G¨olz 1992, Jacobsen 1992a]. Early calculations for thick specimen imaging in electron microscopy are also available
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.10 Comparison with electron microscopy and microanalysis
195
Box 4.9 Electrons versus X rays for atomic resolution imaging In a seminal paper published in 1970, Breedlove and Trammel considered the limitations that radiation damage sets for atomicresolution imaging of molecules [Breedlove Jr. 1970] (other authors have since carried out a similar analysis [Mueller 1977, Henderson 1995]). Let us consider a simpliﬁed version of their analysis, for the case for X rays and electrons only. With electrons, Fig. 4.80 shows that the mean free path for 300 keV electrons in protein is Λinel = 250 nm for inelastic scattering (which deposits ΔE = 39 eV) and Λel = 420 nm for elastic scattering. This means that for each elastic scattered electron, one deposits about (420/250) · 39 = 66 eV of energy. Since one must scatter at least 25 electrons to detect the presence of an atom at a position with SNR = 5, this means at least 25 · (420/250) · 39 = 1, 600 eV (electrons) of ionizing energy is deposited per atom which is more than enough to break all of the atom’s bonds (the implications for the ultimate resolution in electron microscopy of organic materials are discussed in Box 4.9). With 10 keV X rays (where the wavelength is at atomic dimensions), the data plotted in Fig. 3.10 indicate that the cross section for absorption is about 13 times stronger than for elastic scattering, so by Babinet’s principle (Fig. 4.58) photon absorption will dominate the scattering process used to detect an atom. As a result one will have deposited about 25 · 13 · 10, 000 = 3, 300, 000 eV (X rays) of ionizing energy per atom in order to see it with 25 scattered photons. As a result, radiation damage prevents both electrons and X rays from obtaining atomicresolution images of soft materials, but the problem is 2,000 times worse for X rays. Of course there are ways around this fundamental limitation; in electron microscopy, single particle methods [Frank 1975a, Frank 1981, Frank 2002] are used to divide the radiation dose among many images of identical molecules, and this approach is being used in xray freeelectron laser experiments, as will be discussed in Section 10.6. Crystallography takes a similar approach by combining information from many identical molecules with the added advantage that they are all at identical orientations and arranged in regular spacing in a crystal lattice. At larger length scales, the relative merits of xray and electron microscopy are changed from this atomicresolution picture, so that xray microscopes become advantageous for specimen thicknesses beyond about 1 micrometer, as shown in Fig. 4.82.
[Grimm 1998]. A more recent paper uses the same set of assumptions for both xray and electron microscopy to provide a more direct comparison for transmission imaging [Du 2018]). In Fig. 4.82, we show this paper’s result of using Eq. 4.291 for electron microscopy, and Eq. 4.281 for xray microscopy, for the imaging of 10 nm resolution protein features in varying thicknesses of ice. This ﬁgure encapsulates a number of features of what we see as the relative merits of xray and electron transmission imaging: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
196
Imaging physics
1012 1011
10 keV “hard” X rays
109 108
0.5 keV “soft” X rays
U
105 0.05 0.1
U WH ILO \
R
Q
QH
106
\
UJ
H HQ
UJ
107
H ILOW
H
Dose in Gy
1010
QPUHVROXWLRQ SKDVHFRQWUDVW LPDJLQJRISURWHLQ in ice
300 keV electrons 0.3
1
3
10
30
100 200
,FHWKLFNQHVVLQѥP Figure 4.82 Radiation dose calculated for phase contrast imaging of 10 nm protein features in ice, as a function of ice thickness. This calculation includes 300 keV transmission electron microscopy with and without the use of an energy ﬁlter for “zero loss” imaging, 0.5 keV soft xray microscopy in the “water window” spectral region between the carbon and oxygen K absorption edges, and 10 keV hard xray microscopy. For ice thicknesses less than about 0.5 μm, electron microscopy imparts a lower radiation dose for highresolution imaging for the reasons discussed in Box 4.9; these thicknesses are compatible with imaging macromolecules and viruses, and even archaebacteria [Grimm 1998] and peripheral regions of eukaryotic cells [Medalia 2002]. However, xray microscopy oﬀers lower dose for imaging whole eukaryotic cells and tissues. A similar calculation result is shown elsewhere [Du 2018].
• For specimens with a thickness of a few hundred nm or less, electron microscopy offers lower radiation dose and higher spatial resolution for imaging at an equivalent resolution for the reasons described in Box 4.9. For biological applications, this applies to imaging macromolecules and even very large viruses [Xiao 2005, Xiao 2009]; archaebacteria [Grimm 1998] and peripheral regions of eukaryotic cells [Medalia 2002] are also accessible with electron microscopy. There may still be cases in which xray microscopy would be preferred for these smaller samples, such as if one wanted to include information on trace element distributions, as described in Section 4.10.1. • For specimens with a thickness of a micrometer or more, electron microscopy quickly becomes too diﬃcult due to the basic properties of electron interactions as shown in Figs. 4.81 and 4.82. Here, xray microscopy oﬀers the ability to image whole eukaryotic cells, as will be highlighted in Section 12.1. In materials science, xray microscopy can be used to image circuit features even within unthinned silicon wafers as will be described in Section 12.4. • For spectroscopic imaging of chemical binding states and electronic conﬁgurations in materials, one can compare EELS, and its nearabsorptionedge cousin of enDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
4.11 See the whole picture
197
ergy loss nearedge spectroscopy (ELNES), against xray absorption nearedge spectroscopy (XANES; see Section 9.1.2). Because the inelastic mean free path for electron scattering is often signiﬁcantly shorter than the absorption length for xray absorption 1/μ (Eq. 3.75), EELS oﬀers greater sensitivity when considering trace quantities in thin specimens (Fig. 4.79) and furthermore EELS allows one to study several chemical elements at one time provided the electron spectrometer has an appropriate combination of spectral range and resolution. However, ELNES spectra appear on a background of plural inelastic scattering from the plasmon modes (see Fig. 4.78), whereas xray interactions are largely free from multiplescattering complications (see Fig. 3.10). Energy deposition into the plasmon modes is dominant in EELS, and absent in xray absorption spectroscopy, so xray absorption spectroscopy oﬀers lower dose for innershell electron spectromicroscopy studies of light materials [Isaacson 1978, Rightor 1997], while plasmonmode EELS can be superior in some cases [Yakovlev 2008]. Our perspective is that electron and xray microscopy oﬀer important complementary capabilities.
4.11
See the whole picture In this chapter we have embraced the odd nature of light. We have used a wave description for imaging theories based on a Fourier grating decomposition of an object, and for how a waveﬁeld evolves as it reaches downstream planes. We have used a discrete photon model for estimating the illumination required to see a certain object, and the radiation dose imparted during imaging. As we said at the start of Section 3.3, we treat a photon as a particle on Mondays and Wednesdays, and as a wave on Tuesdays and Thursdays (allowing for threeday weekends, which are so common in the life of scientists). It is worthwhile then to conclude this chapter by reminding ourselves of a view of how these diﬀerent pictures work together. The count degeneracy parameter δc deﬁnes the number of photons per phase space area per coherence time [Goodman 2015, Sec. 9.3]. Consider the example of the FLASH freeelectron laser at Hambug [Singer 2012], which can deliver about 7 × 1012 photons per pulse with about 65 percent of the pulse being delivered into a single spatially coherent mode (the selfampliﬁed spontaneous emission or SASE mechanism of freeelectron lasers – discussed in Section 7.1.8 – means that one does not get pure singlemode emission). The pulse length is about 100 femtoseconds, whereas a measurement of the typical spectral bandwidth yields a coherence time of about 2 fs so only about 2 percent of the pulse is delivered within a coherence time. Thus one has a degeneracy parameter δc of about photons (4.292) δc = 1012 · (0.65 spatial) · (0.02 temporal) 9 × 1010 , pulse so there are many photons in a single spatially and temporally coherent mode. Now consider a synchrotron beamline with a ﬂux of 1010 photons/second after spatial ﬁltering
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
198
Imaging physics
to get a single spatially coherent mode. As will be seen in Table 7.1, this might come in the form of pulses of 34 ps in duration and 153 ns spacing; thus in 1 s there are 6.5×106 pulses so there are 1.5×103 photons per 34 ps pulse. If a silicon monochromator is used to spectrally ﬁlter the beam to a bandpass of Δλ/λ = 10−4 , then the coherence time is found from 104 waves with a time per wave of T = λ/c = 4 × 10−19 s, yielding a coherence time of 4 × 10−15 s The temporally coherent fraction of a pulse is thus (4 × 10−15 )/(34 × 10−12 ) = 1.2 × 10−4 , so one arrives at a degeneracy parameter of about 3 photons δc = 1.5 × 10 (4.293) · (1 spatial) · (1.2 × 10−4 temporal) 0.2 pulse for this example synchrotron source. This means that separate photons do not have much overlap with each other in the optical system in today’s synchrotron light source experiments (most laboratory sources have even lower degeneracy parameters). Consider now an xray microscope at a synchrotron light source. Because the count degeneracy parameter is small, we must treat each photon as an individual event; in other words, there is only one photon in the microscope at a time. That photon emerges from the accelerator, and it experiences the entire monochromator and exit slit so that its spectral properties are determined by its wave characteristics. The photon’s waveﬁeld is then modulated by the condenser lens so that the waveﬁeld becomes conﬁned to the illumination region at the sample. The waveﬁeld interacts with the entire sample in the ways we have described in terms of Fourier decomposition and OTF (Section 4.4.7), and it then propagates to the objective lens where again the waveﬁeld is modulated as described in Section 4.3.6. Finally, the waveﬁeld reaches the image detector, at which point something else happens: a photon is absorbed at some point on the detector according to a probability distribution given by the waveﬁeld times its complex conjugate. And then the next photon comes, and the process is repeated! One can see images forming from the accumulation of photons [Hecht 2002, Fig. 1.1], or electrons [Tonomura 1993, p. 14]; these examples make it clear that each photon (or electron) must sample the entire optical system and specimen before arriving at some location with a wavedetermined probability.
4.12
Concluding limerick How does light know how to work such magic? It demands a limerick: A microscope works with a wave And ends with a photon so brave One by one; not by swarm Is how waveﬁelds do form Together an image to save!
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:01:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.005
5
Xray focusing optics
Janos Kirz contributed to this chapter. In our everyday lives, we take lenses and mirrors for granted. They focus visible light and form images with ease. All this is made possible by the availability of materials that have an index of refraction n that is substantially diﬀerent from 1, as described in Eq. 3.62 and Appendix B.1 online at www.cambridge.org/Jacobsen. In the case of lenses one normally uses glass with a real index of refraction of n = 1.3–1.5, while metalcoated visiblelight mirrors can have near 100 percent reﬂectivity over a broad range of incidence angles. In the X ray region of the electromagnetic spectrum, we have seen from Eq. 3.67 and Appendix B.2 that the refractive index n = 1 − δ − iβ is complex, and only slightly less than 1. Hence xray optics tends to be very diﬀerent from optics for visible light, as the zoology of approaches shown in Fig. 5.1 make clear. We discuss the principles behind these optics in this chapter.
5.1
Refractive optics Simple refractive lenses have a focal length f as given by the lensmaker’s formula (Eq. 4.165) of 1 1 1 = (n − 1) − , (5.1) f R1 R2 where R1 and R2 are the radii of curvature of the lens surfaces, and n is the refractive index. For a doubleconvex lens for visible light, R1 is positive, while R2 is negative (since the centers of curvature lie on opposite sides of the lens). For glass with a refractive index diﬀerence from vacuum of n−1 ∼ 0.3–0.5, radii of order of a few centimeters lead to a focal lengths of similar magnitude. For X rays with n − 1 = −δ which is of order 10−5 , the situation becomes very diﬀerent: a doubleconvex lens has a negative focal length and causes a parallel incoming beam to diverge rather than converge. In addition, the focal length for centimeterscale radii of curvature will not be centimeters, but rather distances approaching a kilometer! To make matters worse, X rays will be attenuated as they pass through the lens. Hopeless? Many people including Paul Kirkpatrick [Kirkpatrick 1948a, Kirkpatrick 1949a], Alan Michette [Michette 1991], and the author, believed that refractive optics would never be practical for X rays.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006
200
Xray focusing optics
Compound refractive lens (half)
Fresnel zone plate Multilayer Laue Lens
Source
nce
Grazing incide mirror
Normal incidence multilayer mirror Image
nce Grazing incide r multilayer mirro
Figure 5.1 A zoology of several xray focusing optics types. These optics are shown
superimposed upon a set of ellipses representing largeincidenceangle reﬂective optics that could transfer light from the source to the image. In the case of multilayer optics, only the high refractive index layers are shown; materials with a lower refractive index lie between these layers. This ﬁgure was inspired by one shown by Spiller [Spiller 1994, Fig. 4.1].
As is frequently the case, a superﬁcial analysis is unjust. The story was set straight ﬁrst by Bingxin Yang [Yang 1993]. Given that the ratio of phase shifting to absorptive parts of the xray refractive index δ/β = f1 / f2 increases with xray energy and also with lighter materials (see Fig. 3.16), he echoed Kirkpatrick’s suggestion that refractive lenses would work best using light materials such as beryllium to focus hard X rays. Since converging lenses would have to be concave, the thickness of the material near the optic axis could be kept to a minimum, thereby minimizing absorption. Yang also considered Fresnel lenses to further reduce absorption in the material. He pointed out, as Kirkpatrick had before [Kirkpatrick 1949a], that it may be easier to make cylindrical lenses producing line foci, and use two of these in a crossed conﬁguration to generate a point focus, and furthermore he lauded “the beneﬁt of manipulating beams with multiple lenses, especially in situations where one single element could not satisfy our need.”
5.1.1
Compound refractive lenses The experimental breakthrough came just a few years later [Snigirev 1996] in a delightfully simple way: a series of holes were drilled in a metal block to make a 1D lens as shown in Fig. 5.2 (a USA patent application on such an approach was ﬁled by Tomie the year before [Tomie 1997]). The two surface pairs between each hole act as a cylindrical lens with focal length f = R/(2δ). It is easy to show that a set N lenses, each of focal length f , have a net focal length of f /N when placed in close proximity to each other (that is, their spacing is small compared to f ). As a result, one can make a compound refractive lens (CRL) with N lens surfaces that has a net focal length of f =
R R = , 2Nδ 2Nαλ2 f1
(5.2)
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006
5.1 Refractive optics
201
Figure 5.2 Schematic of a compound refractive lens for xray focusing. The ﬁrst demonstration
[Snigirev 1996] was done by drilling a series of holes in a metal block, leading to doubleconcave lens elements as indicated. Today it is more common to either use lithography with directional etching for 1D lenses, or to use mandrils to press into materials to produce 2D lens elements which can be combined as shown at right. In both cases parabolic proﬁles can be fabricated to reduce spherical aberration, though only spherical proﬁles are shown here.
where the second form (using Eq. 3.69) shows how the focal length scales as 1/λ2 (more detailed calculations yield a slight modiﬁcation of the focal length of compound refractive lenses [Simons 2017], and a more detailed discussion of aberrations has been presented [Osterhoﬀ 2017]). By shrinking the radii of the holes to a fraction of a millimeter, and using the combined power of N = 30 lenses, Snigirev et al. produced a lens with a practical focal length of 1.8 m for 14 keV X rays. This demonstration generated a lot of interest, and a variety of approaches have been pursued since. Bruno Lengeler and collaborators developed the technique of shaping a series of paraboloidal indentations in beryllium with R as small as 50 μm to create 2D focusing with reduced spherical aberration [Lengeler 1999b] (as suggested by Yang), and optics of this type are commercially available from rxoptics.de. This has led to “transfocator” systems in which one can rapidly add or remove lenses to change N; they have become very useful optical elements in some synchrotron beamlines [Snigirev 2009, Vaughan 2010]. These CRL focusing systems are rugged, and can handle the high heat load present in these applications. Another approach has been to use lithographic patterning and highly directional etching techniques [Aristov 2000a] or deep xray lithography methods [Shabelnikov 2002, Nazmov 2004] to produce 1D lenses in materials such as silicon or polymers (Fig. 5.3), or 1D or 2D lenses in electroplated materials such as nickel using the LIGA (German: LIthographie, Galvanik und Abformung) process [Nazmov 2005, Nazmov 2007]. When used as orthogonal pairs of 1D optics, one obtains not an Airy2 focus intensity but a sinc(x/a x )2 sinc(y/ay )2 focus intensity, as discussed in Section 4.4.5. These approaches are readily extensible to the production of kinoform lenses [Jordan 1970, EvansLutterodt 2004] as shown in Fig. 5.4. These greatly reduce xray absorption and thus increase eﬃciency; at the highresolution end they can be thought of as blazed Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006
202
Xray focusing optics
am
y ra
be
X
ѥP
Figure 5.3 Example of a 1D compound refractive lens made by nanopatterning on a silicon
wafer followed by deep reactive ion etching. Shown here are a subset of a total of N = 61 doubleconcave parabolic lenses with a curvature of R = 9.4 μm (giving a focal length of 31 mm at 14 keV) and an aperture of 40 μm. Image courtesy of Lukas Grote of the University of Hamburg, and data courtesy of Frank Seiboth, DESY, Hamburg.
zone plates as will be discussed in Section 5.3.1. However, they do impose 2π phase shears on the waveﬁeld exiting the optic. The spatial resolution that can be obtained using refractive optics is determined by several factors. At extreme radii, a ray incident on the “steep” curvature of a refractive lens will encounter a surface that is nearly parallel to the optical axis and the ray will be reﬂected (rather than transmitted into, and refracted by, the surface); this sets a resolution limit for a single refractive lens that is equivalent to the case for a reﬂective optic discussed below (Eq. 5.10) [EvansLutterodt 2003]. In compound refractive lenses, the curvature of any one lens is weaker so this is less of a problem, and furthermore one can design an adiabatic adjustment into the focal length of each of the CRLs to reach, in principle, fewnanometer spatial resolution [Schroer 2005b] with < 20 nm resolution having been achieved [Patommel 2017]. For nonkinoform compound refractive lenses, absorption limits the eﬀective aperture [Snigirev 1996]. If one limits the combined lens thickness to an attenuation length of μ−1 = λ/(4πβ) (Eq. 3.75) so that transmission is 1/e 38 percent at the edge, the usable aperture diameter A can be written as
λR λR = , (5.3) A= πβN πNαλ2 f2 where in the latter form (using Eq. 3.65) we can see more explicitly the scaling with xray wavelength λ and complex oscillator strengths ( f1 + i f2 ) as shown in Fig. 3.16. For an example [Kurapova 2007] of silicon at 21 keV with N = 100 and R = 2 μm, Eq. 5.3 gives A = 10 μm, showing that nanofocusing CRLs are usually used in conjunction with prefocusing optics to bring the beam width down to match such small apertures. This limited eﬀective aperture A of CRLs also sets a limit on spatial resolution. The Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006
5.1 Refractive optics
203
ts
Xray beam axis
Parabolic refractive lens
Long kinoform
Short kinoform
Figure 5.4 Xray kinoform refractive optics. At left is shown a parabolic refractive optic. By
removing material in deﬁned thickness steps t s , one can produce kinoform optics [Jordan 1970] with reduced xray beam absorption, either as long or short kinoforms [EvansLutterodt 2004]. The best performance is obtained when the thickness steps correspond to a phase shift of 2π, or t s = δ/λ (Eq. 3.69), in which case the short kinoform becomes equivalent to a curved proﬁle zone plate [Tatchyn 1982, Tatchyn 1984].
numerical aperture is N.A. = A/(2 f ) in the smallangle limit, giving
1 Nαλ3 f12 . N.A. = 2 πR f2
(5.4)
Since the Rayleigh resolution of a perfect lens is δr = 0.61λ/N.A. (Eq. 4.173), compound refractive lenses have a spatial resolution limit due to absorption of
πR 1 f2 (5.5) δr = 1.22 Nα λ f12 Using the same example of silicon at 21 keV with N = 100 and R = 2 μm, this gives a resolution limit of δr (CRL) = 66 nm which is in fact representative of what is achieved in experiments (some papers use a numerical factor other than 0.61 in δr and thus quote smaller values for the achievable spatial resolution [Lengeler 1999a]; at the same time, adiabatic CRLs have been used for demonstrations of sub20 nm resolution 2 [Patommel √ 2017]). Considering that f2 decreases as about λ (Fig. 3.16) in addition to the 1/ λ term in Eq. 5.5, one can see the advantages of working at higher photon energies when using CRLs. The expression of Eq. 5.5 also emphasizes the advantages of using light elements like Be [Schroer 2002] or even Li [Dufresne 2001] with more favorable ratios of phase shift over absorption, or f1 / f2 = δ/β, for obtaining the maximum resolution and lowest absorptive losses. At energies of around 50 keV and up, Compton scattering instead sets the limit on lens aperture [Elleaume 1998, Eq. 11]. Other factors (such as the existence of highquality deep reactive ion etch processes for silicon [Aristov 2000a, Kurapova 2007], or high heat load requirements leading to the choice of diamond [Snigirev 2002, Ribbing 2003, N¨ohammer 2003]) also come into play. Gaussian ﬂuctuations in individual surface proﬁles [Lengeler √ 1999a] with a standard deviation of σ s (adding in an uncorrelated fashion over 2N surfaces) lead to a Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006
204
Xray focusing optics
√ net phase ﬂuctuation characterized by θ s = 2πδ 2Nσ s /λ, with acceptable performance according to the Rayleigh quarter wave criterion (Section 4.1.2) when θ s ≤ π/4, or 1 λ . (5.6) √ 8 2N δ For our example of a 21 keV CRL made of silicon with N = 100, one requires σ s ≤ 480 nm, which is easily achieved in nanolithography processes or even in presses used to “stamp” out optics. Finally, one can measure the phase of a CRLfocused waveﬁeld using ptychography (Section 10.4), and then add a phase correction optic to improve it [Seiboth 2017]. It is useful to consider the energy tunability of CRLs. From Eq. 5.2 we see that the focal length scales with the square of changes in photon energy (assuming f1 has little change with energy, which is the case away from absorption edges, as shown in Fig. 3.16). Taking the derivative of Eq. 5.2 gives ⎛ ⎞ 1 d f1 ⎟⎟⎟ R ⎜⎜⎜ 2 R −2 −1 d(λ f1 ) = − df = (5.7) ⎝⎜ + ⎠⎟ dλ. 2Nα 2Nαλ2 λ f12 dλ σs ≤
At the same time, we can calculate the absorptiveaperturelimited depth of ﬁeld of a CRL from DOF = 2δz = 2λ/N.A.2 (Eqs. 4.213 and 4.214) and Eq. 5.4, giving DOF = 2δz =
8πR f2 . Nαλ2 f12
(5.8)
If we set the change in focal length due to wavelength changes to be a fraction χ of the depth resolution δz (thus keeping the focal spot from being blurred due to chromatic aberrations), we ﬁnd that the spectral bandpass dλ/λ dE/E used to illuminate a CRL with an aperture reaching the absorption limit (Eq.. 5.3) must be kept to below f2 dλ ≤ χ 4π 2 λ f1
(5.9)
which for χ = 1 and silicon at 21 keV works out to be dE/E 0.3 percent. Refractive optics went from something considered impractical to becoming very useful optics in both xray beamlines and in xray microscopy. This must be celebrated in a limerick! Christian Schroer of DESY has generated some beauties, and what follows is heavily inspired by Schroer: We once thought that xray refraction was too weak for focusing action. But compounding the lens adds up many bends. We focus with great satisfaction!
5.2
Reﬂective optics The normal incidence reﬂectivity of X rays from single interfaces goes like R⊥ δ2 /2 (Eq. 3.118), which is vanishingly small. However, as noted in Sections 2.2 and 3.5, x
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 01:59:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.006
5.2 Reﬂective optics
205
1
10 5
0.5
1 0.2 0.5 0.1 0.2 0.1
5RXJKQHVVQP
6ORSHHUURUѥUDG
2
0.05
0.05 0.02 1990
2000
2.44drN ). Not only will the design wavelength pass through this pinhole unobstructed, but slightly longer and shorter wavelengths will as well. In order to estimate the fullwidth at halfmaximum (FWHM) spectral bandwidth, we will consider the wavelength change Δλ for which the transmission is reduced to half of its value. This means we wish to know √ the wavelength change Δλ for which the geometric beam size at the pinhole plane is 2d p so that the beam area is twice the pinhole area. Since this diameter divided by Δ f gives the same tan(θ) as the condenser diameter dz divided by f + Δ f , we have √ 2d p dz dz = (6.12) Δf f + Δf f √ 2d p Δλ Δf = = , (6.13) f λ dz where we have used Eq. 6.11 in Eq. 6.13. This is for the halfwidth at halfmaximum, so to obtain the FWHM value we must consider the wavelength change in the longer wavelength direction as well, leading to a FWHM bandwidth of √ 2 2d p Δλ FWHM = . (6.14) λ dz That is, the monochromaticity λ/Δλ is given by the ratio of condenser diameter dz to monochromator pinhole diameter d p . This has driven the development of very large condenser zone plates, such as zone plates with diameters of dz = 9 mm and outermost zone width of drN = 54 nm made either using UV holography [Schmahl 1993] or electron beam lithography [Anderson 2000]. In the case of electron beam lithography, it is very challenging to combine both large diameters and narrow zone widths, so that electron Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
6.3 Fullﬁeld microscopes, or transmission xray microscopes
249
Pinhole diameter dp
Central stop: diameter fraction b
Zone plate diameter dz Figure 6.4 Fresnel zone plates can be used as a combination condenser lens and linear
monochromator. A zone plate with diameter dz and central stop fraction b accepts a polychromatic beam, and a pinhole of diameter d p is placed at the focal position for the design wavelength λ. While the blue dashed line shows a shorter wavelength that will still make it through the pinhole unobstructed, the solid blue line√shows a wavelength λ − Δλ for which light is spread out over twice the area of the pinhole (or 2 larger diameter than the pinhole). This condition is then used to calculate the FWHM spectral bandwidth of Eq. 6.14.
beam writing times of 48 hours have been used! These condenser zone plates also see higher beam powers than the specimen or the objective lens (because monochromatization removes most of the power from a polychromatic beam), so thermal management is important to ensure zone plate condenser survivability.
6.3.2
Capillary condensers The latest microscopes that have evolved out of the G¨ottingen legacy have used standard xray beamline grating monochromators (Section 7.2.1) and a capillary optic (Section 5.2.3) as the condenser lens [Heim 2009]. The same has been true for commercial instruments [Zeng 2008]. Capillary condensers have several important advantages: • One can project a monochromatic illumination ﬁeld onto the object or specimen plane with no wavelengthselecting pinhole required. Especially when doing nanotomography in TXM systems with zone plate condensers, the millimeterscale size of the mount for the monochromating pinhole signiﬁcantly limits the ability to obtain large tilt angles when using planar specimen holders such as electron microscope grids or silicon nitride windows. Grating monochromators used with capillary condensers largely remove this limitation [Heim 2009, Schneider 2010]. • The reﬂection coeﬃcient for singlebounce xray reﬂective optics can be well above 50 percent (Section 3.6), while zone plates often have diﬀraction eﬃciencies of 15 percent or lower (Section 5.3.2). √ • The critical angle for grazing incidence reﬂectivity of θc = 2δ (Eq. 3.115) is often larger than the maximum beam convergence angle that can be produced by practical ﬁnest zone widths drN . For example, the critical angle for X rays reﬂecting
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
250
Xray microscope systems
from fused silica is 37.5 mrad at 540 eV, so the net deﬂection angle 2θc can be 75 mrad, whereas to obtain the same diﬀraction angle (Eq. 4.30) one would need ﬁnest zones with a width of drN = λ/(2θ) = 15 nm in the condenser. This ﬁnest zone plate is at the limits of what has been achieved in zone plates, and at that limit the eﬃciency is reduced because of the challenges of fabricating high aspect ratios (Section 5.3.4) so that most condenser zone plates have much larger ﬁnest zone width drN . Capillary condensers bypass these limitations. • By using a conventional grazing incidence monochromator, the “white beam” incident power from the xray source is spread out over a large surface area on a substrate that is easily watercooled. Since only the monochromatic beam reaches the condenser, and again the grazing incidence condenser has the beam spread out over a large surface area, thermal engineering challenges on the condenser optic are largely removed. • With a conventional grating monochromator, it is easy to obtain very high spectral resolution values (such as λ/Δλ 10,000 demonstrated in a TXM system [Guttmann 2011]) as needed for imaging across xray absorption nearedge resonances (see Section 9.1.2). For these reasons, capillary condensers are growing in popularity in TXM systems. Whatever the condenser optic used, the illumination spot size is given by some combination of the geometrical image of the source and any eﬀects due to aberrations in the condenser. Following the discussion of Section 4.4.6, we point out that one spatially resolved pixel in a TXM has a ﬂux that is determined by the source brightness delivered through the imaging system, while the number of pixels that can be illuminated simultaneously is given by the source size–angle or phase space product Msource produced by the source and accommodated by the condenser. That is, to a ﬁrst approximation each spatially resolved pixel can be illuminated by a separate spatially coherent mode Msource , up to a limit of the number of spatially resolved pixels in the detector. That is, if a detector with 2048 × 2048 pixels is used and one seeks to have good sampling with two detector pixels Δdet per spatially resolved pixel, one could accept up to 1024 × 1024 spatially coherent modes Msource or a phase space full angle–full width product of 1024λ in each of the directions xˆ and yˆ . In practice, this is diﬃcult to achieve and the delivered illumination phase spaces are often somewhat less in the horizontal, and much less in the vertical, in particular if synchrotron sources are used. To correct for this, a diﬀuser can be used in the specimen illumination path [Uesugi 2006], or the condenser can be “wobbled” or mechanically scanned during exposure so as to provide an even illumination ﬁeld [Rudati 2011]. By these means, even illumination over ﬁelds of size 10–20 μm is routinely obtained, though at a cost of exposure time (the brightness that could be used for fast imaging of a smaller number of pixels is now shared among a larger number of pixels). One must also keep in mind the defocus aberration limit to the ﬁeld of view, as discussed in Section 4.4.1. Because of the advances in xray source brightness, and the fact that it is brightness that limits the perpixel exposure time, modern TXMs can record images with exposure times well below one second. To image larger ﬁelds of view, the specimen can be transDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
6.3 Fullﬁeld microscopes, or transmission xray microscopes
251
lated in a stepandrepeat fashion, after which the individual images can be assembled into a larger mosaic image [Loo 2000, Liu 2012a]. In early G¨ottingen TXM systems, photographic ﬁlms [Rudolph 1984] or nuclear emulsions were used to record the image with detective quantum eﬃciencies of no more than about 9 percent [B¨assler 1991]. However, chargecoupled device (CCD) cameras were soon being tested with phosphor coatings for conversion of the xray image into visible light [Germer 1986], and by 1993 backsidethinned CCDs were being used for direct xray detection [MeyerIlse 1993]. Today, backsidethinned CCDs deliver very high quantum eﬃciency, and remain the detector of choice for photon energies below the Si K absorption edge at 1.84 keV (radiation damage seems to limit detector lifetime at higher energies). At higher energies, scintillator–lens–visible light camera systems are usually used, as will be discussed in Section 7.4.7. With all of these detectors, the best practice is to use an optical magniﬁcation M suﬃcient to map the Rayleigh resolution δr onto the width of two detector pixels, or Mδr = 2Δdet , in order to meet the requirements for Nyquist sampling (Eq. 4.88). As noted in Section 2.5, the G¨ottingen group’s work at the older BESSY synchrotron in Berlin inspired the development of similar microscopes on bending magnet beamlines at synchrotron light source facilities around the world, with microscopes on undulator beamlines following more recently (the relative advantages of these two source types are noted in Section 7.2.2). The energy range has been expanded to include TXM systems developed for studies at multikeV xray energies, where one has greater penetration power for thicker specimens, and larger working distance, for working with more elaborate specimen environmental chambers (Section 7.5). While synchrotron light sources are powerful and widely available, there is also a real need for TXMs in home laboratories. Early steps in this direction included demonstrations by the G¨ottingen group with plasma discharge sources [Niemann 1990]. Greater success was obtained by the group of Hans Hertz in Sweden by using lasers to excite plasma emission from invacuum liquid jets of ethanol [Rymell 1993] and, with improved emission, from liquid nitrogen [Berglund 1998]. They have gone on to build waterwindow soft xray microscopes using these sources and a normalincidence multilayer mirror as the condenser optic [Berglund 2000], and have developed cryo nanotomography capabilities [Bertilson 2011]. A very important step has been the development of commercially available laboratory microscopes, so that xray microscopy in the home lab can spread to those who are not inclined (or do not have the expertise) to build their own systems. After experience with building synchrotronbased microscopes, Wenbing Yun founded the company Xradia (now Carl Zeiss Xray Microscopy) in the late 1990s and began delivering TXMs using characteristic line emission from microfocus xray sources, capillary reﬂectors as condensers, and zone plate objectives for imaging and nanotomography at 5.4 keV [Wang 2002]. Exposures typically take several minutes. These laboratory microscopes have been commercially successful both at Zeiss and at Yun’s new company, Sigray, and their variants for use with synchrotron radiation have been installed and are operating at nearly a dozen light sources. We end our discussion of TXMs by noting one important consideration: the objective optic lies downstream of the specimen to be imaged. If one is using an objective lens with low eﬃciency, such as a Fresnel plate where the eﬃciency is in the 5–20 percent Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
252
Xray microscope systems
Objective OSA Source
Detector(s) for other signals
Object
Transmission detector
Figure 6.5 Schematic view of a scanning xray microscope or SXM. In this case the objective
lens used to produce the ﬁne focal spot has a central stop, and an ordersorting aperture (or orderselecting aperture; OSA) is used to isolate the focused beam as would be required for a Fresnel zone plate objective (see Fig. 5.17). Scanning transmission xray microscopes (STXMs) as well as ptychography systems (Section 10.4) rely primarily on the transmission signal, while scanning ﬂuorescence xray microscopes (SFXMs; Section 9.2) and scanning photoelectron emission microscopes (SPEMs) rely primarily on detection of xray induced xray ﬂuorescence (SFXM) or electrons (SPEM). Additional modes are possible, including luminescence (SLXM; also called xray excited optical luminescence or XEOL). For xray ﬂuorescence at synchrotron sources, an energy dispersive detector is usually oriented to the side rather than below the specimen; this is discussed in Section 9.2.
range (Section 5.3.2), this aﬀects the radiation dose received by the specimen (compound refractive lenses can also be used as the objective lens [Lengeler 1999a], though they also have some absorption losses). In most cases one must record a certain number of photons per pixel in the image in order to see features of a certain size (Section 4.8), and if the objective lens has only 20 percent eﬃciency then one must illuminate the specimen with ﬁve times more exposure in order to obtain the required statistical signiﬁcance in the image.
6.4
Scanning xray microscopes The basic idea behind a scanning xray microscope (SXM) is to demagnify an xray source and form the smallest focal spot possible, and then to acquire an image pixelbypixel through a raster scan. In principle one could scan either the probe or the sample, but in most instruments it is the sample that is mechanically scanned (so that the objective lens remains in constant alignment to the illuminating beam, thus reducing the risk of image intensity variations). As shown in Fig. 6.5, SXM systems have great ﬂexibility in the signal that is used to form the image: • Scanning transmission xray microscopes (STXMs, pronounced as “sticksems” by cool geeks like us) use primarily the transmission signal. As noted in Section 4.5.1, the shape of the sensitive area of the transmitted beam detector can play as strong a role in image quality as the condenser lens does in TXM systems, so that in the ideal case the detector should subtend an angle corresponding to
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
6.4 Scanning xray microscopes
•
•
• •
253
m = 1.5 times the N.A. of the STXM objective lens. The detector can also be segmented into quadrants or additional segments for diﬀerential phase contrast imaging, as discussed in Section 4.7.4, and full pixel array detectors can be used with the transmitted beam signal for ptychography, as discussed in Section 10.4. For simple area detectors, one can use a silicon photodiode to measure high ﬂux rates as a current, or an avalanche photodiode to measure the photon count rate (including with high time sensitivity for pumpprobe experiments [Stoll 2004, Puzic 2010]), with tradeoﬀs as shown in Fig. 7.15. One can also use a phosphor or scintillator to convert the incoming xray beam signal into visible light and then use optical detectors either in current or pulsecounting mode. Gasﬁlled proportional counters have also served well as zerodarknoise detectors for low ﬂux rates [Feser 1998], though with extra complications of bulk and complexity. Scanning ﬂuorescence xray microscopes (SFXMs, or “sphixems”; also called xray microprobes) use an energyresolving photon detector to collect xray stimulated xray ﬂuorescence signals. Most systems use energydispersive detectors (Section 7.4.12), where the energy of each detected photon is measured based on the number of charge–hole pair separations produced in a semiconductor (usually silicon with an energy resolution of about 130 eV at 10 keV). SFXM is discussed in further detail in Section 9.2. Scanning photoelectron emission microscopes (SPEMs) use an energy resolving electron detector to collect Auger and photoelectron spectra. At xray energies below about 100 eV, multilayercoated Schwarzschild objectives have been used [Ng 2006], while at higher energies both zone plate optics [Ade 1991, Ko 1995, Marsi 1997] and grazing incidence reﬂective optics [Voss 1992b] have been used. The characteristics of SPEMs are discussed with photoelectron emission microscopes (PEEMs) in Section 6.5; SPEMs are used primarily for studies in materials science [Kiskinova 1999]. Rather than looking at electrons that have been ejected, one can use the xray beaminduced current (XBIC) [Vyvenko 2002] signal for imaging. Scanning luminescence xray microscopes (SLXM, or “slicksem”) involve collection of xray stimulated visible light emission signals generated by the same physical processes as are involved in scintillators (Section 7.4.7). In spite of early hopes of imaging visible light emission from organic materials [Jacobsen 1993], radiation damage limitations mean that luminescence is best used for studies of inorganic materials such as ceramics [Zhang 1995a, Moewes 1996]. In studies of semiconductors, the method of using coreshell electron excitation to generate luminescence [Bianconi 1978] is known as xray excited optical luminescence (XEOL). This approach is seeing considerable success in scanning xray microscopy [Mart´ınezCriado 2006, Mart´ınezCriado 2012].
Many instruments combine several of these functions; for example, nearly all SFXM systems include a STXM mode. A microscope developed at DESY in Hamburg included even more detection modes, such as desorbed ions [Voss 1997]. Scanning was ﬁrst proposed in 1938 for electron microscopy [von Ardenne 1938], Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
254
Xray microscope systems
Nyquist sampling
Common sampling
Unidirectional
Bidirectional
Figure 6.6 Scanning schemes and sampling in scanning xray microscopes. For Nyquist
sampling, one should choose the scan pixel spacing Δr to be half the Rayleigh resolution δr . Since the Rayleigh resolution involves the radius rather than the diameter of the probe function (Fig. 4.29), this produces signiﬁcant probe overlap, and even illumination along a scan line. Unfortunately, many users of scanning microscopes choose a pixel spacing closer to the spatial resolution (here labeled “Common sampling”). At right is shown the diﬀerence in scan trajectories for unidirectional scans versus bidirectional scans. Since many scanning microscopes do not have shutters fast enough to close and then open between scan lines, some additional radiation dose is applied to the specimen in unidirectional scans and there is also a cost of additional scan overhead time during the “ﬂyback” phase of the probe’s motion relative to the specimen.
and in 1953 for xray microscopy [Pattee 1953]. In xray microscopy, the ﬁrst demonstrations came a few years later [Cosslett 1956, Duncumb 1957]. As noted in Section 2.5, the scanning ﬂuorescence xray microscope developed by Horowitz and Howell [Horowitz 1972] really began to show the possibilities for using synchrotron radiation, though it only used a pinhole to deﬁne the probe beam. It was some years later before the group of Janos Kirz at Stony Brook University began to demonstrate STXMs using Fresnel zone plates as highresolution objective lenses [Rarback 1984], as noted in Section 2.5. When zone plate optics are used, fractional central stop diameters of b = 0.2– 0.5 are used along with ordersorting apertures or OSAs (also called orderselecting apertures) to isolate the ﬁrstorder focus as shown in Fig. 5.17, with the exact value of b chosen to provide adequate working distance between the OSA and the specimen. Scanning microscopes are quite ﬂexible, especially in the era of computercontrolled instruments. The scan parameters (the step size Δr from pixel to pixel, and the number of pixels N x,y ) can be adjusted over wide ranges; in eﬀect, the image magniﬁcation and ﬁeld of view are freely adjustable. There are even examples of synchrotron SFXM systems being used to scan works of art [Thurrowgood 2016] that are almost half a meter across! However, one should pay attention to proper Nyquist sampling of the scan, as illustrated in Fig. 6.6. In addition, many scanning microscopes use unidirectional scanning (also shown in Fig. 6.6) because it is easier to program in a control computer, but bidirectional scanning oﬀers advantages in speed and reduced radiation exposure on the specimen. There has also been a trend from using a move–settle–measure or “step scan” approach to a continuous or “ﬂy scan” approach1 in STXMs [Jacobsen 1991, Kilcoyne 2003] and SFXMs [Kirkham 2010], as illustrated in Fig. 6.7. Fly scanning is faster but one must account for the fact that the probe function is modiﬁed, and this 1
I am not aware of anyone yet taking continuous scan images of small insects; who will do the ﬁrst ﬂy ﬂy scan?
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
6.4 Scanning xray microscopes
255
d
d Position
Position
s
Step
to
te td to
Time
Fly v
te td
Time
Figure 6.7 Step scans, where a measure–move–settle sequence is used, versus “ﬂy” or
continuous scan motions. In step scans, one has the exposure time te plus a possible detector readout “dead” time td , followed by the time t0 it takes to move to the next ﬁxed scan position. In continuous or ﬂy scans, one has only the exposure time te and the possibility of a detector “dead” time td . Continuous or ﬂy scans involve a pixel spacing along a scan row of s, as set by the clocking of data collection during constant probe velocity; this can be larger or (ideally) smaller than the probe diameter d. Figure adapted from [Deng 2015a].
plays a special role in coherent scanned imaging methods like ptychography, as shown ﬁrst in simulations [Clark 2014], and then in experiments [Pelz 2014, Deng 2015a, Huang 2015]. Finally, in ptychography (Section 10.4) some researchers use spiral, nonrectilinear scan patterns to minimize reconstruction artifacts [Thibault 2009a], and other nonrectilinear scan approaches such as Lissajous scanning [Sullivan 2014] might also oﬀer advantages. Scanning microscopes require a spatially coherent beam in order to deliver a diﬀractionlimited focus at the objective lens’ resolution limit (Section 4.4.6), and with spatially incoherent (or multiple coherence mode M) sources such as most synchrotron light sources today, one must use some form of spatial ﬁltering to coherently illuminate the objective lens as shown in Fig. 4.43. This provides the opportunity to trade oﬀ somewhat lower spatial resolution for higher ﬂux by opening up beamline apertures and thus increasing the source phase parameter p, as discussed in Section 4.4.6. The beam illuminating the object must also be monochromatized to match any dispersive properties of the objective lens, such as a zone plate or multilayer Laue lens (Eq. 5.33) or compound refractive lens system (Eq. 5.9), and this is usually done by using a crystal, grating, or multilayer monochromator (Section 7.2.1). The achromatic properties of singlelayercoated Kirkpatrick–Baez grazing incidence mirrors (Section 5.2.1) oﬀer the great advantage of not requiring beam monochromatization. Because of the spatial coherence requirement, and serial rather than parallel acquisition of image pixels, scanning microscopes often have much slower imaging times than fullﬁeld or TXM systems. The highestperformance SXMs operate using undulators (Section 7.1.6) at synchrotron light sources, though there are many examples of successful STXMs at bending magnet beamlines (Section 7.1.5). Even with the brightest sources, it is rare to have perpixel dwell times (step scans) or transit times (ﬂy scans) as low as 100 μs, in which case a 1000 × 1000 pixel scan would take almost two minutes to acquire even if “ﬂy scans” were used with inﬁnite stage acceleration/deceleration and no other time losses due to data transfer, etc. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
256
Xray microscope systems
Electron imaging optics and area detector Xray beam
Electron energy analyzer
Xray beam Scanning
PEEM
SPEM
Figure 6.8 Photoelectron emission microscopes (PEEMs) and scanning photoelectron emission
microscopes (SPEMs) both use the absorption of X rays to produce photoelectrons and Auger electrons from the surfaces of materials. In a PEEM, electron optics are used to image the emitted electrons onto an area detector, with the electron optics setting the spatial resolution (most PEEMs also include an electron monochromator for imaging emission at a selected electron kinetic energy). In a SPEM, xray optics are used to focus the xray beam to a small spot, and an electron energy analyzer is used to record the energy spectrum of electrons emitted from the surface.
An important characteristic for SXMs is that the objective lens is located upstream of the specimen rather than downstream. This means that ineﬃciencies in the objective lens increase total imaging time, but do not lead to higher radiation dose on the specimen. This is diﬀerent from the case of a TXM, where a zone plate with 10 percent eﬃciency means that one must expose the specimen to a tenfold higher radiation dose in order to obtain an image with the same statistical degree of signiﬁcance.
6.5
Electron optical xray microscopes (PEEM and others) As noted in the previous section, scanning photoelectron emission microscopes (SPEMs) work by detecting electrons emitted from a small scanned xray focal spot. An alternative approach is to illuminate a larger sample area and use electron optics to image the ejected electrons; this is what is done in a photoelectron emission microscope or PEEM system [Feng 2007a], as shown in Fig. 6.8. Photoelectron emission microscopes can image a wide spectral range of emitted electrons from a small region into an electron spectrometer for microspectroscopy, or they can produce highresolution images of a selected electron energy when using an electron monochromator in their electron optical path. Because they can work with larger xray illumination spots (such as xray beams with many spatially coherent modes M present), PEEMs can be operated with laboratory xray sources and indeed commercial instruments are readily available. When PEEMs are used to image electrons with kinetic energies below about 100 eV, the instruments are sometimes called lowenergy electron microscopes (LEEMs) [Bauer 1994], and for studies of magnetic materials an oﬀshoot instrument type is the SPLEEM, with spinresolved electron detection. We saw in Fig. 3.12 that lowenergy electrons travel relatively short distances in solids. In Section 3.1.1, we noted that Auger electron emission provides elementspeciﬁc information, since the electron’s energy is determined by a speciﬁc quantum transition.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
6.5 Electron optical xray microscopes (PEEM and others)
257
(nm)
20 10
Inelastic mean free path
5 Polystyrene Si
Au
2 1 0.5
0.2 10
100
1000
10,000
Electron energy (eV) Figure 6.9 Inelastic mean free path Λinel for low energy electrons in polystyrene, silicon (Si),
and gold (Au) [Ashley 1976, Ashley 1978]. As can be seen, photoelectrons and Auger electrons created by the absorption of soft xray photons (approx. 100–1000 eV) are able to escape only from regions close to the surface if they are to be detected without having their energy changed through inelastic scattering. Inelastic mean free paths for high energy electrons were shown in Fig. 4.80.
One can therefore use PEEMs and SPEMs for spatially resolved xray photoelectron spectroscopy (XPS), or electron spectroscopy for chemical analysis (ESCA). However, in order to carry out these analysis approaches, the detected electrons should not undergo any energychanging inelastic scattering events; in other words, it is only those electrons produced within a distance of less than an inelastic mean free path Λinel of a material’s surface that will emerge with their energy unchanged. The distance Λinel is typically only a few nanometers when using soft xrays (as shown for three solids in Fig. 6.9), so this means that both SPEM and PEEM are surfacesensitive imaging techniques. From deeper within a material, an electron may undergo one or many inelastic scattering events so it will emerge with some variable energy lower than that of the Auger peak, and this gives rise to lowenergy “tails” on Auger peaks in the electron spectrum. As shown in Fig. 6.10, the directly detected photoelectron and Auger electrons give one information on speciﬁc atomic transitions [Carlson 1975]. However, the electrons that have undergone multiple inelastic scattering events are also useful since a significant part of the photoelectron spectrum consists of secondary electrons with low energies of 10–30 eV, where the inelastic scattering mean free path Λinel begins to increase to distances of many nanometers (Fig. 6.9). Low electron energy PEEM systems (or LEEM systems) exploit surface variations in electron emissivity to image surface topography and conductivity. Secondary electron emission from the surface is proportional to the number of X rays absorbed, so PEEMs are frequently used to image the xray absorptivity as the xray energy is tuned, providing one path to nearedge xray spectromicroscopy (Section 9.1). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
258
Xray microscope systems
106 Ru 3d5/2
Counts per 100 ms
Ru 3d3/2
Ru 4s, Al 2p
105
Si 2p3/2, Si 2p1/2
5•103 0
Al 2s
O KLL Si 2s
Ru 4p
104
O 1s Ru 3p3/2 Ru MNM Ru 3p1/2
O 2s
100
Incident photon energy: 750 eV
200
300
400
500
600
Electron binding energy (eV) Figure 6.10 Photoelectron spectrum of aluminosilicate grown on a Ru(0001) substrate, illustrating some characteristics of xray photoelectron spectroscopy (XPS). The spectrum shows photoelectron electron peaks at energies corresponding to the electronic states indicated (e.g., Ru 3d5/2 ), as well as Auger peaks corresponding to O KLL (K coreshell electron ejected, to an L shell inital state ﬁlling the vacancy, and an L shell electron being emitted) and Ru MNM. The plasmon spectrum below 30 eV is modiﬁed compared to that seen from electron energy loss spectroscopy (EELS; see Fig. 3.15) by inelastic scattering of electrons before escape due to the mean free path shown in Fig. 6.9. This spectrum was obtained as part of a study that looked at changes in the Ru layer following exposure to O2 and H2 gases in an ambient pressure XPS system [Zhong 2016]. Data courtesy of Anibal Boscoboinik, Brookhaven Lab.
If one adds sensitivity to the angle at which electrons are emitted from an illuminated surface, one can gain considerable information on electronic band structure in solids. This technique is called angleresolved photoemission spectroscopy (ARPES). By combining ARPES with the high spatial resolution of a SPEM, one arrives at a technique that is often called nanoARPES [Rotenberg 2014].
6.6
Concluding limerick Too many microscope systems to keep track of? Not sure of your geek pronunciation of STXM as “sticksem” and so on? Let’s try a limerick: I like to see wide ﬁelds with TXM and probe in small spots with my STXM With metals it seems one can see much with PEEMs Tough problems? These microscopes ﬁx ’em!
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:05:35, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.007
7
Xray microscope instrumentation
Janos Kirz and Michael Feser contributed to this chapter. The ﬁrst xray microscopes were oneofakind instruments that were operated by their builders, in a tradition that continues to this day. Although there was a brief phase in the 1950s where commercial pointprojection microscopes were available (see Section 6.2, and [Cosslett 1960, Plates IX.B and X]), up until the year 2000 essentially all microscopes were custombuilt. These custombuilt microscopes are now joined by commercial instruments oﬀering a wide range of capabilities. No matter whether you are using a commercial instrument where you can pop a sample in and push a button to get an image, or a custom instrument, it is useful to understand what “makes it tick.” Hence this chapter. Section 7.1 discusses xray sources, while Section 7.2 discusses the optical transport systems and associated equipment needed to bring the xray beam to the imaging system. After some brief comments on nanopositioning systems in Section 7.3, the properties of several types of xray detectors are covered in Section 7.4. Finally, Section 7.5 provides a short introduction to specimen environments. The degree of sophistication in modern xray microscopes is worth a moment’s pause for thanks. It wasn’t always so! What is available today makes the homebuilt system (Fig. 2.4) that the author ﬁrst encountered look unbelievably crude, and at that point things had already made signiﬁcant advances from earlier years [Kirz 1980c, Rarback 1980]. An amusing anecdote was presented by Arne Engstr¨om in 1980 [Engstr¨om 1980] as he looked back on four decades of work in xray microscopy: Another trend in xray microscopy and xray microanalysis, especially in the ﬁeld of the biomedical sciences, is the increasing sophistication and complexity of systems and equipment for the collection and treatment of experimental absorption data. However, this trend is not unique to this ﬁeld of research. In fact, over the last 20 to 30 years there has been such a fantastic development of commercially available instrumentation for research and development that, in retrospect, the immediate postwar conditions seems very primitive indeed. For example, I remember the presentation of an automatic recording optical microabsorptiometer applicable to cellular analysis at an AAAS meeting in Boston in 1951. The demonstrator said proudly to the audience, which consisted of researchers who were, as was usual then, working with essentially selfassembled equipment: “I can watch this machine working, while standing behind it with my hands in my pockets.” From the back of the room there was a hoarse voice, “I bet you have your hands in the government’s pocket.”
So let’s also pause for a moment of thanks for the ﬁnancial support that has been provided to develop the instrumentation for xray microscopy! Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
260
Xray microscope instrumentation
7.1
Xray sources The characteristics demanded of an xray source depend on the type of xray microscope being used, as was discussed in Chapter 6. Three of the most important characteristics are as follows: • Spectral bandwidth, as will be discussed in Section 7.2.1. Most (but not all!) xray microscopes require some degree of monochromaticity, with E/(ΔE) ranging roughly from 100 to 1000 or more. ´ • Etendue or phase space area, as will be discussed in Section 7.2.2. Scanning microscopes require that the illumination be limited to a beam size–angle product, or phase space area, of about λ in each direction so that Msource 1 (Section 6.4), while fullﬁeld microscopes (Section 6.3) can work with phase space areas up to the number of pixels in the image. One can always use spatial ﬁltering to restrict the e´ tendue of a source at a cost in ﬂux, or diﬀusers or wobbled optics to increase the e´ tendue, as noted in Section 6.3. • Flux, and time structure. One needs enough photons per pixel to obtain an image with suﬃcient signaltonoise ratio (SNR; Section 4.8). Oftentimes the source has a regular time structure where photons are “on,” or being delivered during a time to out of a cycle or repetition time of tr , as shown in Fig. 7.1. This might happen because of the time bunch structure of a synchrotron light source (Section 7.1.4), or of a pulsed laser source used to generate hot plasmas or high harmonic gain, or the recharge time for capacitors in a pinchedplasma source (Section 7.1.3). We deﬁne a temporal duty cycle dt of dt =
to , tr
(7.1)
for which example values are given in Table 7.1. Sources with a timeaveraged spectral brightness Bs,ave have a peak brightness Bs,peak given by Bs,peak =
Bs,ave , dt
(7.2)
so obviously if you are trying to convince people of how great a low repetition rate source is you will emphasize peak rather than average brightness! If one is trying to study highelectricﬁeld phenomena in atomic physics, then all of the photons should be delivered in a very short time to and one might average together signals from a large number of pulses. In “diﬀraction before destruction” experiments such as those discussed in Section 10.6, we again want many photons in a very short time to followed by a long enough time (tr − to ) to read out the detector and deliver a fresh sample to the beam region. In both of these cases a source with high peak brightness Bs,peak is desired, even if the duty cycle is very low. Otherwise, in microscopy we must allow for steadystate heat dissipation in the specimen as discussed in Section 11.1 so that we favor high duty cycles and high timeaveraged brightness Bs,ave . Photoncounting detectors require either (a) the dead time td of the detector to be short compared to the “on” time to , or (b) that Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
261
Source output
Repetition time tr “On” time to
Time
Figure 7.1 Time characteristics of an xray source. In some cases the source might have an “on”
time to after which there is no xray emission for the remainder of a repetition time tr . This applies to the bunch structure of synchrotron light sources, or sources driven by pulsed lasers or electromagnetic discharges, leading to duty cycles (Eq. 7.1) as indicated in Table 7.1. Table 7.1 Duty cycles dt as given by Eq. 7.1, for various xray sources based on the “on” time t0 ,
and the repetition time tr shown in Fig. 7.1. In many cases the time structure of the source can be modiﬁed, and there are considerable variations between sources of a given type. Therefore the values shown here are representative rather than exact. Source Laboratory electron impact Plasma pinch [Partlow 2012] Synchrotron: APS@Argonne Laserproduced plasma [Martz 2012] Highharmonic gain (HHG) [Popmintchev 2018] XFEL: LCLS@Stanford [Emma 2010]
to
tr
dt 1
500 ns 33.5 ps 600 ps 20 fs 50 fs
0.5 ms 153 ns 0.5 ms 1 ms 8.33 ms
1.0 × 10−3 2.2 × 10−4 1.2 × 10−6 2.0 × 10−11 6.0 × 10−12
one collect far fewer than one photon per “on” time on average. Photon integrating detectors are better able to handle many photons arriving in an “on” time to , as will be shown in Fig. 7.15. Having described these general characteristics of xray sources, we take a short detour into photometry before going on to discuss speciﬁc xray source types.
7.1.1
Photometric measures The characteristics of a source can be described by several photometric measures. We use the simple term “intensity” or I to refer to the square of a wave’s complex amplitude (Box 4.1). Deﬁnitions of photometric terms and their symbols vary; those listed below represent a mixture of the recommendations in the Gold Book of the International Union of Pure and Applied Chemistry (IUPAC), the holy writ of optics [Born 1999], and common usage among xray microscopists. • Flux Φ is the number of photons per second (photons/s). Usually this is used in
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
Xray microscope instrumentation
Brightness (ph/sec/mm2/mrad2/0.1% BW)
262
1025
XFELs MAXIV
10
20
Undulators 0RRUH·VODZ
1015 Bending magnets 1010
Xray tubes 1960 1970 1980 1990 2000 2010 2020
Year Figure 7.2 History of the maximum available xray source brightness. This has traced a path
from conventional electron impact sources, to early parasitic use of synchrotrons, to dedicated storage rings as light source facilities, to facilities with low emittance with undulators, to xray freeelectron lasers (XFELS) and the ﬁrst multibend achromat storage ring source (MAXIV in Sweden). The increase in available source brightness has been greater than the wellknown Moore’s “law” in microelectronics [Moore 1965], which noted that the number of transistors that could be incorporated into a single integrated circuit doubled about every two years. Those who have been around computing for a while know how remarkable this trend has been; xray sources have seen even greater advances! Figure adapted from [Jacobsen 2016b] with the kind permission of the Societ`a Italiana di Fisica.
•
• • •
connection with a speciﬁed spectral bandwidth (typically 0.1 percent), giving the spectral ﬂux, but the preﬁx “spectral” is often left out. Fluence F is the cumulative number of photons per area (photons/m2 ). Again, this is usually used in connection with a speciﬁed spectral bandwidth, but the preﬁx “spectral” is often left out. Irradiance IE is the power received per area (W/m2 ); see Eq. B.47 in Appendix B at www.cambridge.org/Jacobsen. Spectral intensity I s is the photon ﬂux from a source per solid angle dΩ with a given bandwidth BW (photons/s/sr/BW). Spectral brightness Bs is the photon ﬂux per area per solid angle per bandwidth (which in the synchrotron radiation community is usually expressed not in base SI units but in photons/s/mm2 /mrad2 /0.1% BW). For a Gaussian source, one can write the spectral brightness as Bs =
1 Φ , Σ x Σy Σ x Σy ΔE/E
(7.3)
where the sizes Σ and divergences Σ are given in Eq. 7.12 for an undulator with ﬁnite electron beam emittance eﬀects included. The coherent ﬂux within a given Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
263
spectral bandwidth is Φc = Bs · λ2 , as was shown in Eq. 4.199. This photometric term is sometimes called “brilliance” in parts of the synchrotron radiation community inﬂuenced by the early planning documents of the European Synchrotron Radiation Facility. The highest available xray source brightness by year is shown in Fig. 7.2. Note that from Eq. 3.7 one can show that ΔE Δλ Δλ =− =  , E λ λ
(7.4)
and the spectral bandwidth leads to the coherence length as given in Eq. 4.181.
7.1.2
Laboratory xray sources: electron impact Most laboratory xray sources are based on the same physics used by R¨ontgen: electrons accelerated to strike a target inside a vacuum chamber. Metals are usually chosen because they have a combination of high melting point, high thermal conductivity, and high electrical conductivity. The electron beam is often produced by heating a ﬁlament to a high enough temperature that an increased fraction of electrons have an energy above the work function energy (Eq. 3.19); they can then be extracted by an accelerating voltage, focused to a spot, and made to strike a metal target. Only a small fraction ( 0.1 percent) of the electron beam power is converted to X rays; the rest ends up as heat. The target emits both a broad spectrum of Bremsstrahlung radiation (German for “braking radiation,” caused by electrons swinging around the dense positive charges of nuclei), and characteristic X rays produced by removing corelevel electrons from the anode material. The characteristic X rays (xray ﬂuorescence; Section 3.1.1) form narrow peaks in the emission spectrum with welltabulated energies [Bambynek 1972] and spectral widths [Krause 1979b]; for example, aluminum Kα1 line emission is into 0.43 eV fullwidth at halfmaximum (FWHM), while for copper the FWHM linewidth is 2.11 eV. The emitted X rays are unpolarized. There are many details involved in optimizing an electron impact source for a speciﬁc application, as discussed in several books [Cosslett 1960, Dyson 1973] and journal articles [Green 1961, Green 1968]: • One can change the target material to choose among various emission lines as shown in Fig. 7.4. One can then use absorption ﬁlters to signiﬁcantly reduce the transmitted fraction of continuum X rays below a ﬂuorescence line. • One can change the energy of the electron beam to excite higherenergy xray ﬂuorescence lines, or to enhance the continuum spectrum at lower energies, as shown in Fig. 7.5. • A fundamental limitation is that the target must not be made to melt (otherwise one has an electron beam evaporation system, such as is used to deposit thin metal ﬁlms on surfaces). The anode can be watercooled, and furthermore one can direct the electron beam spot onto a region near the outer radius of a rotating disk, so that one region on the disk is heated in “ﬂashes” and has time to be cooled by heat
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
264
Xray microscope instrumentation
Electron beam
Apparent source width Apparent source width Takeoff angle Ƨ
Xray attenuation length ƫ1 Metal target
Figure 7.3 Schematic representation of the eﬀect of changing the takeoﬀ angle from an electron impact source. The electrons will penetrate some distance into a target, and spread out laterally (for simulation results on a polymer, see Fig. 11.4, though for highdensity metal targets the electron scattering lengths will be shorter). At a small takeoﬀ angle relative to the surface, the apparent source size will be smaller but also more of the emitted X rays will be reabsorbed over their long path through the target material. At larger takeoﬀ angles, fewer X rays will be selfabsorbed but the source will appear to be larger. Takeoﬀ angles of 25◦ are not uncommon in practice.
•
•
•
•
conduction into the nonirradiated regions while it spins around. Watercooled rotating anode xray sources can involve kilowatts of electron beam power into the anode. One can embed a series of very small metal targets in diamond ﬁlm, which oﬀers very high heat conductivity and xray transparency. With a series of metal target regions in a plane illuminated by electrons coming from above, an edgeon view of the plane can have X rays from all of these small targets add up for increased ﬂux [Yun 2016]. Sources of this type are commercially available from Sigray. An alternative to cooling the target material is to preheat it into a liquid jet, and direct a “hotter” electron beam onto that jet [Hemberg 2003] since there is no need to worry about melting. Sources of this type are commercially available from Excillum. For maximum xray ﬂux, one can direct a signiﬁcant electron current into a largesized spot. If the electron beam is focused not to a round spot but a line, one can take advantage of the higher cooling provided by the nonirradiated material along the sides of the line, yet obtain a roughly circular apparent source size by using a shallow takeoﬀ angle (Fig. 7.3; 25◦ is not uncommon, and 6◦ is sometimes used) along the direction of the line. One can instead go to a very small, micrometersized or smaller electron beam spot on a thin target in a microfocus source. Lateral beam spreading will be greatly reduced in a thin target (see Fig. 11.4; while that’s for a polymer rather than a metal, you get the idea), leading to a smaller xray source. The total xray ﬂux will be much lower, but due to the nature of heat conduction into the large nonirradiated area (see Eq. 11.4) the xray ﬂux per area can be much higher. This
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
Spectral intensity (photons/s/sr/200 eV BW/watt on target)
7.1 Xray sources
265
KƠ1,2 Copper
1010
Cu and W with 40 kV electrons
LƠ1,2 Kơ1 Lơ1,2
Tungsten 109
Continuum
LƢ1 Continuum
108 3
10
30
Photon energy (keV) Figure 7.4 Laboratory xray source spectra for targets made of W (tungsten) and Cu (copper),
calculated for an electron beam accelerating voltage of 40 kV. This calculation (courtesy of Michael Feser) is for a 25◦ takeoﬀ angle from the target, and includes absorption of a 250 μm thick beryllium window. Since the actually emission lines are typically 1–3 eV wide, the ratio of line to continuum or Bremsstrahlung emission will change on this plot as one changes the assumed bandwidth (BW).
makes for higher source brightness, and furthermore the ratio of line emission to continuum radiation is also improved. Microfocus sources have been used for pointprojection xray microscopy (Section 6.2), and even for propagationbased phase contrast imaging (Section 4.7.2) using scanning electron microscopes to produce the small electron beam source [Mayo 2003].
Electronimpact laboratory xray sources are readily available from a number of commercial manufacturers, and with a variety of source characteristics. As one example, one can purchase a gallium liquid metal jet source with a Kα brightness of 6.5 × 1010 photons/s/mm2 /mrad2 total, with 58 percent going into the Kα1 line at 9224.8 eV (with linewidth 2.59 eV FWHM), and 29 percent going into the Kα2 line at 9251.7 eV (with 2.66 eV linewidth). If one could isolate the Kα1 line with 100 percent eﬃciency, this would yield a spectral brightness of 1.3 × 1011 photons/s/mm2 /mrad2 /0.1% BW. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
266
Xray microscope instrumentation
Spectral intensity (photons/s/sr/200 eV BW/watt on target)
1010
LƠ1,2
KƠ1
Tungsten
109
Lơ1,2
40 kV
KƠ2
LƢ1
90 kV
108
.ơ1
160 kV
107 3
10
30
100
Photon energy (keV) Figure 7.5 Laboratory xray source spectra for a W (tungsten) target calculated for three diﬀerent electron acceleration voltages (40, 90, and 160 kV). As in Fig. 7.4, this calculation (courtesy of Michael Feser) is for a 25◦ takeoﬀ angle, and again includes absorption due to a 250 μm thick beryllium window.
7.1.3
Unconventional laboratory xray sources An alternative way to generate X rays is to create a hot, highdensity plasma (an extreme limit of this was discussed in Box 2.1). This can be done in the laboratory by focusing an intense laser pulse on a material, or by using pulsed electromagnetic “pinching.” This plasma can produce X rays by two distinct mechanisms. One is simply blackbody radiation based on the temperature of the plasma. The Planck blackbody radiation distribution of photons versus photon energy has an maximum at Epeak = 2.821 kB T,
(7.5)
where the Boltzmann constant is kB = 8.62 × 10−5 eV/K (Eq. 3.20). That is, to produce a blackbody peak at a soft xray energy of 300 eV, one requires a temperature of 1.2 × 106 K, while 10 keV requires a temperature of 4.1 × 107 K. Even at somewhat lower temperatures, one can create a plasma with fully or partially ionized atoms, and have suﬃcient thermal energy to excite electronic transitions to drive xray ﬂuorescence. Because most of the atoms are at least partially ionized, the energies of various electron orbitals will be shifted, and the emission energies will be diﬀerent than those listed in standard tabulations of neutral atoms. Hot plasma sources are pulsed sources (unless you can arrange to have a longlived Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
267
hot, dense plasma, in which case a magnetic conﬁnement fusion energy researcher would love to chat with you!). Therefore one needs to replace the “target” material after each pulse. In pulsed electromagnetic pinch sources, the medium can be a gas, in which case replenishment is straightforward; such sources have been used for soft xray microscopy [Niemann 1990] and tomography [Duke 2014]. If one instead uses a highintensity laser as a means of generating a hot plasma, one early approach to producing a continuouslyregenerated target was to use the key components of a magnetic tape reeltoreel system [Michette 1988, Michette 1994, Michette 1997] (one could then set an xray experiment to a soundtrack of music from the 1960s!). A more recent approach has been to use pulsed laser excitation on a narrow stream of liquid or liquid droplets of ethanol, ammonium hydroxide, or liquid nitrogen for emission at various lines in the 360–500 eV range [Rymell 1993, Rymell 1995, Berglund 1998, Martz 2012]. These latter sources can deliver a timeaveraged spectral brightness of ∼1 × 1012 photons/s/mm2 /mrad2 /0.1% BW [Martz 2012] at 500 eV, are being used in a successful series of compact laboratory transmission xray microscopes [Berglund 2000, Takman 2007, Fogelqvist 2017]. With these pulsed plasma sources, the peak brightness Bs,peak can be a million times higher (Table 7.1 and Eq. 7.2), but the timeaveraged brightness Bs,ave depends on the repetition rate of the laser, which is often limited by cost and laser cooling considerations. Can one create X rays in a laboratory in a more ﬁnely controlled manner than with a hot plasma, and with greater source brightness than with an electron impact source? One very active research ﬁeld involves the excitation of electrons in an atom by successive electric ﬁeld cycles in highintensity lasers in a method called high harmonic gain, or HHG [L’Huillier 1993], which has seen considerable development for the production of extended ultravoilet (EUV) and soft xray beams [Rundquist 1998, Bartels 2002]. Because all of the electrons in the laser focus are driven synchronously by the laser’s electric ﬁeld, their emission is coherent, from a spot size of typically 50–200 μm and into an angle consistent with one spatially coherent mode, or Msource = 1. These sources provide considerable light output (∼5 × 1010 photons/s/1% BW) at EUV photon energies of 92 eV, so they have been used in impressive coherent diﬀraction imaging experiments with subwavelength spatial resolution [Gardner 2017]. The photon ﬂux drops oﬀ at higher photon energies [Heyl 2017, Fig. 3], though in one recent example significant emission was obtained over a wide enough photon energy range that one could obtain carbon XANES (Section 9.1.3) spectra, and even EXAFS (Section 9.1.7) spectra [Popmintchev 2018] around the Sc L edges near 400 eV and the Fe L edges near 700 eV. At 300 eV or λ = 4.1 nm, the coherent ﬂux with a 1 kHz driving laser is about 109 photons/sec/1 percent BW which with Msource = 1 works out to a timeaveraged brightness of about 6 × 1012 photons/s/mm2 /mrad2 /0.1% BW. With a duty cycle dt (Table 7.1) of 2 × 10−11 , the peak brightness is 3 × 1023 in the same units.
7.1.4
Synchrotron light sources Target melting is a limitation of electron impact sources, and target vaporization is inevitable in laser or pinch sources, or in HHG. So is there a way to remove the target
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
268
Xray microscope instrumentation
and its limitations? It sounds a bit like the Zen koan about one hand clapping. . . In fact this can be realized by subjecting relativistic electron beams to strong magnetic ﬁelds. While all the ingredients to understand this were in place from Maxwell’s theory of electromagnetism in the 1860s [Maxwell 1861] and Einstein’s theory of special relativity [Einstein 1905], the discovery of synchrotron radiation was an accident that happened to those with prepared minds [Blewett 1998]. In an era where vacuum chambers were often fabricated by glassblowers, researchers looking to see if there were electrical sparks from a balky synchrotron accelerator at General Electric noticed the steady emission of light from the electron beam. They soon came to understand its origin [Elder 1947], based on earlier theories outlined by Iwanenko and Pomeranchuk [Iwanenko 1944] and ﬁlled in more completely by Schwinger [Schwinger 1949]. There were then early experiments [Hartman 1988, Robinson 2015] ﬁrst at Cornell in the 60– 200 eV region [Tomboulian 1956], and then at NIST in Gaithersburg in the 25–70 eV region [Madden 1963, Codling 1965]. Even then, at many accelerators synchrotron radiation was more of a curiosity, or even a nuisance if one’s goal was to accelerate charged particles to high energies for collisions with other particle beams or ﬁxed targets. One such particle physics machine was the DESY synchrotron in Hamburg, where the beam was ramped up every 20 msec to 7.5 GeV before being “dumped” onto a ﬁxed target. While Kenneth Holmes had considered synchrotron radiation from this machine for xray generation in the mid1960s, it was in 1969 that an increase in current to 10 mA made it attractive for thenPhDstudent Gerd Rosenbaum to work with Holmes to build a focusing xray monochromator at a VUV beamline. This gave them a 150fold gain in intensity over an xray tube, so that with Jean Witz they were able to record xray diﬀraction from a muscle ﬁber. They then installed a seperate access tunnel to the synchrotron within which to install their monochromator, and an experimental “bunker” (what we would now call a hard xray beamline and a hutch) for further studies in small angle diﬀraction [Rosenbaum 1971, Huxley 1997, Holmes 1998]. Already in their ﬁrst paper [Rosenbaum 1971] they even predicted that the xray beam could, in the future, be focused down to as small as 200 μm! These early eﬀorts at parasitic use of synchrotron radiation from machines built for highenergy physics eventually led to the development of synchrotron storage ring light sources. It is worthwhile therefore to make the terminology clear, even though the word “synchrotron” is often used as synonym of “storage ring”:
• A synchrotron is an accelerator in which dipole magnets are used to establish a closedloop orbit (usually approximately circular). In today’s examples, a radio frequency (RF) cavity is used to add kinetic energy to charged particles. Synchrotrons can be used to ramp up the charged particle energy by simultaneously increasing the power (and thus voltage) in the RF cavity, and increasing the ﬁeld in the dipole magnets. At the top of the energy ramp, one can deﬂect the charged particle beam into a ﬁxed target, or make a collider by causing collisions between a counterrotating charged particle beam (such as protons on antiprotons at the Large Hadron Collider – the LHC – at CERN near Geneva). Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
269
• A storage ring is a synchrotron that is designed to be operated primarily at one single, ﬁxed kinetic energy for the charged particle beam. • A synchrotron light source is a storage ring optimized for the production of electromagnetic radiation. Thus synchrotron light sources are designed to maintain a constant beam energy and nearconstant current over time, so as to be optimized for the production of steady XUV and xray radiation beams. The storage ring mode was ﬁrst used for parasitic xray production on the SPEAR ring at Stanford, and this in turn inspired the development of synchrotron light sources such as Aladdin in Wisconsin, NSLS at Brookhaven, BESSY in Berlin, and others. Today, synchrotron light sources with electron beam energies of 1–2 GeV excel at producing soft X rays up to about 1000 eV, machines with an energy of about 3 GeV work well in producing hard X rays up to about 10 keV, and machines in the 6–8 GeV range produce X rays with energies up to about 100 keV. Beam currents tend to be in the 100–500 mA range, and since the beam is organized into discrete electron bunches that “surf the wave” in the RF accelerating cavities, photons arrive in 1–100 picosecond pulses spaced 1–200 nanoseconds apart, as indicated in Table 7.1 (the details vary by facility). Along with the development of undulators (Section 7.1.6), this has enabled the tremendous increase in available xray source brightness shown in Fig. 7.2. This gives rise to a generational history of lightsources, with the ﬁrst generation representing parasitic use, the second generation representing machines designed from the start to produce synchrotron radiation (the 1980s), the third generation being machines with lower emittance and undulator xray sources (the 1990s), and the fourth generation being diﬀractionlimited storage rings (beginning with MAX IV in Sweden around 2017). The GeVrange electron beam kinetic energies E listed above for various storage rings are, of course, highly relativistic, in that the Lorentz factor γ (Eq. 4.286) of γ =1+
E me c 2
is in the thousands since the electron rest mass times the speed of light squared is me c2 = 511 keV (Eq. 3.29). From Eqs. 4.286 and 4.287 we can also see that the velocity v of electrons in these storage rings is very close to the speed of light c, since 1 c−v = 1 − 1 − 1/γ2 2 . (7.6) 1−β= c 2γ This will become important when considering undulator sources. At the highly relativistic energies of electrons in synchrotron light sources, radiation from a moving charge such as a dipole oscillating transverse to its linear motion is compressed into an angle of about 1/(2γ), or less than a milliradian about the viewing direction. When combined with the high xray ﬂux that can be achieved, this leads to spectacularly high intensity and brightness when compared to laboratory xray sources. Storage rings have a builtin economy factor to their operation: one only has to accelerate the electrons up to relativistic velocities once, upon beam injection. After that, electrons will lose energy due to spontaneous radiation emission as they travel in their Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
270
Xray microscope instrumentation
orbit, but the RF cavities need only supply enough power to exactly compensate for that loss, which might be a few hundred keV per electron per orbit (compared to several GeV per electron to achieve beam injection). Even so, they are expensive facilities, costing roughly 108 –109 US dollars to construct and about onetenth of that cost for annual operations. Therefore they are usually developed as regional, national, or even international facilities, as catalogued by the web site www.lightsources.org, with dozens in operation worldwide and most including at least one xray microscopy beamline. While these are expensive facilities to construct and operate, they can usually host about 20– 50 beamlines running simultaneously, so the persimultaneousexperiment cost is not far from that of topend electron microscopes. While some industrial users pay a fee to use these sources for proprietary work, in nearly all cases one obtains nofee access via peerreviewed scientiﬁc user proposals. To understand the properties of synchrotron radiation sources in more detail, we must consider the phase space characteristics of the electron beam. The magnetic lattice of the storage ring causes the beam to undergo a series of focusing and defocusing operations, but as discussed in connection with Eq. 4.189 the product of the electron beam size times divergence at various beam foci is a constant. In storage rings, the product is described by the emittance of the electron beam, while β gives the ratio of the size over divergence at one particular local focus (both quantities can have separate values in the horizontal and vertical planes). The relationships between these quantities and the standard deviation source size of the electron beam are and σy = y βy , (7.7) σx = x βx while the divergences are characterized by and σx = x /β x
σy =
y /βy .
(7.8)
(The ratio of vertical to horizontal emittance y / x is known as the emittance coupling of the machine; values of 10 percent are not uncommon in today’s machines). The emission of radiation by a single electron has its own intrinsic emittance r of r =
λ , 2π
(7.9)
where the result of Eq. 7.9 is only approximate, depending on the criterion chosen to characterize size and divergence [Elleaume 2003, Eq. 33]. (This is given by others [Kim 1986] as σr = λ/(4π), and both of these 1σ values diﬀer from the FWHM value of r = λ for Msource = 1 as discussed in Section 4.4.6). When collecting radiation from a inﬁnitessimal emittance source of length L along the electron beam’s trajectory, the intrinsic beta function associated with radiation emisssion [Elleaume 2003, Eq. 33] is βr =
L π
and the intrinsic source size σr and divergences σr are √ 2λL λ and σr = . σr = 2π 2L
(7.10)
(7.11)
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
271
In the Gaussian approximation, the net source size is given by a convolution of electron beam and intrinsic photon emittances, leading to net source sizes and divergences characterized by 2 Σ x = σ2r + σ2x and Σx = σ2 r + σx 2 Σy = σ2r + σ2y and Σy = σ2 (7.12) r + σy . In circa 2016 storage rings, the horizontal emittance x is often about 100 times larger than the radiation wavelength, while the vertical emittance is a few times the wavelength, but fourthgeneration storage rings are beginning to appear which use multibend achromat lattice designs with nearwavelength emittance in both directions [Eriksson 2014].
7.1.5
Bending magnet sources Electrons are kept on a roughly circular orbit in the storage ring by a series of dipole or bending magnets distributed around the machine’s magnetic lattice. Within a dipole magnet with magnetic ﬁeld B, the electron’s radius of curvature ρe is given by ρe =
βE eB
or
ρe (meters)
10 βE (GeV) , 2.998 B (T)
(7.13)
where e is the electron charge, E the electron beam kinetic energy, and β is the relativistic velocity (nearly 1) as given in Eq. 7.6. That is, the fewTesla magnetic ﬁeld in a bending magnet gives rise to the centripetal acceleration needed to maintain the circular orbit. Because all acceleration of charged particles results in the emission of radiation, bending magnets are good xray sources. In the vertical, the emission is concentrated within a 1σ angle of 1/2γ as noted above, while in the horizontal one might collect radiation over an acceptance angle θ x,a an arc of radiation that depends on limiting apertures, with some bending magnet beamlines accepting light over several milliradians in the horizontal. Dipole magnets at synchrotron light sources tend to have large β x,y values (large size σ x,y relative to angular divergence σx,y ) so that the beam size does not change much over the length of the dipole or bending magnet. (This also means that the equivalent intrinsic radiation source size of Eq. 7.11 is unimportant in bending magnet sources). The eﬀective source size σθx in the horizontal is even larger [Williams 2005, Eq. 20] as one “views” the arc of the source over the fullwidth acceptance angle θ x , giving a 1σ equivalent source size of ρe 2 θ . σθx (7.14) 16 x For the Advanced Photon Source at Argonne, the source size is characterized by σ x = 83 μm and σy = 35 μm, while if one accepts θ x = 1.5 mrad from the source the eﬀective horizontal source size becomes σθx = 55 μm (using Eq. 7.13 to obtain a bending radius of ρe = 38.9 m). This gives a net result of (7.15) Σ x,BM = σ2x + σ2θx Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
272
Xray microscope instrumentation
so in this particular case with σ x = 83 μm, the extra source size factor of σθx = 55 μm (Eq. 7.14) is negligible. The spectrum of bending magnet radiation (Fig. 7.6) is quite strong in the THz, infrared, and visiblelight ranges [Williams 2005], and its output increases up to a critical energy Ec , after which it declines steeply, as shown in Fig. 7.6. This critical energy (the dividing line in terms of matching emitted power above and below Ec ) is given by 3eBγ2 2me Ec (keV) = 0.665 (E 2 [GeV]) (B [T]), Ec =
(7.16) (7.17)
where the second expression allows for simple calculations. For the Advanced Light Source in Berkeley with E = 1.9 GeV and B = 1.27 T, standard bending magnets have a critical energy of Ec = 3.0 keV, while for the Advanced Photon Source at Argonne with E = 7.0 GeV and B = 0.6 T the critical energy is Ec = 19.6 keV. Bending magnet radiation is linearly polarized in the horizontal plane, due to the direction of the electron’s bend. Radiation slightly above or below the orbit is elliptically polarized. At lower energies, the output is strong enough that these lightsources are also among the brightest sources for broadspectrum infrared light, and several synchrotron lightsources host infrared spectromicroscopy programs.
7.1.6
Undulator sources Bending magnets at synchrotron light sources are very bright, and have served as excellent sources for both fullﬁeld (TXM) and scanning (SXM) xray microscopes. However, it was made clear in Chapter 5 that quasimonochromatic radiation is required for refractive (Eq. 5.9) and diﬀractive (Eq. 5.33) xray optics, and the highest resolution Kirkpatrick–Baez mirrors used for practical hard X ray nanofocusing applications have multilayer reﬂective coatings (and thus limited spectral bandpass) so as to give larger grazing incidence reﬂection angles and higher spatial resolution [da Silva 2017]. Thus it is highly desirable for xray microscopy to take the broad spectrum of a bending magnet (Fig. 7.6) and somehow compress most of the output into a narrow spectral line, to decrease the horizontal source size relative to the extended source of a bending magnet, and to decrease the angular divergence. This is what undulators do. Undulators were ﬁrst discussed [Ginzberg 1947, Motz 1951] as microwave sources, and they were ﬁrst demonstrated at Stanford [Motz 1953] as a way to generate microwave radiation from a 100 MeV electron linear accelerator (linac). The history of their development at storage rings has been nicely described [Winick 1981], with a landmark for xray science being the installation in 1980 of a permanent magnet undulator at the SPEAR ring at Stanford, which was able to produce keV photon energies [Halbach 1981]. The idea quickly spread, so that all synchrotron light sources built since then have included undulators for delivering bright xray beams. An undulator is named for the motion of the electron beam in the device’s magnetic ﬁeld. By creating a sinusoidal magnetic ﬁeld with Nu periods each of period length λu (usually using permanent magnet blocks to produce a ﬁeld that oscillates up and down),
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
273
Spectral intensity (1019 photons/s/mrad2/0.1% BW)
1014
1013
Critical energy Ec
1012
1011 0.1
1
10
100
Photon energy (keV) Figure 7.6 Spectral intensity I s for a standard bending magnet source at the Advanced Photon
Source at Argonne National Laboratory. Bending magnet sources have strong output at lower photon energies which increases toward a critical energy Ec (Eq. 7.17). The brightness is found by dividing by Σ x,BM (Eq. 7.15) and σy (Eq. 7.7), giving in this case a numerical factor of 340 mm−2 , or a spectral brightness of about Bs 3 × 1016 photons/s/mm2 /mrad2 /0.1% BW at Ec = 19.6 keV. Calculation provided by Roger Dejus of the Advanced Photon Source.
a relativistic electron in a straight section of the storage ring lattice is made to oscillate side to side in an undulating motion. When viewed headon, each of the Nu periods looks like a set of Nu inphase dipole radiation emitters, so that one has a spectral concentration into a bandwidth of ΔE/E 1/Nu , and a coherent superposition of radiation within an angular range of 1/Nu . However, these properties are changed by special relativity in two important ways. The ﬁrst is that from the electron’s perspective, the magnetic ﬁeld lattice period λu is contracted to λu /γ, where γ is the Lorentz factor of Eq. 4.286. The second is that when one goes from the frame of the electron’s average motion back into the laboratory frame, a relativistic Doppler shift applies which upshifts the frequency of the radiation by another factor of 1/(2γ). The net eﬀect goes like λu /(2γ2 ), which explains how one can obtain nanometerscale radiation wavelengths from centimeterscale magnetic periods using multiGeV storage rings. A more detailed calculation of the emission wavelength involves considering the time diﬀerence between the emission of electric ﬁeld peaks (at the location of magnetic ﬁeld peaks, at undulator halfperiods) which travel at the speed of light, and transit of the electron to the next ﬁeld peak which is at a velocity slightly less than that of light (Eq. 7.6, with an added time delay due to the slight transverse sinusoidal motion of the electron, which is in turn aﬀected by the magnetic ﬁeld strength). The end result Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
274
Xray microscope instrumentation
5•1018 Undulator harmonics
m=1
m=3
Spectral brightness (ph/s/mm2/mrad2/0.1% BW)
K=2.61
m=5
K=0.5 1•1018
m=7 m=9
1•1017
0
10
20
30
40
50
Photon energy (keV) Figure 7.7 Tuning curve of brightness versus photon energy for the λu = 3.3 cm undulator (“Undulator A”) at the Advanced Photon Source at Argonne National Laboratory. Shown here is the tuning range for each undulator harmonic as given by Eq. 7.18, over a range of undulator K values (Eq. 7.19) of K = 0.5 to 2.61. Calculation provided by Roger Dejus of the Advanced Photon Source.
is that the undulator emits radiation at a series of harmonic wavelengths λm given by [Hofmann 1978, Coisson 1982, Krinsky 1983] λm =
λu K2 + γ2 θ2 ), (1 + 2 2 2γ
(7.18)
with only the odd harmonics (m = 1, 3, 5, . . .) appearing onaxis (θ = 0) if the electron beam emittance is suﬃciently small. In this expression, K is a measure of the peak magnetic ﬁeld strength B0 given by K=
eB0 λu , 2πme c
(7.19)
where e is the charge and me is the rest mass of the electron. The value of B0 and therefore K can be adjusted by changing the mechanical separation between the top and bottom halves of a permanent magnet structure. Thus one can tune the wavelength and therefore the photon energy of the radiation, so as to place undulator spectral peaks at the desired photon energy over quite a wide range (an example of an undulator radiation tuning range for the Advanced Photon Source at Argonne is shown in Fig. 7.7). The parameter K gives the maximum deﬂection angle θe of the electron from its normal trajectory according to K (7.20) θe = . γ Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
Spectral brightness (1019 photons/s/mm2/mrad2/0.1% BW)
7.1 Xray sources
Harmonics: n=1
4
275
APS Undulator A, K=1.5
3
n=3 2
1
n=5 n=4
n=2
n=6
n=7
0 0
10
20
30
40
50
Photon energy (keV) Figure 7.8 Spectral brightness of a λu = 3.3 cm undulator (referred to locally as “Undulator A”) at the Advanced Photon Source at Argonne National Laboratory. Undulator sources have a series of harmonic peaks at wavelengths given by Eq. 7.18, which are tunable by adjusting the mechanical gap between top and bottom halves of permanent magnet undulators (thus tuning the onaxis magnetic ﬁeld strength, and undulator K value of Eq. 7.19); the plot here is for K = 1.5. The even harmonics only show up onaxis due to convolution of the undulator output with the electron beam divergence. Calculation provided by Roger Dejus of the Advanced Photon Source.
Since the radiation waveﬁeld from one magnetic period extends out to an angle of about 1/γ about the electron’s forward motion, this means that large values of K start to produce discontinuities in time of the electric ﬁeld received on axis, which moves one from a more sinusoidal ﬁeld pattern over time with no harmonics when K 1, to a more squarewavelike ﬁeld pattern with increasing strength of high harmonics as K is increased beyond 1. This explains the increase in the brightness of high harmonics at the expense of the ﬁrst harmonic as shown in Fig. 7.7. (At much higher K values like K = 3 or more, one moves from the properties of an undulator to what is called a “wiggler,” which produces higherenergy X rays but with less brightness [Krinsky 1983]; wigglers are generally of less interest for xray microscopists.) Returning to undulators, if one ignores electron beam emittance, the angular width of the m = 1 harmonic is given by
1 1 + K 2 /2 (7.21) Δθ = γ 2Nu and the spectral bandwidth is given by Δλ 1 . λ mNu
(7.22)
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
276
Xray microscope instrumentation
Thus undulators reach the goal of compressing radiation into the desired spectral bandwidth, and divergence angle. As was noted, the most common undulator magnet structure today uses a set of permanent magnet blocks mounted on upper and lower support girders, with mechanical adjustment of their separation distance used to tune the peak ﬁeld strength and thus the K value. This arrangement generates linearly polarized beams, with the polarization in the horizontal plane. More complex arrangements of magnets can make the electron beam undulate in the vertical plane (not recommended in today’s storage rings, since vacuum chambers are usually wider than they are tall to accommodate large horizontal beam emittance), or follow a helical path. Elliptically polarizing undulators (EPUs) have four sets of magnets that can be shifted along the beam axis with respect to each other to provide a full variety of electron beam motions, creating any form of linear or circular polarization. One can also use currents in coils to produce the magnetic ﬁeld, with superconducting undulators gaining in popularity at present. As can be seen from Fig. 7.8, undulators give a timeaveraged brightness Bs,ave that is orders of magnitude higher than what is available from bending magnet sources. They are the brightest sources of X rays available at synchrotron light sources, and are in the highest demand. However, their brightness is aﬀected by the parameters of the electron beam in the storage ring. The presence of the θ2 term in Eq. 7.18 means that ﬁnite electron beam divergences in the storage ring will eﬀectively put some θ 0 radiation onto the undulator axis, which is why even harmonics m = 2, 4, . . . begin to appear in undulator spectra with thirdgeneration storage rings, as shown in Fig. 7.8. As noted in Section 4.4.6, the phase space area of a coherent photon beam is approximately equal to λ; at the wavelength of peak emission power, undulator radiation from an inﬁnitesimal electron beam can be approximated [Kim 1986, Onuki 2003, Kim 2017] as being from a Gaussian source characterized by a FWHM size (Fig. 4.4) of 2.35σr with √ √ 2λL 2λ Nu λu = , (7.23) σr = 2π 2π and a FWHM angular divergence of 2.35σr with λ . σr = 2Nu λu
(7.24)
The electron beam itself has a emittance, or phase space area, which leads to an electron beam size characterized by Eq. 7.7 and a divergence characterized by Eq. 7.8 In the Gaussian approximation, the combined net source size and divergence is as given by Eq. 7.12. In thirdgeneration storage rings, the horizontal emittance is often about 100 times larger than the radiation wavelength (and “natural” undulator emittance) so that Msource 100 in the xˆ direction, while the vertical emittance is a few times the wavelength so that Msource 3–10 in the yˆ direction. Fourthgeneration storage rings use a multibend achromat lattice design with nearwavelength emittances (and thus Msource 1, so that they are called diﬀractionlimited storage rings) in both directions [Eriksson 2014]. This is continuing a remarkable trend in the increase in available xray brightness, as shown in Fig. 7.2. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
277
Much information about undulator radiation can be found in recent books about the topic [Onuki 2003, Kim 2017].
7.1.7
Inverse Compton scattering sources The approximately centimeterscale magnetic period length λu in permanent magnet undulators is dictated by the achievable ﬁeld strength in the magnetic materials, while electromagnetic undulators again involve an interplay between achievable current density in wire windings of decreasing size. This means that one needs GeV electron beam energies to produce keV photons. What if one could dramatically shrink λu from about a centimeter to about a micrometer? From Eq. 7.18 we see that we could then reduce γ2 by a factor of 104 , and thus go from around 1 GeV beam energies to roughly 10 MeV, which can be obtained with much smaller and less expensive accelerators. It might seem that the way to achieve such a short period is to replace the mechanical structure providing a sinusoidal magnetic ﬁeld with a way of producing an electromagnetic ﬁeld with a periodicity of about a micrometer: an intense visiblelight laser ﬁeld! While the magnetic ﬁeld associated with light is not very large (Appendix B.3 at www.cambridge.org/Jacobsen), the electric ﬁeld introduces transverse velocity kicks to the electrons in a way that will be described in the next section. But just as one can use a classical description of light as electromagnetic waves on Mondays, Wednesdays, and Fridays, and a quantum description in terms of photon momentum the rest of the week, one can consider the transfer of momentum produced by backscattering a visible light photon from an energetic electron in a process known as inverse Compton scattering (Eq. 3.28). While this approach has not yet found application in submicrometer xray microscopy which is the focus of this book, it has been developing rapidly [Hajima 2016] and it has been used for largerscale xray imaging [Achterhold 2013] using a commercial source [Eggl 2016] from Lyncean Technologies that was compact enough for one (wellfunded) laboratory to purchase and operate.
7.1.8
Xray freeelectron lasers (FELs) As we have seen above, the key to the increased brightness provided by undulators is the correlation of radiation waveﬁelds produced by one electron traversing all the Nu periods in the undulator. This increases the radiation power only linearly with the length Nu λu of the undulator, though it does compress the radiation into narrow spectral peaks (Eq. 7.22) and a narrower angular distribution (Eq. 7.21). What about a large number of electrons together? In a storage ring, the Ne electrons in a bunch are stochastically distributed due to radiation emission over an orbit of the machine, so each and every time one adds up the electric ﬁelds from radiation of individual electrons via an incoherent sum, leading to an intensity that scales with Ne (see Eq. 4.9). Most storage ring operating modes have many bunches circulating simultaneously, and again these bunches are uncorrelated on the timescale of λ/c so the xray ﬂux scales linearly with the electron beam current in the ring. What if we could somehow produce some correlation of the radiation from the Ne
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
278
Xray microscope instrumentation
e
105
reg
im
Selfamplification buildup
al g
ain
Saturation
ion :e xp on en ti
106 Spontaneous emission
ula t Sim
107
ѥP
Pulse energy (joules)
104
Ѥ
108 0
2
4
6 8 10 Undulator length (m)
12
14
Figure 7.9 Illustration of the selfampliﬁed spontaneous emission (SASE) principle in
freeelectron lasers (FELs). An electron bunch enters a long undulator with an initially random distribution of electron positions as shown at left, but as the undulator radiation ﬁeld builds up the electrons begin to be bunched in position, leading to a correlation of their electric ﬁelds. This continues through the exponential gain regime until saturation occurs. Shown here are the experimental measurements of perpulse radiation energy versus undulator length for the ﬁrst soft xray SASE FEL at DESY [Rossbach 2003], where the saturation regime agrees very well with simulation results [Saldin 1999]. Inset ﬁgures of the electron bunches provided by Rolf Treusch of DESY.
electrons at a speciﬁc wavelength and viewing angle? There are a lot of electrons in a bunch (Ne = 1 × 1011 in each bunch when the Advanced Photon Source at Argonne runs in 24bunch mode), so the potential gains are huge! This is the name of the game for freeelectron lasers. In a typical visiblelight laser, an optical cavity with highreﬂectivity mirrors at either end is used to build up a strong light ﬁeld within the gain medium so as to use the mechanism of stimulated emission from atoms. Thinking mainly of visible light (though considering 14.4 keV X rays too), John Madey at Stanford proposed a scheme of using an undulator with an optical cavity to cause bunching of the electrons in the cavity’s standing wave ﬁeld distribution in a process he termed “stimulated emission of Bremsstrahlung” [Madey 1971], with further theoretical developments [Hopf 1976, Colson 1977] providing insight from semiclassical theories. However, making an optical cavity for X rays using normal incidence mirrors is not practical, as Sections 3.6 and Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.1 Xray sources
279
4.2.4 made clear. How then to achieve the required bunching without an optical cavity? The answer arrived in the form of selfampliﬁed spontaneous emission, or SASE (pronounced like “sassy”) [Kondratenko 1980, Bonifacio 1984]. As an electron bunch traverses a long undulator, the spontaneous radiation in the upstream end eventually builds up to produce enough of an electric ﬁeld that individual electrons gain positive or negative velocity kicks depending on whether they are “surﬁng downhill” or “climbing uphill” on the ﬁeld; this causes progressive bunching of the electron beam as shown in Fig. 7.9. Because the SASE process is aﬀected by the particular initial conﬁguration of the electrons in the bunch, there is a “startup from noise” characteristic to the SASE radiation pulse such that there are bunchtobunch ﬂuctuations in the details of the electron correlations, leading to ﬂuctuations in the exact radiation wavelength, beam direction, and beam energy (one can “seed” the pulses by using harmonics of lower wavelength coherent radiation sources to improve things in seeded FELs). The SASE eﬀect only applies to those electrons that are within a single spatial coherence mode (or Msource 1 as described in Eq. 4.198) of the emitted radiation, which means an electron beam emittance smaller than the photon emittance, or λ, is required. In spite of early attempts to develop storage ring FELS, the transverse and longitudinal spreading of the bunch that occurs in repeated orbits in a storage ring means that linear accelerators oﬀer overwhelming advantages for FEL operation. Thus while one would like to use storage rings due to their economy of operation as noted in Section 7.1.4, FELs use linear accelerators or “linacs” in which many GeV of energy is invested in each electron in order to produce keV X rays from one pass through the undulator, after which the electron beam is “dumped.” While one can switch successive electron bunches into separate long undulators, it also means that one can at most operate a few experimental stations at once, unlike the 40–60 that operate simultaneously at some storage ring synchrotron light sources. This makes FELs much more expensive to build and operate, thus aﬀecting the availability of beamtime. In addition, in order to maintain the required magnetic ﬁeld uniformity in 100 meter long undulators, most SASE FEL undulators run at ﬁxed magnetic ﬁeld strength so that photon energy tuning requires adjustment of the linac accelerating energy, which is less straightforward. In spite of their access limitations, and the “startup from noise” ﬂuctuation limitations of SASE FELs, the achievable gains in instantaneous brightness are huge, approaching a factor of the number of electrons Ne in the bunch as one nears saturation. Therefore xray FELS (XFELs) have enabled radical and exciting new destroybutdiﬀract approaches to xray microscopy as described in Section 10.6. For approaches such as scanning microscopy, spectromicroscopy, or tomography, the specimen must survive multiple illumination pulses so in these cases one must worry about the antiGoldilocks condition and the “noﬂy zone” discussed in Section 11.1.1. The development of FELs, and the ﬁrst demonstration of the SASE eﬀect with EUV [Rossbach 2003] and hard xray radiation [Emma 2010], involves a rich interplay of personalities and big science politics. One can get a glimpse of this from personal perspectives provided by Madey [Madey 2016], and by Claudio Pellegrini, who helped arrive at the SASE picture [Bonifacio 1984] and who played a major role in instigating hard xray FEL development [Pellegrini 2012, Pellegrini 2017]. The various stories are Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
280
Xray microscope instrumentation
so complex that they have even shaped how patent law applies to university research [Cai 2004].
7.2
Xray beamlines In a visiblelight microscope, one would never simply place the specimen on top of the ﬁlament of a light bulb or the exit aperture of a laser, mount an objective lens, and be done with it. The same applies to xray microscopes: some sort of optical transport system is used to deliver the xray beam from the source to the specimen, and to condition the illumination properties as described at the start of Section 7.1. This system is called a “beamline” at synchrotron light sources and XFEL facilities, and in fact this word is also sometimes used to describe xray beam transport systems for other source types, too. The beamline should provide some or all of the following: • • • • • •
the required degree of monochromaticity (Section 7.2.1); the proper e´ tendue and coherent phase space (Section 7.2.2); apertures and shutters (Section 7.2.3); radiation shielding (Section 7.2.4); management of thermal loads (Section 7.2.5); vacuum or gas environment (Section 7.2.6).
Sometimes several of these functions are provided by a combined system, such as a spherical grating monochromator, which both images the source onto an exit slit in one direction and monochromatizes it at the same time. Given that there are entire books on beamline design [Peatman 1997], we will only give a short overview of the topic.
7.2.1
Monochromators and bandwidth considerations Xray microscopes using grazing incidence reﬂective optics (Section 5.2), like Kirkpatrick– Baez or Wolter mirrors without multilayer coatings, can use a broad spectral bandwidth (broadband radiation). Refractive optics require a higher degree of monochromaticity (Eq. 5.9), as do diﬀractive optics such as Fresnel zone plates (Eq. 5.33) and reﬂective optics with multilayer coatings (Section 4.2.4). Simple absorption contrast in contact (Section 6.1) or point projection microscopy (Section 6.2) can also make use of broadband radiation. Electron optical xray microscopes (Section 6.5) often require narrowband illumination so as not to “blur out” the energy spectrum of emitted electrons, which would be problematic for the types of electron lenses that are highly chromatic. Propagationbased phase contrast methods (Section 4.7.2) require that E/ΔE be larger than the number of Fresnel fringes used to obtain a phase contrast image, while holography and coherent diﬀraction imaging have much more demanding requirements on E/ΔE, as discussed in Chapter 10 (such as the requirement of Eq. 10.54 for ptychography). It is also preferable to be able to tune the photon energy so as to maximize contrast, or to be able to excite a desired xray ﬂuorescence emission line (Section 9.2) or nearedge absorption resonance (Section 9.1.2).
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.2 Xray beamlines
281
One can always narrow the spectral bandpass ΔE of a source by using a monochromator as a spectral ﬁlter (Section 7.2.1), though this comes at a cost in eﬃciency within the selected bandpass. Xray monochromators come in three basic types: crystals, multilayers, and gratings. Crystals tend to be used above about 2 keV, and gratings below, with multilayers spanning both ranges and oﬀering greater spectral bandpass ΔE (or, equivalently, lower spectral resolution). Crystal monochromators work on principles described in Section 4.2.3, though they are usually used in pairs: one crystal might deﬂect the beam at an upwards angle, and a second crystal then deﬂects it by the same angle so as to provide a beam that is oﬀset in height but at the same angle as the xray beam emerging from the source. (Upward deﬂection is used at most of today’s synchrotron light sources due to the smaller vertical source size, though at the newest, lowestemittance light sources horizontal deﬂection is sometimes chosen based on mechanical stability considerations). In a doublecrystal monochromator (DCM) [Schwarzschild 1928, Smith 1934], the ﬁrst crystal absorbs signiﬁcant energy from the incident beam, so cooling must often be provided. For small energy tuning ranges, one can cut a rectangular channel in a monolithic silicon crystal and thus obtain a DCM with very high stability, but for larger tuning ranges one needs a mechanism to tilt two separate crystals to the desired Bragg angles while also translating the second crystal to keep it centered in the ﬁrst crystal’s diﬀracted beam. While one can use Si 111 crystals down to about 2 keV, and exotic materials such as YB66 down to about 1.2 keV [Wong 1999], at even lower photon energies or longer wavelengths the lattice spacing of available crystals no longer allows for Bragg diﬀraction. The relatively low absorption of multikeV or “hard” X rays in silicon leads to diﬀraction eﬃciencies of >90 percent in many cases (Fig. 4.12), and the high quality of readily available silicon crystals leads to good coherence preservation from polished crystals. For microscopy, they can provide a restrictively small spectral bandpass (for example, 1.4 eV at 10 keV for Si 111 , as shown in Fig. 4.12), giving a monochromaticity of E/(ΔE) 7000. Since many xray focusing optics might need a monochromaticity of only about 1000 or less (see Eqs. 5.9 and 5.33), this overly restrictive spectral bandwidth reduces the ﬂux that could otherwise be used. Because of the narrowness of the Darwin width, it can be very helpful to use a mirror optic to collimate the xray beam so that it is more nearly parallel when it reaches the crystal monochromator. To work at longer wavelengths, one must increase the d spacing of the crystal lattice (Fig. 4.9) and the way that can be achieved is to use synthetic multilayers as described in Section 4.2.4. As with DCMs, one usually uses these multilayers in pairs in a double multilayer monochromator (DMM) so that the beam angle is unchanged. Double multilayer monochromators can produce illumination with a spectral bandwidth in the neighborhood of 1 percent (with considerable ﬂexibility around that number according to the design choices made for the multilayer coating), thus providing more ﬂux for those imaging systems that can tolerate the larger bandwidth. When using a DMM to increase the accepted spectral bandwidth and thus the accepted ﬂux from a nonmonochromatic xray source, one must keep in mind the dispersive limits to spatial resolution in compound refractive lenses, as considered in Eq. 5.9, or in Fresnel zone plates, as considered in Eq. 5.33. An especially clever arrangement is to combine the Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
282
Xray microscope instrumentation
function of a single multilayer monochromator and condenser lens into one optic for laboratory transmission xray microscopes [Hertz 1999, Berglund 2000]. At photon energies below about 2 keV, grating monochromators become the common choice. These are mirrors operated at grazing incidence because of the critical angle θc given by Eq. 3.115. The mirror surface may be ﬂat in a plane grating monochromator, or curved in a spherical grating monochromator, in which case one can combine dispersive and focusing properties (as described in Section 5.2.1) in one optic. Because the grating is used at grazing incidence angles, one can achieve nanometer eﬀective grating periods as seen from the beam direction by using micrometerscale structures on the grating surface. This means that methods including laser interferometry and mechanical ruling can be used to produce the grating structure. Because the grating is operated at grazing incidence, it also tends to be somewhat long (several centimeters), so that one might need to slightly adjust the grating period between the upstream and downstream ends; such variable line space gratings are usually produced by mechanical ruling. The grating grooves can be blazed (arranged at a shallow angle relative to the substrate) so that one approaches specular reﬂectivity in the dispersion direction, thus improving efﬁciency, which can exceed 20 percent for soft X rays (note that the grating entrance and exit angles as shown in Fig. 4.8 are normally chosen to not be equal to the substrate grazing incidence reﬂection angles so that one separates the zeroth or undiﬀracted order from the desired diﬀraction order). Of particular note is the Rowland circle condition for spherical or toroidal grating monochromators, where one matches the dispersion and focusing properties of the optic as worked out by H. A. Rowland in 1882, and described in more recent publications [Namioka 1959, Samson 1967, Peatman 1997]. Spherical grating monochromators (SGMs) in particular have been used for many scanning xray microscopes [Winn 2000, Warwick 2002] where spectral monochromaticities of λ/(Δλ) 3000 or more are obtained so as to match the intrinsic ∼1 eV spectral linewidth of carbon XANES resonances (Section 9.1.3) near 300 eV photon energy. With SGMs, one must pay attention to the presence of secondorder diﬀraction which can cause some 600 eV light to be present when rotating the grating to obtain 300 eV light at the exit slit, so that one uses either ordersorting mirrors (Fig. 3.27) or transmission through a material with an absorption edge just below the energy of the secondorder light, such as a gas ﬁlter [Winesett 1998, Warwick 2002]. Of special note is the use of largediameter Fresnel zone plates as combined linear monochromators and condenser lenses, as discussed in Section 6.3.1. These were central to many of the transmission xray microscopes (TXMs) developed by the G¨ottingen group at the BESSY storage ring. However, as noted in Section 6.3.2, more recent TXMs have used grating monochromators and capillary condenser lenses as these provide better power handling, higher monochromaticity, and increased working distance near the specimen [Schneider 2012, Sorrentino 2015].
7.2.2
Coherence and phase space matching Various xray microscopes make diﬀerent demands on xray source size and divergence characteristics. As discussed in Section 4.4.6, the size and angular distribution of an
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.2 Xray beamlines
283
40
xўDQJOHѥUDG
20
6RXUFHZLGWK wx ѥP GLYHUJHQFH VHPLDQJOH Ƨx ѥUDG
z P
z P
z P
ѥP ZLGHVOLW 0 z P
z P
z P 20 %HDPZLGWK YHUVXVz 40 80
60
40
20
0
20
40
60
80
xSRVLWLRQѥP Figure 7.10 Evolution of the phase space area occupied by a light beam focused on a slit, and at various distances from the slit. Shown here is a beam with a convergence semiangle θ = 30 μrad focused onto a slit with a width of w = 20 μm. When the beam propagates to positions of z = 1 meter and z = 2 meters downstream of the slit, the phase space area becomes “tilted” in the clockwise direction due to geometric propagation of the beam. If a focusing optic were to be placed at z = 2 meters, it would tilt the phase space area in a counterclockwise direction before propagation would again tilt it clockwise towards the vertical at the focus. Depending on the focal length of that second optic, it might lead to a phase space distribution with a diﬀerent width and angular divergence, but the product of the two would remain the same due to Liouville’s theorem (Eq. 4.189).
xray source is sometimes referred to as its e´ tendue. The product of size times angle at a source or focus position divided by the wavelength gives the number of source modes Msource . While one can use optics to image light from a small source with large divergence onto a large focal spot with small divergence (and vice versa), the size–angle product is unchanged due to Liouville’s theorem (Eq. 4.189). The exception is that one can reduce the number of source modes through the use of a spatial ﬁlter, where an aperture placed at a focus is used to throw away light in exchange for decreasing the number of source modes Msource . Point projection microscopy requires a reasonably small source to minimize penumbral blurring (Section 6.2) or to produce a speciﬁed degree of spatial coherence, while a large divergence leads to a large ﬁeld of view. In propagationbased phase contrast imaging methods (Section 4.7.2), the ratio of the imaging ﬁeld width to the width of the total number of Fresnel fringes recorded gives an estimate of the number of source modes Msource that can be accepted in each of the x and y image directions. In fullﬁeld imaging (Section 6.3), one can accept as many source modes Msource as there are pixels in the detector in each direction, so that bending magnet sources at synchrotrons serve nicely as sources for fullﬁeld TXM systems [Feser 2012]. However, exposure times Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
284
Xray microscope instrumentation
depend on the number of photons in a single mode Msource , since one mode maps onto one pixel; therefore brightness still comes into play, and especially at synchrotron radiation sources one often has fewer source modes Msource than image pixels N so that one has to wobble or scan the condenser optic during exposure (Sections 6.3.1 and 6.3.2). In scanning microscopy (Section 6.4) one obtains the smallest focused beam size only when Msource 1 (see Section 4.4.6 and Eq. 4.198), and the coherent imaging methods of holography, CDI, and ptychography discussed in Chapter 10 also require Msource 1 (see for example Fig. 10.12). The role of beamline optics is to transfer light from the source into the xray microscope with the proper number of source modes Msource , and with the appropriate ratio of size versus divergence. One very useful tool for thinking about how to do this with beamline optics is to consider how a light beam can become rearranged in phase space [Hastings 1977, Pianetta 1978, Smilgies 2008, Ferrero 2008, Huang 2010a]. Consider a source with uniform illumination across a speciﬁed width and angular divergence as shown in the xˆ direction in Fig. 7.10 (one must track changes in phase space separately in the xˆ and yˆ directions). As the beam propagates downstream by a given distance, the light rays with large divergence expand out to large positions, so the phase space box becomes tilted in the clockwise direction as indicated. If one were to then put a focusing lens in place, large positive angles would become large negative angles so the phase space box would become tilted in the counterclockwise direction. If the lens were to produce an image of the beam which is smaller in width and therefore (necessarily) with larger divergence, the phase space box would become narrower in width and taller, while an enlarged image of the source would lead to a phase space box that was wider but not as tall. It is also worthwhile considering the role of a slit. If the slit is narrower than the beam at a focus position, the divergence of the beam is unaﬀected (ignoring diﬀraction eﬀects) so the phase space box becomes narrower in width but has an unchanged height. This gives rise to a smaller net phase space area and a reduced number of source modes Msource . If the slit were to be at a position diﬀerent than a focus position (such as at z = 2 m in the example of Fig. 7.10), even a slit that is wider than the original source (such as a 40 μm wide slit as shown in Fig. 7.10) would intercept part of the phase space box in a way that a 1:1 refocused beam would have an altered angular distribution. Therefore the phase space model becomes a good way to understand the eﬀects of optics and apertures at various points in a beamline, with calculations handled easily using matrix methods in classical optics. If one extends it to incorporate partial coherence and diﬀraction eﬀects, one essentially has the Wigner distribution approach that has been used with synchrotron radiation sources [Kim 1986]. One can manage both source monochromaticity and e´ tendue at the same time with beamline optics. In a spherical grating monochromator (SGM) operated in the Rowland circle condition, polychromatic light focused on the entrance slit results in monochromatic light being focused on the exit slit of the monochromator. That exit slit can then serve as both a spectral ﬁlter (by controlling how large a slice of the spectrally dispersed beam is accepted) and a spatial ﬁlter (by adjusting the size of a slit at the focus) in one direction, while in the orthogonal direction a separate optic might be used to image the source onto the exit slit, where again one can do spatial ﬁltering. This arrangement is Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.2 Xray beamlines
285
(misaligned)
(refrac te transm d itted rays)
Intended beam path
Knifeedge Rectangular slits slits
Cylindrical slits
Figure 7.11 Slit types that can be used for aperturing xray beam. The intended transmitted beam path is shown in red. For visible light and electrons, knifeedge slits are often preferred because they are relatively easy to prepare and are tolerant of angular alignment errors. However, rays just oﬀ the intended path are only partly absorbed due to the penetration of X rays into thin material (especially for higherenergy X rays) so the slit appears to be softedged and wider than intended. In addition, these rays can be refracted by the xray beam (see Fig. 3.19). With rectangular slits, one solves the problem of xray penetration if the slit face is perfectly aligned, but any misalignment of the edge (or, with a perfect rectangle, angular misalignment of the rectangular block) will again produce a somewhat softedged slit transmission function with refraction. For this reason, hard xray slits are sometimes made from cylinders so that one has a sharp cutoﬀ of transmission and no requirement for rotational alignment along the axis of the cylinder.
well suited to soft xray scanning microscopes [Winn 2000, Warwick 2002]. One can open slits for higher ﬂux with lower spatial and temporal coherence (noting that the focus spot will increase in size, as was shown in Fig. 4.42), or close the slits for higher spatial resolution and better spectroscopy at the price of a decrease in ﬂux.
7.2.3
Slits and shutters Slits are conceptually simple items in a beamline, but their implementation requires care. Consider three diﬀerent slit types as shown in Fig. 7.11; especially for hard X rays, obtaining a clear slit cutoﬀ for a welldeﬁned xray beam width is not so straightforward. For coherent diﬀraction imaging methods, one must also consider scattering from any roughness on the edges of the slit (sometimes referred to as “hot lips”) as seen by the xray beam, which sometimes leads to complex scatter shield arrangements like that shown in Fig. 10.17. Furthermore, slits are usually placed at intermediate focus positions of the xray beam and sometimes they are used upstream of monochromators so they must be able to absorb the entire spectral output of the source; this provides challenges with slit cooling so as to maintain a stable slit position and width. Shutters serve to interrupt the beam entirely. Safety shutters in the front end of the
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
286
Xray microscope instrumentation
beam line serve to protect personnel from the xray beam, from the electron beam in case the orbit is lost at synchrotron light sources, and (again at synchrotron light sources) from gamma rays that can be created by collision of storage ring beam electrons with residual gas molecules. These safety shutters are interlocked so that one cannot access the xray beam path unless the safety shutter is closed. Safety shutters must be able to absorb the entire beam power, so they are often massive watercooled blocks with slow opening and closing times. For this reason, smaller and lighter shutters are often added as part of the microscope instrumentation (see for example [Chapman 1999]) to protect the sample from unnecessary exposure, to deﬁne the exposure time, or to synchronize the exposure with some other stimuli. The beam incident on the microscope tends to be small in dimension, and low in power, so that small movement of a light object can suﬃce to intercept it. Fast shutters may be actuated by an electromagnet, or even a piezoelectric actuator, with millisecond or even faster response time.
7.2.4
Radiation shielding From the calculations summarized in Section 4.9.1, it is clear that specimens in xray microscopes will be exposed to a signiﬁcant radiation dose. However, as noted in Section 11.2.3, it’s nice to minimize the radiation dose received by the experimenters! For soft X rays (1 keV), the thickness of vacuum pipes and windows is suﬃcient to completely absorb any stray xray beam, and even air is a pretty good absorber (Fig. 7.12). As one gets up to harder X rays at 10 keV and above, vacuum pipes still provide suﬃcient shielding but lead (Pb) is used to shield areas where higherenergy X rays might strike and produce xray scattering. Since one might use electron acceleration voltages of many tens of keV in laboratory sources, or have an electron beam energy of many GeV in storage rings, radiation shielding must account for the possibility that the electron beam can go in unanticipated directions and create higherenergy X and gamma rays. For that reason, hard xray experiments at synchrotron light sources are often carried out inside steel or leadshielded rooms from which the experimenter is excluded; these “hutches” (a word someone from farm country in the USA will associate with enclosures for pet rabbits) have a security system involving keys and area search buttons to ensure that nobody is inside and the door is closed before a safety shutter can be opened.
7.2.5
Thermal management The power of the X ray beam emitted from synchrotron radiation sources is typically between hundreds of watts to many kilowatts. Undulators in particular concentrate much of this power into a narrow cone, so the power density can be higher than in an arc welder. These beams can melt through thick stainless steel valves and other equipment not designed to handle them, and unfortunately this fact has been demonstrated more than once. The ﬁrst surfaces to see the beam are typically watercooled and oriented at grazing incidence to the beam to spread out the heat load. One can use the lowpass energy “ﬁltering” property of grazing incidence mirrors (Fig. 3.27) to remove all
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
1/e attenuation length (meters)
7.2 Xray beamlines
105 104 103 102 101 100 101 102 103 104 105 106 107 108
0.1
lium
He
287
1m
Air 1 mm
304
0.3
1
s
l
tee
ss s
le tain
ѥP
Pb
3
10
30
Photon energy (keV) Figure 7.12 Xray absorption length μ−1 (Eq. 3.75) in air and helium, plus 304 stainless steel
(used for ultra highvacuum chambers and ﬁttings) and lead (Pb). Soft X rays are eﬀectively stopped by a few millimeters of air, though more caution must be used if helium gas is in the beam path. The walls of vacuum chambers are suﬃcient to stop xray beams in most cases, though lead shielding is used when extra protection is required and in particular for locations where there is the potential for gamma ray scattering.
the xray power above a certain photon energy, thus reducing power loading on all downstream components. Silicon, diamond, and the copperbased compound glidcop are preferred materials for dealing with high power densities because of their good thermal properties. In beamlines with double crystal monochromators, the ﬁrst crystal is cooled either by water or by liquid nitrogen. Remarkably, the thermal expansion coeﬃcient of silicon at liquid nitrogen temperature is vanishingly small, making such cryogenic monochromators popular at undulator beamlines in spite of the cost and complexity. Downstream of a monochromator the power in the beam is reduced, typically by three orders of magnitude, so dealing with the heat load is much easier. Still, with precision optical elements, avoiding damage is not enough: thermal distortions of an optic’s ﬁgure must be carefully controlled. Beamline optical designs typically involve detailed ﬁnite element calculations to predict local thermal response of all optical components, and to design the geometry, the materials used and the cooling arrangement to keep such distortions within allowable limits.
7.2.6
Vacuum issues, and contamination and cleaning of surfaces Since most xray sources involve electron beams accelerated in a vacuum environment, ultra highvacuum (UHV) conditions (typically about 10−9 Torr or lower, or 10−7 Pascal or lower) are highly desirable so as to minimize electron scattering and beam degrada
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
288
Xray microscope instrumentation
tion. At most synchrotron light sources, UHV conditions are maintained in the storage ring vacuum chamber, and in xray beamlines up until one reaches a window of some sort. With soft xray beamlines this might be a 100 nm thick silicon nitride window (Section 7.5.1) separating the UHV beamline from the optics and specimen at atmospheric pressure, while for hard xray beamlines a 0.3–0.5 mm thick beryllium window is often used to separate upstream UHV regions from a downstream region which might be at vacuum (allowing for a thinner window) or which might be ﬁlled with helium gas. With coherent xray beams, these windows must be polished so that thickness variations do not impose phase structure on the beam, and considerable care is taken in their handling (solid beryllium is quite safe to handle, but ﬁne beryllium dust is extremely hazardous and leads to the severely debilitating lung condition called berylliosis1 ). Polyimide or Kapton windows are also sometimes used to separate vacuum from atmospheric pressure environments at hard xray beamlines. Another advantage of maintaining a beamline at UHV conditions is that it minimizes the condensation of contaminants on surfaces (see Eq. 11.13). This helps keep grazing incidence optics clean, which is especially important for soft xray beamlines. If one instead has residual hydrocarbons in the vacuum chamber from ﬁngerprints, organic solvents, and so on, the xray beam can ionize the molecules and the damaged fragments can then stick to the surface where the beam has hit (a process called radiationinduced cracking). This can be bad news at a beamline, because one might then peer in a vacuum window and see that an expensive grazing incidence mirror or grating now has an ugly brown streak on it, producing a severe loss of reﬂectivity at the carbon edge and diﬃcult incident ﬂux normalization problems for carbon XANES spectromicroscopy. The achievement of UHV conditions is a religion with its own bible [O’Hanlon 2003]. It involves careful choices of materials and pump types, and proper cleaning and handling of components. One important rite in this religion is “baking out” a vacuum system, where it is heated to elevated temperatures (e.g., 100◦ C) to accelerate water and hydrocarbon desorption from surfaces while being pumped. If the “baked” system pressure gets down into the 10−8 Torr range, chances are good that the base pressure will drop another order of magnitude or more after the system is cooled down to room temperature. Resistive electrical heating tapes are sometimes wrapped around the vacuum chamber, and aluminum foil is often used to wrap the system and keep the heat in relative to the room temperature environment. This is why one sometimes sees some beamline components covered with aluminum foil, which some in the USA might otherwise associate with baking a turkey at the Thanksgiving holiday.
7.3
Nanopositioning systems Xray microscopes require nanopositioning to move the specimen into the focal region, and mechanical scanning of either the optic or (more typically) the specimen in scanning 1
The author’s research lab head during college summers at Los Alamos was Herb Anderson, who had contracted berylliosis during a laboratory ﬁre during the Manhattan Project days, so that by the early 1980s he toted around a portable oxygen system.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.3 Nanopositioning systems
289
microscopy. Once moved, the specimen needs to remain in a stable position relative to the optic with a tolerance of a small fraction of the desired spatial resolution. Like UHV technology, nanopositioning technology represents another religion with its own prophetic writings (see for example [Fleming 2014, Ru 2016]), so I will only convey here some brief points from my own belief system. For coarse positioning over long distances (including centimeters or more), micropositioning stages with ballbearings slides and gearreduction electrical motors are usually used. The motors are either stepping motors, where gearlike permanent magnets provide a set of stable positions at ﬁxed rotational intervals that electrically driven coils must move past, or direct current (DC) motors, where the speed of rotation is proportional to the current supplied. With ﬁner gear reduction systems, slower velocities can be traded oﬀ for ﬁner motion precision down to about 50–100 nm, depending on the model chosen. Encoders provide a check of the actual motion achieved (sometimes by optical measurements of motor shaft rotation, or actual linear position of the stage) and this feedback is used by a motor controller to reach the desired position. Encoders are required when using DC motors; with stepping motor stages, they can provide reassurance against missed motor steps. Because micropositioning stages often have a screw pushing against a surface with a retention spring holding things in place, most motor controllers will employ a backlash correction strategy in which the last stage motion always involves pushing with that screw (moves in the other direction are accomplished by going past the desired endpoint, and then driving back). These stages can carry signiﬁcant mass (kilograms or more, depending on the model chosen), and special vertical translation stages are sometimes made by a horizontal motion of driving one wedge into another, with ballbearing guides between the wedges. The electric motors can generate heat (as can the infrared light sources in encoders) [Nazaretski 2013], so there can be some thermal drift of the stage after a motion has been carried out. Finer motion is achieved by using piezoelectric crystals which expand or contract in response to an applied voltage. These can be used to achieve subnanometer displacements, or if a mechanical leveraging arrangement is used one can expand the range to tens of micrometers at a loss in stiﬀness and compliance. Over these smaller ranges, the motion is usually constrained to be in the desired direction through the use of a ﬂexure system. Because piezos are notorious for hysteresis, some sort of nanometerprecision encoder is used, which is often a capacitance micrometer, though laser interferometers and mechanical strain gauges can also be used. One can think of the pushing strength of these piezos in terms of capacitance: stiﬀer ﬂexures and longer ranges require larger drive piezos, which then require voltage ampliﬁers with larger electrical current limits in order to quickly supply the pervolume charge imbalance needed to cause the piezo to expand with a short response time. In order to do longrange coarse positioning with nanoscale ﬁne movement, one often uses a stack of x–y–z motorized micropositioning stages with a x–y piezo stage on top (because the depth of ﬁeld of most xray microscopes is several micrometers, stepper motor stages usually oﬀer ﬁne enough motion for focusing). Unfortunately this can lead to rather large stage assemblies, with long mechanical paths between the optic and the specimen. Let’s consider the example of a Ushaped stage assembly that has two stacks Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
290
Xray microscope instrumentation
that are each 10 cm high, separated by 10 cm. If the U were to be made out of stainless steel, a 1◦ C change in temperature would lead to an expansion of each of the three arms by a distance of Δx = αx ΔT = (18 × 10−6 /◦ C) · (10−1 m) · (1◦ C) = 1.8 × 10−6 m or 1.8 μm, assuming a representative value for the coeﬃcient of thermal expansion for stainless steel (one can also use materials with signiﬁcantly lower thermal expansion coeﬃcients, such as invar metal or zerodur ceramic). For the two vertical arms of the U, a 1◦ C change of temperature would make both the optic and the specimen move upwards by 1.8 μm together, while the bottom arm of the U would have the two stacks move apart by 1.8 μm. This might be OK in that the depth of ﬁeld (Eq. 4.215) of most xray microscopes is much larger than 1.8 μm (except for high resolution soft xray microscopes), and movements of both the optic and the specimen by 1.8 μm relative to an illumination width of typically 50–200 μm are inconsequential. However, the stage stacks are not usually made of uniform material (there might be a mix of materials, plus various interfaces for bearing slides etc.) so there will likely be some diﬀerential in the up–down thermal expansion. If this happens during acquisition of one fullﬁeld image, the image will be blurred, while in a scanning microscope the scan ﬁeld will become distorted. If one takes a sequence of images such as are required for spectromicroscopy or for tomography, one will have to worry about registration of the images to correct for thermal drift. For this reason, many highquality experimental facilities at synchrotron light sources follow the practice of highend electron microscopy labs by taking special care to install heating and cooling systems that can maintain temperature stability to within 0.1◦ C. Thermal drifts can also be compensated for using various image registration steps (as has been demonstrated in spectromicroscopy [Jacobsen 2000] and in tomography [G¨ursoy 2017b]), but an even better approach is to use a laser interferometer to measure any remaining changes in relative position between optic and specimen. Laser interferometers have been used as absolute position encoders to correct for nonlinearities in piezo motion in xray microscopes [Shu 1988] and to correct for the relative position of specimen and optic in 2D scanning microscopy [Kilcoyne 2003] and in 3D ptychographic tomography [Holler 2012]. They have also been touted as solutions to correct for highfrequency motion such as vibration [Kilcoyne 2003], though it is diﬃcult to correct for vibrations at frequencies above about 100 Hz when one considers the bandwidth of digital servo control systems and the combination of weight, piezo size, and piezo drive current limit in many nanopositioning stages. As a result, for vibration it can be argued that there is no substitute for solid mechanical design, so that one can have a system that is stable in the typical vibration environment of modern synchrotron light source facilities (Fig. 7.13) or welldesigned microscopy laboratories. These points are discussed in more detail in connection with a recently developed compact STXM [Takeichi 2016]. When it comes to precision mechanical design, there are yet more holy writs [Jones 1988, Smith 1992], so we restrict ourselves to a few comments: • One can ﬁrst minimize vibration excitations by design of the laboratory. Modern Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.3 Nanopositioning systems
291
506GLVSODFHPHQWPHWHUVʚ+]
106 107
f
+]
2
120 180 +] +]
Vertical
108
109
1010
1011
Horizontal Floor vibrations at the Advanced Photon Source, January 2013 (C. Preissner)
1012 1
10
100
300
Frequency f+] Figure 7.13 Xray microscopes require a high degree of stability between the objective lens (often a zone plate optic) and the specimen, in spite of ambient vibrations. This plot shows the vibration on the experimental ﬂoor at the Advanced Photon Source at Argonne National Laboratory, in units of RMS displacement in meters per square root Hz. Vertical motion is often about ten times larger than horizontal motion. The spikes at 60, 120, and 180 Hz are likely due to vibration from electrical transformers, pumps, and other equipment operating at the USA alternating current electrical line frequency of 60 Hz, along with harmonics. Data courtesy of Curt Preissner, Argonne Lab.
light source and electron microscope facilities make signiﬁcant investments into the construction of the ﬂoor, often using thick slabs that are mechanically isolated from the “regular” building ﬂoor with heating and cooling systems, mechanical vacuum pumps, electrical transformers, and so on. Even the dirt under the slabs must be paid attention to, with various schemes of compaction and deep support pillars being employed. The microscope itself should also be acoustically shielded, especially if there are noisy pumps or air outlets nearby. Commercially available optical tables mounted on air supports oﬀer one way to minimize vibration for microscopes mounted on top, though at synchrotron light sources one must pay attention to the absolute position control of the air supports so that the whole table stays in a stable position relative to the 50–200 μm size that is typical for xray beams. One can also place a large granite or concrete block on top of foam mounts to a ﬂoor, and then build the microscope on this large, weakly couDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
292
Xray microscope instrumentation
pled mass, although the foam mounts can slowly compress over time leading to a need to occasionally adjust the microscope to remain centered on the xray beam. • One must be cognizant of minimizing mechanical strain. If two plates are bolted together, changes in temperature will demand that they both expand or contract, and the resulting strain can lead to mechanical creep. With optics such as grazing incidence mirrors, one must also pay particular attention to the clamping mechanism so that it does not create strain and thus bumps or dips on the surface of a very expensive, exquisitely ﬁgured and polished optic. A good solution is to use kinematic mount design principals [Smith 1992] such as balls in grooves. • A design mantra worth repeating is “small, light, and stiﬀ.” Smaller objects undergo less thermal drift, and objects that are small, light, and stiﬀ have higher mechanical resonance frequencies. This is useful for two reasons: the ﬁrst is that the typical vibration excitation environment shows a rolloﬀ in amplitude at higher frequencies (Fig. 7.13), and the second is that for “white noise” acoustical excitation (noise that has the same power at all frequencies) the amplitude of motion tends to decline with the frequency to the negative third power (because the third derivative of x = x0 sin(ωt) corresponds to impulses, or d3 x/dt3 , and impulses are reﬂective of the input noise). • Have realistic and sensible goals for what needs to be highly stable, and what does not. One needs nanometerscale stability of the optic relative to the specimen (image positions), but only micrometerscale stability of the optic relative to the xray beam. Tabletop atomic force microscopes provide great examples of this: the entire microscope might vibrate a bit as a unit when placed on “noisy” tabletops, but the important thing is that the scanning probe tip and the specimen do not vibrate relative to each other. These points are not important to most scientiﬁc users of xray microscopes, though they certainly can appreciate the outcomes of good instrument design!
7.4
Xray detectors While the comic character Superman may have xray eyes, most of the rest of us do not—and besides, all the xray image analysis methods discussed in earlier chapters require quantitative, digital images rather than observations recorded as sketches in a notebook in the manner of van Leeuwenhoek more than three centuries ago. The topic of xray detectors is broad, ranging from xray detectors for astronomy on rockets and satellites [Fraser 1989] to the very large market of xray detectors for medical imaging [Spahn 2013, Panetta 2016], where in the latter case the energy range of interest is from about 20 keV to 100 keV or more. For xray microscopes in the 0.2–15 keV range, detectors have often been treated as a bit of an afterthought, and far less money has gone into their development than into the development of accelerators for storage ring light source facilities, or the development of xray nanofocusing optics. Fortunately there is enough of a crossover from xray detector requirements for macro
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
293
molecular crystallography and powder diﬀraction (both of which have signiﬁcant commercial markets for both academic and industrial applications) that xray microscopy has beneﬁtted tremendously from advanced detectors in recent years, and we have hope that this trend will accelerate with the concentrated investments now being made in detectors for xray freeelectron laser (XFEL) facilities. Xray detectors work by absorbing xray photons, and generating either electrons and holes in semiconductors, electrons from photocathodes, visible light in scintillators, or heat in superconducting calorimeters. We will discuss these in turn, but ﬁrst let us outline the general characteristics that are valuable in xray microscopes: • Detective quantum eﬃciency (DQE): DQE will be formally deﬁned in Eq. 7.34, but for a photoncounting detector with no dark noise it is in essence the fraction of incident photons that are recorded. • Dead time: the time tdead after arrival of one photon before the detector is ready to record a subsequent photon. This can be due to a shaping time in a pulse detection circuit, or the time required to restore an equilibrium charge distribution in a detector, or the time required to transfer a signal onwards during which photon arrivals cannot be recorded. • Dark noise: the signal that is (incorrectly) reported even though no xray photons are incident. This is essentially zero in most photoncounting detectors, while in chargeintegrating detectors it can reﬂect both thermal excitation of electron–hole separations or readout noise in the analog charge measurement electronics. • Dynamic range: the ratio of the maximum to minimum signal value that can be successfully recorded. At the low end, this is aﬀected by dark noise, while it can be limited at the high end by dead time (limiting the maximum ﬂux rate) or by saturation (limiting the maximum signal that can be measured successfully, such as full well capacity for perpixel charge collection in CCD cameras). • Solid angle: for the detection of radiation that goes into a large angular distribution, such as xray ﬂuorescence, an important parameter is the solid angle of detection Ω which is given in Eq. 9.24 as Ω = 2π(1 − cos θ) πθ2 , where θ is the semiangle from the center to the outer radius of a circular detector. • Energy resolution: for xray ﬂuorescence microscopy (Section 9.2) using energydispersive detectors, as will be discussed in Section 7.4.12, one requires that the detector be able to record the energy of detected photons with some speciﬁed energy resolution ΔE. This is discussed further in connection with Eq. 7.30. • Number of pixels: some detectors count all photons that arrive within their active area (this is the case for many energydispersive detectors for xray ﬂuorescence microscopy, as well as basic detectors in scanning transmission xray microscopes). Other applications require area detectors (Section 7.4.4) with spatial resolution, which is most often in the form of a regular array of pixels N x × Ny . • Pixel size: for pixelated area detectors (Section 7.4.4), the size of a pixel Δdet is important for experimental design (see for example Fig. 10.18). Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
294
Xray microscope instrumentation
• Spatial resolution: in pixelated area detectors, there can be some distribution of signal from one photon into a set of neighboring pixels. In an ideal case, the spatial resolution is much less than the pixel size so that there is no spreading of the signal into neighboring pixels. • Frame rate: for pixelated area detectors, this is the rate at which one can read out entire images. This can either be a burst rate for a limited number of images to be stored in an internal buﬀer and later transferred, or a sustained frame rate for continuous image transfer. High frame rate is especially important for imaging methods like ptychography (see Section 10.4.1). These form the essential characteristics of xray detectors.
7.4.1
Detector statistics In Section 4.8, we discussed the statistics of imaging in terms of an intensity measured with a feature present I f , the intensity measured with a feature absent or the background intensity Ib , and the mean number of incident photons n¯ in the Gaussian approximation to Poisson statistics (Fig. 4.69). That also led to a discussion of false positives and false negatives, and minimum detection limits (Section 4.8.2). We need one other result from statistics to understand detector performance: the statistics of a chain process, where a primary event a causes a second event b before detection, so that one must account for the variance in both processes a and b. This is known as a Markov chain, with an expected signal S¯ of (7.25) S¯ = S¯ a S¯ b and a variance [Breitenberger 1955, Gillespie 1991] of σ2s = S¯ b2 σ2a + S¯ a σ2b .
(7.26)
We outline here some further statistical considerations that are speciﬁc to detectors. The ﬁrst of these involves energy resolution. If one photon leads to the generation of a mean value q¯ of quanta in a detector, then the fullwidth half maximum (FWHM) of the distribution of the number of quanta detected (essentially the resolution of the √ detector) is given by FWHM = 2.35σ = 2.35/ q¯ as shown in Fig. 4.4. However, this is based on the assumption that the detected quanta q¯ are produced directly by a photon with energy E in a process that takes an energy W to create detectable quanta, or E . (7.27) W If there are some secondary processes that follow (such as electrons produced by xray absorption undergoing a number of inelastic scattering events in the detector material), one might arrive at a distribution that should have been based on the statistics of the initial process, but is instead described by the ﬁnal processes that lead to the detected quanta q. ¯ In order to account for these eﬀects, Ugo Fano introduced [Fano 1947] the use of a parameter F (now called the Fano factor F) which can in principle be calculated from detailed knowledge of the underlying physics but which also allows one to sweep q¯ =
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
295
all of that under the rug of one empirical correction factor! The Fano factor F then becomes [Knoll 2010] F=
observed variance in q¯ Poissonpredicted variance in q¯
(7.28)
so that variance σ2q in the number of quanta is then given by σ2q = F
E . W
(7.29)
With this factor taken into account, the FWHM energy resolution of an xray detector is given by σq F ΔEFWHM = 2.35 = 2.35 (7.30) E q¯ E/W before accounting for any additional degradation in energy resolution due to electronics noise in measuring the charge per photon. Consider the case of a 10 keV photon in silicon. Jumping ahead to employ a result for silicon at room temperature that will be discussed in Section 7.4.5, xray absorption produces one electron–hole pair per WSi = 3.65 eV
(7.31)
FSi = 0.118
(7.32)
deposited, with a Fano factor
using recent results [Lowe 2007, Mazziotta 2008]. Therefore, one 10 keV photon produces on average q¯ = (10 × 103 /3.65) = 2740 electron–hole pairs, giving F 0.118 3 = (10 × 10 eV) · 2.35 = 154 eV (7.33) ΔEFWHM = E · 2.35 E/W 2740 as the FWHM energy resolution of a silicon detector (with the 1σ value being lower by a factor of 2.35) if there are no other noise sources beyond Fanomodiﬁed Poisson statistics. The next consideration involves the detective quantum eﬃciency (DQE). After several earlier explorations of metrics for detector performance [Rose 1946, Jones 1952, Shaw 1963], detector DQE became accepted as the fundamental approach [Jones 1959, Dainty 1974]. For a number n¯ a of incident quanta on the detector, the number of these events that are actually recorded by the detector is called the noiseequivalent quanta (NEQ), leading to a DQE of DQE =
2 NEQ SNRoutput = n¯ a SNR2input
(7.34)
where the second form written in terms of the signal to noise ratio (SNR) is true only for the case where the noise follows a Poisson distribution (Section 4.8.1). The NEQ is reﬂective of the initial xray photon absorption event without consideration for processes that have intrinsic gain in the number of quanta (examples where there is intrinsic gain Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
296
Xray microscope instrumentation
in the number of quanta include photomultipliers, where a cascading process is used to generate many electrons from one photoabsorption event). Let us examine ﬁrst a detector that detects a fraction a of a mean value n¯ a of the quanta incident on it. In this case the mean signal is given by S¯ a = a¯na , and the variance is given by σ2a = a¯na in the Gaussian approximation to Poisson statistics. The signaltonoise ratio SNR on the input signal is √ n¯ a SNRinput = √ = n¯ a , n¯ a
(7.35)
√ a¯na = a¯na , SNRoutput = √ a¯na
(7.36)
while SNRoutput is given by
so the DQE is given by DQEa =
SNR2output SNR2input
√ ( a¯na )2 a¯na = √ 2 = = a. n¯ a ( n¯ a )
(7.37)
The DQE is just given by the fraction a of photons that are detected, as expected. What about the case where there is a cascading gain? Let n¯ a be the number of incident photons as before, and a be the fraction that are absorbed. If an absorption event then leads to the production of n¯ b secondary quanta, and the probability of detecting these secondary quanta is b, then one has a Markov chain as described in Eqs. 7.25 and 7.26. In this case the mean detected signal S¯ is given by S¯ = S¯ a S¯ b = (a¯na ) (b¯nb ).
(7.38)
If we assume that the individual variances are determined by the Gaussian approximation to Poisson statistics, we have σ2a = a¯na and σ2b = b¯nb and a net variance of σ2S = b2 n¯ 2b a¯na + a¯na b¯nb . The SNR is then SNRout =
S¯ = σs
√ ab¯na n¯ b = √ . 1 + b¯nb ab2 n¯ a n¯ 2b + ab¯na n¯ b ana b¯nb
(7.39)
(7.40)
The DQE then becomes DQEab
√ √ ( ab¯na n¯ b / 1 + b¯nb )2 ab¯nb 1 . = = =a √ 2 1 + b¯nb 1 + 1/(b¯nb ) ( n¯ a )
(7.41)
That is, the DQE is reduced by a factor of 1/[1 + 1/(b¯nb )] relative to the case of a singlestage process (DQEa ). Consider the example of xray photons with an energy of, say, 10 keV incident on a scintillator that produces green light with λ = 500 nm or a photon energy (Eq. 3.7) of 1240/500 = 2.48 eV. If the scintillation process were to be 100 percent eﬃcient, one would obtain a mean number of n¯ b = 4032 visible photons. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
297
Even if the scintillation process were 10 percent eﬃcient, and only 10 percent of the visible photons were detected by a visible light detector, one would have 1 1 = = 0.976, 1 + 1/(b¯nb ) 1 + 1/(0.1 · 0.1 · 4032) so in this example the extra reduction in DQE caused by the secondary statistics is very small. Now let us consider the case [Feser 2002] of a onestage detector with DQEa = a as above, but with “dark noise,” or a signal that is recorded even with no incident xray photons. We assume that the mean dark noise N¯ d is subtracted from the signal via “background subtraction,” but that there are ﬂuctuations in the dark noise that are uncorrelated with the signal and that can be characterized by a variance σ2d . The SNRd of this detector with dark noise is √ ana a¯na SNRd = √ = , (7.42) √ ( a¯na )2 + σ2d 1 + σ2d / a¯na which along with Eq. 7.35 means that the DQE becomes DQEd =
(SNRd )2 a . √ 2 = √ 2 ( n¯ a ) 1 + σd / a¯na
(7.43)
Therefore the DQE is aﬀected primarily at low detected count values when a¯na < σ2d . A more speciﬁc example of a type of dark noise is the equivalent noise charge (ENC) or qENC of charge integrating detectors (which can in fact vary with incident ﬂux rate). The eﬀect of qENC on DQE is given in Eq. 7.56.
7.4.2
Detector statistics: dead time Many detectors have a “dead time” tdead after the arrival of one photon, during which they cannot successfully record the arrival of another photon. For example, in a silicon detector an xray absorption event separates a certain number of electrons and “holes” (see Section 7.4.5 below) from each other. Inelastic scattering of these charges in the detector might lead to diﬀerent travel times of individual charges to the detection circuit, so a pulseshaping circuit is used to integrate over the maximum straggling time and give the same peak height for each xray photon of the same energy. However, if another xray photon arrives during this time, the pulseshaping circuit will interpret the combination as being as a single photon (within the same pulseshaping time) but with higher energy (more charge received during that time). As a result, the second xray photon will be “missed” while the ﬁrst photon also has its energy overestimated (which is not necessarily a problem if a single pulse energy threshold is used). This is the case of a “nonparalyzable” detector [Knoll 2010]. To calculate the eﬀects of this loss, let us consider photons arriving at an actual average rate f0 while the detector records a lower, incorrect rate of f . For each single detected photon there is a dead time of tdead , so the fraction of time that the detector is in a “dead” state is given by f tdead so that the
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
298
Xray microscope instrumentation
5 ƫV
Measured count rate (MHz)
4
t dead ƫV
e
ns
o sp
3
t dead
e rr
a
ne
Li
ƫV
t dead
2 tdead ƫV 1
le
Nonparalyzab
tdead ƫV
Paralyzab
le
0 0
1
2
3
4
5
Actual count rate (MHz) Figure 7.14 Measured detector count rate f as a function of actual (incident) count rate f0 for various values of detector dead time tdead . This is shown both for a nonparalyzable model (Eq. 7.46) as solid lines, and for a paralyzable model (Eq. 7.48) as dashed lines.
number of photons “missed” is given by f0 f tdead . In other words, we have [Knoll 2010] f0 − f = f0 f tdead .
(7.44)
One can use this to ﬁnd either the actual count rate f0 from the measured count rate f as f , (7.45) f0 = 1 − f tdead or one can calculate the expected measurement rate f from the actual count rate f0 as f =
f0 , 1 + f0 tdead
(7.46)
which asymptotically approaches a limit of lim f =
f0 →∞
1 tdead
(7.47)
at very high count rates. In the case where the detector becomes paralyzed for a time that varies according to the signal level, one instead obtains a result [Knoll 2010] of f = f0 exp[− f0 tdead ].
(7.48)
Examples of measured f versus actual f0 count rates from Eq. 7.46, as well as from Eq. 7.48, are shown for several dead times tdead in Fig. 7.14. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
299
In the preceding paragraph, an implicit assumption was that the actual photon rate was continuous over time, whereas in Fig. 7.1 and Table 7.1 we saw that synchrotron light sources (among others) in fact have rather small duty cycles dt . While the general case is well studied [Westcott 1948, Cormack 1962], an approximate treatment [Knoll 2010] is to say that the repetition time tr gives an approximate dead time of tdead,source tr /2. At least for synchrotron sources and most pulsecounting xray detectors, we have tr tdead , so the results of Eqs. 7.44–7.47 are very good approximations of the correct answer though more complete studies are available [Sobott 2013]. We can also think of detector dead time tdead as providing a reduction in the DQE [Feser 2002]. As before, the fraction of time that the detector is in a “dead” state is given by f tdead , so if we consider a counting interval tdwell (such as a pixel dwell time in a scanning microscope, giving f = n¯ /tdwell ), then the mean number of counted photons n¯ relative to the mean number of incident photons n¯ 0 during that interval is given by n¯ tdead ), (7.49) n¯ = n¯ 0 (1 − f tdead ) = n¯ 0 (1 − tdwell from which we can ﬁnd n¯ =
n¯ 0 . 1 + n¯ 0 tdead /tdwell
(7.50)
If, as before, we also account for the detector only recording a fraction a of the photons incident upon it during the nondead time intervals, the output signal is a¯n . We can then solve for the SNR as √ a¯n = a¯n , (7.51) SNRout = √ a¯n leading to a DQE of DQEdeadtime =
(SNRout )2 a¯n a = = . n¯ 0 1 + n¯ 0 tdead /tdwell (SNRin )2
(7.52)
That is, the DQE begins to be reduced if n¯ 0 tdead approaches the pixel dwell time tdwell .
7.4.3
Detector statistics: charge integration Another approach to signal collection in xray detectors is to use a chargeintegration circuit rather than a pulsecounting circuit. As can be seen from the above considerations of detector dead time tdead , this is especially advantageous if one has incident xray ﬂux rates f0 that approach a reasonable fraction of 1/tdead . It is also important if one begins to have an appreciable chance (in a Poisson statistics sense) of collecting more than one photon per “on” time t0 in a pulsed source (Fig. 7.1), which is likely to be the case with scanning microscopes on diﬀractionlimited storage rings [Denes 2014]. The challenge is that detectors and their chargeintegration circuits have their own sort of dark noise which is referred to as the equivalent noise charge (ENC) expressed in number of electrons as qn . We start with the initial photon absorption signal S a = a¯na with variance σ2a = a¯na , and this time include [Chen 2002] a subsequent step of conversion into
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
300
Xray microscope instrumentation
100 Photon counting tdead=200 ns 80
DQE (%)
60
a=0.95 in all cases
Charge integrating tdwell=1.00 ms, qENC=50 e, E=300 eV
40 Charge integrating tdwell=0.01 ms, qENC=50 e, E=300 eV
20
0 100
101
102
103
104
105
106
107
108
109
Incident flux (photons/sec) Figure 7.15 Detective quantum eﬃciency for photoncounting and chargeintegrating detectors
as a function of incident ﬂux. For a pixel array detector, the ﬂux represents the ﬂux incident upon one pixel. The photon counting detector is assumed to be a nonparalyzable detector with tdead = 200 ns, with a DQE calculated using Eq. 7.52. The chargeintegrating detector is assumed to be a silicon detector with qENC = 50 electrons, with a DQE calculated using Eq. 7.56. The values of tdead = 200 ns and qENC = 50 electrons are roughly consistent with the Eiger and Jungfrau detectors, respectively, developed at the Paul Scherrer Institut in Switzerland. Note that with a pixel dwell time tdwell = 1.00 ms (representative of some scanning microscopes today) the detector receives 10 photons per pixel at a ﬂux of 10 kHz, while with a pixel dwell time of tdwell = 0.01 ms the detector receives 10 photons per pixel at a photon ﬂux of 100 kHz. This explains the ﬂux values below which the DQE of chargeintegrating detectors becomes quite low so that qENC dominates. This plot is for a photon energy of E = 300 eV, for which one photon generates q¯ = E/Wsi = 82 electron–hole pairs (Eqs. 7.27 and 7.31), which is only a little bit larger than qENC ; 10 keV photons generate a mean charge of q¯ = 2740 so the chargeintegrating DQE for low photon ﬂux would improve considerably. For photon counting, the detector dead time tdead dictates the ﬂux above which the DQE starts to drop, as was shown in Fig. 7.14. Figure inspired by one shown by Michael Feser [Feser 2002].
detected quanta with q¯ = E/W (Eq. 7.27) and a variance (Eq. 7.29) of σ2q = FE/W. We add to the net variance in the Markov chain process (Eq. 7.26) an extra term of σ2ENC = q2ENC ,
(7.53)
which we assume is uncorrelated with the photon events or their subsequent conversion (ENC is, after all, a ﬂuctuation in a dark signal). The net variance is then σ2S =
E2 E a¯na + a¯na F + q2ENC W W2
(7.54)
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
301
so the SNR is SNRoutput
√ a¯na S a¯na E/W = = = σS E2 E W W 2 q2ENC a¯na + a¯na F + q2ENC + 2 1 + F 2 W W E E a¯na
(7.55)
Since the input SNR is just n¯ a , the DQE is then DQEintegrating =
(SNRoutput )2 = (SNRinput )2
a W W 2 q2 1 + F + 2 ENC E E a¯na
,
(7.56)
which diﬀers from a previous result [Feser 2002] by application of Markov chain statistics. In the DQE expression of Eq. 7.56, the term a in the numerator is the same noise equivalent quanta (NEQ) term as appeared in the basic detector DQE expression of Eq. 7.37. For detection of n¯ a = 50 photons at 500 eV with a detector with a = 0.9, and using values for W and F from Eqs. 7.31 and 7.32, a detector readout circuit with qENC = 100 electrons leads to a term in the denominator of Eq. 7.56 of 1+F
3.65 3.652 1002 W W 2 q2ENC + 2 + = 1 + (0.118) E 500 E a¯na 5002 0.9 · 50 = 1 + 0.000 86 + 0.011 84 = 1.012 70
whereas if the photon energy is increased to 5 keV the denominator becomes 1 + 0.000 09 + 0.000 20. This means that the DQE is almost not reduced at all compared to an ideal detector, where DQE = a as was found in Eq. 7.37, provided qENC is made small enough. Another potential challenge with charge integration involves dynamic range. Counting photons demands that digital counters (with especially simple circuitry) have more bits to count more photons per frame time. In charge integration, one might face limits such as the full well capacity of CCD detectors as noted in Section 7.4.5, or the equivalent in chargeintegrating pixel array detectors. If, however, charge is collected on a capacitor C to yield a voltage V = q/C as it is integrated, one can trigger a reset circuit when Vreset = qreset /C is reached, at which point one quickly drains the capacitor, and then lets it continue to integrate further charge (possibly with further reset events, which can be digitally counted as Nreset ) until the end of the measurement. If the voltage Vanalog = qanalog /C is then measured, one can can calculate the total charge accumulated as qtotal = CVanalog + Nreset · CVreset = qanalog + Nreset · qreset
(7.57)
and if photons at a ﬁxed energy are detected one can then calculate their number. In this way one has the potential to combine single photon sensitivity with high dynamic range. This is the approach used in a series of detectors developed at Cornell University [Ercan 2006, Weiss 2017], while a group at the Paul Scherrer Institut has developed an approach where successive capacitors are “switched in” to increase dynamic range [Bergamaschi 2011]. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
302
Xray microscope instrumentation
We have presented above the essential ingredients for understanding the DQE of a variety of xray detector systems. For detectors exposed to monochromatic illumination, the main choice one faces is whether to operate in pulsecounting mode or in chargeintegrating mode. The answer depends on the ﬂux one must detect, and the perpixel dwell time in the case of a scanning microscope, as shown in Fig. 7.15.
7.4.4
Pixelated area detectors Fullﬁeld microscopy (Section 6.3) and ptychography (Section 10.4) are imaging methods that require pixelated area detectors. Area detectors are characterized by their number of pixels N x and Ny , and the detector pixel size Δdet . In fullﬁeld microscopes, image magniﬁcations of about 1000 are common so it is preferable to have pixel sizes of Δdet 5–30 μm in order to obtain nanometer pixel sizes in the image. Therefore either scintillator detectors (Section 7.4.7) or CCD detectors (Section 7.4.5) tend to be preferred for fullﬁeld microscopy. It is also advantageous to have pixel counts N x × Ny of at least 1024×1024, so as to obtain a large ﬁeld of view, with 2048×2048 being especially popular (N = 2n is well matched to fast Fourier transforms (FFTs) as discussed in Section 4.3.3). In methods such as ptychography (Section 10.4), the detector pixel size Δdet is less crucial because the detector is placed in the Fraunhofer or farﬁeld regime (Section 4.3.2), so the main disadvantage of larger pixel sizes is the need for a longer ﬂight tube leading from the specimen to the detector (this is either evacuated or ﬁlled with helium gas). This makes it easier to use pixel array detectors (Section 7.4.5) with their larger pixel size of Δdet 50–200 μm. At the other extreme of small pixel size, photocathodes with electron optics and electronic readout have been used to make area detectors with submicrometer Δdet [Polack 1980, Polack 1983, Kinoshita 1992, Shinohara 1996]. Fewer researchers have taken up this approach in recent years, perhaps due to concerns about DQE, linearity of response, and ﬁeld distortions. In scanning xray microscopes (see Section 6.4), spatial resolution is provided by the focused xray beam, and picture elements or “pixels” are provided by the set of beam positions on the specimen. Therefore the detector need not have any spatial resolution, allowing the use of detectors such as gas ﬂow proportional counters with very low noise for soft xray detection at low ﬂux levels [Feser 1998], or avalanche photodiodes for very fast response time for pump–probe experiments [Stoll 2004, Puzic 2010]. At the same time, controlling the sensitive area of the detector can be important, as discussed in Section 4.5.1, for the detector area plays the same role as the condenser aperture does in fullﬁeld imaging. One also needs to control the detector area for dark ﬁeld microscopy (Section 4.6, and in particular Fig. 4.59). Some degree of detector segmentation or even full pixelation is needed for techniques such as phase contrast, as discussed in Section 4.7, while ptychography (Section 10.4) requires an array detector. Pixelation can also be valuable in ﬂuorescence microscopy, both for increasing the overall count rate of an energydispersive detector as well as for advanced data interpretation approaches [Ryan 2010]. One important negative characteristic of a pixelated area detector involves charge sharing. If a photon arrives near the boundary between two pixels, the charge can spread
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
303
into adjacent pixels (see Fig. 11.4 for an example in plastic; the charge spreading is smaller in the higher density of silicon). In a photoncounting detector, this could lead to none of the pixels having enough charge deposited to reach the pulse threshold needed to record the photon event. In a chargeintegrating detector, one might mistakenly record this as the arrival of more than one photon (each with lower energy than that of the incident photon) in multiple adjacent pixels. With monochromatic illumination and single photons arriving per integration time, the charge sum adds up to that produced by one photon, and the center of the charge distribution shows the arrival position at subpixel resolution (“droplet analysis” provides this information [Livet 2000]). If the pixels are large so that few events are aﬀected by charge sharing, one can use charge integration to measure both the position (pixel location) and energy (Eq. 7.30) of each photon. This requires low readout noise and a photon ﬂux per pixel well below the frame rate so as to avoid pileup, so in the past it has been applicable mainly to xray astronomy [Janesick 2001]. However, as frame rates have increased, one can start to use chargeintegrating detectors in synchrotron experiments either for subpixel spatial resolution or for perpixel energy resolution [Soman 2013, Cartier 2016, Ihle 2017]. One way to reduce chargesharing in a multielement or pixelated area detector is to overlay on the detector a collimator grid to block sensitivity near pixel boundaries. This comes at the cost of areaaveraged DQE. Frame rates for area detectors are another important consideration for applications such as tomography or spectromicroscopy in fullﬁeld imaging, or for ptychography. Frame rates for CCD detectors are usually in the 20–400 Hz range as limited by the rate of clocking charge through the CCD pixels to a small number of analogtodigital converter circuits. Pixel array detectors and CMOS detectors have pulsecounting or chargeintegrating electronics at each pixel, so that frame rates of 500–5000 Hz are now commercially available with faster frame rates on the horizon. Visiblelight CMOS detectors (which can be coupled with scintillators) can reach sustained frame rates of 30 kHz or more [Mokso 2017], especially on smaller regions of the detector array. These cameras are enabling highspeed tomographic imaging of dynamic processes, with capabilities summarized in Fig. 8.10. The area detectors above all oﬀer electronic readout, which is strongly preferred in today’s digitized world. Early xray microscopes used photographic ﬁlm or nuclear emulsions with a pixel size Δdet of about a micrometer or less [Niemann 1980], and contact microscopy (Section 6.1) and inline holography (Section 10.2) have used photoresists with an eﬀective spatial resolution of about 50 nm. For ﬁlm, nuclear emulsions, and photoresists, there is no particular limit on “pixel” count except for the ﬁeld of view of subsequent image enlargement and/or digitization devices.
7.4.5
Semiconductor detectors Semiconductorbased detectors are the most commonly employed detectors in xray microscopes today. This is particularly true for silicon detectors, which have the virtue of beneﬁtting from the worldwide digital and analog circuit industries. This makes it rela
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
304
Xray microscope instrumentation
tively inexpensive and easy to obtain ultrapure, singlecrystal silicon wafers for sensors, and lownoise, highspeed electronics for signal processing and readout. The literature on semiconductor detectors is voluminous, with many excellent books [Spieler 2005, Knoll 2010, Lowe 2014] outlining the key concepts and a plethora of research papers describing the most recent developments. As a result, we only make a few comments of particular relevance to the developers and users of xray microscopes: • For transistors and diodes, dopants are added at low concentration to silicon to provide donor energy levels just below the conduction band in negatively doped or ntype semiconductors (where phosphorous is frequently chosen as the dopant), or acceptor energy levels just above the valence band in positively doped or ptype semiconductors (where boron is a typical dopant). The energy gap Eg between these levels is only about Eg = 1.12 eV in silicon and Eg = 0.67 eV in germanium at room temperature, so with silicon one can promote an electron from the valence band to a donor level in ntype silicon using a wavelength shorter than about λ = hc/Eg (Eq. 3.7) or λ 1100 nm. This means that most charge integrating and electricalcurrentmode silicon detectors must be shielded against seeing visible light, and furthermore with germanium detectors one can have unacceptably high dark current if the sensor is not kept at low temperature and shielded from infrared light. A ﬁlm of about 100 nm of aluminum can do the trick for visiblelight shielding, with perhaps 10 nm of chromium laying underneath to “wet” the surface and thereby help prevent pinhole defects. • When one moves into the regime of ionizing photons ( 50 eV) with appreciable radiation momentum, conservation of momentum in the Si lattice changes the process to one involving the creation of electron–hole pairs, where a “hole” is a positively charged pseudoparticle representing a “missing” electron (the “hole” can travel through the silicon lattice as one electron hops into a hole to one side, leaving a hole on its other side). The transport of both electrons and holes involves coupling to Raman modes in the semiconductor lattice, which are the highest energy collective vibrational modes of nuclear displacements from equilibrium positions (phonon modes). This leads to a simple estimate [Shockley 1961] of the threshold energy for creating an electron–hole pair of W 2.2Eg + rEr ,
(7.58)
where Eg is the energy gap noted above, r is the mean number of inelastic collisions of an electron with Raman modes (Shockley estimated r 17.5 for Si), and Er is the energy of a Raman mode (Er = 0.063 eV for Si). The factor of 2.2 in Eq. 7.58 includes the fact that one must excite both an electron and a hole (thus, in a way, explaining the factor of 2), and the parabolic shape of the energy surfaces in the Brillouin zone. Shockley employed some fortuitous guesswork to arrive at Eq. 7.58, which gives a fairly good estimate of WSi (Eq. 7.31) of about 3.5 eV. More exact predictions involve Monte Carlo modeling of electron, hole, and phonon interactions [Fraser 1994], and these calculations are in strong agreeDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
•
•
•
•
305
ment with the observed value of WSi = 3.65 eV (Eq. 7.31), as well as the Fano factor of FSi = 0.118 (Eq. 7.32). Electrons have about three times the mobility that holes have (1350 V/cm/s2 versus 450 V/cm/s2 in Si). Therefore it takes an average electron a time of about 20 ns to traverse a 300 μm thick silicon sensor layer with a bias voltage of 30 V [Spieler 2005]. In order to collect all electrons in one pulse, charge ampliﬁers make use of a pulseshaping time of tens of nanoseconds. This leads to the detector dead time tdead and its consequences, as discussed in Eqs. 7.44–7.52. The persistence of electron–hole separation is temperature dependent, as is the buildup of a dark signal. Therefore it is advantageous to cool silicon detectors to temperatures of about 240 K if they are used for integration times of tens of seconds or longer. At colder temperatures, one begins to interfere with the Fermi–Dirac distribution function (Eq. 3.19), which is the mechanism providing the necessary quiescent current in semiconductor junctions. One can arrange for signal gain via an “avalanche” eﬀect based on a bias voltage near but just below a breakdown voltage in avalanche photodiodes (APDs) which produce a linear signal ampliﬁcation eﬀect, or a bias just above the bias voltage in singephoton avalanche diodes (SPADs) [Cova 2004]. These detectors can deliver very fast response for pump–probe experiments of magnetic spins [Stoll 2004, Puzic 2010]. Silicon sensors that are 200–500 μm thick (limited by electron and hole transport distances) work well for xray energies up to about 15 keV. At higher energies, sensors made of higherdensity materials such as Ge, GaAs, or CdTe sensors become advantageous even though these materials are less readily available in highpurity singlecrystal forms.
Finally, as hard materials with controllable electrical conductivity, silicon sensors are relatively robust against radiation damage (Section 11.4) and one can sometimes “free” radiationinduced trapped charge distributions by periodically heating up or annealing a silicon detector chip [Doering 2011]. Of course this should only be attempted after consultation with the manufacturer of the detector chip! Integrated circuits in silicon are not always so robust against radiation damage, in particular because they employ thin oxide layers as electrical insulators and these layers are known to be more susceptible. Among industrystandard processes for making complementary metaloxide semiconductor circuitry, some are known to be more radiation resistant than others and not all of these processes are compatible with highperformance silicon sensor fabrication, as will be noted below. Therefore the process choice for fabricating applicationspeciﬁc integrated circuits (ASICs) for detector signal processing should take radiation damage into account, at least if the circuitry layer is subject to xray dose (this is less of a concern in pixel array detectors with separate sensors and ASICS, for reasons discussed below). Pixelation and readout in direct exposure semiconductorbased area detectors is accomplished in several ways: • In chargecoupled devices (CCDs), bias voltages supplied to an array of electrodes Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
306
Xray microscope instrumentation
create something like “charge buckets” which contain photongenerated charge within each pixel during data collection [Janesick 2001]. These bias voltages are then manipulated to “clock out” the charge from pixel to pixel (like water being tipped from one bucket to the next), ﬁnally reaching a chargesensitive preampliﬁer (fabricated as part of the CCD). There can be one preampliﬁer per CCD, in which case all pixels have to be clocked through that one ampliﬁer, or multiple ampliﬁers [Denes 2009] with the ultimate limit being the location of an ampliﬁer on each side of each row. The sequence of perpixel charges can then be transferred to an oﬀCCDchip digitizing circuit. This means that photons that arrive during the charge transfer process can be incorrectly interpreted as if arriving at a diﬀerent position, so CCDs are often used with a shutter that can be closed during readout. (An alternative is to make a frame store CCD where one uses a 2N x × Ny pixel array, shields the outer N x /2 pixels on either side from seeing light, and ﬁrst quickly clocks charge out from each of the center N x /2 regions of the array into their respective outer N x /2 regions; one can then allow the center regions to again see X rays while the outer regions are clocked out to the preampliﬁers at a more leisurely rate [Denes 2009]). The fact that there are no perpixel electronics means that the CCD pixel size Δdet can be reasonably small, such as 5–30 μm; however, larger pixel sizes oﬀer greater “full well capacity” (height of the bucket walls) before one reaches saturation and charge starts to leak over the voltage barrier into adjoining pixels. Full well capacities approach one million or so electrons per pixel, which for 10 keV X rays creating (10, 000/3.65) = 2740 electron–hole pairs per photon means that only about 365 photons can be recorded per pixel per integration time. The CCD chip can be thinned to expose the backside (or noncircuitcontaining side) to radiation provided the exposed layer is “passivated” to remove dangling silicon bonds (typically done using ion implantation and annealing), and these backsidethinned CCDs give high DQE as well as increased radiation damage resistance, since their oxide layers are restricted to the unexposed, circuit side of the chip. (A variant involves p–n junctions [Meidinger 2006].) The comments on charge sharing, droplet analysis, and energy resolution given for chargeintegrating detectors in Section 7.4.4 all apply to direct xray detection with CCDs. Visiblelight CCDs can also be used with scintillators [Gruner 2002] for xray area detectors, using either ﬁberoptic bundles or lens imaging to transfer the signal to the CCD. Finally, when only single photons are recorded per pixel per integration time, one can use the droplet analysis and energy resolution capabilities noted for chargeintegrating area detectors in Section 7.4.4. The status of xray CCD capabilities is presented in a recent book chapter [Str¨uder 2016]. • In pixel array detectors (PADs) one separates the sensor and the readout electronics into two separate chips: a sensor chip (Section 7.4.6), and an ASIC chip. This allows one to use a separately optimized process for fabricating each chip (such as high resistivity for the sensor), and in particular the readout electronics chip (an ASIC) can be fabricated using standard integrated circuit fabrication processes and facilities. The two chips are usually connected by a “bumpbond” process in which matching metal pads are fabricated on the bottom of the Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
307
sensor and the top of the ASIC, small blobs of a ductile metal such as indium are placed on the metal pads on one chip, and then the two chips are mechanically pressed together to electrically connect each sensor pad to its respective ASIC pad (this process is also used for some ﬂipchips in commercial electronics). This bump bonding process usually limits the pixel size to the Δdet = 50– 200 μm range (and leaves each sensor pixel with relatively high capacitance), though 25 μm has been demonstrated in one xray PAD [Ramilli 2017]. Because most of the X rays are stopped in the sensor chip, the ASIC receives less xray dose, which allows a broader array of ASIC processes to be considered. Of particular importance has been a series of xray pixel array detectors developed at the Paul Scherrer Institut (PSI) in Switzerland, starting with the Pilatus detector with perpixel photon counting [Broennimann 2006], followed by the Jungfrau detector with chargeintegrating electronics [JungmannSmith 2016], and more recently the AGIPD detector with analog storage to allow up to 352 frames to be recorded at a burst frame rate of 5 MHz before chip readout [Henrich 2011]. Work at PSI also led to the establishment of the company Dectris in Switzerland, which supplies commercial versions of several of these detectors. As noted in the discussion of high dynamic range and charge integration (Eq. 7.57), Cornell University has developed a series of chargeintegrating PADS, including one developed with the former USA company ADSC [Angello 2004] and one that later on became the basis for the CSPAD detector at the freeelectron laser at SLAC [Philipp 2011, Herrmann 2013]. The Medipix series of ASICs developed at CERN [Campbell 1998, Ballabriga 2007, Llopart 2002] have also served as the basis for several research and commercialized xray PADs. With perpixel electronics, one can use either pulsecounting or chargeintegrating readout schemes (or even a hybrid of the two [Weiss 2017]) and have a detector optimized for the expected ﬂux rate (see Fig. 7.15). In addition, with chargeintegrating PADs, the droplet analysis and energy resolution capabilities described in Section 7.4.4 become an option. Recent book chapters provide greater detail on pulsecounting [Br¨onnimann 2016] and chargeintegrating [Graafsma 2016] PADs. • In CMOS detectors the sensor and CMOS perpixel readout electronics are fabricated together in one monolithic device. Because no bumpbonding is required, the pixel size Δdet can be smaller than with PADs. However, most standard CMOS processes are not compatible with highquality xray sensors (see Section 7.4.6) and also show sensitivity to radiation damage in the oxide layers, as noted in Section 11.4. If one uses frontside illumination (illumination on the same side as the circuitry), one can lose some of the signal due to absorption in the circuitry, leading to a limited ﬁll factor (along with increasing risk of radiation damage to the circuitry). The approach of the PERCIVAL detector developed by a collaboration led by DESY in Germany [Wunderer 2015] is to use backside thinning as has been done for CCDs, with Δdet = 27 μm pixel size and frame rates up to 120 Hz. Another approach to overcome these limitations is to use silicononinsulator (SOI) technology, so that one can separate the highresistivity sensor layer from the electronics readout layer. This approach has been used for both Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
308
Xray microscope instrumentation
chargeintegrating detectors with Δdet = 17 μm pixel size [Nishimura 2016] and photoncounting detectors with Δdet = 60 μm pixel size [Arai 2011], with use in application experiments planned for the future. Another approach is to have an epitaxially grown layer surrounded by two layers of pdoped silicon; in this case one can have a response that depends on which layer an xray photon was absorbed in, in which case one can use droplet analysis for charge sharing amongst pixels, as well as perphoton energy resolution analysis, for singlephotonperpixel exposures as discussed in Section 7.4.4. This has led to the development of chargeintegrating detectors with Δdet = 12 and 25 μm pixel size [Doering 2016], which again are planned to be tested in application experiments in the future. Apart from CMOS sensors for direct xray detection, what is far more common is the use of visiblelight CMOS detectors together with scintillators ,as will be discussed in Section 7.4.7. These three detector types have been compared against each other for xray crystallography, where PADs were found to be superior [All´e 2016]. There are additional semiconductor xray detector types beyond CCDs, PADs, and CMOS detectors. One example involves the use of thinﬁlm transistors and amorphous selenium ﬁlms [Parsafar 2015], but these are detectors with performance optimized for medical xray imaging at 50–150 keV photon energies, and are less well suited for xray microscopy.
7.4.6
Sensor chips for direct xray conversion As noted above, in PADs one can separate the function of xray absorption and charge generation into a sensor chip, which is then coupled with a separate ASIC chip for signal readout using methods such as bumpbonding. The sensor chip can then have its fabrication properties such as resistivity optimized for its role, without concern for the material properties and processing steps needed for the ASIC readout chip. If the sensor is thick enough, it can also greatly reduce the xray dose received by the ASIC chip. Most sensors for PADs are made out of highresistivity silicon, in thicknesses ranging up to about 1 mm. For higherenergy X rays, the low density of silicon can mean that not all photons are absorbed in the sensor chip, reducing DQE and increasing the dose on the ASIC chip. For energies well above 10 keV, one can instead use sensors made of higherdensity materials such as germanium which has been coupled with the Medipix3 ASIC in the LAMBDA detector [Pennicard 2014], or CdTe which has been used with Medipix2 ASICs [Aamir 2011] and which is available as a sensor material on several commercial detectors.
7.4.7
Scintillator detectors: visiblelight conversion We use the term “scintillator detector” to refer to the coupling of a luminescent material (which converts xray absorption into visiblelight emission) with a visiblelight detector.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
309
Luminescent materials absorb ionizing radiation and release the energy as visible light. The generic term “luminescence” refers to the case in which the absorption event does not lead to a blackbody emission spectrum (Eq. 7.5) [Garlick 1958]. Luminescence therefore includes both ﬂuorescence (which is a rapid, direct electron transition process permitted by a spinallowed transition with Δ j = 0; see Eqs. 3.16 and 3.17), and also the spindisallowed process of phosphorescence [Blasse 1994, Appendix 3], which occurs at times longer than 1 ns after the absorption of the ionizing radiation trigger. Luminescent materials in powder form tend to be called phosphors, while single crystals tend to be called scintillators [Blasse 1994, Chapter 8], though there is not a high degree of uniformity of terminology in the literature (for example, structured scintillators [Olsen 2009] do not use single crystal materials). Because most (but not all!) xray microscopy work employs singlecrystal luminescent materials, xray microscopists tend to simply speak of scintillator detectors for all cases.2 After ionizing radiation has been absorbed, there can be a competition between nonradiative return to the material’s ground state by phonon modes (heat), versus the transfer of at least some energy to one or more luminescent centers (also called activators). As an example, the gemstone ruby is made of crystalline aluminum oxide with chromium atoms replacing some (typically about 1 percent) of the aluminum atoms (written as Al2 O3 :Cr). These dopant atoms form Cr3+ ions in the aluminum oxide lattice with orbital distortions due to the lattice, so that excitations of the aluminum oxide lattice can stimulate transitions at wavelengths that are diﬀerent from transitions in Cr alone. Therefore while pure aluminum oxide is colorless, the Cr dopant in ruby gives rise to emission at the ruby red wavelength of λ = 694 nm. One can use diﬀerent activators to obtain emission at diﬀerent wavelengths so as to match the peak eﬃciency of a visiblelight photodetector or camera, and other materials to reduce eﬀects such as afterglow, which is caused by electrons or holes becoming trapped at defects or contaminants in the material, with slow, thermally excited release long after the exciting ionization event. Xray scintillators used in xray microscopy include CsI:Tl, and Lu3 Al5 O12 :Eu (lutetium aluminum garnet, or LuAg or LAG, doped with europium as one example activator) crystals grown by liquid phase epitaxy, with other materials used in certain cases as well [Martin 2006]. One tabulation of the properties of various scintillators, including luminosity (photons per MeV of energy deposited), decay time, and visiblelight emission peak is available from an internet search of scintillator.lbl.gov. While phosphors and scintillators are sometimes used as visiblelight converters for single area detectors in scanning transmission xray microscopes [Kilcoyne 2003], luminescence allows one to make pixelated area detectors. One approach for soft xray detection is to simply coat the surface of a visiblelight CCD with a thin phosphor [McNulty 1992] (with P41 having especially favorable properties amongst soft xray phosphors [Yang 1987b]). For hard xray applications, thicker luminescent materials are required in order to stop an appreciable fraction of the xray beam, but one must then consider transfer of the visiblelight intensity pattern through the thickness of the material and onto the visible light camera. One of the ﬁrst approaches was to create 2
After all, being sloppy about the terminology gives one a chance to see a luminescence specialist cringe!
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
310
Xray microscope instrumentation
a columnar structure that would conﬁne visible light laterally, and ﬁll it into an extended depth with a luminescent material. This was ﬁrst done with ∼10 μm column pitch [Duchenois 1985, Bigler 1985, Thacker 2009] and then at just below 1 μm pitch [Deckman 1989, Olsen 2009]. At ∼10 μm resolution, another approach is to coat a luminescent material onto a tapered ﬁberoptic bundle to transfer light to the camera [Gruner 2002]. For xray microscopists, the most common approach for realizing micrometerscale spatial resolution pixelated area detectors is to use a luminescent screen followed by visiblelight microscope objectives to project a magniﬁed image onto a CCD or CMOS detector. (A 45◦ mirror is often used in the optical path so that the visiblelight camera does not get damaged by exposure to the direct xray beam.) This was done ﬁrst with structured scintillators [Flannery 1987, Deckman 1989], but today most researchers prefer the uniformity of response of highquality singlecrystal scintillators. Because there is a large commercial market for sensitive and highspeed pixelated area detectors for visible light, one can obtain burst frame rates of hundreds of kHz using oncamera frame storage [Fezzaa 2008], or sustained frame rates of tens of kHz using specialized camera interfaces [Mokso 2017]. Finally, because many 2–3 eV visiblelight photons are generated from each X ray, the DQE of the twostage process is dominated by the eﬃciency of absorption in the luminescent material (Eq. 7.41) even though the eﬃciency for collecting the visiblelight photons might be low. When using singlecrystal scintillators with microscope objective optics, thin scintillators are preferred so that all of the visible light is generated within the depth of ﬁeld (DOF; Eq. 4.215) of the microscope objective, while thick scintillators are preferred to stop a large fraction of the X rays. This tradeoﬀ is shown in Fig. 7.16, where the absorption of LuAG is shown along with spatial resolution based on a simplifying formula [Koch 1998] of
2 % & 0.70 μm + (0.28 μm) · t · N.A. 2 (7.59) Δdet N.A. where t is the thickness of the scintillator. The ﬁrst term is the diﬀraction limit to resolution, the second term is due to DOF, and the numerical coeﬃcients are for capturing 90 percent of the signal within the line spread function of a microscope objective with a given N.A.. As this ﬁgure shows, one has to decide whether to emphasize high DQE or high spatial resolution at harder xray energies. Scintillatorbased detectors tend to have relatively high dark noise from stray light scattering, and low dynamic range. However, they oﬀer a spatial resolution in the submicrometer range, which is one to two orders of magnitude smaller than what can be achieved with CMOS or PAD area detectors.
7.4.8
Gasbased detectors Gasbased detectors have a long history in xray science, dating from R¨ontgen’s experiments that showed that X rays cause ionization in gases. In xray microscopy they were the ﬁrst type of detector used for scanning transmission xray microscopes (Fig. 2.4). They are less in favor in today’s semiconductor era, but they still oﬀer some very nice
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
311
Photon energy (keV) 2
3
3
10
20
30
14
so Ab
12
100
5HVROXWLRQѥP
8
6
80
60
t ѥP 40
Absorption (%)
n
tio
rp
10
4
t ѥP
Re 2 0 0.0
so
t ѥP
lu
tio
n
20 t ѥP
t ѥP 0.5
1.0
0 1.5
Numerical aperture (N.A.) Figure 7.16 Spatial resolution and xray absorption in Lu3 Al5 O12 (LAG or LuAG) scintillators
of 2, 5, 10, 20, and 50 μm thickness. The spatial resolution versus relay optic N.A. (left and bottom axes, narrow plot lines; see Eq. 7.59) shows that thin scintillators are preferred so that higher resolution and more light collection can be obtained with high N.A. optics. Absorption versus xray energy (right and top axes, broader plot lines) shows that thick scintillators are preferred so as to stop a larger fraction of the xray beam at multikeV xray energies, thus improving DQE. Therefore one has to make tradeoﬀs between resolution and DQE for a given experiment.
properties: a dark noise of essentially zero due to a large activation energy W (Eq. 7.27) for creating ion–electron pairs (W = 33.9 eV for air, 24.8 eV for nitrogen, 25.5 eV for argon, and 26.8 eV for methane [Weiss 1955, Wolﬀ 1974]), and near 100 percent DQE for detection of photons that are absorbed in the gas. With a low electric ﬁeld applied across a gas, one can use the measured ionization current created in a volume sealed using Kapton or polyimide chamber windows as a monitor of the xray ﬂux transmitted through the ionization chamber. If one instead uses a wire at a bias voltage relative to the chamber wall to produce a large electric ﬁeld near the wire’s small radius, one can begin to get a gas multiplication eﬀect. The initial ion is accelerated enough to cause ionization in another gas molecule, so that at voltages of about 1000–2000 V one xray absorption event generates an ionization pulse involving W · M ion–electron pairs. The gas multiplier M ranges from about 102 to 104 depending on the gas and the voltage [Hendricks 1972, Wolﬀ 1974]. (One often uses a gas such as Ar so that electrons and ions do not get trapped by the closedshell noble gas prior to an ionization event, with a small quantity of an organic molecule such as methane to serve as a quench gas to terminate a pulse discharge [Collinson 1963, Agrawal 1988]; 90 percent argon and Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
312
Xray microscope instrumentation
10 percent methane, or P10, is one common gas mixture for proportional counters.) It is therefore possible to measure the energy of the absorbed photon in this “proportional counter” regime, though the energy resolution is not very high (Eq. 7.30) due to the large value of W (the Fano factor for several gases is around F 0.17 [Alkhazov 1967]). As the voltage is raised further, lightning can strike: one obtains a complete breakdown of the gas upon an initiating ionization event, so that one xray pulse creates a very large charge pulse independent of the xray photon’s energy. This is the Geiger–M¨uller regime of gas detectors. In both the proportional and Geiger–M¨uller regimes, there is a recovery time for ions to recombine with electrons so that the counter is ready for another xray event, and this is aﬀected by spacecharge limitations near the bias wire. This can limit the maximum count rate of gas proportional counters to less than 1 MHz, though higher counting rates can be achieved by operating at belowatmosphere pressures and with multiple bias voltage wires arranged along an extended xray beam path [Feser 1998]. It is also advisable to have a low but steady gas ﬂow to compensate for the outgassing of contaminants and to remove molecules that have been not just ionized but fragmented after xray absorption.
7.4.9
Superconducting detectors The energy resolution of silicon detectors is limited by the WSi = 3.65 eV needed to create a single electron–hole pair, as indicated by Eq. 7.33. One approach to reduce W dramatically, and thus dramatically improve the energy resolution of energydispersive detectors (Section 7.4.12), is to use the energy associated with the sharp onset of electrical resistance in a superconducting ﬁlm operated near its critical temperature. Consider the case of aluminum, which has a critical temperature of 1.175 K corresponding to a thermal energy of kB T = 0.0010 eV. In principle this should lead to the creation of 3.65/0.0010 = 3650 more quasiparticle events than in a silicon detector, with a decrease √ in the energy resolution by a factor of 1/ 3650 = 0.016, so that the 154 eV energy resolution of silicon (Eq. 7.33) would be reduced to about 2.5 eV. Thus the strategy is to have an xray photon stopped in an absorbing material which will heat up by a minuscule amount, and detect that heat by the change in resistance in a superconducting ﬁlm [Moseley 1984]. This is best accomplished by using the electrothermal feedback of a transition edge sensor [Irwin 1995], leading to an estimate of the fundamental limit to energy resolution of (7.60) ΔEFWHM = 2.36 4kB T 2C(1/α s ) n/2, where C is the heat capacity of the superconductor, n 4–6 is the thermal impedance between a superconducting ﬁlm and its substrate, and T dR (7.61) R dT is a unitless measure of the sharpness of the change in electrical resistance R of the superconductor at its critical temperature (values of α in the hundreds are representative for many superconducting alloys). The ﬁrst xray detectors to use this principle αs =
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
313
achieved an energy resolution of 2.6 eV with 1 keV photons [Irwin 1996]. A major limitation has been the time required for a detector element to “cool down” once again after it has absorbed an xray photon, thus limiting count rate, but this can be addressed by providing many detector elements in a pixelated array with multiplexed signal readout [Chervenak 1999, Irwin 2004]. Xray detectors that can combine high energy resolution with high count rate are under active development, and could oﬀer a path to imaging diﬀering chemical states of elements in scanning xray ﬂuorescence microscopy by exploiting XANESlike shifts in ﬂuorescence emission energies.
7.4.10
Energyresolving detectors While the method of xray emission spectroscopy [Meisel 1989] is ripe for combination with microscopy with a few early examples, the detection of elementspeciﬁc xray ﬂuorescence (Section 9.2) is the main method in xray microscopy that requires detectors that can report the energy of detected photons. There are two main types of energyresolving xray detector systems now in use in xray microscopes: wavelengthdispersive detectors, which use Bragg diﬀraction in crystals or from gratings, and energydispersive detectors which measure the number of quanta created in a detector after an xray absorption event. We outline only a few key ideas of each.
7.4.11
Wavelengthdispersive detectors Wavelengthdispersive detectors are essentially diﬀraction spectrometers, with a detector that has pixels at least along the dispersion direction (that is, a linear pixel array detector). No energy resolution from the detector is required, as diﬀerent photon energies are translated into diﬀerent arrival positions (spatial resolution) on the detector. Because in scanning xray microscopy they are used to collect the signal from a small excitation spot, their design is somewhat simpliﬁed relative to xray spectrometers used to collect signals from larger areas. Because xray ﬂuorescence yields are larger at multikeV xray energies and above (Fig. 3.7), wavelengthdispersive detectors used for xray microscopy tend to use Bragg diﬀraction from crystals (though there are very successful gratingbased spectrometers for soft xray emission [Nordgren 1989, Chuang 2017]). One of the challenges of volume gratings like crystals is that Bragg’s law requires one to match both the input and the output angle with the crystal’s d spacing, as shown in Fig. 4.9. Because the Darwin width or angular spread within which one has strong diﬀraction is so narrow (Fig. 4.12), one faces special challenges not present with simple plane gratings (Fig. 4.8). One way to overcome this limit is to scan the grating through a rotation range to collect a spectrum, but this is not practical in scanning xray microscopes where short pixel dwell times are desired in order to image large ﬁelds of view at high resolution. Instead, it is more common to bend the crystal [DuMond 1930] both to provide a degree of imaging from the source to a focus like in the Rowland circle condition for spherical gratings, and to keep a larger width of a crystal near the Bragg condition. This can be done in reﬂection mode [Johann 1931], or in transmission mode [Cauchois 1932]. These older
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
314
Xray microscope instrumentation
designs have been summarized [Sandstr¨om 1957], while newer designs with multiple crystals have appeared [AlonsoMori 2012, Honkanen 2014]. As noted in Section 9.2, an especially interesting wavelengthdispersive spectrometry system at the European Synchrotron Radiation Facility in France uses a polycapillary array to increase the solid angle of collection to Ω = 1.48 (though with only 14 percent eﬃciency), while also providing the crystal with a parallel rather than a diverging beam. This system delivers 4–40 eV energy resolution [Szlachetko 2010], and because diﬀerent xray energies arrive at diﬀerent pixels on a linear array detector one can accommodate higher overall ﬂux rates than in singleelement energydispersive detectors.
7.4.12
Energydispersive detectors Energydispersive detectors (sometimes called energydispersive spectroscopy detectors or EDS detectors) work by using Eq. 7.27 of E = qW to measure the photon energy E for each photon (q = 1) as it arrives. The resulting energy resolution for silicon EDS detectors of ∼150 eV FWHM is usually suﬃcient to separate ﬂuorescence lines, though there can be overlaps as discussed in Section 9.2.1. As can be seen from Eq. 7.30, one will obtain improved energy resolution using materials with lower values of W (recall from Eq. 7.31 that WSi = 3.65 eV, while WGe = 2.96) though silicon remains popular because of the quality of material that can be obtained and its more highly developed processing technologies. Considerable detail on the various technologies for EDS detectors such as lithiumdrifted silicon or Si(Li), or silicon drift diodes (SDDs), is provided elsewhere [Spieler 2005, Lowe 2014], so we will only make a few general comments here: • Because measuring the energy of a photon requires collecting all of the electrons or holes it liberates, one must allow for enough time for all charges to reach an ampliﬁer (not leaving enough time leaves one with a “ballistic deﬁcit” in charge collection [Loo 1988]). This would make one prefer a smaller active detector area, while increasing the solid angle of the detector (Eq. 9.23) would make one prefer a larger active detective area. The required compromise aﬀects the detector dead time tdead , thus limiting the maximum count rate to the MHz range (Eq. 7.44) per detector element even if dead time corrections are made. One way to go beyond this limit is to essentially have a pixelated energyresolving detector so that the aggregate rate can be much higher; this is the approach taken by the 384 “pixels” of the MAIA detector system [Ryan 2010]. • If some of the electron–hole separations created by the absorbed photon fail to reach the ampliﬁer, one obtains an incorrect, lower value for the photon energy. This can happen when some charges are trapped by defects or impurities in the detector material. Incomplete charge collection manifests itself as a “step” and a “tail” in the spectral response from a single photon energy, as illustrated in Fig. 7.17. • In order to minimize the contribution of thermal excitations from adding “extra” charge to a photon’s measurement, the detector is often cooled below room temperature so as to “sharpen” the Fermi–Dirac distribution function of Eq. 3.19.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.4 Xray detectors
315
100
Normalized intensity
102 Sum
104
S (step; incomplete charge collection)
10
6
108 1010 T (tail; incomplete charge collection)
1012
G (Gaussian broadening)
1014 0
2
4
6
8
10
Photon energy (keV) Figure 7.17 Factors in the response of an energydispersive spectroscopy (EDS) detector. Shown
here is the representative response to 8.00 keV photons, with a peak broadening given by Eq. 7.30, as well as both a “step” and a “tail” on the lowenergy side of the spectrum, which are both due to incomplete collection of the photonabsorptionproduced electron–hole charge separation. Figure adapted from [Sun 2015], following the notation of [Van Grieken 2002].
• Incoming photons can trigger the process of xray ﬂuorescence in the materials that make up the detector. If the ﬂuorescence occurs deep within the detector material, the ﬂuorescent photon will be reabsorbed (Section 9.2.4) in the detector so the correct total charge will still be reported. If, however, that ﬂuorescence event takes place near the surface of the detector material, the energy of that ﬂuorescent photon can be lost. This gives rise to “escape peaks” in the ﬂuorescence spectrum; for example, with 10.00 keV photons into Si where the SiKα ﬂuorescence line is at 1.74 keV, there will be a small escape peak at 10.00 − 1.74 = 8.36 keV in the xray ﬂuorescence spectrum. • As was noted above Eq. 7.44, if two photons arrive within the pulseshaping time they can be interpreted as a single photon with twice the energy. This is called pileup, and it means that strong ﬂuorescence at 4 keV can produce a pileup peak at 8 keV. • In order to avoid the buildup of contaminants on the active detector surface as well as to shield from visible light, most detectors use a thin beryllium or aluminized silicon nitride entrance window. This aﬀects how close the detector can be placed to the specimen, thus limiting collection solid angle; it also limits the ability to detect xray ﬂuorescence lines below 1–2 keV, depending on the window material Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
316
Xray microscope instrumentation
and thickness. Xray ﬂuorescence selfabsorption is strong in these cases (Section 9.2.4), and ﬂuorescence yields are low (Fig. 3.7), though there are successful demonstrations of the use of sub1 keV xray ﬂuorescence in xray microscopy [Kaulich 2009, Kaulich 2011, Hitchcock 2012, Bufon 2017]. Together these eﬀects lead to the characteristics of actual energydispersive spectra, such as that shown in Fig. 9.12.
7.5
Sample environments Xray microscopy is all about the specimen under study, and one of the advantages of using X rays for microscopy is the high penetrating power relative to electron beams. While environmental electron microscopy oﬀers one the chance to image specimens in partial gas pressures and in liquid environments that are up to a few hundred nanometers thick [Ross 2016], inelastic and plural elastic scattering sets fundamental limits (Fig. 4.81) while xray microscopes are able to study specimens that are micrometers or even millimeters thick. Therefore one can consider sample environments that are much closer to what is “natural” for the study at hand, provided one is aware of the limitations set by radiation damage (Chapter 11). For soft and biological materials, radiation damage can be minimized by maintaining the specimen at cryogenic temperatures using the methods discussed in Section 11.3.1. For many other materials, one can observe them in native conditions (in situ) or even as they normally operate (operando3 ), though as noted in Section 11.4 at doses of about 109 Gy one does begin to see changes in some lithium battery materials [Nelson 2013] and in silicononinsulator materials [Polvino 2008]. Even with these limits, there is considerable headroom for xray microscopy to accommodate a wide range of sample environments. Two of these environmental conditions involve gases and ﬂuids. As was shown in Fig. 7.12, X rays are able to penetrate signiﬁcant distances in air and in helium gas, depending on the xray energy. Therefore while it is helpful to provide a helium gas environment where possible so as to minimize xray beam absorption and scattering, and while the presence of oxygen can increase radiation damage eﬀects in soft materials studies [Coﬀey 2002, Braun 2009] as noted in Section 11.2.1, there is no absorption problem in providing an air or other gas environment for a specimen if required. Fluids are also easily accommodated: Fig. 2.5 showed that the 290–540 eV “water window” spectral range [Wolter 1952] provides great absorption contrast for the study of hydrated organic materials, and there is appreciable phase contrast for such specimens at multikeV energies, as was shown in Fig. 4.77. One can also incorporate microﬂuidics systems to provide for ﬂuid exchange and reactions within an xray experiment [Ghazal 2016]. However, for studies of ﬂuids and gases one usually needs a way to obtain an isolated environment, which leads to the need for xray windows. There are several materials that one can use for xray windows with diﬀerent absorp3
This is often written as in operando as one would expect from in vivo and in vitro, but apparently the correct Latin usage is operando. Consult your local Jesuit to be sure.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.5 Sample environments
317
1/e attenuation length (meters)
103
104
Borosilicate glass
105 Kapton
106
1000 nm
Silicon nitride
107
100 nm
Silicon
108 100
300
500
1000
3000
5000
10,000
Photon energy (eV) Figure 7.18 Attenuation lengths μ−1 (Eq. 3.75) in several materials used for thin xray windows.
Since the transmission through a thickness t goes as exp[−μt] (Eq. 3.76), and xray window thicknesses tend to be between 100 and 1000 nm, those thicknesses are indicated at right on the plot. Shown here are the curves for silicon nitride (Si3 N4 ; ρ = 3.17 g/cm3 ), borosilicate glass (81 percent SiO2 , 13 percent B2 O3 , 3.5 percent Na2 O, 2 percent Al2 O3 , and 0.5 percent K2 O; ρ = 2.23), silicon (Si; ρ = 2.329), and Kapton (stoichiometry H10 C22 N2 O5 ; ρ = 1.42).
tion properties, as shown in Fig. 7.18. Polyimide (of which Kapton is one variant) was used for xray windows in early zone plate xray microscopes [Schmahl 1980], but its relatively high absorption in the soft xray range and its lack of stiﬀness can be problematic. Silicon windows oﬀer great properties and have been used with much success in soft xray microscopy, but the methods needed to produce them [Medenwaldt 1995] are somewhat costly in terms of equipment required and manual monitoring. Thinwalled borosilicate glass capillaries (pulled to a wall thickness of about 100 nm) have proven to be very successful [Weiß 2000] for soft xray tomography of samples that can be loaded inside, such as cells or particles in liquid suspension. However, they are not easily manufactured for ∼100 nm thin ﬂat substrates. For that requirement, silicon nitride has nearly perfect properties of low absorption, high stiﬀness, great radiation damage resistance, and reasonably lowcost production methods (as will be described in Section 7.5.1). However, other alternatives are worth considering, such as silicon carbide membranes, which are said to be tougher and with better compatibility for cell culturing [Altissimo 2018]. Graphene is another interesting material, since monolayer ﬁlms can support high pressure diﬀerentials and block the transport of waDownloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
318
Xray microscope instrumentation
UV light Photomask
Photoresist Si Silicon nitride 1: UV exposure
2: Develop photoresist
3: Silicon nitride RIE (reactive ion etch) Wafer orientation
face
4: Strip photoresist
54.7°
5: KOH etch Si
Figure 7.19 A process for producing thin silicon nitride windows [Pawlak 1987]. It begins with a doublesidepolished 100 silicon wafer upon which has been grown an Si3 N4 layer using lowpressure chemical vapor deposition (LPCVD). A positive photoresist is then spun on each side, and one side is exposed through a contact mask so that liquid development exposes regions of the Si3 N4 surface. A reactive ion etch (RIE) with a ﬂuorinated gas (for example, 8.5 percent O2 with CF4 ) is then used to selectively etch the nitride layer and expose the underlying silicon only at the desired locations. Next, an anisotropic wet etch along the 111 planes is made to remove silicon in the exposed areas, creating freestanding windows.
ter [Bunch 2008, Liu 2015a]. This has allowed graphene to be used as a very thin membrane for encapsulating protein crystals [Wierman 2013], for liquid cells in electron microscopy [Park 2016, Textor 2018], and indeed for multiple imaging modalities [Matruglio 2018]. With the right window material, one can leverage the capabilities of lithographic fabrication methods to make sample environments with the right combination of ﬂuids or gasses and temperature. This has been important for xray microscopy studies of battery and catalysis materials, as will be discussed in Section 12.4 as well as in recent reviews [Weker 2016, Lin 2017]. The details of what makes for an ideal sample environment are very applicationspeciﬁc, so we emphasize here the basic considerations of window materials.
7.5.1
Silicon nitride windows Silicon nitride windows were ﬁrst made in connection with early plans to use xray shadow printing of absorption masks for proximity xray lithography (an approach that caused much excitement for possible future integrated circuit production [Spears 1972, Spiller 1993] at a time when optical lithography was thought to be limited in spatial resolution to many micrometers). The original process for silicon nitride window fabrication [Bassous 1976] was developed at the IBM Research Center in New York in the 1970s, and was later simpliﬁed [Pawlak 1987] to the one shown in Fig. 7.19. The process steps are as follows: • One starts with a doubleside polished silicon wafer with the surface oriented along
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
7.5 Sample environments
319
the 100 direction. This wafer is then placed in a low pressure chemical vapor deposition (LPCVD) system in which a silicon nitride layer of the desired thickness is grown. It is important to use a gas mixture and temperature combination to produce ﬁlms with low internal stress [Sekimoto 1982, TempleBoyer 1998, Toivola 2003]. The resulting ﬁlm might have a stoichiometry that diﬀers slightly from Si3 N4 . • A positive photoresist is then spun on both sides, after which UV exposure through a contact mask is used to deﬁne the top surface silicon etch areas. The photoresist is then developed to expose these areas on the silicon nitride ﬁlm. • A reactive ion etch (RIE) is carried out using a ﬂuorinated gas such as 8.5 percent O2 with CF4 . This removes the silicon nitride in the exposed areas to expose the underlying silicon wafer, with the diﬀerential etch rate of photoresist in the RIE gas protecting the other areas of silicon nitride. • The photoresist is then stripped using O2 RIE, which does not harm the silicon nitride. • The wafer is then wetetched in a solution of ethylene diamine, pyrocatechol, and water, which produces a highly anisotropic etch along the 111 planes of silicon [Finne 1967]. This wetetch step can take several hours, and it does not aﬀect the silicon nitride layer. The result of this process is to produce silicon nitride windows on silicon wafer frames. These have been used as lowelectronbackscatter, xray transparent windows for the fabrication of Fresnel zone plates, and as the windows in specimen environmental chambers for in situ and operando studies [Yang 1987a, Neuh¨ausler 2000, de Groot 2010]. Minichambers with silicon nitride windows and ﬂow tubes for periodic ﬂushing with culture medium have been used for studies of initially living cells [Pine 1992], including the study shown in Fig. 11.8, while silicon nitride windows oﬀer one option in microﬂuidic chambers used in xray studies [Ghazal 2016]. Silicon nitride windows are now commercially available from several vendors, including with electrodes for specimen heating and for electrochemistry. They are also used for electron microscopy [Ring 2011], and in microelectromechanical systems (MEMS) devices where their mechanical properties have been studied [Zwickl 2008] and where circulararea windows have been fabricated [Serra 2016]. Silicon nitride windows can be used to separate the UHV environment of an xray beamline from the atmospheric pressure environment of an xray microscope: 100 nm thick silicon nitride windows can withstand a pressure diﬀerence of 1 atmosphere over an area of 1 mm2 with no radiationinduced weakening over years of operation (scaling to larger areas has been studied in detail [Sekimoto 1982]). This took some courage to try at ﬁrst, including convincing the people in charge of the vacuum system at synchrotron light sources that the windows would not break and vent the accelerator to air! With precautions such as diﬀerential pumping arrangements and electropneumatically driven gate valves interlocked to pressure gauges, this was eventually permitted [Rarback 1984] and this approach is now widely used in xray microscopes. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
320
Xray microscope instrumentation
7.6
Concluding limerick Instrumentation may seem prosaic, but it’s the poetry through which we express our goals of learning more about the world at the nanoscale. If we want to do great xray science on our tools we must place our reliance Our detector and source we choose wisely; of course! Our instruments are in compliance.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:13:57, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.008
8
Xray tomography
Do˘ga G¨ursoy contributed to this chapter. Up until now we have concentrated on twodimensional (2D) imaging of thin specimens. However, one of the advantages microscopy with X rays oﬀers is great penetrating power. This means that X rays can image much thicker specimens than is possible in, for example, electron microscopy (as discussed in Section 4.10). For this reason, tomography (where one obtains 3D views of 3D objects) plays an important role in xray microscopy. There are entire books written on how tomography works [Herman 1980, Kak 1988], and on its application to xray microscopy [Stock 2008], so our treatment here will be limited to the essentials. Examples of transmission tomography images are shown in Figs. 12.1, 12.6, and 12.9, while ﬂuorescence tomography is shown in Fig. 12.3. Our discussion of xray tomography will be carried out using several simplifying assumptions: • We will assume parallel illumination, even though there are reconstruction algorithms [Tuy 1983, Feldkamp 1984] for cone beam tomography where the beam diverges from a point source. • We will assume that we start with images that provide a linear response to the projected object thickness t(x, y) along each viewing direction. In the case of absorption contrast transmission imaging, this can be done by calculating the optical density D(x, y) = − ln[I(x, y)/I0 ] = μt(x, y) as given by Eq. 3.83, with μ being the material’s linear absorption coeﬃcient (LAC) of Eq. 3.75. In phase contrast imaging, one may have to use phase unwrapping [Goldstein 1988, Volkov 2003] methods to ﬁrst obtain a projection image which is linear with the projected object thickness since ϕ = kδt (see Fig. 3.17). • We will assume that there is no spatialfrequencydependent reduction in the contrast of image features as seen in a projection image. That is, we will assume that the modulation transfer function (MTF) is 1 at all frequencies u (see Section 4.4.7). One can always approach this condition by doing deconvolution (Section 4.4.8) on individual projection images before tomographic reconstruction, or building in an actual MTF estimate into optimization approaches (Section 8.2.1). • We will assume that the ﬁrst Born approximation applies (Section 3.3.4): we can approximate the waveﬁeld that reaches a downstream plane in a 3D object as being essentially the same as the waveﬁeld reaching an upstream plane. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
322
Xray tomography
• We will assume that the 3D object volume lies entirely within an xray microscope’s depth of ﬁeld limit of DOF 5.4δ2r /λ as given in Eq. 4.215. This is known as the pure projection approximation. Cases where this does not apply are discussed in greater detail in Section 10.5. Together, these assumptions allow us to assume that we obtain pure projections through the object, with no information encoded regarding diﬀerences along the illumination direction. We will furthermore limit our discussion here to standard tomography with a single axis of rotation (at least until we consider doubletilt tomography and laminography in Section 8.5.2). Singleaxis tomography is also referred to as computed axial tomography, which in medical imaging is referred to as a CAT scan.1 In tomography based on absorption contrast, one wants to ﬁnd a balance between having enough absorption to produce contrast in the projection images, but not so much that one has trouble getting the beam to emerge from the thick specimen or that one absorbes too much radiation dose in the specimen. The optimum linear absorption coeﬃcient (LAC) μ for absorption contrast tomography is μopt = 2/D,
(8.1)
where D is the diameter of the specimen [Grodzins 1983b]. One possibility is to tune the photon energy to satisfy Eq. 8.1 as indicated by Eq. 3.75. As is often the case in scientiﬁc discoveries, several people contributed to the origins of what we know today as tomography. Important mathematics advances relevant to tomography were made by Johann Radon in 1917 [Radon 1917, Radon 2007]. Some ideas of determining object slices from line projections were developed by Allan Cormack [Cormack 1963, Cormack 1964], though with only limited experimental demonstration. In January 1968, David De Rosier and Aaron Klug submitted the ﬁrst [De Rosier 1968] of several [Crowther 1970a, Crowther 1970b, Klug 1972] papers on tomography in electron microscopy, while in August 1968 Godfrey Hounsﬁeld of the company EMI in the UK submitted a patent ﬁling for medical CAT scan methods (his ﬁrst publication demonstrating its operation came several years later [Hounsﬁeld 1973, Ambrose 1973]). Cormack and Hounsﬁeld shared the 1979 Nobel Prize in Medicine, and Klug won the 1982 Nobel Prize in Chemistry.
8.1
Tomography basics In Fig. 8.1 we show the collection of a particular row from a 2D projection image p(x , z)θ as an object slice f (x, y)z is rotated. This row is called a slice projection p(x )z,θ , and it records the transmission image through a particular object slice f (x, y)z chosen along the z direction (the key tomographic variables are listed in Table 8.1). Mathematically, a slice projection is given by the famous Radon transform [Radon 1917, Radon 1
Veterinarians might request cat CAT scans; those who like sporting dogs might want to see the lab results.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.1 Tomography basics
Illuminating beam
Ƌz
Slice projection
Object slice
323
Object slice
3D object
2D projection
Figure 8.1 Schematic representation of the collection of 2D projection images from a 3D object
as it is rotated. A parallel beam is assumed to illuminate the object, with transmission images p(x , z)θ being recorded on an area detector at each angle θ. Because the object is rotated in θ about the z axis, each object slice f (x, y)z is imaged onto a separate line projection on one row z on the detector (with height Δz = Δdet ). Since z on the detector maps exactly onto z on the object, we can write the line projection at the particular detector row z as p(x )z,θ . The set of line projections (or onedimensional images in the x direction, as recorded by the detector) as the object is rotated by θ gives rise to a sinogram for one slice position z, as will be shown in Fig. 8.2. Table 8.1 Notation used for tomographic quantities, both for the Fourier representation
discussed in Section 8.1, and for the matrix representation discussed in Section 8.2, where it is explained in Eqs. 8.5 and 8.6 how multidimensional images are represented by 1D arrays.
Name Object Object slice at z Object slice pixel indices Object slice in Fourier space Projection image Slice projection at z Slice projection pixel indices Slice projection in Fourier space Projection matrix Rotation angles
2007] of p(x )z,θ =
∞ −∞
∞ −∞
Fourier formula f (x, y, z) f (x, y)z [x, y] F { f (x, y)z } p(x , z)θ p(x )z,θ x F {p(x )z,θ } θ
Matrix formula f fz x
Matrix dimensions
pθ pz,θ x
1 1 (from x ) 1
Wθ θ
2 (from fz ,pz,θ )
1 (from x) 1
f (x, y, z) δ(x cos θ − y sin θ − x ) dx dy,
(8.2)
where δ() is the Dirac delta function of Eq. 4.84. Our goal is to use this set of slice projections to reconstruct the 2D object slice f (x, y)z at the object row z, and then to combine all the reconstructed object slices to produce a 3D view of the object. Because we assume that there is no blurring of the projection image from row to row z , we can Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
324
Xray tomography
Oblique view of object slice Ƨ Ƌz
Feature
Ƌuy Redundant angles
ƋƧ
Ƨ Feature Feature
Topdown view of slice
Sinogram
Fourier space
Figure 8.2 Two ways to represent slice projections p(x )z,θ in tomography. At left is shown one
object slice f (x, y)z at a particular z location in the 3D object (Fig. 8.1), both in an oblique view of object slice and in a topdown view of slice. Diﬀerent rotation angles θ allow one to record diﬀerent slice projections p(x )z,θ as 1D images; the axes of these slice projections for a few speciﬁc rotation angles θ are shown in separate colors. Features in the object appear at rightofcenter slice projection positions x for half of the projection angles, and at leftofcenter positions x for the other half of the projection angles. One can assemble the set of all slice projections for all angles θ to form a sinogram, which is a 2D image with axes x for positions along each slice projection p(x )z,θ , and θ for each rotation angle. (The name sinogram is drawn from the fact that one feature will appear to trace out a sinusoidal curve along the vertical axis, as shown.) Because the slice projection taken at θ = 350◦ should just be the same as the slice projection taken at θ = 170◦ except that the x direction is reversed, rotation of the object over 180◦ is suﬃcient to construct the entire sinogram (this is why half of the sinogram is shown as having “redundant angles”). Each slice projection with N x pixels is assumed to be a true projection with no depth information about the object. This means that its Fourier transform yields information over N x pixels in the u x direction in Fourier space, and over only one pixel (the zerospatialfrequency pixel) in the orthogonal direction. This leads to a ﬁlling of information in the Fourier space representation of the object slice, or F { f (x, y)z }. This happens both for the selected, colored angles shown at left in topdown view of slice, and for any addition angles (shown in gray) over which slice projections were acquired, with an angular spacing Δθ. The Fourier space coordinates [u x , uy ] (with a spacing Δuy in the uy direction indicated) correspond to the object slice real space coordinates [x, y].
map the detector row z directly onto the object row z, or z = z. In addition, because we assume these rows are imaged separately, we can process each of the Nz object slices separately; this makes standard tomography data processing a trivially parallelizable problem in computing. If you were to view several photos of a simple 3D object taken from diﬀerent views, you might try to construct a 3D model by simply extending (or backprojecting) the object along your viewing direction in each view. For an object slice f (x, y)z at position z, you would take the 1D slice projections and backproject them within the f (x, y)z image array. This approach will be shown in the top row of Fig. 8.4. As more and more projections are obtained (that is, as Nθ is increased), you would expect the reconstructed Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.1 Tomography basics
325
object slice image f (x, y)z to improve, yet you would not have to wait for all projection angles to be acquired before starting to see an image appear. However, this simple realspace backprojection operation produces an inaccurate representation of the object slice for reasons that will be discussed below. As we rotate an object slice through θ, the set of slice projections p(x )z,θ we obtain can be put together in a 2D “image” in which the horizontal axis is the distance x along each slice projection, and the vertical axis is the rotation angle θ. This image is called a “sinogram” because a feature at one position in the object slice traces out a sine curve along the vertical axis, as shown in Fig. 8.2. Because the Radon transform of Eq. 8.2 is invertible, once we have assembled a sinogram from a complete set of slice projections p(x )z,θ we can, in principle, recover the object slice f (x, y)z .
8.1.1
The Crowther criterion: how many projections? How ﬁne a rotation step Δθ should we use in tomography? It is easiest to think about this question by considering tomographic data in transverse–axial Fourier space. Let’s consider one slice projection p(x )z,θ in the Fourier plane. As shown in Fig. 8.2, we remap the information from N x pixels in the slice projection to N x discrete spatial frequencies (see Section 4.3.3). Because the slice projection has no depth information (due to our assumption of obtaining pure projections), its extent in the orthogonal direction in Fourier space is precisely one pixel (the center, zerospatialfrequency pixel). If we then assemble the Fourier transforms of the slice projections F {p(x )z,θ } over all rotation angles θ, we wind up with a representation of the object slice in the Fourier plane F { f (x, y)z }, as shown at right in Fig. 8.2. If we can remap each Fourier transform of each slice projection F {p(x )z,θ } onto a regular grid in the Fourier plane (this Fourier plane interpolation operation is conceptually straightforward, but tricky in detail as we will see below), we can simply carry out an inverse Fourier transform F −1 {} of that remapped data in the coordinates [u x , uy ] to obtain the object slice image f (x, y)z . That is, we obtain a full 2D view of the slice from the set of 1D line projections, or a 3D view of the object from a set of 2D projection images. That is the beauty of tomography! For those who are familiar with crystallography, or with techniques for reconstructing images from diﬀraction plane intensities as will be discussed in Chapter 10, it is good to remember that our Fourier space information is obtained computationally from images. When one takes the Fourier transform of a 2D projection image, or a 1D line projection, one obtains complex information in the Fourier plane. The Fourier plane magnitudes (which when squared give diﬀraction intensities) and phases can be inverse Fourier transformed with no loss of information. In other words, there is no phase problem in tomography. If we acquire projections at too few rotation angles Nθ , we will not be able to ﬁll information into all pixels in the Fourier plane. These “unﬁlled pixels” will by default have an information content of zero, and this will lead to artifacts in the reconstructed image, as will be shown in Fig. 8.4. To avoid this problem and provide information at all points in Fourier space, one should make the angular step Δθ be about equal to the distance of one pixel in Fourier space or Δθ Δuy , so that there are no “unﬁlled pixel”
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
326
Xray tomography
(a) ion ject Pro
Ƌt
ge ima
(b)
y
(c)
ƋƧ
Ƨ
Nt
uy
uy
Rotation axis
x Top view of slice z ina
tion
Illum
ux
ux
Fourier space
Fourier space
Figure 8.3 Schematic representation of the Crowther criterion in conventional tomography. Each
slice of the object is mapped onto one row of a detector, as shown in Fig. 8.1. One then obtains 1D pure projection images of the object with N x transverse pixels of width Δt in the transverse direction (a), and a depth of precisely one pixel at zero spatial frequency in the axial direction (because there is no way to distinguish between diﬀerent axial positions in a pure projection). For an angle θ = 0◦ , the Fourier transform of this image yields an array with N x = N x pixels in the transverse or u x direction and Nz = 1 pixels in the axial or uz direction in transverse–axial Fourier space (b). As the object is rotated, so is the information obtained in Fourier space, so (u x , uz ) Fourier space is ﬁlled in as shown in (c). The Crowther criterion of Eq. 8.3 is eﬀectively a statement that one must provide complete, gapfree coverage of all pixels around the periphery in transverse–axial Fourier space. Figure from [Jacobsen 2018].
gaps between polar angle projection slice lines at the periphery (see Fig. 8.3). Now if we have square object slices with N x pixels on a side, we can write r = N x /2 and note that the distance around the periphery of the array edge is 8r. But in fact the true periphery of our sampling is a circle, not a square, so the correct number of angular sampling points required is not 8r but 2πr. That is, if the object slices have N x × N x pixels, we need to acquire projections over not 4N x but 4(2π/8)N x angles, giving a requirement for Nθ = πN x if rotating the specimen over 360◦ . Because projections taken 180◦ apart yield the same information in the pure projection approximation, we in fact need only Nθ =
π N x 2
(8.3)
rotation angles. The condition of Eq. 8.3 is known as the Crowther criterion [Crowther 1970b]. One can be a bit more exact and make these angles exactly match rectangular voxel positions at the outer surface of a 3D data cube (thus changing the spacing to be even not in θ but in 1/ cos θ), and for data sampled in this way one can obtain a further improvement in the tomographic reconstruction by using polar Fourier transforms in an approach called equal slope tomography [Miao 2005]. One can also relax the Crowther criterion by a factor of 1/NA when using multislice methods (Section 10.5) to reconstruct NA axial or depth planes from one viewing direction [Jacobsen 2018]. Finally, as we will see when we compare ﬁltered backprojection with algebraic reconstruction tomography methods in Section 8.2, one can in fact frequently get away with far fewer Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.1 Tomography basics
327
than the Crowther number of Nθ projections when using algebraic approaches with their additional object constraints, though it is important to emphasize that these constraints only minimize artifacts that would otherwise appear due to the missing information without actually substituting for that missing information.
8.1.2
Backprojection, ﬁltered backprojection, and gridrec Examination of the Fourier space representation of an object slice F { f (x, y)z } shown at right in Fig. 8.2 makes clear another detail in Fourierbased tomographic reconstruction methods. The overlap of information from each projection is very high at low spatial frequencies [u x , uy ] (near the center of the Fourier space representation), and it becomes low or even undersampled at high spatial frequencies (near the periphery of the Fourier plane representation, with undersampling taking place when the Crowther condition of Eq. 8.3 is not met). Therefore it is essential to correct for these variations in data weighting by applying a radially dependent ﬁlter g(ur ) (sometimes called a “ramp ﬁlter”) g(ur ) = 1/ur .
(8.4)
to the data in the Fourier plane before carrying out the inverse Fourier transform operation to recover the object slice f (x, y)z . This ﬁlter normalizes out the weighting of all spatial frequencies, leading to the ﬁltered backprojection method. However, we must exercise care, because unthinking application of the ﬁlter of Eq. 8.4 will lead us to magnify the presence of noise in the reconstructed image. Recall examples like that of Figs. 4.19 and 4.49, where we found that image signals tend to decline with spatial frequency as u−a r where a 3–4, while noise due to Poisson statistics has a “ﬂat” ﬂoor independent of spatial frequency u. Therefore, just as in image deconvolution (Section 4.4.8), we need to multiply the backprojection ﬁlter function g(ur ) with either a Wiener ﬁlter W(u) like in Eq. 4.207, or with some other function such as a Hamming ﬁlter [Harris 1978]. Another thing to note in the Fourier space representation of an object slice shown at right in Fig. 8.2 is that the Fouriertransformed slice projections F {p(x )z,θ } provide data in a 1D array along a polar angle, while the Fourier transform of the object slice F { f (x, y)z } is on a Cartesian grid. The pixels in one representation do not sit exactly on top of the pixels in the other, so some scheme of remapping the data must be found. If one were to do this in real space, a simple bilinear or cubic interpolation might suﬃce with only localtoonepixel errors present; however, one pixel in Fourier space contributes to all pixels in real space, so imperfect interpolation can aﬀect the entire object slice image f (x, y)z . The approach used to handle this Fourier space grid mapping is one inspired by work in synthetic aperture radio astronomy [Brouw 1975], which was then brought over to xray tomography [O’Sullivan 1985]. The idea is this: one wishes to convolve the Fourier space polar line projection data F {p(x )z,θ } with a smooth, but limitedinextent convolution kernel W(u x , uy ) that will provide a data sampling on the Cartesian grid. Now the discrete Fourier transform has the property that the real space data are assumed to be cyclically periodic (Eq. 4.95), when in fact we normally assume that the object ﬁts fully within the slice projection p(x )z,θ ; in other words, we assume Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
328
Xray tomography
Backprojection
Filtered backprojection
MLEM
Figure 8.4 Comparison of simple backprojection, ﬁltered backprojection using gridrec, and the
maximum likelihood expectation maximum (MLEM) algorithm [Richardson 1972, Lucy 1974] (Section 8.2.2), which is one of a class of algebraic, iterative reconstruction algorithms. In all cases, projections were generated from a Shepp–Logan phantom [Shepp 1974], with the number of rotation angles θ varying from Nθ = 4 at left to Nθ = 256 at right. Since this phantom was digitized over N x = Ny = 256 pixels, the reconstructions for Nθ = 256 at right approximately satisfy the Crowther criterion of Eq. 8.3, so good reconstructions are obtained in both ﬁltered backprojection and MLEM cases. When fewer angles are recorded, backprojection and ﬁltered backprojection show severe artifacts which could be erroneously interpreted as image features; in such cases, algebraic iterative reconstruction methods such as MLEM are strongly preferred.
that the object can be contained within a compact support. For this reason, one also desires that the real space representation of the convolution kernel W(x ) be able to greatly suppress the line projection at its edges (since a narrow function in Fourier space produces a broad function in real space, this condition is reasonably easy to satisfy). So as to have constant results at all polar angles θ where one calculates the 1D Fourier transform of the slice projection F {p(x )z,θ } and maps it into the 2D Fourier grid of F { f (x, y)z }, the kernel should be nearly symmetric in 2D, and separable so that W(x, y) = W(x)W(y). A good choice is to use a prolate spheroid function,2 or a polynomial approximation to such a function for computational speed [Nuttall 1981, Xiao 2001]. After regridding, the eﬀects of this convolution kernel can be removed by dividing the object slice f (x, y)z by the kernel W(x s , y s ) [O’Sullivan 1985]. This fast computational approach for Fourier space regridding and ﬁltered backprojection reconstruction has been 2
At the Advanced Photon Source in the USA, this prolate spheroid might carry a Chicago Bears football logo and have rather pointy ends, while at the Australian Synchrotron it might carry St. Kilda Saints markings and have more rounded ends.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.2 Algebraic (matrixbased) reconstruction methods
329
named the gridrec algorithm [Dowd 1999], with additional papers describing its performance [Marone 2012]. The combination of the polar weighting ﬁlter function g(ur ) of Eq. 8.4, and the gridrec regridding operation, make ﬁltered backprojection reconstruction approaches an excellent choice for many uses. The computation has no iterative sequences, and furthermore it exploits the fast Fourier transform (FFT) algorithm, which involves ∼2N 2 log(N) computational steps to process an N 2 array (rather than the N 4 steps that the bruteforce discrete Fourier transform would require; see Eq. 4.89). Gridrec delivers fast, highquality reconstructions, but only if one has met the Crowther criterion (Eq. 8.3) for angular sampling. If instead one has a smaller number Nθ of projection directions, both simple backprojection and ﬁltered backprojection deliver lowquality tomographic reconstructions. The good (large Nθ ) and the bad (small Nθ ) results for ﬁltered backprojection via gridrec are shown together in Fig. 8.4.
8.2
Algebraic (matrixbased) reconstruction methods The reconstruction methods described above are based on a Fourier optics understanding of the tomography reconstruction problem. An entirely diﬀerent approach was formulated by Gordon and Herman in 1970 (their ﬁrst paper in fact appeared in print the following year [Gordon 1971]), leading to the development of algebraic reconstruction tomography (ART) [Gordon 1970] as the ﬁrst of a class of iterative matrixbased methods. Normally we would rewrite the object slice f (x, y)z as a twodimensional array in the coordinates [x, y]. However, we can also choose to index the 2D pixels in this array by a single index x, which follows a sequence (i x , iy ) of x = {(0, 0), (1, 0), . . . , (N x − 1, 0), (0, 1), (1, 1), . . . , (N x − 1, Ny − 1)} = {0, 1, . . . , (N 2 − 1)},
(8.5)
where in the second form of x we have assumed that N x = Ny = N. This allows us to write the object slice as a 1D array fz with N 2 elements. Let us also write the slice projection p(x )z,θ as a 1D matrix pz,θ , where the indices i x go as x = {0, 1, . . . , (N x − 1)}.
(8.6)
How are these two matrices related to each other? Well, one pixel in the slice projection involves a weighted sum of a subset of pixels from the object slice as shown in Fig. 8.5, which naturally involves the dimensionality of both the object slice x and the slice projection x . That is, there is a 2D projection matrix Wθ with N x × N 2 matrix elements that allows us to relate the object slice fz to the slice projection pz,θ as pz,θ = Wθ fz
(8.7)
for a given angle θ. The projection matrix Wθ is illustrated in Fig. 8.5. Let’s consider the deceptively simple expression of Eq. 8.7 in greater detail. Because the 2D projection matrix Wθ is something that can be calculated exactly, we can obtain the N x values of the slice projection pz,θ if we known the N 2 values of the object f. Now Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
330
Xray tomography
Object slice
Xray beam direction
ce Sli
for one slice projection pixel
pro n tio
jec Figure 8.5 Algebraic reconstruction tomography (ART) methods are matrixbased methods for
recovering the object slice fz via solution of the equation pz,θ = Wθ fz for all values of θ (Eq. 8.7). Shown here is the weighting matrix Wθ that tells which of the x pixels in the 2D object slice fz contribute to one of the x pixels in the 1D image of the slice projection pz,θ . The weighting matrix Wθ will be sparse, because for any one pixel in fz there will only be a few object pixels that appear along the projection direction (the red column through the object slice in the ﬁgure).
there is no point to having N x > 2N due to the Nyquist sampling theorem of Eq. 4.88; as a result, we have an overdetermined problem in that the calculation of pz,θ = Wθ fz of Eq. 8.7 involves obtaining N x values in fz from N 2 observations in pz,θ . However, the inverse is certainly not true: even if we could obtain the pseudoinverse W+θ of the matrix Wθ and thus write fz = W+θ pz,θ ,
(8.8)
we would not arrive at an unambiguous answer, since we would be trying to determine N 2 values from N x measurements. That is, we would have far more unknowns than knowns. Obviously as slice projections from more and more projection angles Nθ were added to our data, we could get closer and closer to a deterministic measurement. However, while it is easy to manipulate two equations to solve for two unknowns, it far more diﬃcult to algebraically manipulate N x Nθ knowns to solve for N 2 unknowns. The better solution is to turn to numerical optimization approaches, which leads us to a little detour.
8.2.1
Numerical optimization Matrix equations of the form pθ = Wθ f of Eq. 8.7 appear in a wide range of problems (we shall see another example in Section 9.3). As a result, an entire branch of applied mathematics has arisen around their solution, calling them numerical optimization prob
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.2 Algebraic (matrixbased) reconstruction methods
norm
331
norm
Figure 8.6 Common “norms” used in optimization problems. The onenorm or 1 (Eq. 8.12) has
a sharp minimum, while the twonorm or 2 (Eq. 8.13)) has a softer minimum. When combining a main cost function C0 with several regularizers λi Ci such as in Eq. 8.15, the twonorm provides a better balance between various costs towards ﬁnding an overall minimum.
lems. A signiﬁcant subset of this literature deals with the generic matrix equation y = Ax
(8.9)
where y is usually some measurement, A is a matrix modeling the measurement process (with a pseudoinverse A+ ), and x is a model of the object. It is worth noting that the discrete Fourier transforms can be written as a matrix operation (Eq. 4.96). In the algebraic reconstruction tomography example seen above, we were not able to solve the equivalent (Eq. 8.8) of x = A+ y exactly. This is characteristic of situations where numerical optimization methods are used. We might reasonably seek to ﬁnd a minimum residual to the diﬀerence between the two sides of the equation, or a minimum value of min y − Ax p
(8.10)
generally, or in the case of algebraic tomography methods based on Eq. 8.7, a minimum of min pz,θ − Wθ fz p ,
(8.11)
where the subscript p indicates the speciﬁc measure of the minimum, as will be illustrated below. In other words, we can deﬁne a datamatching cost function C0 or objective function that should be minimized. The function C0 is often formed from one of several norms [Tarantola 2005], including yi − Axi  (8.12) 1 : C0 = y − Ax1 = i
⎞1/2 ⎛ ⎟ ⎜⎜⎜ 2⎟ ⎜ 2 : C0 = y − Ax2 = ⎜⎝ yi − Axi  ⎟⎟⎟⎠
(8.13)
i
where 1 is called the onenorm and 2 is called the twonorm. These diﬀerent norms drive diﬀerent behaviors in the convergence of the solution, as illustrated in Fig. 8.6, so that the twonorm 2 is usually preferred. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
332
Xray tomography
While we may not know exactly what the object x is, we may have some knowledge of its characteristics. In coherent diﬀraction imaging (Chapter 10) as solved using an optimization approach, this might include knowledge that the object ﬁts entirely within a subregion of a sampled realspace array from which we have diﬀraction intensities; this is known as a ﬁnite support constraint. In spectromicroscopy (Chapter 9), we might know that materials with known spectra comprise part of the specimen, so these spectra should play a role in the solution [Mak 2014]. Yet another constraint can be nonnegativity [Lee 1999], such as in the case of processes A that are based on xray absorption where negative absorption (that is, addition of energy to the transmitted beam) would violate basic physics. For those items of prior knowledge that can be quantiﬁed in terms of additional error terms Ci , we can incorporate them into an overall cost function by adding them as regularizers. Example regularizers include the following: • Sparsity: one may have reason to favor models of the specimen x which are “sparse,” meaning that many entries are zero. In the case of spectromicroscopy, one may have regions of a specimen that are phasesegregated so that certain pixels in the image contain only one material with one spectroscopic signature. Ideally one would then like to minimize the zeronorm or 0 , which measures how many entries in x are nonzero. Since this turns out to be an “NPhard”3 regularizer to minimize in optimization [Natarajan 1995], it has been shown that the onenorm 1 serves as a good proxy for sparsity [Tibshirani 1996], leading to a regularizer of Csparsity = x1 . • Total variation (TV): the most “safe” or “conservative” version of a reconstructed object x is the one that has the least amount of structural variation, consistent with what is demanded by the recorded data. Therefore a common regularizer is to minimize the total variation V of the object, which is deﬁned as V=
i=N−2
xi+1 − xi .
(8.14)
i=0
The concept of TV minimization4 ﬁrst arose in Fourier analysis [Jordan 1881], and there are a variety of approaches to its deﬁnition in multiple variables [Clarkson 1933]. The application of TV regularization in an optimization approach often allows one to obtain reconstructed images even in cases where one might seem to have incomplete data, such as in compressive sensing [Cand`es 2006, Donoho 2006], where one ﬁrst transforms the data onto a basis set where its information is nicely sparsiﬁed and separable (principal component analysis provides one such transform, as will be discussed in Section 9.3.1). The basic error minimization cost of C0 = y − Ax2 (Eq. 8.13) might not be on the same numerical scale as any of the separate regularizer costs. As a result, the net cost is written using a set of regularization terms λi for each of the regularizers, leading to a 3 4
NPhard means nondeterministic polynomialtime hard; think “slow to compute.” TV minimization is almost always a good approach, unless there’s something really good on television tonight!
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.2 Algebraic (matrixbased) reconstruction methods
333
Cost function Ctotal
Local minima
Global minimum
Variable being adjusted Figure 8.7 Local versus global minima in a cost function Ctotal . One of the challenges of
optimization strategies involves avoiding traps in local minima. Numerous strategies can be applied to avoid being trapped in a local minimum, including multiscale optimization and simulated annealing.
total cost of Ctotal = C0 + λ1C1 + λ2C2 + . . .
(8.15)
Think of making a multiitem purchase where each individual item i is bought using a separate currency (dollars, euros, yen, pesos, and so on); obviously the total cost must include factors λi that account for currency exchange rates in order to know the true cost in a single currency. As one learns in calculus, the minimum of a simple function f (x) can be found by setting its ﬁrst derivative to zero, and ensuring that its second derivative is negative. However, when it comes to ﬁnding a minimization of Eq. 8.15, we have a plethora of variables—each of the pixel values in an object slice fz in the example of tomography— even before we consider any regularizers λiCi . Therefore, taking a derivative along a single variable is not guaranteed to lead us in the right direction, and we must also take care not to get trapped in any local minima as we try to ﬁnd our way to a global minimum (Fig. 8.7). In fact, because the problem is usually underdetermined, local minima might be abundant and furthermore there may be only small diﬀerences between many of the local minima and the global minimum. Furthermore, the solution might be so unknown at the outset that one starts with solution parameters chosen at random (in fact, using incomplete knowledge rather than random starts can often lead one to getting trapped in local minima; random starts are often better). So how do we tweak a myriad of variables to ﬁnd a global minimum to Ctotal when we start with random numbers? Some of us might throw up our hands in exasperation, but applied mathematicians will rub their hands in glee: this is a rich playground! Of course there are many books written on the topic (see for example [Nocedal 2006]), so we will make only a few remarks here. • The calculations become iterative. From the random start, one tries to tweak the solution towards smaller individual cost terms C1 , C2 , and so on in the total cost Ctotal of Eq. 8.15. But since tweaks aﬀecting one cost term will usually aﬀect another, one must iterate through this process and see how the cost function becomes minimized. Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
334
Xray tomography
• The simplest approach is to go in the direction of reducing one cost term C1 , and then in the direction of reducing another cost term C2 , and so on, in a successive optimization approach. However, one can obtain better convergence at the cost of more calculations by using a simultaneous optimization approach where the vector sum of all individual minimization steps is used. The successive and simultaneous optimization schemes are compared against each other in Fig. 8.8. While simultaneous optimization involves more calculations, it also involves a more direct path to the global minimum, without zigzagging in a way that could lead one into local minima. It turns out that the original ART algorithm [Gordon 1970] involves a successive optimization approach (which can be traced back to work by the Polish mathematician Stefan Kaczmarz [Kaczmarz 1937]), while the simultaneous iterative reconstruction tomography (SIRT) algorithm [Gilbert 1972] involves a simultaneous optimization approach. These two fundamental diﬀerences in approach arise again in coherent diﬀraction imaging (Chapter 10), where the classical error reduction algorithm [Fienup 1978] inspired by Gerchberg and Saxton [Gerchberg 1972b] and its variants such as the hybrid input–output algorithm [Fienup 1982a] are successive approaches, while the diﬀerence map algorithm [Elser 2003] represents a simultaneous approach. • One can ﬁrst downsample the data to lower resolution, and carry out a more rapidly computed optimization where many of the local minima might be “blurred out,” after which optimization can be carried out on the full dataset to reﬁne the solution. This is now known as multiscale optimization, though multiresolution image processing has an older history [Rosenfeld 1984]. If one thinks of this in Fourier space, one starts by reconstructing the image only at low spatial frequencies, and then one gradually “marches” out to higher spatial frequencies in iterative reconstructions in an approach called “frequency marching” [Chen 1999]. • If you took a calculus class, you probably learned how to use the chain rule to diﬀerentiate a more complex function in terms of a series of elementary diﬀerentiation operations. Well, computers have taken calculus classes too! Automatic diﬀerentiation [Griewank 2008] refers to an approach in which a computer applies the chain rule to the mathematical operations of a cost function as written in computer code. This gives the computer an eﬃcient way to estimate a steepest descent path, and it is available in various toolkits such as TensorFlow by Google. In the case of iterative phase retrieval, as will be discussed in Chapter 10, this approach was ﬁrst suggested by Jurling and Fienup [Jurling 2014] and later applied to xray ptychography data reconstruction [Nashed 2017, Kandel 2019]. Computer optimization problems are strongly related to machine learning and pattern recognition. Consider the example of artiﬁcial neural networks, where a set of “stimuli” x produce various weighted responses A in various “neural” pathways to produce diﬀering outcomes y. If one has a set of known correct outcomes y from a set of known inputs x, one can use optimization to ﬁnd the response matrix A in y = Ax and approaches of this sort have been applied to tomography reconstruction problems [Pelt 2013, Parkinson 2017]. As an example, if one has a “training set” of images x of Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.2 Algebraic (matrixbased) reconstruction methods
x2
Simultaneous optimization
C1
C1
Successive optimization
335
x1 Initial x2
guess
x1 Initial
guess
x3 x4
C2 x5
x4
C2
x3
Figure 8.8 When trying to solve for the global minimum between multiple terms in a total cost
function Ctotal , one approach is to take a step towards the minimum Demonstration of the iteration process for ART and SIRT on a two dimensional solution space. ART converges rapidly. SIRT has somewhat slower convergence but is more robust to measurement noise.
people which one knows can be sorted into three categories y (images of Jane, or of Diane, or of Robert), one can “train” the neural network by ﬁnding the matrix A using optimization. With this “trained neural network,” one can then supply a new dataset x of images and have high conﬁdence that y = Ax will recognize the new images as being of Jane, Diane, or Robert—though sometimes one can spoof [Sharif 2016] machine learning! However, this is a gross oversimpliﬁcation of how computational neural networks (CNNs) work. For example, CNNs might mimic the brain by having multiple computational “layers” with decision points made after certain layers, in which case the operations of a CNN clearly are not represented by a single matrix A nor with linear operators. Problems where one would like to solve y = Ax by minimizing y − Ax will appear again in Section 9.3.2, and in Section 10.3.6.
8.2.2
Maximum likelihood and estimation maximum Another approach to solve the matrix equation of pz,θ = Wθ fz in tomography (Eq. 8.7) is to seek to maximize the likelihood of the solution for the object slice fz , rather than minimizing a cost function Ctotal . If the object is assumed to be f (x) and the ideal observed image is p(x ˜ ) in the onedimensional case, one should be able to write the observed image [Lucy 1974, Lucy 1994] as f (x) g(x x), dx, (8.16) p(x ˜ )= where g(x x) is the probability of getting a signal at the observed image pixel p(x ˜ ) based on the object signal at pixel x. That is, g(x x) is a blurring function in the measurement process, which could be a Gaussian blur given by Eq. 4.14 as P(x , x), or it could be some more complicated blurring function such as an intensity point spread
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
336
Xray tomography
function of an imaging optic (Section 4.4.3). However, we should also assume that the actual observed image p(x ) has noise n(x ) so that p(x ) = p(x ˜ ) + n(x ),
(8.17)
where n(x ) might often be Poisson noise due to limited photon statistics (Section 4.8.1). It would therefore be more likely that sharp ﬂuctuations in the actual observed image p(x ) are due to photon statistical ﬂuctuations n(x ) rather than sharp structure in the object f (x). In other words, the blurred image with the maximum likelihood (in a Bayesian statistics sense [Richardson 1972]) is p(x ˜ ). Using a statistical method called expectation maximization (EM) [Dempster 1977] to seek the solution with maximum likelihood, one can write the next iterate j + 1 of the guess of the object f (x) j using a convolution notation [Fish 1995] as + , p(x) f j+1 (x) = f j (x) ∗ p(−x) . (8.18) f j (x) ∗ g(x) This update rule is at the heart of the maximumlikelihood expectation maximization (MLEM) method as applied to tomography [Shepp 1982], though in fact there is a great diversity of notation used to write Eq. 8.18, as shown in Appendix C online at www.cambridge.org/Jacobsen. The update rule of Eq. 8.18 was ﬁrst proposed by Richardson [Richardson 1972] and by Lucy [Lucy 1974] based on a Bayesian statistics approach with no assumption on the characteristics of the noise n(x ). However, the EM approach stipulates Poisson noise, yet arrives at the same update rule of Eq. 8.18; this has led to the statement [Carasso 1999] that “the equivalence of these two methods is curious in view of their diﬀerent underlying hypothesis.” In addition, the act of dividing the observed image p(x) by the convolution of the present guess of the object fi (x) and the probe or blurring function g(x) in Eq. 8.18 can lead to noise ampliﬁcation in the reconstruction process if too many iterations are carried out. In real space there may be true “zeroes” in the object, and moreover the Weiner ﬁlter approach used to avoid dividebyzero errors in Fourier deconvolution (Section 4.4.8) is not applicable in real space. For this reason, MLEM is often combined with regularization schemes such as minimizing total variation (Eq. 8.14).
8.3
Analysis of reconstructed volumes The goal of tomography is to arrive at a 3D representation of the specimen, often with each voxel quantiﬁed in terms of its linear absorption coeﬃcient (Section 3.3.3) or its electron density (Eq. 10.72). What does one do with these data? The simplest method is to view object slices f (x, y)z in succession in a movie, where each slice is a grayscale image in the movie sequence. However, often this is only an intermediate step in the analysis of a specimen. If the specimen is composed of materials with distinctly diﬀerent densities, one can form isodensity surfaces (surfaces that contain a set of voxels with approximately the same numerical values for optical density) and then “spin”
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.3 Analysis of reconstructed volumes
Reconstructed optical density
337
Overlay of segmentation
Segmentation alone Figure 8.9 Segmentation of tomographic reconstructions into identiﬁable features allows one to
carry out important quantitative analyses of their volume, distribution, and connectivity. However, renderings of segmented volumes are usually represented by clear boundaries between features, which can give an impression of higher spatial resolution and feature separation than might actually exist in the reconstructed optical or electron density. This is illustrated here in a subregion of an xray tomography dataset of a charcoal sample [Vescovi 2018], where the raw reconstructed optical density is shown at top left. Otsu thresholding [Otsu 2007] was ﬁrst applied to the data to ﬁnd ﬁnd three somewhat distinct values of optical density in the reconstruction, after which a 3D connectedcomponent analysis was used to remove fewvoxel blocks. The resulting segmentation masks were then smoothed by convolution with a Gaussian kernel, after which isodensity surfaces (surfaces at threshold values of each of the three classes) were generated and rendered in separate colors for each class of optical density using Vaa3D [Peng 2010]. These isosurfaces are shown overlaid on the optical density data at top right, and by themselves at bottom. The point of this illustration is that one can gain a false impression of the sharpness and feature separation in a tomographic dataset if only isosurface renderings are shown; representative grayscale images of the actual reconstructed optical density should also be shown to provide full disclosure of the characteristics of the tomographic reconstruction. Figure made using images provided by Ming Du, Northwestern University.
the resulting 3D rendering on a computer display to obtain an amazingly realistic view of the object. There is a wide range of more sophisticated approaches to segmentation (such as watershed methods, and adaptive thresholding) which have been used for some Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
338
Xray tomography
time in electron tomography [Sandberg 2007] and in medical imaging [Taha 2015], and the throughput of these approaches has been improved through the use of graphical processing units (GPUs) [Smistad 2015]. More recently, deep learning or multilayered convolutional neural network approaches [LeCun 2015] have emerged as oﬀering improved performance following “training” by manual annotation of features in a small number of images; progress in this area is summarized in a recent review [Rehman 2018]. Application of these various methods has enabled quantitative studies of the pore structures which determine reactant access in catalysts [da Silva 2014], the changes in the lacunar system of bone with diﬀerent osteoporosis drug treatments [Mader 2013], and the volume distribution of subcellular organelles [Parkinson 2008]. A tutorial on segmentation in soft xray tomography of cells is available in the Journal of Visualized Experiments [Darrow 2017]. While segmentation gives one an allimportant ability to quantify volumes, connectivity, and other characteristics of distinct features within a specimen, one should be aware that renderings of the surface boundaries of segmented regions also give one the appearance of sharper boundaries within the specimen than is justiﬁed based on the examination of the actual reconstructed density. In other words, the segmented volume can give the appearance of higher spatial resolution and feature separation than is present in the actual reconstructed volume (Fig. 8.9).
8.4
Tomography in xray microscopes As was noted at the beginning of this chapter, the high penetration power of X rays makes tomography an obvious method to implement in xray microscopes, because without it the overlap of structures in depth can make 2D images diﬃcult to interpret. Examples of the use of tomography in xray microscopy are shown in Figs. 12.1, 12.6, and 12.9 for transmission imaging, while ﬂuorescence tomography is shown in Fig. 12.3. One very simple yet powerful way to carry out highresolution xray tomography is to use a scintillator screen and a high numerical aperture (N.A.) microscope objective to image the xray intensity distribution downstream of an object onto a visiblelight camera [Flannery 1987]. This allows one to exploit the rapid development of highframerate visiblelight cameras, so that projection images can be obtained at sustained frame rates rates of 30 kHz [Mokso 2017] and above, thus enabling xray studies of dynamic processes in materials (see Fig. 8.10; this involves complications in data processing for timeevolving samples [Ruhlandt 2017], while opening new opportunities in materials science [Villanova 2017]). The use of scintillators limits the spatial resolution to something just below 1 μm where several complicating factors arise as discussed in Section 7.4.7. One can improve the spatial resolution further by using a scintillator/microscope objective/visiblelight camera system in the point projection geometry discussed in Section 6.2, an approach that is employed in several commercially available xray microtomography systems.
Downloaded from https://www.cambridge.org/core. Stockholm University Library, on 29 Oct 2019 at 01:38:47, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.009
8.4 Tomography in xray microscopes
339
7RPRJUDPWLPHVHFRQGV
106
104
102
100
102 0.01
0.1
1
10
9R[HOVL]HѥP Figure 8.10 A sampling of experimental results reported for xray tomography at various voxel sizes (approximately equal to the 3D spatial resolution in most cases), and the time required to acquire one tomogram. Because of the increased photon ﬂux required for highresolution imaging (Section 4.9.1), there is usually a tradeoﬀ between resolution and tomography acquisition time. Data compilation organized by Francesco De Carlo of Argonne Lab, with similar plots presented and discussed elsewhere [Maire 2014, Villanova 2017].
It took a longer time for nanoscale xray tomography to be realized, and diﬀerent developments took place with scanning and fullﬁeld microscopes: • The ﬁrst demonstration of xray nanotomography used a scanning transmission xray microscope (STXM; Section 6.4) to acquire Nθ = 9 projections over a 105◦ tilt range at 345 eV, delivering images of a twoplane test structure at 100 nm xy and 600 nm z resolution [Haddad 1994]. Three dimensional imaging was combined with spectromicroscopy (Chapter 9) for the ﬁrst time in STXMs, ﬁrst by imaging a set of serial sections of a 3D object [Hitchcock 2003] (which, of course, is not the same as tomography) and then by rotating an object over Nθ = 61 projections over 180◦ as it was imaged at two XANES resonance energies (530.0 and 532.2 eV) near the oxygen K edge [Johansson 2007]. • Transmission xray microscopes (TXMs; Section 6.3) generally oﬀer much faster imaging times, therefore making it easy to collect more projections. As a result, the ﬁrst TXM tomography demonstration [Lehr 1997] involved an increased number of projections (Nθ = 33) over a 160◦ tilt range at 517 eV, delivering 50 nm xy resolution. Most transmission tomography in xray microscopes today is done using TXMs. • Soft xray microscopes operating at E1 we can make the approximation sE μm,2 μm,1 · (E1 /E2 ) sE E1 = , (9.10) μm,1 μm,1 E2 in which case Eq. 9.9 becomes
ρx =
I1 ln I0,1
s sE I2 E1 E E2 − ln E2 I0,2 E1 . μx,2 − μx,1
(9.11)
Finally, if (μx,2 − μx,1 ) is accurately known and all nonspecimendependent background signals are properly subtracted, the error in the measurement will be dominated by photon statistics. Let us assume that (ρx μx ) (ρm μm ) so that most of the absorption is due to the matrix. In this case we can write I1 exp[−ρm μm,1 ]I0,1 .
(9.12)
If we assume that the illumination is provided by a mean exposure of n¯ photons, the ratio I1 /I0,1 becomes √ exp[−ρm μm,1 ](¯n ± n¯ ) I1 , (9.13) = √ I0,1 n¯ ± n¯ where we have used the Gaussian √ approximation for Poisson statistics discussed in Section 4.8.1. Assuming the errors n¯ to be uncorrelated so that they add in a rootsumofsquares fashion, Eq. 9.13 becomes I1 exp[−ρm μm ](1 ± 2/¯n) (9.14) I0,1 and (within the simplifying approximations we have made) Eq. 9.11 becomes
ln exp[−ρm μm,1 ](E1 /E2 ) sE (1 ± 2/¯n) − ln exp[−ρm μm,2 ](E2 /E1 ) sE (1 ± 2/¯n) ρx μx,2 − μx,1 sE ln(E1 /E2 ) + ln(1 ± 2/¯n) + sE ln(E1 /E2 ) − ln(1 ± 2/¯n) μx,2 − μx,1 √ 2sE ln(E1 /E2 ) ± 2/ n¯ , (9.15) μx,2 − μx,1 where in the last step we have made use of the Taylor series approximation ln(1 + x) x for x 1, and added uncorrelated errors in a rootsumofsquares fashion. Separating this into the measurement and its error, we have √ 2sE ln(E1 /E2 ) 2/ n¯ ± , (9.16) ρ x ± Δρ x μx,2 − μx,1 μ x,2 − μx,1 Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.1 Absorption spectromicroscopy
2.0
Tyrosine 1.5
Optical density
Absorption cross section Ʊ
n=1 Electronic Continuum (core level) state (fully ionized)
ƋE
K edge
EK
Excitations from core level Continuum resonances
1.0
0.5
Photon energy
355
0.0 280
(Ionization potential)
Idealized carbon absorption edge
290
300
310
Photon energy (eV)
Figure 9.2 Xray absorption nearedge structure (XANES) or nearedge xray absorption ﬁne
structure (NEXAFS) in xray absorption spectroscopy. A schematic representation is shown on the left, and a spectrum from the amino acid tyrosine is shown at right (data from [Kaznacheyev 2002]). An xray absorption edge occurs at a photon energy suﬃcient to completely remove a corelevel electron (Fig. 3.3). Because chemical binding happens at energies within a few eV of the Fermi energy (Box 3.2), there can be unoccupied or partially occupied electronic states just a few eV below an absorption edge. In XANES, a photon with an energy below the ionization potential can promote an innershell electron into this electronic state, leading to an absorption resonance at an energy a few eV below the absorption edge. In fact, one can have many resonances of this type, for example as shown in Fig. 9.5. There can also be continuum resonances above the absorption edge; electrons that get promoted into these states don’t stay there very long, leading to broad resonances due the Heisenberg uncertainty principle (ΔE) · (Δt) ≥ (Eq. 3.24).
so the fractional error in the measurement is given by Δρx 1 1 . √ ρx n¯ sE ln(E1 /E2 )
(9.17)
With sE = 3 and E1 E2 , the fractional error reaches 1 percent when one uses n¯ = 1100 photons per pixel in the illumination. One can increase the illumination n¯ to achieve higher sensitivity provided all background signals are properly controlled, but this illustrates why it is hard to measure concentrations below about 1 percent using diﬀerential absorption.
9.1.2
Living near the edge: XANES/NEXAFS The schematic representation of an xray absorption edge shown in Fig. 3.3 represents one basic transition for an isolated atom. However, atoms are neither that simple, nor are they always so lonely, leading to two eﬀects in xray absorption spectroscopy. The ﬁrst is XANES (xray absorption nearedge structure, a term that comes from the EXAFS community), which is also called NEXAFS (nearedge xray absorption ﬁne structure, a term favored by chemists who consider the discrete electronic states involved). The second is EXAFS (extended xray absorption ﬁne structure; Section 9.1.7). While XANES
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
356
Xray spectromicroscopy
Orbital configuration
551.0 532.0
(O1s1 )
(C1s1 )
305.0 287.4
C 1s
(C1s1Ʊ ) Experiment
(O1s1Ʊ )
x10
288
292
Theory
296
300
304
308
Energy (eV)
O 1s
Energy (eV) 0.0
Experiment
Theory
532 Ground state
536
540
544
548
552
Energy (eV)
Figure 9.3 Carbon and oxygen edge innershell spectra of carbon monoxide (CO) in the gas
phase, measured under electron scattering conditions equivalent to xray absorption [Hitchcock 1980] (higher resolution xray absorption spectra have also been published [Domke 1990]). Spectra and upperlevel orbital plots from ab initio quantum calculations performed using the program GSCF3 [Kosugi 1980, Kosugi 1987] are also shown. The state energy diagram in the center shows the four main transitions. The hatched line indicates the corelevel ionization potential.
happens within a few eV of the ionization potential, EXAFS can extend to energies several hundred eV higher. Fortunately for our existence, atoms can form chemical bonds, and those bonds have energies of about 1.5–11 eV (see Box 3.2). This produces available electronic states that are a few eV below the last occupied atomic orbital (as set by the Fermi energy, discussed in Section 3.1.3), yet these states may not be fully occupied within an ensemble of molecules. As a result, an xray photon with an energy of a few eV below the ionization potential has the chance to excite an innershell electron into occupying that electronic state. This leads to an absorption resonance a few eV below the absorption edge as shown in Fig. 9.2. The typical features of a XANES spectrum can be placed into the following categories, with most of these features shown in the carbon and oxygen nearedge spectra of carbon monoxide (CO) shown in Fig. 9.3 along with illustrations of σ and π orbitals. 1. Valence excitations involve core shell or ground state electrons being excited to valence states, such as the C 1s → π∗ transition in CO at 287.4 eV. In this context, valence refers to compact orbitals constructed from atomic levels of the outermost occupied principal quantum number (n = 2 for CO). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.1 Absorption spectromicroscopy
357
2. Rydberg transitions involve promotion of a core shell electron to levels constructed from atomic levels with n greater than the valence shell (n > 2 for CO). Examples are the 3s and 3p Rydberg features at 292.4 and 293.8 eV in C 1s excited CO. 3. Ionization continuum corresponds to direct ionization of the core electron. The threshold or ionization potential for the C 1s and O 1s edges of CO are indicated in Fig. 9.3 by a hatched line. For atoms and molecules with weak XANES features, this corresponds to the position of the absorption edge (Fig. 3.3). 4. Multielectron excitations have valence (occupied) to valence (unoccupied) excitations occurring simultaneously with core to valence excitations. The relatively sharp feature at 300 eV in CO is a twoelectron excitation: (C1s2 . . . π2 π∗0 ) → (C1s1 . . . π1 π∗2 ). 5. Continuum resonances involve valence electronic states above the core level ionization potential, so that promotion of a core electron to this state rapidly decays into direct ionization. From the Heisenberg uncertainty principle (Eq. 3.24) of ΔE ≥ /(Δt), these shortlived states produce broad spectral features above the ionization potential. The C 1s → σ∗ transition at 305 eV and the O 1s → σ∗ transition at 551 eV are both examples of continuum resonances. While this categorization is couched in the orbital language of atoms and molecules, corresponding features exist in the XANES spectral of solids. In this case there are valence excitations (corresponding to core→conduction band excitation), excitons (core excited electrons temporarily trapped by the core hole potential), direct ionization transitions, multielectron features (such as shakeup and shakeoﬀ features [Mukoyama 1994]), and multiplescattering resonances in the ionization continuum. The energy locations of XANES resonances can shift with the atom’s oxidation state, sometimes by several eV. Consider the case of an atom becoming more oxidized, such as iron going from Fe2+ to Fe3+ or, equivalently, from Fe(II) to Fe(III). The higher oxidation state number means that the iron atom has “given away” more of its electrons to another atom, so that fewer remain in its own innershell orbitals. As a result, there are fewer electrons to help in partial screening of the nuclear charge, and those that remain are more tightly bound. Those electrons then face a longer (in energy) jump out to a partially occupied molecular or valence orbital, so that the XANES peak shifts to higher energy at a higher oxidation state. The shift scales with the square of the oxidation state [Johnson 1936], as might be expected from the Zscreen term in the modiﬁed Bohr model of Eq. 3.12.
9.1.3
Carbon XANES Given the variety of ways that carbon atoms can form molecular bonds, it is not surprising that the carbon absorption edge is particularly rich in XANES resonances. This can lead to a powerful way to image chemical speciation in organic materials, as was ﬁrst demonstrated for imaging polymer blends [Ade 1992]. As an example, consider the immiscible polymers polycarbonate (PC) and poly(ethylene terephthalate) (PET), for which carbon XANES spectra are shown in Fig. 9.4. At about 285 eV (which is
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
358
Xray spectromicroscopy
2.0
Optical density (OD)
285.36
285.69
PET
PC
1.5 1.0
PET
PC
PET
PC PC:
C
O
C
O C O
C
0.5
O
PET:
C2H4
O C
n O
ѥP
C O n
0.0
285
290
295
285.36 eV
285.69 eV
Photon energy (eV)
Figure 9.4 Carbon nearedge or XANES spectra and images of a blend of two immiscible
polymers: polycarbonate (PC), and poly(ethylene terephthalate) (PET). The carbon XANES spectra at left show that PET has higher absorption at 286.36 eV, while PC has higher absorption at a photon energy that’s only 0.33 eV higher. This leads to a complete reversal of contrast between images taken at those two photon energies. Microtomesectioned 100 nm thick ﬁlms were prepared by G. Mitchell and R. Cieslinksi, Dow Chemical, and images and spectra were acquired [Ade 1994] with H. Ade, North Carolina State University, using a soft xray scanning transmission xray microscope (STXM).
about 5 eV below the carbon ionization potential at about 290 eV), these two materials have aromatic bond resonances that appear about 0.3 eV apart from each other, so that images taken at the two resonance energies show a complete contrast reversal, as shown in Fig. 9.4. Because of this, carbon nearedge spectromicroscopy has become quite popular, with a number of synchrotron light source microscopes optimized just for this. While one could probably write an entire book just on this topic, we limit ourselves to some key comments here: • Detailed understanding of the spectrum of a speciﬁc molecule requires highly accurate quantum mechanical calculations of the electronic states of the molecule and their occupancy. This is often done by methods such as density functional theory (DFT) [Jones 2015], though one can also gain insights by studying the progression of spectral peaks between molecule types with added chemical binding states [St¨ohr 1992]. • One can excite corelevel electron transitions to nearedge states using either xray absorption as discussed here, or using electron energy loss spectroscopy (EELS) at the nearedge (energy loss nearedge structure or ELNES) as discussed in Section 4.10.1. Compared to EELS, X ray XANES provides a lowerdose way to use corelevel electrons for spectromicroscopy [Isaacson 1978, Rightor 1997] because one excites only the desired transition. At the same time, the similarities between ELNES and XANES means that one can gain valuable insights into XANES spectra from consulting EELS data [Hitchcock 1994], as shown in Fig. 9.3 • Because the corelevel electron state undergoes little modiﬁcation due to chemical binding eﬀects, XANES corelevel electron transitions into valence orbitals are generally easy to understand and interpret. Transitions from a lowenergy valence state to a higherenergy valence state that are driven by UV or visible light involve Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
8
6
Alanine: aliphatic
5
8
O
HS
O 6
Cysteine: side chain SH
OH NH3+
4
OH O
OH NH3+
6
6
O
NH3+ NH2
3
Arginine: C=N /*
Glutamine: NH2
4
4
+H2N
10
O
Mass Absorption Coefficient (104 cm2/g)
9.1 Absorption spectromicroscopy
OH NH3+
Tyrosine: aromatic 8
O
OH 3
NH3+
6
6 OH
NH NH2
4
359
4
2 2
2
2
2
1
0
0
287 288 289 290 291 292
0
287 288 289 290 291 292
0
287 288 289 290 291 292
0
287 288 289 290 291 292
284
286
288
290
292
Photon Energy (eV)
Figure 9.5 Carbon XANES spectra of several amino acids, along with calculations of the
speciﬁc electronic transitions and organic functional groups that produce the observed resonances [Kaznacheyev 2002]. These resonances allow one to identify major biochemical organizational themes in biological specimens, but the ∼1 percent local mass concentration limit of soft xray spectromicroscopy and the ratio of natural peak widths to the overall carbon XANES spectral range means that one is far from being able to detect individual proteins within the complex environment of a cell.
a more complex initial state. The simplicity of xray XANES comes at a cost of ionizing radiation damage. • Carbon XANES spectromicroscopy has proven to be of tremendous value in studies of polymers [Ade 1992]. Because polymers such as Kevlar exhibit strong molecular alignment, one can use the diﬀerence in xray absorption between diﬀerent beam polarization directions for linear absorption dichroism studies (Fig. 12.5) to observe the polymer orientation in the plane of the thin section [Ade 1993]. Several reviews describe the details of carbon XANES spectroscopy of a large number of polymer types [Urquhart 1999, Dhez 2003, Watts 2011b]. Associations of functional groups with speciﬁc energies in the carbon nearedge spectral region are shown in Fig. 9.6. • Carbon XANES has the potential for imaging major biochemical organization motifs in biological specimens. The 20 amino acids that serve as the functional units for proteins have had their carbon XANES spectra carefully measured, and compared with theoretical calculations [Kaznacheyev 2002] so that one can associate speciﬁc resonances with speciﬁc functional groups, as shown in Fig. 9.5. As an example, tyrosine has a particularly distinctive spectrum with especially strong preedge resonances, as shown in Fig. 9.2. However, given that the intrinsic energy width of most carbon XANES resonances is about 0.2 eV and the range over which they appear is only about 5 eV, there are a limited number of “energy channels” available without signiﬁcant spectroscopic overlap. In addition, the ∼1 percent sensitivity limit of diﬀerential absorption mapping (Section 9.1.1) means that one is far from detecting individual proteins in the complex environment of a cell. Finally, radiation damage to carbon XANES spectra (Section 11.2.1) and the lack of availability of many soft xray microscopes with cryogenic specimen transfer and imaging conditions (Section 11.3) limits what can be done. The best examples of the use of carbon XANES for biological studies published thus far involve Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
360
Xray spectromicroscopy
Carbonyl Carboxyl
Schäfer et al. Aliphatic Aromatic
Phenol
CO Oalkyl COOH carboxyl
Schumacher, Scheinost et al. Aromatic
Phenol
CH aliphatic Rydberg Carbonate Carbamate
Urquhart and Ade
Urea Acetic Aromatic
284
285
286
Acetate Ketone Amide Aldehyde 287
288
289
290
291
Photon energy (eV) Figure 9.6 Bonding motifs in carbon nearedge (XANES) spectra, as reported by several
authors. These include carbonyl core transitions in polymers (Urquhart and Ade; [Urquhart 2002]), and organic components of soil and environmental specimens by one set of authors (Schumacher/Scheinost et al., [Scheinost 2001, Schumacher 2005]) and by another (Sch¨afer et al., [Sch¨afer 2003]). Many additional tables of peak assignments exist (see for example [Solomon 2012]), but deﬁnitive assignments usually require careful spectroscopic studies of thin ﬁlms of the isolated components, along with density functional theory calculations, for reliable interpretation.
studies of proteinmediated packing of DNA in sperm [Zhang 1996], and overall sperm biochemical organization [Mak 2014]. • Because of strong absorption at the carbon K edge, most carbon XANES studies are carried out on specimens that are 100–200 nm thick (approximately equalling the absorption length μ−1 of Eq. 3.75, for reasons described in Section 9.1). However, 3D images employing carbon XANES contrast have been obtained by using serial sectioning of materials [Hitchcock 2003], and polymers have been studied in 3D using tomography combined with oxygenedge XANES spectroscopy [Johansson 2007] and also with tomographic spectromicroscopy at the ﬂuorine K edge [Wu 2018]. • In carbon XANES spectroscopy, care must be taken to understand the radiation dose at which spectroscopic resonances begin to be modiﬁed. This is addressed in more detail in Section 11.2.1, and lowdose data acquisition strategies are discussed in Section 11.2.7. • One can combine the insights gained by radiationdoselimited xray spectromicroscopy with those obtained using resonant soft xray scattering, which can Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.1 Absorption spectromicroscopy
361
probe ﬁner length scales but only by averaging over larger illuminated areas [Ade 2008]. These comments only touch the surface of this very important application of xray microscopy, with recent applications revisited in Chapter 12.
9.1.4
XANES in magnetic materials Electrons have spin, so for atoms with partially ﬁlled orbitals (Table 3.1) one can have unpaired electrons which can couple with their partners in nearby atoms. This can take the form of parallel coupling, leading to ferromagnetism, or antiparallel coupling, leading either to antiferromagnetism if the opposite spins are equal or ferrimagnetism if not. Paramagnetism and diamagnetism arise when spins are coupled only when induced by an external magnetic ﬁeld, while spintronic materials have longrange correlations that can be associated with structuring of the material [St¨ohr 2006]. These include skyrmions, where the coupling is between orthogonal spin orientations. When circularly polarized light is incident with its helicity aligned with or against the direction of electron spins, one can have a change in the xray absorption cross section, with the degree of enhancement depending on the density of states for the speciﬁc spin direction. This diﬀerence in absorption at XANES resonances for magnetic moments parallel or antiparallel to the xray beam direction is termed xray magnetic circular dichroism (XMCD), as shown in Fig. 9.7. While there were earlier observations of xray magnetic linear dichroism (XMLD) [van der Laan 1986], the ﬁrst observations of XMCD were for the Tb M edge [van der Laan 1986] and the Fe K edge [Sch¨utz 1987]. The ﬁrst observation of strong L edge XMCD using Fe reﬂectivity [Kao 1990] and especially Ni absorption [Chen 1990] inspired the development of magnetooptical sum rules [Thole 1992] so that one can obtain a quantitative determination of elementspeciﬁc spin and orbital magnetic moments. While there are other ways to remove nonmagnetic image contrast, by taking images with opposite circular polarization and subtracting them one can obtain XMCD images that show the degree of magnetization along the xray beam direction. This was ﬁrst done [St¨ohr 1993] with a photoelectron emission microscopes (PEEM; see Section 6.5). However, because the mean free path for inelastic scattering of lowenergy electrons is typically under 5 nm (Fig. 6.9), one obtains a clean PEEM signal from only the outer few nanometers of the material and surface properties such as topography can add complexity to image interpretation. Transmission imaging removes those limitations: the 100–150 nm penetration of 700 eV soft X rays is well matched to the thickness of the magnetic thin ﬁlms used in hard disk drives, while the sensitivity extends down to 1 nm thick layers in Co [Maci`a 2012]. Transmission XMCD imaging was ﬁrst demonstrated [Fischer 1996] using a TXM (Section 6.3) on a bending magnet beamline where radiation was collected above and below the plane of the orbit so as to obtain a high degree of partial circular polarization (an example image is shown in Fig. 12.10). Another way to achieve this is to use electromagnetically driven elliptically polarizing undulators (EPUs; see Section 7.1.6) to dynamically adjust the direction of circular polarization.
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
362
Xray spectromicroscopy
1.5
DOS
L3
EF
(2p3/2)
H field
Absorption
1.0
L2
0.5
(2p1/2)
0.0
2p3/2 2p1/2
Dichroism effect
0.5
700
710
720
730
Photon energy (eV) Figure 9.7 Xray magnetic circular dichroism (XMCD) in iron with an applied external
leading to an auxiliary ﬁeld H in the material (see Eq. B.12 in online magnetic ﬁeld B Appendix B at www.cambridge.org/Jacobsen). This eﬀect can be used for highcontrast xray transmission imaging of magnetic domains in thin ﬁlms (as will be shown in Fig. 12.10). When circularly polarized light is incident upon magnetic domains aligned parallel or antiparallel to the xray beam direction, a signiﬁcant diﬀerence is observed in the L2 and L3 XANES resonances depending on whether the helicity of the beam points in the same or the opposite direction as the electron spin. The diﬀerence between these two is the dichroism eﬀect. The inset diagram shows how the density of states (DOS) for the two spin directions diﬀer near the Fermi edge [Mathon 2001, St¨ohr 2006], thus giving rise to dichroism. Data courtesy of Elke Arenholz, then of Lawrence Berkeley National Laboratory.
With a spatially coherent beam and a pixelated area detector, one can use ptychography from an EPU beamline for imaging magnetic materials [Shi 2016]. Fourier transform xray holography oﬀers yet another approach to XMCD imaging (Fig. 10.9), with the advantage that one does not need to place an optic near the specimen so that there are fewer space constraints for pole pieces for providing external magnetic ﬁelds. The dynamics of magnetic spins can be studied using the pulse structure of the electron beam in synchrotron light sources (Table 7.1) along with fastgating detectors. In scanning microscopes (Section 6.4), this can be done using fastresponse avalanche photodiodes [Stoll 2004], while timegated CCD detectors have been used in TXM systems [Wessels 2014]. The application of xray microscopy to the study of magnetic materials will be discussed further in Section 12.4, with an example image shown in Fig. 12.10. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.1 Absorption spectromicroscopy
100 284.59 eV
80
363
284.87 eV 285.09 eV
f1, f2
60 40
f1
20
f2 0 287.97 eV
20
288.59 eV 288.31 eV
280 281 282 283 284 285 286 287 288 289 290
Energy (eV) Figure 9.8 Carbon nearedge spectrum of the complex oscillator strength ( f1 + i f2 ) for the amino acid tyrosine. The experimentally measured xray absorption spectrum shown in Fig. 9.2 was used to determine f2 (E) with a strong aromatic absorption resonance at 285.09 eV, and a strong COOH π∗ transition at 288.59 eV; the f2 (E) data were then spliced into tabulated data over a larger energy range after which f1 (E) was calculated using a numerical implementation [Jacobsen 2004] of the Kramers–Kronig transform of Eq. 3.111. As the photon energy increases towards the aromatic absorption resonance, the f1 (E) value reaches zero at 284.59 eV, which is where f2 (E) begins its sharp rise, and it reaches its most negative value at 284.87 eV. Unfortunately the sharp negative resonance in f1 (E) comes at the midpoint in the rise of absorption f2 (E) so that one can reduce but not avoid radiation damage by using phase contrast nearedge spectroscopy to detect selected molecule types.
9.1.5
XANES in phase contrast When discussing the xray refractive index n = 1 − αλ2 ( f1 + i f2 ) of Eq. 3.65, we noted that the Kramers–Kronig relationship (Section 3.4.1) of Eq. 3.111 provides a way to calculate the phaseshifting part of the complex oscillator strength f1 (E) from knowledge of the absorptive part f2 (E) over a wide range of photon energies E. One can “splice in” a nearedge absorption spectrum into tabulated data for f2 (E) over a larger energy range [Palmer 1998, Jacobsen 2004, Yan 2013, Watts 2014], and thus obtain a nearedge spectrum for the real or phaseshifting part of the complex oscillator strength [ f1 (E) + i f2 (E)]. We showed in Fig. 3.21 a comparison of calculated versus interferometrically measured nearedge f1 (E) values at the carbon K edge, and how this departs from tabulated values of f1 (E). One challenge in using XANES spectromicroscopy at high spatial resolution is the possibility of radiationinduced modiﬁcations to the underlying material, as will be discussed in Section 11.2.1. Could one carry out phase contrast XANES spectromi
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
364
Xray spectromicroscopy
croscopy with a lower radiation dose than one has with absorption contrast XANES? There are some possibilities, but several complicating factors arise [Jacobsen 2004]. To illustrate the possibilities, we show in Fig. 9.8 the f2 (E) nearedge spectrum of the amino acid tyrosine (obtained from the absorption data shown in Fig. 9.2), and the f1 (E) phaseshifting spectrum obtained by the “splicingin” method. Unfortunately the sharp dips in the f1 spectrum lie halfway up the strong absorption resonances, so one can reduce but not eliminate the extra absorption due to the XANES resonance. One could instead exploit the zerocrossing value of f1 (E) at an energy below the absorption resonance, but that does not undergo a sharp change with energy. Finally, absorption spectra arise via Fermi’s Golden Rule of Eq. 3.18 directly from the overlap of quantum mechanical states leading to simple physical interpretation, while phase spectra will have the added complication of the Kramers–Kronig integral to deal with for their interpretation. Still, nearedge phase contrast eﬀects have been explored experimentally in diﬀerential phase contrast [Hornberger 2007a, Figs. 3.8 and 3.9] and in ptychography [Maiden 2013, Shapiro 2014, Farmand 2017, Hirose 2017], so it is possible they could end up being practical and important. In the meantime, other approaches that exploit XANES phaseshift resonances include resonant soft xray reﬂectivity (RSoXR) [Wang 2005, Wang 2007], since it involves both δ = αλ2 f1 and β = αλ2 f2 , as shown by Eq. 3.120.
9.1.6
Errors in XANES measurements When carrying out absorption spectromicroscopy in a scanning transmission xray microscope or STXM using Fresnel zone plate optics, one must worry about blocking the “zero order” or undiﬀracted light that might be transmitted through the central stop, as shown in Fig. 5.17. As a result, the central stop should be made quite thick, as indicated by Eq. 5.45. In addition, if a grating monochromator used to select the photon energy delivers some secondorder light, and if the zone plate has a mark:space ratio other than 1:1, secondorder light from the monochromator (at twice the desired photon energy) can be delivered by secondorder diﬀraction by the zone plate (at half the ﬁrstorder focal length) to exactly the same focal position as the desired photon energy. This can reduce the apparent strength of XANES peaks, as illustrated in Fig. 9.9. So as to minimize the presence of higherenergy photons, some synchrotron beamlines use doublebounce mirrors to suppress secondorder light as shown in Fig. 3.27. An alternative approach is to use a gas cell or a thin ﬁlter made of a material with an absorption edge above the desired photon energy E, but below the secondorder photon energy 2E, to selectively absorb secondorder light. Finally, one can also attempt to correct for the presence of higher monochromator orders in the analysis of a spectrum by measurement over an extended range and doing leastsquares ﬁtting [Yang 1986].
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.1 Absorption spectromicroscopy
365
2.5
Optical density
2.0
1.5
2× energy fraction: 0% 1% 3% 10%
1.0 30% 0.5
0.0 282
284
286
288
290
292
Energy (eV) Figure 9.9 Eﬀect of the presence of secondorder monochromator light on the measurement of a XANES spectrum. Shown here is a subset of the carbon nearedge absorption spectrum of the amino acid tyrosine (a larger spectral range was shown in Fig. 9.2) with an assumed thickness of 230 nm, representing an optical density OD = μt (Eq. 3.83) of about 1 at a photon energy of 290 eV. This spectrum was assumed to contain a speciﬁed fraction of light at twice the photon energy, such as from the second diﬀraction order of a grating monochromator. As can be seen, the presence of even a small amount of secondorder light can reduce the apparent height of XANES resonances, and limit the maximum optical density that can be observed using a speciﬁc xray beamline.
9.1.7
Wiggles in spectra: EXAFS As the incident photon energy E is tuned above an xray absorption edge so that a corelevel electron is removed from an atom, the excess energy above the ionization potential E I goes into kinetic energy of the electron, or Ek = E − Ei .
(9.18)
This liberated electron has a nonrelativistic momentum p given by Ek = p2 /(2me ), where me is the electron’s mass, so that the momentum becomes (9.19) p = 2me Ek , giving a de Broglie wavelength (Eq. 3.5) λe of λe =
h h = √ . p 2me Ek
(9.20)
Now let’s assume that there is an occupied electron orbital in a neighboring atom that is a distance r away. The ejected electron’s wavefunction can be reﬂected by this charge Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
366
Xray spectromicroscopy
1.0
Optical density
0.8
0.6
0.4
0.2
0.0 9,600
9,700
9,800
9,900
10,000
10,100
10,200
Photon energy (eV) Figure 9.10 Extended xray absorption ﬁne structure (EXAFS) spectrum of a zinc foil. The aboveedge “wiggles” in EXAFS are due to constructive and destructive selfinterference of the de Broglie wave nature of ejected electrons reﬂecting oﬀ of electron shells in a neighboring atom. Data from Matt Newville of the University of Chicago, via the XAFS spectra library found at cars.uchicago.edu/xaslib.
“surface” back onto itself, so that it forms a standing wave when n x λe = r.
(9.21)
Therefore one expects there to be slight constructive and destructive modiﬁcations to the xray absorption cross section at electron kinetic energies Ek that go like Ek =
n2x h2 n2x (1240 eV · nm)2 n2 (hc)2 = = , 2me r2 2 me c2 r2 2(511 × 103 eV)r2
(9.22)
where in the latter version we have made use of hc from Eq. 3.7, and the relativistic energy corresponding to the mass of the electron of me c2 = 511 keV. If we assume a representative interatom spacing of r = 0.25 nm, we ﬁnd that the ﬁrst spectral “wiggle” comes at a photon energy E − Ei = Ek of 24 eV above an absorption edge, with subsequent wiggles at energies spaced farther apart according to the n2x dependence in Eq. 9.22. Because these modulations occur over an extended energy range above the absorption edge (as shown in Fig. 9.10), these modulations are referred to as extended xray absorption ﬁne structure or EXAFS. Of course there is a deeper story to EXAFS analysis, and its historical origins go back to the early days of xray physics [Stumm von Bordwehr 1989]. A key advance was to √ realize that by considering the absorption spectrum on a scale of Ek , the relationship of Eq. 9.22 becomes linear in n x , the number of de Broglie waves between an atom and nearby electron shells. One can then apply a Fourier transform to the spectrum so Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.2 Xray ﬂuorescence microscopy
367
as to measure the distance r to a nearby occupied electron shell with a high degree of accuracy [Sayers 1971]. Analysis of EXAFS spectra is practically an industry unto itself, and it is covered in detail in monographs [Teo 1986]. In a crystal, the arrangement of neighboring atoms is replicated precisely over many unit cells, so the EXAFS modulations are strong and one can measure interatomic distances to very high precision. The orientation of neighboring atoms is not precisely repeated in liquids, yet there is still a nearestneighbor distance that can be measured from weaker EXAFS modulations [Eisenberger 1975]. However, in gases, the distances to neighboring atoms are random, and no EXAFS wiggles are observed. Because EXAFS analysis usually requires a careful measurement of the spectrum over a range of several hundred eV above an absorption edge, we are not aware of it being combined with highresolution imaging for applications in EXAFS spectromicroscopy – but it would not be surprising if this were done in the future.
9.2
Xray ﬂuorescence microscopy Our discussion of XANES/NEXAFS and EXAFS has involved measurements of changes in xray transmission as the incident photon energy is tuned (absorption spectroscopy). Another very powerful approach in xray microscopy is to use a focused xray beam to excite the emission of xray ﬂuorescence photons as discussed in Section 3.1.1, so that one can detect the presence of speciﬁc elements with atomic number Z by observing the emission of ﬂuorescence X rays at energies approximated by Eq. 3.14 and its equivalents for other electronic transitions. In practice, one uses the experimentally determined energies of xray ﬂuorescent emission lines from tabulations [Bambynek 1972, Krause 1979a, Elam 2002], including in computerreadable formats as described in Appendix A. Because one is looking for distinctive signals (xray ﬂuorescence lines) against a dark background (low scattering, as will be described below), xray induced xray ﬂuorescence provides one of the best combinations of sensitivity for trace element analysis and minimum damage, as was shown in Fig. 4.79. If one had an achromatic imaging optic and an energyresolving area detector, one could perform fullﬁeld xray microscopy using xray ﬂuorescence lines. However, both optics and detectors present challenges for realizing this approach: • Achromatic optics for nanoscale fullﬁeld xray imaging are not yet readily available. Fresnel zone plates (Section 5.3.1) and compound refractive lenses (Section 5.1.1) are strongly chromatic, so that a single energyresolving area detector would be at the infocus image position for only one xray ﬂuorescence energy. Grazing incidence reﬂective xray optics are achromatic if they do not have multilayer coatings, but as discussed in Section 5.2 they tend to have very small ﬁelds of view due to oﬀaxis aberrations. In theory, the Wolter geometry shown in Fig. 5.10 could overcome this limitation but it is challenging to realize with grazing incidence optics. Earlier attempts at fabricating two grazing incidence Wolter optics for
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
368
Xray spectromicroscopy
2D focusing proved challenging [Onuki 1992, Hasegawa 1994]. More recently, Woltertype optics have been developed using a Kirkpatrick–Baeztype geometry so that a separate optic pair is used to focus in each direction (thus requiring four optics for 2D focusing) [Matsuyama 2014, Yamada 2017]. These optics have demonstrated 50 nm resolution with unspeciﬁed eﬃciency [Matsuyama 2017], so this limitation might be overcome. • It is diﬃcult to obtain detectors that provide both spatial and spectral information, at least for high signal rates. As will be discussed in Section 7.4, most xray detectors work by converting an absorbed xray into a number of quanta (such as electrons and holes in semiconductor detectors). If readout of a detector pixel’s quanta occurs over a time T (whether due to a choice in data collection time, or fundamentals such as capacitance in a detector element), one has a choice of interpreting the number of quanta in terms of the energy of a single xray photon, or of the number of photons at an alreadyknown xray energy, but not both. How then to measure the energy of single photons while also allowing for an appreciable overall signal rate? This requires a large number of pixels with charge integration. The MAIA xray ﬂuorescence detector [Ryan 2010, Ryan 2014] oﬀers 384 pixels, though this is not suﬃcient for most imaging applications. At very low signal rates where one can expect no more than one photon per pixel per acquired image, standard CMOS cameras have been used for energyresolved image detection [Scharf 2011, Ordavo 2011, Zhao 2017]. However, there is not yet a good solution available that combines many pixels, high spatial resolution, and high spectral resolution while also accommodating high signal rates. Diﬃcult does not mean impossible, and fullﬁeld xray ﬂuorescence microscopy at severalmicrometer resolution has already been demonstrated [Takeuchi 2009, Garrevoet 2014] with a “light sheet” illumination approach to 3D imaging. More recently, 0.5– 1.0 μm spatial resolution was obtained using Kirkpatrick–Baez optics as noted above, and a chargeintegrating CCD camera [Matsuyama 2019]. However, the above challenges remain for extension to nanoscale imaging with high throughput. Therefore the main approach to nanoscale imaging using xray ﬂuorescence is to use a scanning microscope to provide spatial resolution, and a nonspatially resolved energydispersive detector to measure the energy of each emitted ﬂuorescent photon. (While the MAIA detector mentioned above has energyresolving pixels, its many detector elements are used primarily to increase overall count rate, and perpixel collimation is used to preferentially detect ﬂuorescent photons emitted from the focused xray beam spot position while reducing the contribution of scattered photons from other locations.) This scanning approach is known by several names, including scanning xray ﬂuorescence (SXF), xray ﬂuorescence microscopy (XFM), scanning xray ﬂuorescence microscopy (SXFM), or scanning ﬂuorescence xray microscopy (SFXM). In keeping with the nomenclature of scanning transmission electron microscopes (STEMs) and STXMs, we will use SFXM here.2 A schematic of a typical layout for SFXM at a synchrotron light source is shown in 2
If you’re suﬃciently sophisticated, you can pronounce SFXM as “sﬁcksem.”
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.2 Xray ﬂuorescence microscopy
Zone plate objective
369
Sample, raster scanned
Undulator Monochromator
Order sorting aperture (OSA)
Transmission detector XRF detector
Figure 9.11 Schematic representation of a scanning ﬂuorescence xray microscopy (SFXM) experiment with an undulator source at a light source facility. The undulator delivers an xray beam polarized in the horizontal direction, so there is a minimum of elastic scattering for a detector at 90◦ horizontal to the incident beam. An energydispersive detector (Section 7.4.12) is used to record the xray energy of each ﬂuorescent photon at each specimen scan position, and a transmission detector can also be used which might be a simple silicon photodiode, or a segmented transmission detector for diﬀerential phase contrast [Hornberger 2008], or a pixelated detector for a variety of contrast modes [Thibault 2009b] including ptychography (Section 10.4). Figure modiﬁed from [Deng 2015b].
Fig. 9.11. As described in Section 7.1.4, these facilities use dipole magnets to steer the circulating electron beam in the horizontal direction and these “bending magnets” are one type of xray source for experimental end stations; another type is an undulator, and most undulators also deﬂect the electron beam back and forth in the horizontal direction. As a result, the xray radiation in the plane of both bending magnet and undulator sources is very strongly linearly polarized in the horizontal direction, and Eq. 3.34 then tells us that there is zero elastic scattering at 90◦ horizontally from the incident beam direction. This means that an xray detector placed at that angle relative to the specimen will receive a minimum of elastic scattering [Dzubay 1974], thus reducing the overall ﬂux on the ﬂuorescence detector and minimizing a strong elastic peak in the detected spectrum (see Fig. 9.12), although there are other ﬂuorescence detector placement options, as will be discussed in Section 9.2.2. For a detector with a circular sensitive area of radius r located a distance z from the specimen such that it extends over a semiangle of θ = tan−1 (r/z), the detected solid angle Ω will be given by 2π θ sin θ dθ dϕ = 2π(1 − cos θ) (9.23) Ω= ϕ=0
θ =0
πθ2 for θ π/2.
(9.24)
Energydispersive detectors (Section 7.4.12) typically oﬀer detection solid angle coverage of about Ω = 0.1–0.7 steradians (full coverage over 180◦ involves a solid angle of 2π sr out of 4π possible), and one can use two ﬂuorescence detectors on either side of the illuminating beam to double the collected solid angle [De Samber 2012]. The 384 element MAIA detector [Ryan 2010] oﬀers a solid angle of collection of up to 1.2 sr in the forward or backward direction, as shown in Fig. 9.14. One impressive alternative Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
370
Xray spectromicroscopy
approach [Szlachetko 2010] for transverse detection has been to use a polycapillary array or “Kumakhov lens” (Section 5.2.5) to collect a solid angle of 1.48 sr and deliver a highly parallel beam onto a crystal in a wavelengthdispersive detector (Section 7.4.11) with 4–40 eV energy resolution. However, in this case the polycapillary optic had an eﬃciency of only 14 percent, leading to an eﬃciency·solid angle product of 0.21 sr. In order to allow for clearance of the detector front face relative to the specimen, as well as to minimize ﬂuorescence selfabsorption (Section 9.2.4), the specimen scan axis might be inclined by some angle θscan as shown in Fig. 9.14 relative to the usual perpendiculartoincidentbeam direction. This must be accounted for in setting scan parameters and displaying the image. Because one single detector system records the ﬂuorescence signals from all detected elements during one pixel’s exposure time, the “maps” or images of elemental composition that one obtains are intrinsically registered with each other (that is, one does not need to do anything to align the diﬀerent images to a common position). For reasonably thin specimens, it is useful to record the transmission signal simultaneously with the ﬂuorescence signal “for free.” The detector might be a simple silicon photodiode for absorption contrast (which is often quite weak at the multikeV xray energies used for SFXM; see Figs. 3.16 and 4.61). One can instead use a segmented transmission detector for diﬀerential phase contrast [Hornberger 2008]. If a pixelated area detector is used, one can use it to display a variety of contrast modes [Thibault 2009b], including ptychography [Schropp 2010a, Deng 2015b] as will be discussed in Section 10.4. Again, the transmission images are intrinsically registered to the ﬂuorescence images. Scanning ﬂuorescence xray microscopy (SFXM) provides one of the most sensitive methods for imaging the distribution of elements present at low concentrations within inorganic and biological materials. Its comparison to other methods is discussed in Section 4.10.1.
9.2.1
Details of xray ﬂuorescence spectra When using an incident xray energy E in a SFXM system, one has the potential to excite xray ﬂuorescence emission from all lowerenergy ﬂuorescence lines with an intensity per line given by Eq. 3.10. The lines with energies closest to the incident energy will be excited with the greatest eﬃciency due to the jump ratio of Eq. 3.8 just above an absorption edge, but the lower energy ﬂuorescence lines will still be excited. As a result, in a SFXM experiment one acquires a spectrum like the example shown in Fig. 9.12. This spectrum contains a number of features that are worth considering in detail: • In most cases an energydispersive detector is used which might have an intrinsic energy resolution of about 150 eV, depending on the xray energy (Eq. 7.33). This means that spectral peaks appear to be much wider in the measured spectrum than one would have expected from their intrinsic width. • At the incident xray energy E, one can observe a Rayleigh or elastic peak which is
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
Mg Na AlSi P S Cl Ar K Ca
Cr Mn Fe Co Ni Cu Zn
Ti
Ge As
371
Br
105 Si
Data Ti
Cl
104
K Ar
Al
10
P S
3
Ti
Cu
Fe
Ca
Elastic
Zn Ni
Cr
Ca
102
Overall fit
Compton
Mn
Co
Kơ
KƠ
Cu Zn Background
102 Au
Tail
L
Imageintegrated photons/50 msec/19.3 eV bin
9.2 Xray ﬂuorescence microscopy
101 1
2
3
4
5
6
7
8
9
10
11
12
Xray emission energy (keV) Figure 9.12 Example spectrum from a scanning ﬂuorescence xray microscope (SFXM). This is from a 10 μm thick cryo section of bovine articular cartilage that was freezedried prior to xray ﬂuorescence analysis. This spectrum was obtained by summation of all pixel spectra in an image. It contains contributions mainly from Kα and Kβ lines, plus the elastic scattering peak at the incident photon energy of 10.2 keV and a Compton scattering peak at a slightly lower energy. The spectrum ﬁtting includes the eﬀects of detector energy response as shown in Fig. 7.17. The weak ﬂuorescence lines in the 11–12 keV region are excited by a small presence of 20.4 keV photons due to secondorder diﬀraction from the beamline monochromator. From a study by Markus Wimmer, Rush University, with Olga Antipova, Argonne Lab. The spectrum ﬁtting was carried out by Stefan Vogt, Argonne Lab, using his program MAPS [Vogt 2003a].
due to elastic scattering of a fraction of the incident photons into the ﬂuorescent detector. As noted above, a small detector located at 90◦ to the incident beam in the horizontal plane will record a very small elastic peak due to the minimization of elastic scattering at that angle; as the solid angle of the detector is increased, more and more of the elastic peak will appear in the detected spectrum. • At an energy of E − ΔECompton as given by Eq. 3.28, there will be a Compton peak due to inelastic xray scattering. The strength of this peak is proportional to the electron density in the specimen, so that one can use the Compton peak as a proxy for the projected or areal mass in the specimen along the incident beam direction. • One will then see the various ﬂuorescence peaks from chemical elements present in the specimen. The natural linewidth of ﬂuorescence lines from the lighter elements is ∼1 eV [Krause 1979b], but they are broadened considerably by the resolution of the energydispersive detector (Eq. 7.33). One must therefore pay careful attention to nearby ﬂuorescence peaks such as the L1 , L2 , and L3 peaks (see for example Fig. 3.8) when analyzing ﬂuorescence spectra. One can also have nearDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
372
Xray spectromicroscopy
overlaps of L lines from heavier elements with K lines from lighter elements, for example. • When using energydispersive detectors, several complicating factors can arise in the observed spectra, including deadtime correction, pileup, escape peaks, and incomplete charge detection. The origins of these eﬀects are described in more detail in Section 7.4.12, but their eﬀects on observed spectra are as follows: – Dead time: because the detector must collect the signal from electron–hole separation in the detector material (typically silicon, though sometimes germanium is used), there is a “dead time” tdead when the detector is insensitive to the arrival of another photon. At high count rates, this means one must apply a dead time correction to the observed intensity as given by Eq. 7.45 and illustrated in Fig. 7.14. – Pileup: when two ﬂuorescent photons reach the detector within its pulse integration time (that is, within the dead time), the detector electronics might report it as a single photon with an energy given by the sum of the two. – Incomplete charge detection: defects in the detector’s crystalline lattice can trap some of the signal from one photon, leading to a broad signal “ﬂoor” underneath the ﬂuorescence peaks. This is shown in Fig. 7.17. – Escape peak: with siliconbased energydispersive detectors, the electron generated by absorption of an incident photon of energy E can occasionally excite emission of an Si Kα ﬂuorescent photon with an energy of E(Si, Kα ) = 1.74 keV. If that ﬂuorescent photon escapes, the remaining electron–hole charge separation will correspond to an energy of E − E(Si, Kα ). If there is an especially strong Rayleigh or Compton peak in the spectrum, or even an overwhelmingly strong ﬂuorescence line from one major constituent in the specimen, this means one can see an “echo” of this line at an energy 1.74 keV lower. • At photon energies below about 2 keV, the ability to detect xray ﬂuorescence lines begins to diminish. Energydispersive detectors are often equipped with thin, visiblelightopaque windows to separate the cold vacuum environment of the detector material from the specimen region, and these windows can preferentially absorb lowerenergy ﬂuorescence lines. The energy resolution of siliconbased detectors can also make it diﬃcult to separate these lowenergy lines. Finally, low ﬂuorescence yield (Figs. 3.5 and 3.7) and selfabsorption of lowenergy ﬂuorescence lines (Section 9.2.4) can also aﬀect their detectability. These are merely challenges rather than roadblocks; several scanning xray microscopes have been used for successful ﬂuorescence imaging at energies down to 280 eV [Kaulich 2009, Kaulich 2011, Hitchcock 2012]. Again, these eﬀects can be seen in the experimental example of Fig. 9.12. In order to see these complicating factors in a more simpliﬁed example, Fig. 9.13 shows a simulation result for the ﬂuorescence spectrum one might expect from Zn in a biological specimen. Simulations of this sort can be carried out using Monte Carlo methods [Schoonjans 2012, Golosio 2014], or a semianalytical approach [Sun 2015]. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.2 Xray ﬂuorescence microscopy
373
Compton Rayleigh
S KƠ S Kơ 1010
Zn KƠ
1011
Zn Kơ
I(E)/I0(E0)
1012 1013
Zn L 10
Sum
14
1015 1016
1018
Detector resolution
O K
1017
0
2
4
6
8
10
E (keV) Figure 9.13 Simulated SFXM spectrum from a biological specimen consisting of a 20 nm protein layer with 0.01 percent Zn added, contained within an overall specimen thickness of 100 nm of amorphous ice. Shown here relative to an incident beam intensity of 1 are the ﬂuorescence contributions of S and O within the model protein (Box 4.8), and various ﬂuorescence lines from Zn as well as the Rayleigh or elastic scattering background and the Compton background at 9.80 keV (Eq. 3.28). Signals at these xray energies are all aﬀected by the energydispersive detector energy resolution and response as shown in Fig. 7.17, and a detector solid angle of 0.024 sr was assumed. Modiﬁed from [Sun 2015].
9.2.2
Fluorescence detector geometries As was noted in Fig. 9.11, most SFXM systems at synchrotron light sources use a detector placed at 90◦ to the incident beam in the horizontal plane so as to minimize elastic or Rayleigh scattering. However, as the detector solid angle (Eq. 9.23) is increased, a detector at this position will nevertheless begin to collect elastically scattered photons. It is of course advantageous to increase the solid angle that the ﬂuorescence detector subtends, but in this case one may begin to consider other detector geometries such as those shown in Fig. 9.14. One notable example of this is the MAIA detector [Ryan 2010, Kirkham 2010, Siddons 2014] developed between CSIRO in Australia and Brookhaven National Laboratory in the USA. In present versions, this detector is composed of 384 “pixels” of energydispersive detector elements, each with their own pulseprocessing circuitry, so that the detector can handle a much higher aggregate ﬂuorescence count rate. These pixels are also equipped with collimators pointing towards the expected specimen/beam focus position. Because of the large planar extent of this detector, it is not convenient
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
374
Xray spectromicroscopy
Specimen Incident beam
Fluorescence detector
Fast scan
Ƨscan Specimen Fast scan
Incident beam
Incident beam
180°
0° Forward detector geometry
Specimen Fluorescence detector Fast scan
Fluorescence detector
Backward detector geometry
90° Transverse detector geometry
Figure 9.14 Geometries for locating energyresolving detectors for SFXM. The most common geometry is the transverse geometry, where the detector is oriented at 90◦ to the xray beam in the horizontal plane. At this angle, one will detect a minimum of elastically scattered photons due to the horizontal polarization of xray beams from bending magnet and most undulators at synchrotron light sources; one must also incline the scan direction by some angle θscan relative to the usual so as to allow for clearance between the specimen and the detector and to minimize selfabsorption of the ﬂuorescence signal within the specimen. As one increases the solid angle collected by the detector, the advantages of this geometry are muted so that one may consider other geometries such as the forward (0◦ ) or backward (180◦ ) positions. In the case of the 180◦ position, the detector must have a hole through which the incident beam can pass; this is the case of the MAIA detector [Ryan 2010, Kirkham 2010, Siddons 2014]. Figure adapted from [Sun 2015].
to mount in the 90◦ geometry; instead, it is mounted in the backwardsemission or 180◦ geometry as shown in Fig. 9.14, and the detector has a hole in its center to allow for the passage of the incident xray beam. This detector also has another unique characteristic: it delivers data in the form of a list of individual photon events (tagging the time of the event, and the energy of the photon) rather than integrating all photon events over a preselected integration time. The relative merits of various detector positions, specimen tilt angles, and solid angles can be considered using either Monte Carlo [Hodoroaba 2011] or semianalytical [Sun 2015] approaches. As one example, one might worry that for very low element concentrations there could be a competition between diﬀerent physical eﬀects: • As the detection solid angle is increased, one will collect more “signal” photons which are emitted isotropically. • As the detection solid angle is increased, more “background” photons are collected as one moves oﬀ of the polarizationproduced zero of elastic scattering. It turns out that signal collection dominates, so that large solid angle is always preferred (see [Sun 2015, Fig. 16]). In addition, the relative geometric ease that a planar detector at 180◦ has for collecting large solid angles outweighs the elastic scattering minimum for the 90◦ geometry for nearly all specimens. An exception is for lowmass specimens that are only a few micrometers thick, such as for detecting low concentration elements in biological specimens; in this case, a large solid angle detector in the transverse detector geometry oﬀers advantages (see [Sun 2015, Fig. 14]). Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.2 Xray ﬂuorescence microscopy
375
Table 9.1 Parameters for estimating the main K shell ﬂuorescent ﬂux produced by zinc.
K absorption edge Absorptive part of the oscillator strength Jump ratio (Eq. 3.8) Fluorescence energies Net K ﬂuorescence yield Fractional yields F Electron–hole transfer factor T
9.2.3
E K = 9.659 keV f2 = 3.694 at 10 keV rK = 7.543 E Kα1 = 8.637 keV, E Kα2 = 8.614 keV ωK = 0.46937 F Kα1 = 0.57606, F Kα2 = 0.29435 T =1
Elemental detection limits using xray ﬂuorescence Exact quantitation of trace element quantities using SFXM is a topic deep enough to ﬁll entire books [M¨uller 1972, Russ 1984, Janssens 2000d], and complicated enough to warrant Monte Carlo methods for ab initio analysis [Vincze 1995b, Vincze 1995a, Vincze 1999, Schoonjans 2012, Schoonjans 2013, Golosio 2014]. In practice, most researchers carry out quantitative analysis by ﬁrst recording the ﬂuorescence emission from a “standard” sample which (ideally) has similar absorption and scattering characteristics to the main mass of the specimen under study, and ﬂuorescing elements added at a known concentration that approximates what might be expected from the specimen under study. For example, one might start with a highpurity glass to which known quantities of trace elements are added, followed by thorough mixing when molten; the cooled sample should then contain a known concentration of trace elements with uniform distribution. By measuring the standard in the same apparatus as used for the unknown specimen, one can measure the concentration of a trace element by the ratio of emitted ﬂuorescence signals. Originally the emitted ﬂuorescence signal was measured by integrating the signal over an “energy window” set to incorporate most of the ﬂuorescence line, but today the preferred practice is to measure the entire ﬂuorescence spectrum delivered by the detector (an approach that was ﬁrst used in electron [LeFurgey 1992] and proton [Ryan 1993] microprobes, and then introduced to SFXM [Vogt 2003b, Twining 2003]). Several computer codes are available to carry this out [Ryan 2000, Vogt 2003a, Sol´e 2007, Schoonjans 2012, Schoonjans 2013, Crawford 2019], and some of the general principles involved in full spectrum analysis are discussed in Section 9.3. Exact calculations of elemental detection limits requires detailed knowledge of effects including incomplete charge collection in the ﬂuorescence detector (see Fig. 7.17). However, if these background limits are suﬃciently low then the sensitivity is dominated simply by the number of ﬂuorescent photons detected from a trace element, as discussed in Section 4.8.2. In Section 3.1.1 we gave the following expression (Eq. 3.10) for the ﬂux into one xray ﬂuorescence line: IKα1 (E) = ωK F Kα1 T Kα1 (1 −
1 )(1 − e−μZ (E)tZ ) I0 (E). rK
For elemental analysis it is more convenient to consider the areal mass density ρZ and mass absorption coeﬃcient μZ of Eq. 9.3 for element Z, leading to an alternative form Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
376
Xray spectromicroscopy
for the ﬂuorescence ﬂux of IKα1 (E) = ωK F Kα1 T Kα1 (1 −
1 )(1 − e−ρZ μZ (E) ) I0 (E). rK
(9.25)
The product ρZ μZ (E) can be found from Eqs. 9.2 and 9.3 as mZ NZ AZ = 2 2 Δr Δr N A N A μZ = 2re λ f2 A NZ ρZ · μZ = 2 2 re λ f2 . Δr ρZ =
thus
(9.26)
If we had NZ = 104 zinc atoms in an area of (Δr = 50 nm)2 , the numerical value of ρZ ·μZ using 10 keV incident X rays is 1.03 × 10−5 , given the parameters listed in Table 9.1. Thus we are well justiﬁed in making the approximation
(1 − e−ρZ μZ (E) ) ρZ μZ (E),
(9.27)
which along with Eq. 9.25 and Eq. 3.7 lets us write the fraction of Kα1 ﬂuorescent photons per incident ﬂux as hc 1 NZ IKα1 (E) 2ωK F Kα1 T Kα1 (1 − )re f2 (E) 2 , I0 (E) rK E Δr
(9.28)
with an obvious equivalent for the Kα2 ﬂuorescence line. Since the Kα1 and Kα2 ﬂuorescence lines are only 23 eV apart, and most SFXM systems use siliconbased energydispersive detectors with an energy resolution of about 150 eV (Eq. 7.33), the detected signal will be the sum of the Kα1 and Kα2 lines. Thus in practice we are interested in the numerical result of NZ IKα1 (E) + IKα2 (E) = (9.15 × 10−25 meters2 ) 2 , I0 (E) Δr which for NZ = 104 atoms in an area of (Δr = 50 nm)2 gives IKα1 (E) + IKα2 (E) = 3.66 × 10−6 . I0 (E) If we have an incident ﬂux of I0 (E) = 109 photons/s, an xray ﬂuorescence experiment with a perpixel dwell time of 0.1 s and a detector solid angle coverage of 0.2 sr will detect a total of 0.2 = 58 photons 109 · 0.1 · 3.66 × 10−6 · 4π in a SFXM experiment. We found in Section 4.8.2 that one can have very low false positive and negative error rates for detecting an element even if only 10 photons are detected (assuming suﬃciently small background normalized intensity Ib ), so this small number of zinc atoms in a (50 nm)2 area should be detectable. If the specimen is made of carbon with ρ = 2.26 g/cm3 and is t = 5 μm thick (an exceedingly simple model for a biological cell!), the sampled pixel has ρtΔ2r NA /A = 1.42 × 109 carbon atoms so the detection of 104 Zn atoms represents a concentration of 7.06 parts per million Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.2 Xray ﬂuorescence microscopy
377
(ppm) and a detected mass of NZ A/NA = 1.11 × 10−18 grams or 1.11 attograms. These numbers are representative of what is achieved at several synchrotron light source facilities [De Samber 2016]; for example, the absolute mass detection limit for ESRF beamline ID22NI with 0.3 ms pixel dwell time was reported to be about 2 attograms [Adams 2011]. With low photon counts in ﬂuorescent signals, the spatial resolution is not necessarily given by the full probe resolution. A good approach to evaluate the achieved spatial resolution is to examine the power spectrum of the image, as illustrated in Fig. 4.19. Since one can use techniques like ptychography (Section 10.4) to measure the actual shape of the focused xray beam, it is natural to try to use deconvolution to obtain a sharper view of a xray ﬂuorescence image [Vine 2012, Deng 2017b]. If deconvolution is combined with a Wiener ﬁlter as described in Section 4.4.8, one may well have diﬀerent spatial resolution values for diﬀerent ﬂuorescence maps according to their signal strength, as shown in Fig. 4.49. Since many incident photons are required to detect elements present at low concentration, it is useful to estimate the radiation dose imparted to the specimen in SFXM. Consider the above ﬂux of 109 photons/s and a pixel dwell or transit time of 0.1 s, so that one has 108 photons incident per (Δr = 50 nm)2 pixel. One can estimate the radiation dose DC imparted to the carbon mass (with μ−1 = 2.14 mm at 10 keV according to tabulations) using Eq. 4.281 as DC = n¯
Eμ ρΔ2r
= (108 photons) ·
(104 eV/photon) · (1.602 × 10−19 J/eV) · (2.14 × 10−3 m)−1 (2.26 g/cm3 ) · (10−3 kg/g) · (102 cm/m)3 · (50 × 10−9 m)2
= 1.3 × 107 Gy or 13 MGy. This dose can cause signiﬁcant changes in the chemical bonding state of organics (as will be discussed in Chapter 11). However, especially if cryogenic conditions are used so that molecular fragments do not “ﬂy away” in solution or in a vacuum, the element in question might remain in place along with the overall mass of the specimen region. That overall mass provides contrast in transmission imaging (especially phase contrast at higher xray energies) at length scales much larger than atomic, so xray microscopes may not show very much change in image contrast, as demonstrated in Fig. 11.11. As was noted in Section 4.10.1, SFXM represents one of the most sensitive methods for nondestructive detection of elements present at low concentrations; it has a sensitivity roughly 1000 times better than electron microprobes due to very small background signals relative to the desired ﬂuorescence signal [Janssens 2000a]. However, for biological studies one must exercise care in specimen preparation since it is known that chemical ﬁxation can be associated with the loss of certain diﬀusible elements; rapid freezing followed either by freezedrying or, better yet, imaging in the frozen hydrated state oﬀers better chemical ﬁdelity [Matsuyama 2010, Perrin 2015, Jin 2017, Jones 2017, De Samber 2018]. A recent paper summarizes the factors one must consider in Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
378
Xray spectromicroscopy
Fluorescence selfabsorption in 25% protein, 75% water 1.0
Zn (8.6 keV) Cu (8.0 keV) Fe (6.4 ke V) Mn (5.9 keV)
Transmisson
0.8
0.6 Ca (3.7 keV)
K (3.3 keV)
0.4
Cl (2.6 keV)
0.2
P (2.0 0.0 0
20
keV)
40 7KLFNQHVVѥP
S (2.3
60
keV) 80
Figure 9.15 Selfabsorption can give rise to errors and artifacts in ﬂuorescence tomography (as well as incorrect ratios of elements) if not corrected for. Shown here is the selfabsorption calculated for ﬂuorescence from a number of biologically interesting elements due to varying thicknesses of a mix of 25 percent generic protein (Box 4.8) and 75 percent water to represent the cytosol of a typical cell [Fulton 1982, LubyPhelps 2000].
sample preparation, experimental data collection, and analysis for the accurate detection of elements present at low concentration [Lemelle 2017].
9.2.4
Fluorescence selfabsorption One of the advantages of xray microscopy is its ability to work with quite thick specimens. However, in ﬂuorescence microscopy there arises a potential complicating factor: after an incident photon penetrates the specimen and is absorbed by an atom, the lowerenergy xray ﬂuorescence photon that might result has a chance of being absorbed within the specimen before it can escape and be detected. This is known as selfabsorption, and of course it aﬀects the detection of lighter elements more than of heavier elements (due to the Z 2 dependence of xray ﬂuorescence emission energies; see Eq. 3.12). An example in the case of biological imaging is shown in Fig. 9.15. One can correct for this eﬀect with assumptions about the nature of the specimen [Janssens 2000b], or by measuring the absorption in the transmitted beam, as discussed in the next section (Section 9.2.5).
9.2.5
Fluorescence tomography Fluorescence tomography provides a way to view the 3D distribution of elements in a specimen, as shown in Fig. 9.16. As the beam in a SFXM is scanned across one object slice (see Fig. 8.1), the transmission detector records information on the net absorption along the beam direction, and the ﬂuorescence detector records information on
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.2 Xray ﬂuorescence microscopy
379
Transmitted xray beam detector Tomographic rotation XY scan Incident xray beam
Specimen
Xray fluorescence detector (energy resolving)
Figure 9.16 Schematic of ﬂuorescence tomography data collection in SFXM. At each scan position, the transmission detector records the integrated absorption along the beam column as required by the pure projection assumption in standard tomography. The ﬂuorescence detector records the integrated signal from each ﬂuorescence emission line in much the same way, also satisfying the projection assumption. Figure adapted from [de Jonge 2010b].
the net ﬂuorescence signal along that same beam direction. One can therefore use an energydispersive detector to record what is equivalent to a pure projection image for each ﬂuorescence line, and carry out a standard tomographic reconstruction of the object [Boisseau 1986, Boisseau 1987, Cesareo 1989]. This approach has been extended to submicrometerresolution 3D imaging of lowconcentration elemental distributions using SFXM [de Jonge 2010a], as will be shown in Fig. 12.3. An alternative approach is to use a detector that is sensitive to only one plane transverse to the xray beam direction, as shown in Fig. 9.17. This is called confocal xray ﬂuorescence, even though the optical arrangement is much diﬀerent than in visiblelight confocal microscopes. The confocal optic should be nondispersive, so single capillaries (Section 5.2.3) or polycapillary optics (Section 5.2.5) are usually used, typically with a depth resolution of several micrometers. If a wavelength dispersive confocal optic is used, it is possible to deliver a collimated beam to a wavelengthdispersive detector such as a ﬂat crystal spectrometer. The confocal approach was proposed [Gibson 1992] and then demonstrated [Ding 2000, Kanngießer 2003] for imaging a selected depth within a thicker specimen, after which scanning the specimen along the xray beam direction led to 3D imaging [Vincze 2004]. The selfabsorption problem in xray ﬂuorescence described in the previous section (Section 9.2.4) becomes even more pressing as one goes to the thickness of 3D specimens. This can lead to severe artifacts in xray ﬂuorescence tomography reconstructions, such that the ﬂuorescence elements in the interior of the object simply do not appear in the reconstruction; this is illustrated in Fig. 9.18. However, the 3D information of tomography can also be put to advantage. If one could tune the xray beam energy to match each of the ﬂuorescence line emission energies, a transmission tomogram could be taken at each energy and used to calculate the exact selfabsorption characDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
380
Xray spectromicroscopy
Incident xray beam
Capillary optic Specimen
Detection depth plane
Confocal overlap To fluorescence detector
Figure 9.17 Confocal method for xray ﬂuorescence imaging of a selected depth plane in a 3D specimen. In this example, the incident xray beam comes from above to stimulate xray ﬂuorescence emission along its path through the specimen, while a nonwavelengthdispersive optic (such as a capillary, or a polycapillary lens as shown here) orthogonal to the beam limits ﬂuorescence signal detection to one depth plane in the specimen. If one is using a wavelengthdispersive detector (Section 7.4.11), it is advantageous to deliver a mostly collimated ﬂuorescence signal beam to the detector (in fact, rays can diverge from the exit of the capillary optic by as much as ±θc , the grazing incidence critical angle, as given by Eq. 3.115 and discussed in Section 5.2.5). Note that the confocal arrangement is very diﬀerent than what is used in visiblelight confocal microscopes, where the illuminating and detecting lenses lie on the same optical axis (and in epiillumination schemes one lens serves both functions).
teristics of the specimen and thus correct for it (as long as selfabsorption merely reduced but did not eliminate detection of ﬂuorescence from that element) [Hogan 1991]. However, if one is trying to detect multiple elements in the specimen, the number of datasets required can become overwhelming and diﬃcult to acquire (for example, some of the xray ﬂuorescence energies might be below the lowest incident photon energy available on a particular light source beamline). Other approaches are usually used, sometimes following upon developments made for radionuclide emission tomography [Chang 1978, Nuyts 1999, Zaidi 2003]. In the case of objects with uniform absorption, and illumination at a single xray energy, analytical approaches have been developed [La Rivi`ere 2004, Miqueles 2010] and these have been shown [Miqueles 2011] to provide a good starting point for iterative methods. One iterative approach has been to use transmission tomography data at a single xray energy to estimate the absorption at all xray ﬂuorescence energies (using the fact that, in the absence of xray absorption edges, xray absorption scales with xray energy as shown in Fig. 3.20), and thereby correct for selfabsorption [Schroer 2001, La Rivi`ere 2006, Yang 2014]. One can also add the Compton scattered signal as another measurement of overall specimen electron density, and use the tabulated absorption coeﬃcients μeE of all elements e at each ﬂuorescence energy E [Golosio 2003, De Samber 2016]. Other approaches classify the specimen as being composed of a ﬁnite number of material phases for the calculation of Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.3 Matrix mathematics and multivariate statistical methods
Glass
W
Au ѥP
8 6 4 2 0
Rotation Ƨ
Solid cylinder without selfabsorption
Position Ʋ Solid cylinder with selfabsorption
Hollow cylinder with selfabsorption
Transmission x103 sinogram
Rotation Ƨ
Si fluorescence sinogram
Sample
381
x103
Position Ʋ
4 3 2 1 0 ×103 4 3 2 1 0
×1010 6 5 4 3 2 1 ×1010 6 5 4 3 2 1 ×1010 6 5 4 3 2 1
Figure 9.18 Illustration of selfabsorption in xray tomography, in a simulation. At left is shown an object slice of a 200 μm diameter simulated glass rod surrounded by two 10 μm diameter wires, one of tungsten (W) and one of gold (Au); in the bottom row, the glass rod is assumed to be hollow with a wall thickness of 30 μm. Sinograms (Fig. 8.2) from the simulated 12.1 keV transmission images (right row) clearly show the diﬀerence between the solid and hollow glass rod, since the absorption length μ−1 of 12.1 keV xrays in the glass is 150 μm. However, even the hollow glass rod is large compared to the 1.66 μm absorption length μ−1 of Si Kα1 X rays in the glass, so Si ﬂuorescence is only detected from the side of the glass cylinder facing the ﬂuorescence detector. If there were to be no selfabsorption of the Si ﬂuorescence signal, one would obtain an Si XRF sinogram as shown in the top row, where the incident xray beam is partially absorbed in the small W and Au wires as they rotate into positions to intercept the incident beam before it reaches the glass cylinder. However, when selfabsorption is included, the sinograms for the solid and hollow glass rods are nearly indistinguishable as shown in the middle and bottom images, respectively. By combining information from the ﬂuorescence (XRF) and transmission (XRT) sinograms, one can in principle obtain a better reconstructed image of the specimen in the case of strong ﬂuorescence selfabsorption [Schroer 2001, Golosio 2003, Di 2017]. This simulation ﬁgure is adapted from [Di 2017], while [De Samber 2012, Fig. 3] provides a nice experimental demonstration.
selfabsorption [Vekemans 2004]. The above methods have usually used speciﬁc energy windows for ﬂuorescence analysis, but more recently fullspectrum ﬂuorescence analysis has been combined with transmission tomography to correct for selfabsorption of xray ﬂuorescence in an optimization approach [Di 2017].
9.3
Matrix mathematics and multivariate statistical methods The tradition among experts in xray absorption or ﬂuorescence spectroscopy has been to carefully acquire one highstatistics spectrum from a specimen, and then carry out
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
382
Xray spectromicroscopy
exhaustive analysis that might include density functional theory calculations so as to understand the physical origins of all features in the spectrum. This detailed level of understanding is of course to be celebrated, but it becomes impractical in spectromicroscopy as one has not one or a few, but thousands or even millions of spectra in one dataset. We therefore turn to the consideration of matrix mathematics and multivariate statistical analysis methods for xray spectromicroscopy. The notation in what follows is centered on xray absorption spectroscopy, but of course the mathematics is the same if one considers xray ﬂuorescence. In xray absorption spectroscopy, one measures the specimen transmission I(E) as a function of energy which of course decreases exponentially with increases in specimen thickness t as given by the Lambert–Beer law (Eq. 3.76). To obtain a representation that is linear with the projected thickness, this is normalized relative to the incident spectrum I0 (E), leading to an optical density of D(E) = − ln[I(E)/I0 (E)] = μ(E) t(E),
(9.29)
as given by Eq. 3.83. In 2D spectromicroscopy, the optical density is spatially resolved, so in fact one has a 3D dataset D(x, y, E) (and with tomography one would have D(x, y, z, E)). However, when it comes to analyzing the set of spectra obtained, we do not care about their arrangement in an image, so we instead consider this dataset to be a 2D array DN×P where we use n = 1, . . . , N to index the set of photon energies of the spectra, and p = 1, . . . , P to index the pixels according to p = icol + (irow − 1) · nrows ,
(9.30)
where icol and irows are both indexed from a starting value of 1. Now since Eq. 9.29 gives D = μt, the optical density matrix DN×P should be the product of a set S of absorption spectra μN×S and a matched set S of thickness maps T S ×P leading to a matrix equation for the optical density of DN×P = μN×S tS ×P .
(9.31)
In principle there could be a unique spectrum at each pixel, in which case S = P, but we will hope that there will be some common spectroscopic signatures S among at least some of the pixels P so that S < P. Therefore we will refer to S as the set of chemical components to the specimen, and in fact it is this set that we are trying to deduce from the data. With that understanding, we can write out Eq. 9.31 as ⎤ ⎡ ⎤ ⎡ pixels D1P ⎥⎥⎥ ⎢⎢⎢ μ11 components μ1S ⎥⎥⎥ ⎢⎢⎢ D11 ⎥ ⎢ ⎢⎢⎢ .. ⎥⎥⎥ ⎢⎢⎢ .. ⎥⎥⎥⎥ ⎢⎢⎢ spectra = ⎥ ⎢ ⎥ ⎢ . spectra . ⎥⎥⎥⎥ ⎥⎥⎦ ⎢⎢⎣ ⎢⎢⎣ ⎦ ... DNP ... μNS DN1 μN1 ⎡ ⎤ pixels t1P ⎥⎥⎥ t11 ⎢⎢⎢ ⎢⎢⎢ .. ⎥⎥⎥⎥ · ⎢⎢⎢⎢ components (9.32) . ⎥⎥⎥⎥ ⎢⎣ ⎦ ... tS P tS 1 to make more explicit the meaning of the rows and columns in all three matrices. In some cases we might in fact know the exact set of absorption spectra μN×S for Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.3 Matrix mathematics and multivariate statistical methods
383
all the chemical components S that, combined together, make up our specimen. With xray ﬂuorescence, we may know the set of elements present in the specimen, and the spectral response of the detector, so that we can again represent any measured spectrum as a linear combination of the spectra produced by all ﬂuorescence lines plus Rayleigh, Compton, and incomplete charge collection backgrounds [Ryan 1993]. In these cases, we can simply use the rules of linear algebra to calculate directly the “thickness maps,” or images of the thickness of each chemical component at each pixel, as tS ×P = (μN×S )+ DN×P = μ+S ×N DN×P ,
(9.33)
where μ+S ×N is a pseudoinverse of μN×S . Numerical matrix pseudoinversion can be done by a number of methods, but the most common approach involves singular value decomposition (SVD). Based on the Eckart–Young theorem of linear algebra, it states that an array AN×S with N ≥ S can be decomposed into AN×S = U N×S · WS ×S · VST×S ,
(9.34)
where the matrix U N×S has orthogonal columns, the matrix WS ×S is zero everywhere except for its diagonal elements, which are all zero or positive (these diagonal elements are called the singular values), and the matrix VS ×S has orthonormal rows. That is, these matrices have the properties that UST×N · U N×S = 1S ×S , and VST×S · VS ×S = 1S ×S . The singular value decomposition algorithm [Press 2007] or SVD routine that is present in many linear algebra subroutine libraries can be used to numerically construct these arrays. With them, one can ﬁnd the pseudoinverse of AN×S as A+S ×N = VS ×S · WS−1×S · UST×N ,
(9.35)
where the inverted matrix WS−1×S is again a diagonal matrix with elements Wi,i−1 that are the inverse of the singular values Wi,i , or zero when Wi,i = 0. Use of SVD for xray spectromicroscopy analysis with a set of known spectra has proven to be useful in soft xray XANES [Zhang 1996, Koprinarov 2002]; it is particularly useful in the analysis of immiscible polymers (Fig. 9.4) where one indeed has only a few known spectra that can be determined in advance by spectroscopy on pure thin ﬁlms.
9.3.1
Principal component analysis In many cases (particularly in biology or environmental science), the specimen cannot be assumed to be made up of a simple combination of a limited number of components for which reference spectra are known a priori. One approach to handle these cases involves the use of principal component analysis (PCA) to characterize the dataset in terms of its most signiﬁcant variations without prior knowledge of their characteristics. There is a long tradition of using PCA in the social sciences and in chemistry [Malinowski 1991], and its ﬁrst use in xray microscopy was in connection with photoelectronbased imaging [King 1989], with scanning transmission xray microscopy applications coming later [Osanna 2000]. The goal in PCA is to describe the specimen by a set of s = 1, . . . , S abstract abstract
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
384
Xray spectromicroscopy
components (where S abstract ≤ N). These abstract components describe the main spectroscopic signatures in the data; these signatures may in fact arise from a linear combination of several diﬀerent chemical species, so that there is not a simple, direct relationship between one particular abstract component and one particular chemical component of the specimen. While one can use SVD to carry out a PCA calculation, in many cases the numerical implementation involves creating square arrays of dimension N × N or P × P, whichever is larger, which is of course ineﬃcient if N P (in soft xray XANES spectromicroscopy, one might have N = 100 energy points, and P = 5122 pixels). A more eﬃcient approach is to ﬁrst calculate the spectral covariance from the optical density data times its transpose, or ZN×N = DN×P · DTP×N
(9.36)
which measures the correlation between images at various energies. Because the correlation of the image at energy n1 with the image at energy n2 is the same as the correlation of n2 with n1 , the covariance matrix ZN×N is symmetric. One can then use an eigenvalue routine from a linear algebra subroutine library to ﬁnd a matrix of eigenvectors (which we will henceforth call eigenspectra C N×S abstract ) and eigenvalues λ(s) that fully span the covariance matrix: ZN×N · C N×S abstract = C N×S abstract · ΛS abstract ×S abstract
(9.37)
where S abstract = N, and ΛS abstract ×S abstract is a diagonal matrix whose diagonal elements are given by the eigenvalues λ(S abstract ) for S abstract = 1, . . . , N. We can also ﬁnd a corresponding matrix (which we will henceforth call the eigenimage matrix RS abstract ×P ) from T RS abstract ×P = C N×S · DN×P , abstract
(9.38)
where we have used the fact that C is orthogonal (being composed of eigenvectors) so that its inverse is its transpose, or C −1 = C T . Finally, it is obvious that we can also rewrite Eq. 9.38 as DN×P = C N×S abstract RS abstract ×P ,
(9.39)
so that we can represent the full dataset with the matrix product of the eigenspectra times the eigenimages, and the most signiﬁcant information as DN×P = C N×S¯ abstract RS¯ abstract ×P ,
(9.40)
where the expression of Eq. 9.40 takes advantage of data size reduction and noise suppression as we will now discuss. To understand the power of PCA analysis, it is helpful to look at the example shown in Fig. 9.19. The dataset shown [Lerotic 2004] is one with lutetium serving as a standin for americium in interactions with ferrihydrite and humic acids, the latter of which are organics in soil that can aﬀect the transport of radionuclides [Dardenne 2002]. An oxygen K edge XANES spectromicroscopy dataset was acquired, and a set of eigenvalues, eigenspectra, and eigenimages were calculated using Eqs. 9.36–9.38. As one can see in Fig. 9.19, these quantities have the following characteristics: Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.3 Matrix mathematics and multivariate statistical methods
(b) Eigenspectra
(a) Eigenvalues 106 Eigenvalueh(Sabstract )
1
6
105
5
104 103
4 2 3 4
102
3 2 (zero)
5 6
101 10
385
1
(zero)
0
0
10 20 30 40 Component index Sabstract
50
525
530
535 540 545 Photon energy (eV)
550
(c) Eigenimages
1
2
3
4
5
6
Figure 9.19 Example use of principal component analysis (PCA) in xray spectromicroscopy. Shown here is an oxygen K edge XANES dataset [Lerotic 2004] of a specimen with lutetium used as a standin for americium in a study of radionuclide transport in groundwater [Dardenne 2002]. The ﬁrst four eigenvalues λ(S abstract ) contain most of the signiﬁcant variation in the eigenspectrum/eigenimage representation of the data, as indicated by the fact that they have much stronger values than all other eigenvalues even when displayed on a logarithmic scale. The importance of the ﬁrst four eigenvalues is also seen in the fact that the ﬁrst four (or the reduced set S¯ abstract = 4) eigenspectra show XANESspectrumlike features, while the spectra for S abstract = 5 and above show mostly uncorrelated noise with low eigenvalue weightings. The eigenimages tell a similar story; the ﬁrst four (or S¯ abstract = 4) show recognizable image features, while eigenimages S abstract = 5 and beyond show only the “salt and pepper” appearance of random noise (weakly visible here because images 1–6 are shown on the same intensity scale). However, the eigenspectra and eigenimages beyond the ﬁrst one have both positive and negative values, with the negative values in the eigenimages shown here on a red instead of grey color scale. Because both the eigenspectra and eigenimages show negative values that resemble successive orthogonal diﬀerences from the ﬁrst eigenspectrum and eigenimage, it is diﬃcult to interpret these higher eigenspectra and eigenimages on their own; it is only in linear combination that they reproduce measured xray absorption spectra at various image pixels.
• They are obtained from the data “as is,” with no a priori biases on the nature of the data. • The eigenvalues λ(S abstract ) decrease rapidly on a logarithmic scale. Because the eigenvalues represent overall weightings of eigenspectra and eigenimages in the dataset, they indicate that most of the signiﬁcant variations in the data can be represented by using only the ﬁrst four components. This represents a tremendous degree of data compression: in essence one can represent most of the signiﬁcant variations in the data with a reduced set of S¯ abstract = 4 abstract components, rather than (in this case) N = 120 photon energies. The separation between the S¯ abstract = 4 components from the full set of S abstract = N components is not always Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
386
Xray spectromicroscopy
so clear as is shown in this example, so one should also examine the eigenspectra and eigenimages as described next. • Among the set of eigenspectra C N×S abstract , the ﬁrst eigenspectrum shows what is effectively an average of all optical density spectra present in the dataset, with all positive values as one would expect for an optical density (negative values would represent negative xray absorption μt, or the addition rather than removal of energy from the xray beam—clearly unphysical!). However, the subsequent eigenspectra have a mean value of zero, with excursions to both positive and negative values. In eﬀect, the subsequent eigenspectra represent successive diﬀerences from the ﬁrst eigenspectra as required to represent all of the observed variations in the dataset. This means that it is very diﬃcult to interpret the spectroscopic signatures of individual eigenspectra; instead, one only matches observed spectra when forming an appropriate linear combination of the reduced number S¯ abstract of eigenspectra. The eigenspectra beyond S¯ abstract = 4 in this example show mostly stochastic variations that are characteristic of noise (plus, in this case, a slight nonlinearity of response in the strongly absorbing energy range of 540–543 eV, which is likely due to incomplete suppression of higher monochromator orders as discussed near Eq. 5.45 and illustrated in Fig. 9.9). • Among the set of eigenimages RS abstract ×P , the average over all individual optical density images is shown in the S abstract = 1 eigenimage, and then successive images show diﬀerent positive and negative value variations. As with the eigenspectra C N×S abstract , it is only through a linear combination of the S¯ abstract = 4 most signiﬁcant eigenimages that one can represent the individual optical density images acquired at the various photon energies N. The eigenimages for S abstract = 5 and beyond show the “salt and pepper” appearance of random noise, except for a slight shadow in the most strongly absorbing region near the center due to nonlinearities (the same nonlinearity as shown in the 540–543 eV energy range in eigenspectrum S abstract = 5). On the positive side, PCA gives us a way to signiﬁcantly compress the dataset for subsequent analysis (by reducing the dimensionality from N × P to S¯ abstract × P, where in this case S¯ abstract = 4 is much smaller than N = 120), and the compressed data are delivered as a set of orthogonal, successivediﬀerence eigenspectra and eigenimages. On the negative side, because the eigenspectra and eigenimages beyond S abstract = 1 show successive diﬀerences with positive and negative values, they are diﬃcult to interpret on their own. In addition, one can spoof image classiﬁers that use PCA as a data pretreatment step by deliberately adding in weak image features designed to “tickle” weak eigenimages (see [Sharif 2016] for one amusing example).
9.3.2
Cluster analysis and optimization methods The downsides of PCA listed above mean that it is usually used not as a ﬁnal step in analysis, but as a pretreatment step for a variety of followon methods. The ﬁrst approach demonstrated in xray spectromicroscopy was cluster analysis [Lerotic 2004],
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.3 Matrix mathematics and multivariate statistical methods
1.0
1
Pixel weighting in Sabstract=2
Pixel weighting in Sabstract=3
2
4 5 3
2 0
0.5
5
0.0
34 1
0.5
1
1
2
1.0
2
3 3
387
2
1
0
1
2
1.5 3
Pixel weighting in Sabstract=4
2
1
0
1
2
Pixel weighting in Sabstract=4
Figure 9.20 Scatterplots of the weightings RS abstract ×P (Eq. 9.38) of pixels P for two components S abstract compared against each other: S abstract = 3 versus 4 at left, and 2 versus 4 at right. These weightings show the relative strength of the associated eigenspectra present in a pixel, and one can only show twowise comparisons in an individual 2D plot. By using a kmeans clustering algorithm on the Euclidean distance between pixels over the full set of S abstract dimensions, one can classify pixels as belonging predominantly to one eigenspectrum or another. Figure adapted from [Lerotic 2004].
where one seeks dense groupings of pixels in an S abstract dimensional search space. Because each pixel P has a weighting RS abstract ×P for the set S abstract of components as given by Eq. 9.38, one can use a kmeans clustering algorithm to classify pixels based on their spectral similarity, as shown in Fig. 9.20. With a “hard” clustering method which classiﬁes each pixel to one and only one group, one then obtains a classiﬁcation map as shown in Fig. 9.21(a). One can then average the spectra of the pixels in each cluster together to obtain a set of spectra μS ×N,cluster from which a pseudoinverse can be obtained and used in Eq. 9.33 to obtain thickness or weighting maps corresponding to these spectra. This provides a good ﬁrst start to analysis, even though one can still arrive at nonphysical negative values for the reconstructed optical density for reasons shown in [Mak 2014, Fig. 2]. One can improve upon the basic cluster analysis approach described above in a number of ways, including by using “soft” clustering methods in which each pixel is given a weighting of how strongly it belongs in one cluster or another [Ward 2013], or by using an angle rather than Euclidean distance measure [Lerotic 2005]. Additional classiﬁcation approaches have been developed for spectrum imaging in transmission electron microscopy [Bonnet 1999] and xray ﬂuorescence analysis in electron microprobes [Kotula 2003]. In fact, one can do much more. What we have done in Eq. 9.31 as well as in Eq. 9.40 is to write our data analysis problem as one of solving a simple matrix equation of the form y = Ax Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
388
Xray spectromicroscopy
(a) Categorized map
(b) Spectra of categorized map (c) Similarity measure of spectra
Optical density
2.0
5
1.5
4 3
1.0
2 0.5 0.0 525
1 Similarity 530
535
540
545
550
Photon energy (eV) ѥP (d) Weighting images of categorized spectra
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Figure 9.21 Cluster analysis of the data of Fig. 9.19. By assigning pixels exclusively to one cluster or another as shown in Fig. 9.20, one obtains a simple map (a) of regions with similar spectroscopic response, and (b) the corresponding spectra. A “genetic relationship” plot of the similarity of the cluster spectra and their merge points is shown in (c). From the set of spectra μN×S shown in (b), one can obtain the pseudoinverse and use Eq. 9.33 to obtain thickness or weighting maps as are shown in (d). The negative (unphysical) values that can appear in these thickness maps point out a shortcoming in this approach, but it can still provide a good starting solution which can then be reﬁned using optimization methods. Figure adapted from [Lerotic 2004].
which, to the alert reader, should ring a bell: it is Eq. 8.9, the basic equation for the class of numerical optimization problems discussed in Section 8.2.1 in the context of iterative tomography reconstruction algorithms. This means that all of the approaches discussed in terms of image reconstruction, including compressive sensing, are in principle applicable to spectromicroscopy data analysis. Inspired by the method of nonnegative matrix analysis [Lee 1999], which corresponds to the fact that both the absorption spectra μN×S and thickness maps tS ×P should have only positive values, one can use optimization approaches where the basic cost function is given by matching the spectra and thicknesses to the data, or C0 = DN×P − μN×S tS ×P
(9.41)
so that one wishes to minimize the twonorm (Eq. 8.13) of this cost. That is, one wishes to ﬁnd min C0 2 . One can add regularizers C such as the onenorm (Eq. 8.12) of tS ×P 1 to seek a “sparse” solution (that is, to seek a set of spectra that make the thickness maps as diﬀerent as possible from each other, so that a pixel is more likely to be dominated by one of the set of spectra μN×S rather than a more even weighting of all the spectra). One can also add a positivity constraint on μN×S . This approach has been used with Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
9.4 Concluding limerick
389
success in XANES spectromicroscopy [Mak 2014, Mak 2016], and it is implemented in a software package available for download [Lerotic 2014]. There’s one important diﬀerence between tomography reconstructions based on pθ = Wθ f (Eq. 8.7) as well as xray ﬂuorescence analysis, versus xray absorption spectroscopy based on DN×P = μN×S tS ×P (Eq. 9.31): • In tomography one usually knows the set of angles θ from which projection images were obtained, and in xray ﬂuorescence analysis one knows the set of xray ﬂuorescence lines from all 92 stable elements. • In XANES spectromicroscopy, however, one may not know ahead of time the set S of chemical components with distinguishably diﬀerent absorption spectra. Principal component analysis can be used to obtain the number S¯ abstract of distinguishable spectra (Eq. 9.40) with no prior knowledge, after which one can carry out cluster analysis followed by nonnegative and sparse reﬁnement [Mak 2014, Mak 2016] to “discover” what these S¯ abstract spectra look like. That is, in XANES spectromicroscopy one can recover the “hidden” organizing factors S¯ abstract present in the data. This approach is also used in singleparticle imaging methods in electron microscopy (subject of the 2017 Nobel Prize in Chemistry), where one seeks to recover the set θ of viewing directions from which projection images of identical macromolecules were obtained [Frank 1975a, Frank 1988]. Electron microscopists can recover additional “hidden” organizing factors from a dataset, such as molecular conformation states [Scheres 2007, Spahn 2009]. We will see these singleparticle methods again in Section 10.6 when we discuss coherent diﬀraction imaging with freeelectron lasers.
9.4
Concluding limerick Adding spectroscopy to xray microscopy to make spectromicroscopy lets us understand what we are seeing, ranging from trace element distributions to chemical binding states in complex materials. It’s all cool enough to inspire a limerick! Xrays have spectra with wiggles from electrons; they shakes and they jiggles I could be confused but instead am amused Such images give me the giggles!
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:07:34, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.010
10 Coherent imaging
Janos Kirz contributed to this chapter. As discussed in Chapter 5, there have been considerable advances in the development of xray lenses using diﬀraction, reﬂection, and refraction. However, all of these xray lenses have their limitations: they typically have numerical apertures (N.A.) of 0.005 or less, and eﬃciencies ranging from 60 percent to as low as single digit percentages. Compare that with what is available in objective lenses for visiblelight microscopes, where oil immersion objectives go up to N.A. = 1.4 with eﬃciencies near 100 percent and excellent ﬁelds of view and achromatic properties. In electron microscopy, highend microscopes have spherical and even chromatic aberration correctors, an eﬃciency limited only by scatterlimiting apertures and detectors, and a spatial resolution as low as 50 picometers [Erni 2009]. Among these techniques, xray microscopy suﬀers the most from lensimposed limitations so there is an especially strong motivation to consider lensless xray imaging methods such as holography, coherent diﬀraction imaging (CDI; sometimes called diﬀraction microscopy), and ptychography. Besides the images shown in this chapter, xray ptychography images are shown later on in 2D (Fig. 12.2) and 3D (Fig. 12.6), while an example of a Bragg CDI image is shown in Fig. 12.8.
10.1
Diffraction: crystals, and otherwise X rays have been used for atomic resolution imaging, without lenses, for more than a century via the method of xray crystallography (from the Greek word κρύσταλλος or krystallos for “ice”). As discussed in Section 2.1, von Laue had realized in 1912 that the regular spacing of atoms in crystals could provide evidence of the wavelength of X rays. Later that year, Lawrence Bragg and his father William worked out their eponymous law describing crystalline diﬀraction (Eq. 4.33), and used it to determine the structure of several simple cubic lattice salts, including NaCl and KCl [Bragg 1913a, Bragg 1913b]. Another key advance was provided in 1934 by Arthur Lindo Patterson, who realized that the autocorrelation map A of a diﬀraction pattern gives direct information on interatomic distances in crystals, in an approach that is now called the Patterson map [Patterson 1934b, Patterson 1934a]. Recall from Eq. 4.83 that we can describe convolution as a shift–multiply–add operation, as illustrated in Fig. 4.18. By carrying out a shift–multiply–add sequence on a crystal’s diﬀraction pattern, one gets a reinforcement of the doubleslitlike grating patterns arising from any two point sources in the
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011
10.1 Diffraction: crystals, and otherwise
Object
391
Diffraction pattern (+inset)
Patterson map A (autocorrelation) Patterson map vectors
Figure 10.1 Patterson map of a threeatom simulated crystal. At top left is shown the object, consisting of three spheres, all with the same electron density; the line segments separating these spheres are shown in color. The diﬀraction pattern at top right has rings similar to an Airy pattern, as shown in Fig. 4.22 (except that the projected thickness of a sphere is diﬀerent than a uniform circular disk); the pattern is then modulated by interference grating patterns from the interatomic spacings. This can be seen in the inset in the diﬀraction pattern, which shows a region near the optical axis zoomed up. By squaring the diﬀraction magnitudes and taking the Fourier transform, one obtains the Patterson or autocorrelation map A(x , y , z ) (Eq. 10.1) at bottom left, which shows all of the interatomic distance vectors contained in the unit cell at bottom right. A Patterson map of a noncrystalline object is shown in Fig. 10.16.
object. Since the electron density of an atom acts like a point source scatterer, the selfconvolution or autocorrelation of the diﬀraction pattern then produces strong signals at positions corresponding to the interatomic distances as shown in Fig. 10.1. Crystallographers label the diﬀraction spots by Miller indices hkl , as was shown in Fig. 4.11, so in their notation the diﬀraction amplitudes are written as Fhkl . Thus for a crystallographer the autocorrelation of the diﬀraction pattern magnitudes—the Patterson map A(x , y , z ) where {x , y , z } represent positions within the unit cell of dimension {a, b, c}—is written as [Patterson 1934b, Eq. 6] y z x (10.1) Fh,k,l 2 exp[i 2π(h + k + l )], A(x , y , z ) = a b c hkl which is the inverse Fourier transform of diﬀraction intensities rather than of the properly phased diﬀraction amplitudes. As shown in Fig. 10.1, the Patterson map A(x , y , z ) provides direct information on the vector distance between atoms in the unit cell. Aided by this and other advances, by the 1930s crystallography was being applied to understand biological structures, and today the method has advanced so far that many labs obtain crystallographic structures by using airborne shipment of newly made crystals Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011
392
Coherent imaging
Structural heterogeneity
Eukaryotic cell
Atom Molecule
Bacterium Mitochondrion
Ribosome
Size
Figure 10.2 Structural heterogeneity versus size in biology. Crystallography relies on a very low degree of heterogeneity among the many molecules in their unit cells. As one gets to larger structures, one can ﬁnd a greater range of conformational variations in macromolecules; one can sort for a number of these variations in single particle electron microscopy [Leschziner 2007, Scheres 2010, Spahn 2009, Cossio 2013]. Individual variations are seen in the interiors of larger viruses (see for example [Xiao 2005, Xiao 2009, Okamoto 2017]), and by the time one reaches the length scale of bacteria one has a large degree of structural heterogeneity, even before reaching the variability of eukaryotic cells. Figure adapted from [Jacobsen 2016a].
to synchrotron light sources for nearautomatic, remote control structure determination. These developments in xray crystallography are so signiﬁcant in science that they deserve a full tellling of their history and conceptual development, which others already provide [Finch 2008, Authier 2013, Jaskolski 2014]. As powerful as it is, crystallography requires crystals, where many identical structures have been persuaded to line up in an orderly lattice. Simple molecules have mostly identical structures, though subunits can twist and swing during chemical reactions (a classic example on a larger, biologically important molecule is the cis–trans conformational change of rhodopsin [Palczewski 2006]). Even so, large macromolecules— such as subunits of the ribosome [Yonath 1980]—can be persuaded to form crystals. However, the interaction with nearest neighbors can slightly aﬀect the positioning of the outermost atoms in a unit cell, and large molecules can have multiple conformational states (which can be sorted out in single particle electron microscopy [Leschziner 2007, Scheres 2010, Spahn 2009, Cossio 2013]). Virus capsids can also be crystallized, though their interiors can show considerable individual variation (see for example [Xiao 2005, Xiao 2009, Okamoto 2017]). By the time one reaches the size of bacteria, all bets are oﬀ with crystallization, and eukaryotic cells show even more variation. Indeed, in biology there is a trend of increasing structural heterogeneity as one goes to larger objects (Fig. 10.2). The consequence is that one loses the ability to form crystalline arrays, as well as the required degree of having identical copies of the same structure. Therefore even though X rays have excellent properties for imaging micrometersized and larger specimens (Fig. 4.82), crystallography is no longer relevant for these larger individual objects. We are then left with a dilemma: there can be situations where we want to see things beyond the limit of what xray lenses can deliver, yet where we do not have enough Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011
10.1 Diffraction: crystals, and otherwise
393
Figure 10.3 Xray diﬀraction pattern from a single object, versus various periodic arrays of that object. The “object” here is a photograph of David Sayre (1924–2012), a cherished colleague of ours who provided key insights into the inversion of diﬀraction patterns [Sayre 1952a]. At left is shown the real space image, while the corresponding diﬀraction intensity is shown at right (with an inset showing a zoomedin view of a subregion). The diﬀraction pattern from a single object is quite continuous, with a characteristic size of “speckles” corresponding to the size of the object as given by Eq. 10.5. As more copies of the object are placed in a regular, periodic array, the diﬀraction intensity begins to be concentrated into Bragg spots with the continuous diﬀraction pattern (what crystallographers call “diﬀuse scattering”) becoming more obscured. Note that the 1 × 2 case shown at upper right produces something like a doubleslit interference modulation of the single object diﬀraction pattern.
structural regularity to use the methods of crystallography to obtain an “image” of a unit cell using Bragg diﬀraction spots. What about the diﬀraction pattern from a noncrystalline object? An example of such a farﬁeld diﬀraction pattern is shown in Fig. 10.3, in a simulation where one goes from one isolated object, to an increasing number of copies of the same object in a regular array. As this ﬁgure shows, the single object diﬀraction pattern is much diﬀerent than the set of Bragg spots that result from a large crystal: it is a much more continuous function, with information accessible across an entire farﬁeld detector array rather than just at certain pixels that coincide with Bragg spots. Since the farﬁeld diﬀraction patterns of Fig. 10.3 can be calculated by taking a Fourier transform complex transmittance (see Box 10.2) of # (Eq. 4.76) of the object’s $ 1 g(x, y) = z exp k iδ(x, y, z) − β(x, y, z) dz and then squaring the result to obtain the inDownloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011
394
Coherent imaging
Box 10.1 The phase problem in crystallography One early statement of the phase problem in crystallography comes from William Duane at Harvard in 1925. Drawing upon Duane’s earlier work on a momentumbased picture of quanta (particles, or photons) interacting with crystals [Duane 1923], Epstein and Ehrenfest had shown that one can calculate the intensities present in Bragg peaks from the density distribution of the diﬀracting material [Epstein 1924]. Duane then stated [Duane 1925]: “If we reverse the line of thought, and attempt to deduce the density, ρ(x, y, z), of the diﬀracting power (or the density of the electron distribution) in a crystal from the measured intensities of the various reﬂected beams by adding together the corresponding terms in the Fourier series, we ﬁnd that these intensities do not determine the phase angles, δ. In other words, an indeﬁnitely large number of distributions of diﬀracting power will produce beams of rays of precisely the same intensities in the same directions.” A simple example of this is shown from the shift theorem of Fourier transforms given by Eq. 4.86; all sideways translations of a speciﬁc Fourier component in a specimen’s density distribution produce diﬀraction at the same angle but with varying phase; yet the phase is not detectable from isolated diﬀraction intensities.
tensity, can we not simply take the square root of the measured diﬀraction intensity and inverse Fourier transform it to recover the object? The answer is a resounding “No!” Diﬀraction at one angle will correspond to a certain grating structure that is part of the object in the Fourier decomposition picture discussed in Section 4.4.7. However, without knowledge of the phase we have no way of determining the shift of that grating with respect to other gratings with other periodicities (recall from the shift theorem of Fourier transforms of Eq. 4.86 that a sideways shift of the grating in real space corresponds to a phase ramp in space). It’s as is if we have the box of parts, but no instructions on how to assemble them correctly. This is demonstrated in Fig. 10.4, which shows that it is the Fourier space phases rather than the magnitudes that provide the correct relative shifts of the component gratings to yield a recognizable realspace image—even if the magnitudes are all wrong! This illustrates the wellknown “phase problem” (Box 10.1) in the recovery of images from farﬁeld intensity recordings, and solutions to this problem are what we address in this chapter.
10.2
Holography In crystallography, Lawrence Bragg explored one novel approach to phasing diﬀraction patterns so that they could be directly inverted: he made a scaled, visiblelight replica of a crystal diﬀraction pattern, and applied refractive phase shifts to selected diﬀraction spots in order to obtain an optical reconstruction of a unit cell [Bragg 1929]. Further development of this idea led to a pair of papers entitled “A new type of ‘sray microscope”’ [Bragg 1939] and “The xray microscope” [Bragg 1942]. However, Bragg soon became
Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011
10.2 Holography
Original image
Image from Fourier magnitudes
395
Image from Fourier phases
Figure 10.4 Illustration of the phase problem in farﬁeld diﬀraction. At left is shown an object g(x, y) which has been digitally vignetted as described below Eq. 4.95. (The object here is a photograph of our colleague Malcolm Howells, taken while skiing at La Clusaz, France). This image was then Fourier transformed to yield the farﬁeld or Fraunhofer amplitude G(u x , uy ) = F {g(x, y)} in Fourier space. In an xray experiment, one can only measure the intensity I(x, y) = G(u x , uy )† G(u x , uy ). One can then recover the measured Fourier space magnitudes as F(u x , uy ) = I(x, y); since the phase is unknown, one can at least try setting it to be a uniform value of zero at all pixels in Fourier space. When √ an inverse Fourier transform is done to attempt to reconstruct the image as grecon,mag = F −1 { Iei·0 }, one can see from the middle image that this is not a very successful approach! If, however, one were able to record not the intensity but the phase ϕG = tan(Im[G]/Re[G]) of the farﬁeld diﬀraction pattern and invert it to obtain grecon,phase = F −1 {1 · eiϕG } using uniform magnitudes in Fourier space, one obtains a very recognizable image of the object as shown at right. This demonstrates how the Fourier space (or reciprocal space; see Box 4.2) phases ϕG are essential √ for reconstructing objects from their diﬀraction patterns, yet the Fourier magnitudes F = I are all that we can record in experiments.
immersed in activities in aid of Britain during World War II [Phillips 1979], and the development of heavyatomsubstitution and computational methods for crystallographic reconstruction eventually swept past this particular proposal. While it did not turn out to be practical, Bragg’s proposal inspired a Hungarianborn British scientist named Dennis Gabor. Gabor had been seeking a way to improve the resolution of transmission electron microscopes [Gabor 1942], where Scherzer had shown that chromatic and spherical aberrations were intrinsic to the lenses used [Scherzer 1936] and that these aberrations set fundamental limits to the resolution that could be obtained [Scherzer 1949]. (These limitations have been pushed back in the last two decades by the development of aberrationcorrecting optics in electron microscopes, as summarized recently [Hawkes 2009, Hawkes 2015].) While awaiting his turn for a game of tennis during the Easter holiday in 1947, a new approach [Gabor 1948] came to Gabor: ﬁnd a way to record the electron waveﬁeld, and then recreate a magniﬁed version of it optically where one can use visiblelight optics to correct for the electron optics aberrations. Of course Gabor realized that his approach had broader applications, so he chose [Gabor 1949] a concise, descriptive word for it: holography (from the Greek words ὅλος or h´olos for “whole” and γραφή or graph´e for “drawing”) since a hologram captures the whole information of a waveﬁeld by representing its magnitude and phase. Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011
396
Coherent imaging
Box 10.2 Exit waves in coherent imaging The discussion of coherent imaging methods in this chapter is limited to the case of reconstructing an exit wave from the object which represents the input wave$ψ0 #
1 modulated by the object’s complex transmittance z exp k iδ(x, y, z)−β(x, y, z) dz as was described in Eq. 3.71. That is, we assume that there is no diﬀraction within the object, so that one can simply calculate the modulation on the input wave ψ0 based on the projected thickness t(x, y) through the object. (If there are many materials with diﬀering values of the xray refractive index values δ and β, each with their own projected thickness, one can calculate their combined eﬀect as described in Section 3.3.5). At very high resolution one must begin to consider the curvature of the Ewald sphere (Fig. 4.16), which gives an eﬀective depth of ﬁeld limit of λ/N.A.2 (Eq. 4.60) where N.A. is now the largest specimen scattering angle that eﬀectively contributes to the reconstructed image. Fresnel diﬀraction eﬀects also begin to come into play, as was illustrated for example in Fig. 4.63 and as will be shown below in Fig. 10.19. For these reasons, thicker specimens can begin to violate the simple interpretation of coherent diﬀraction images [Thibault 2006], as will be discussed in Section 10.5.
10.2.1
Inline or Gabor holography The geometry for Gabor’s original approach to holography (also called inline holography) is shown in Fig. 10.5. A small object is illuminated with a coherent reference wave ψr . A fraction of this reference wave is modulated by the specimen ψ s , and these two waveﬁelds ψr and ψ s then propagate to a detector. The detector records an intensity distribution I of I = ψ s 2 + ψr 2 + ψ s ψ†r  + ψr ψ†s ,
(10.2)
where ψ s 2 and ψr 2 are the coherent diﬀraction patterns produced by the specimen and reference waves, respectively, and ψ s ψ†r and ψr ψ†s are the holographic interference terms. In Gabor’s original conception, the detector is photographic ﬁlm which is then processed to yield a transparency. When this transparency is illuminated by the same reference wave ψr , the reference wave is modulated to produce two waves: a converging realimage wave, and a diverging virtualimage wave which, at the plane of the real image, appears like an outoffocus twin of the real image, so it is called a twin image wave. If the reference wave is a plane wave, the inline hologram of a point object produces something rather familiar: a zone plate pattern! This was ﬁrst pointed out by Rogers [Rogers 1950], and it gives some immediate insight into the characteristics of inline holography. The most important of these is that the resolution of an inline hologram is limited to the resolution of the hologram detector: the outermost fringes of the hologram are like the outermost zones of a Fresnel zone plate with the same maximum angle collected from the specimen (the N.A. of the hologram). Just as the spatial resolution of a Fresnel zone plate is limited by the outermost zone width drN (Eq. 5.28), the spatial Downloaded from https://www.cambridge.org/core. Simon Fraser University Library, on 04 Nov 2019 at 02:11:12, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781139924542.011
10.2 Holography
Film recording
z
Film transparency age im in Tw
e
av
tw
c bje
397
O
Virtual image
Object
Real image
Ƨ Reference wave
Hologram recording
Reference wave
Hologram reconstruction
Figure 10.5 Schematic representation of Gabor, or inline, holography with plane wave illumination. A coherent reference wave illuminates an object as well as continuing mostly unobstructed to a detector, where it interferes with a wave scattered by the object. The detector (photographic ﬁlm in this schematic) records the intensity of this interference pattern, thus encoding the phase. The same reference wave then illuminates a ﬁlm transparency (processed from the ﬁlm recording) which modulates the waveﬁeld’s magnitude according to the recorded intensity; this produces both a converging real image of the object, and its conjugate which is a diverging virtual image known as the twin image when viewed at the plane of the real image. Because the inline hologram produced with plane wave illumination resembles a Fresnel zone plate, the spatial resolution of the image is limited by the spatial resolution of the detector, just as the spatial resolution of a zone plate is limited by the ﬁnest, outermost zones. This can be seen from the large angle θ between the reference wave and the largestscatteringangle object wave.
resolution of an inline hologram is limited by the width of the ﬁnest fringe halfperiod that can be recorded on the detector. In addition, one has the same spatial and spectral coherence requirements as with an equivalent zone plate. Finally, as one curious sidenote, one can use an xray holographic approach with higher diﬀraction orders to replicate a Fresnel zone plate pattern except with ﬁner zone width in a method called spatial frequency multiplication [Yun 1987a, Yun 1988, Jacobsen 1992c]. Though not intended as a holography experiment (because holography had not been “discovered” yet!), the ﬁrst experimental demonstration of what can now be considered an inline xray hologram was obtained in 1932 by using a suﬃciently coherent xray beam to illuminate a wire [Kellstr¨om 1932]. An optical reconstruction was obtained in 1952 [ElSum 1952], soon after Albert Baez ﬁrst discussed the possibilities of xray holography [Baez 1952a, Baez 1952b]. Baez’s paper discussed the detector resolution limit noted above, and also considered xray microscopy with Fresnel zone plates as an alternative to holography. While a more complete history of the early eﬀorts is presented elsewhere [Jacobsen 1990a], the ﬁrst real successes in xray holography were obtained by Aoki and Kikuta [Aoki 1972, Aoki 1974] using synchrotron radiation and ﬁnegrain photographic ﬁlm. The sub100 nm resolution barrier was broken by using poly(methylmethacrylate) (PMMA; commonly used as a photoresist in electr