Lightwave Communications
 1108427561, 9781108427562

Citation preview

Lightwave Communications This pioneering, course-tested text combines communications theory with the physics of optical communications. Comprehensive and rigorous, it brings together an in-depth treatment of the physical characteristics of the guided lightwave channel with the study of modern methods of algorithmic-based communication in time and space. The many different levels at which a lightwave communication signal can be described, (ray, wave, photon or quantum state), are integrated to provide a unified explanation of how a commonplace bit stream is transformed into a physical lightwave, how that lightwave travels through an optical fiber, and how it is then transformed back into the bit stream. Background fundamentals such as linear systems and electromagnetics are explained in relation to modern topics such as channel models, encoding, modulation, and interference, and end-of-chapter problems are provided throughout. This is an essential text for both graduates and senior undergraduates taking courses on optical communications, and professionals working in the area. George C. Papen is a Professor in the Department of Electrical and Computer Engineering at the University of California at San Diego. Richard E. Blahut is the Emeritus Henry Magnuski Professor in the Department of Electrical and Computer Engineering at the University of Illinois, having served as the Department Head from 2001 to 2008. He is a member of the US National Academy of Engineering.

Lightwave Communications G E O R G E C . PA P E N University of California, San Diego

RICHARD E. BLAHUT University of Illinois, Urbana-Champaign

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108427562 DOI: 10.1017/9781108551748

±c Cambridge University Press 2019

This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2019 Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall A catalogue record for this publication is available from the British Library. Library of Congress Cataloging-in-Publication Data Names: Papen, George C., author. | Blahut, Richard E., author. Title: Lightwave Communications / George C. Papen, University of California, San Diego, Richard E. Blahut, University of Illinois, Urbana-Champaign. Description: Cambridge, United Kingdom; New York, NY, USA : Cambridge University Press, 2019. | Includes bibliographical references. Identifiers: LCCN 2018027190 | ISBN 9781108427562 (hardback) Subjects: LCSH: Optical communications. Classification: LCC TK5103.59 .P374 2019 | DDC 621.382/7–dc23 LC record available at https://lccn.loc.gov/2018027190 ISBN 978-1-108-42756-2 Hardback Additional resources for this publication at www.cambridge.org/papen Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

To Theresa, Kate, and Alex

If you want to find the secrets of the universe, think in terms of energy, frequency, and vibration Nikola Tesla

Contents

Preface Acknowledgements Notation Primary Symbols List of Symbols

page xix xxiii xxv xxvii xxviii

1

Introduction 1.1 Digital Lightwave Communication Systems 1.1.1 Channel Coding 1.1.2 Modulation 1.1.3 Types of Lightwave Channels 1.1.4 Demodulation 1.1.5 Detection and Decoding 1.1.6 Error Probabilities 1.2 Lightwave Signal Models 1.2.1 Relationship of Wave Optics to Photon Optics 1.2.2 Choosing a Signal Model 1.3 Modulation and Demodulation 1.3.1 Phase-Synchronous Systems 1.3.2 Phase-Asynchronous Systems 1.4 Codes and Coded-Modulation 1.5 Multiplexing 1.6 Communication Channels 1.6.1 Common Wave-Optics Communication Channels 1.6.2 Channel Capacity 1.7 References 1.8 Historical Notes 1.9 Problems

1 2 2 3 3 4 5 5 6 8 10 11 12 14 15 15 17 18 20 22 23 24

2

Background 2.1 Linear Systems 2.1.1 Bandwidth and Timewidth 2.1.2 Passband and Complex-Baseband Signals 2.1.3 Signal Space 2.2 Random Signals 2.2.1 Probability Distribution Functions 2.2.2 Random Processes

27 27 34 39 41 52 53 67

viii

Contents

2.3

2.4 2.5

Electromagnetics 2.3.1 Material Properties 2.3.2 The Wave Equation 2.3.3 Geometrical Optics 2.3.4 Polarization 2.3.5 Random Lightwave Fields References Problems

79 80 83 91 93 97 99 99

3

The Guided Lightwave Channel 3.1 Characteristics of an Optical Fiber 3.1.1 Fiber Structure 3.1.2 Optical Fiber Attenuation 3.2 Guided Signal Propagation 3.2.1 Guided Rays 3.2.2 Guided Waves 3.2.3 Guided Photon Streams 3.3 Waveguide Geometries 3.3.1 Modes in a Slab Waveguide 3.3.2 Modes in a Step-Index Fiber 3.3.3 Modes in a Graded-Index Fiber 3.4 Mode Coupling 3.4.1 Derivation of the Coupled Equations 3.4.2 Solution of the Coupled Equations 3.5 References 3.6 Historical Notes 3.7 Problems

110 110 110 112 115 115 121 122 123 125 132 144 146 147 149 149 149 150

4

The Linear Lightwave Channel 4.1 Ray Dispersion 4.2 Wave Dispersion 4.2.1 Dispersion in a Slab Waveguide 4.2.2 Dispersion for a Linearly Polarized Mode 4.3 Narrowband-Signal Dispersion 4.3.1 Narrowband Dispersion 4.3.2 Material Dispersion 4.3.3 Narrowband Signal Propagation 4.4 Group Delay 4.4.1 Mode-Groups in a Step-Index Fiber 4.4.2 Mode-Groups in a Graded-Index Fiber 4.4.3 Step-Index Multimode Fiber 4.4.4 Wavelength-Dependent Group Delay 4.5 Linear Distortion 4.5.1 Distortion from Mode-Dependent Group Delay

155 156 157 157 159 159 161 163 166 167 168 169 171 173 175 175

Contents

4.6

4.7 4.8 4.9

4.5.2 Distortion from Wavelength-Dependent Group Delay 4.5.3 Dispersion-Controlled Optical Fiber 4.5.4 Independent Sources of Distortion Polarization-Mode Dispersion 4.6.1 Jones Representation 4.6.2 Stokes Representation 4.6.3 Distortion from Polarization-Dependent Group Delay 4.6.4 Distortion from Polarization-Dependent Loss References Historical Notes Problems

ix

179 180 181 182 183 186 187 189 190 191 191

5

The Nonlinear Lightwave Channel 5.1 Anharmonic Material Response 5.1.1 Wave-Optics Description 5.1.2 Photon-Optics Description 5.2 Kinds of Nonlinearities 5.2.1 The Kerr Nonlinearity 5.2.2 Raman Scattering 5.2.3 Brillouin Scattering 5.3 Signal Propagation in a Nonlinear Fiber 5.3.1 Phase Matching 5.3.2 Intensity-Dependent Index Change 5.3.3 Nonlinear Propagation Constant 5.3.4 Characteristic Lengths 5.3.5 Classification of Nonlinear Channels 5.4 Single-Carrier Nonlinear Schrödinger Equation 5.4.1 Nonlinear Narrowband Signal Propagation 5.4.2 Nonlinear Distortion for a Single Pulse 5.5 Interference in a Nonlinear Fiber 5.5.1 Cross-Phase Modulation 5.5.2 Nonlinear Schrödinger Equation for Multiple Subcarriers 5.5.3 Nonlinear Interference 5.6 Computational Methods 5.7 References 5.8 Historical Notes 5.9 Problems

201 202 202 203 204 204 205 206 208 208 209 210 213 215 216 217 219 223 224 224 226 229 231 231 231

6

Random Signals 6.1 The Physics of Randomness and Noise 6.1.1 Randomness and Entropy 6.1.2 Photon–Matter Interactions 6.1.3 Expected Energy 6.2 Probability Distribution Functions

237 238 239 244 247 250

x

Contents

6.3

6.4

6.5

6.6

6.7

6.8 6.9 6.10 7

6.2.1 Thermal Noise 6.2.2 Spontaneous Emission Noise 6.2.3 Photon Noise The Poisson Transform 6.3.1 The Direct Poisson Transform 6.3.2 The Inverse Poisson Transform 6.3.3 Forms of Uncertainty 6.3.4 The Gordon Distribution Power Density Spectra 6.4.1 Power Density Spectrum of the Lightwave Noise Power 6.4.2 Power Density Spectrum of the Photodetected Signal with Additive Noise 6.4.3 Power Density Spectrum of the Photodetected Signal with Shot Noise Direct Photodetection with Gaussian Noise 6.5.1 Continuous Probability Density Functions 6.5.2 Discrete Probability Mass Functions Balanced Photodetection with Gaussian Noise 6.6.1 Orthogonal Expansion 6.6.2 Special Cases 6.6.3 Direct Photodetection 6.6.4 Spatially Correlated Modes Bandlimited Shot Noise 6.7.1 Approximate Analysis 6.7.2 General Analysis References Historical Notes Problems

Lightwave Components 7.1 Passive Lightwave Components 7.1.1 Lightwave Couplers 7.1.2 Delay-Line Interferometers 7.1.3 Multipath Interference 7.1.4 Optical Filters 7.2 Semiconductors 7.3 Lightwave Receivers 7.3.1 Photodetectors 7.3.2 Lightwave Demodulators 7.4 Lightwave Amplifiers 7.4.1 Doped-Fiber Lightwave Amplifiers 7.4.2 Gain in a Doped-Fiber Amplifier 7.4.3 Semiconductor Lightwave Amplifiers 7.4.4 Wavelength Dependence of the Gain

250 256 257 262 262 262 264 265 267 267 268 269 271 272 275 277 277 279 281 284 285 286 287 291 291 292 298 298 298 301 302 302 302 304 304 307 312 313 315 318 320

Contents

7.5

7.6

7.7

7.8

7.9 7.10 7.11

7.4.5 Noise from Multiple Amplifiers Lightwave Transmitters 7.5.1 Light-Emitting Diodes 7.5.2 Laser Diodes 7.5.3 External Modulators Noise in Lightwave Receivers 7.6.1 Dark-Current Noise 7.6.2 Internal-Gain Noise Noise in Lightwave Amplifiers 7.7.1 Power Density Spectrum 7.7.2 Probability Distribution Functions 7.7.3 Noise Figure 7.7.4 Nonlinear Phase Noise Noise in Laser Transmitters 7.8.1 Power Density Spectra 7.8.2 Probability Density Functions References Historical Notes Problems

xi

321 323 323 326 329 331 331 331 334 334 336 338 339 342 342 347 349 350 350

8

The Electrical Channel 8.1 The Lightwave Channel 8.1.1 Linear Single-Input Single-Output Lightwave Channels 8.1.2 Multiplex Channels 8.1.3 Multi-input Multi-output Channels 8.1.4 Channel Statistics in Time and Space 8.2 Lightwave Demodulation 8.2.1 Demodulation of the Lightwave Complex Amplitude 8.2.2 Demodulation of the Lightwave Intensity 8.2.3 Demodulation of Pulse Intensity in Multiple Modes 8.2.4 Demodulation of Pulse Intensity in a Single Mode 8.2.5 Cumulative Electrical Power Density Spectrum 8.3 Discrete-Time Electrical Channels 8.3.1 Interpolation and Sampling 8.3.2 Conventional Discrete-Time Channels 8.4 References 8.5 Historical References 8.6 Problems

361 363 363 368 372 374 377 378 387 391 392 395 397 399 400 402 403 403

9

The Information Channel 9.1 Prior and Posterior Distributions 9.2 Methods of Modulation 9.2.1 Signal Constellations 9.2.2 Nyquist Pulses

410 412 413 414 416

xii

Contents

9.3 9.4

9.5

9.6

9.7

9.8 9.9 9.10 10

9.2.3 Detection 9.2.4 Partial-Response Signaling 9.2.5 Sampler Response Methods of Reception Detection Filters 9.4.1 Linear Detection Filters 9.4.2 Detection Filters for Additive White Noise 9.4.3 Detection Filters for Signal-Dependent Noise 9.4.4 Detection Filters for General Noise Detection of a Binary Signal 9.5.1 Detection of a Binary Wave-Optics Signal 9.5.2 Detection of a Binary Photon-Optics Signal 9.5.3 Binary Detection for a Dispersive Channel 9.5.4 Displacement Detection of a Binary Signal Detection of a Multilevel Signal 9.6.1 Detection of a Multilevel Wave-Optics Signal 9.6.2 Detection of a Multilevel Photon-Optics Signal Noise Models for Intensity Detection 9.7.1 Additive Electrical-Noise Model 9.7.2 Signal-Dependent Shot-Noise Model 9.7.3 Signal–Noise Mixing Model References Historical Notes Problems

Modulation and Demodulation 10.1 Modulation Formats 10.1.1 Complex Signal Constellations 10.1.2 Complex-Baseband Modulation 10.1.3 Binary Modulation Formats 10.1.4 Multisymbol Modulation Formats 10.1.5 Efficiency of Modulation Formats 10.2 Phase-Synchronous Demodulation 10.2.1 Demodulation of Binary Formats 10.2.2 Demodulation of Multilevel Real-Valued Formats 10.2.3 Detection of Multilevel Complex-Valued Formats 10.2.4 Demodulation with Phase Noise 10.2.5 Demodulation with Shot Noise 10.3 Dual-Polarization Signaling 10.3.1 Constellations in Four-Dimensional Signal Space 10.3.2 Dual-Polarization Modulation and Demodulation 10.4 Constellations in Signal Space 10.4.1 General Signal Constellations 10.4.2 Constellations on the Complex Plane

417 418 419 422 423 424 425 427 429 430 430 439 444 445 447 447 451 453 454 454 454 455 455 455 463 464 464 465 466 471 474 476 476 476 477 484 486 491 491 493 496 497 497

Contents

10.5

10.6 10.7 10.8 10.9

10.4.3 Orthogonal Constellations 10.4.4 Nonorthogonal Constellations Noncoherent Demodulation 10.5.1 Detection of Noncoherent Orthogonal Signals 10.5.2 Detection of Differential-Phase-Shift-Keyed Signals 10.5.3 Detection of Noncoherent On–Off-Keyed Intensity Signals Energy Demodulation 10.6.1 Sample Statistic References Historical Notes Problems

xiii

498 499 500 501 503 504 509 510 512 513 513

11

Interference 11.1 Intersymbol Interference 11.2 Equalization 11.2.1 Zero-Forcing Equalization 11.2.2 Matched Filter Equalization 11.2.3 Minimum-Error Linear Equalizer 11.2.4 Detection Filters for Additive White Noise 11.2.5 Decision Feedback 11.2.6 Prefiltering and Precoding 11.3 Sequence Detection 11.3.1 Trellis Diagrams 11.3.2 Minimum-Distance Sequence Detection 11.3.3 Maximum-Likelihood Sequence Detection 11.3.4 Maximum-Posterior Sequence Detection 11.4 Interchannel Interference 11.4.1 Uncompensated Linear Interchannel Interference 11.4.2 Uncompensated Nonlinear Interchannel Interference 11.4.3 Linear Equalization of Polarization Interference 11.4.4 Linear Equalization of Interchannel Interference 11.5 Equalization of Intensity Modulation 11.5.1 Intensity Interference 11.5.2 Intensity Equalization with Shot Noise 11.6 Interference in Nonlinear Channels 11.6.1 Sequence Detection for a Nonlinear Channel 11.6.2 Equalization of a Nonlinear Channel 11.7 References 11.8 Historical Notes 11.9 Problems

522 523 527 528 528 529 530 533 533 535 537 538 542 544 551 551 553 553 555 556 556 558 559 560 563 565 565 566

12

Channel Estimation 12.1 Channel Parameters 12.2 Carrier-Phase Estimation

572 572 574

xiv

Contents

12.3 12.4 12.5

12.6 12.7

12.8 12.9 12.10 13

12.2.1 Maximum-Likelihood Phase Estimation 12.2.2 Phase-Locked Loops 12.2.3 Phase Estimation of a Data-Modulated Waveform 12.2.4 Generalized Likelihood Function Clock-Phase Estimation Frame Synchronization Channel-State Estimation 12.5.1 Impulse Response Estimation 12.5.2 Detection-Filter Estimation 12.5.3 Constant-Modulus Objective Function 12.5.4 Adaptive Estimation Polarization-State Estimation Estimation of Spatial Modes 12.7.1 Channel Matrix Estimation for Multiple Spatial Modes 12.7.2 Modal Detection-Filter Estimation References Historical Notes Problems

Channel Codes 13.1 Code Structure and Code Rate 13.1.1 Decoding 13.1.2 Classes of Codes 13.1.3 Nesting of Codes 13.2 Algebraic Block Codes 13.2.1 Galois Fields 13.2.2 Linear Codes 13.2.3 Matrix Description of Linear Codes 13.2.4 Binary Block Codes 13.2.5 Nonbinary Block Codes 13.2.6 Spherical Decoding 13.2.7 Performance Analysis 13.2.8 Descriptions of Linear Codes as Graphs 13.2.9 Limits of Spherical Decoding 13.3 Convolutional Codes 13.3.1 Convolutional Encoders 13.3.2 Decoding on a Trellis 13.3.3 Sequential Decoding 13.3.4 Performance Analysis 13.4 Cutoff Rate and Critical Rate 13.5 Composite Codes 13.5.1 Componentwise Marginalization 13.5.2 Berrou Codes 13.5.3 Turbo Decoding

574 576 581 583 586 588 590 591 595 598 599 600 601 602 603 603 604 604 608 609 610 611 611 612 612 613 613 617 618 621 623 625 627 628 629 632 633 634 635 639 640 642 643

Contents

13.6 13.7

13.8 13.9 13.10 14

13.5.4 Gallager Codes 13.5.5 Message-Passing Decoders Trellis-Coded Modulation Modulation Codes 13.7.1 Runlength-Limited Codes 13.7.2 Spectral-Notch Codes 13.7.3 Partial-Response Codes References Historical Notes Problems

The Information Capacity of a Lightwave Channel 14.1 Entropy, Mutual Information, and Channel Capacity 14.1.1 Types of Information Channels 14.1.2 Entropy 14.1.3 Mutual Information 14.1.4 Fano Inequality 14.1.5 Channel Capacity 14.1.6 Signal and Channel Models 14.2 Photon-Optics Capacity 14.2.1 The Discrete Memoryless Photon-Optics Channel 14.2.2 The Continuous Photon-Optics Channel 14.2.3 The Ideal Photon-Optics Channel 14.2.4 The Additive-Noise-Limited Photon-Optics Channel 14.3 Wave-Optics Capacity 14.3.1 Capacities and Priors for Waves and Photons 14.3.2 Soft-Decision Capacity and Hard-Decision Capacity Using Wave Optics 14.3.3 Intensity Modulation 14.3.4 Phase Modulation 14.4 Capacity of a Product Channel 14.4.1 Capacity of a Gaussian MIMO Channel 14.4.2 Capacity of a Random MIMO Channel 14.4.3 Capacity of a Bandlimited Wave-Optics Channel 14.4.4 Capacity of a Bandlimited Photon-Optics Channel 14.5 Spectral Rate Efficiency 14.5.1 Wave-Optics Spectral Rate Efficiency 14.5.2 Photon-Optics Spectral Rate Efficiency 14.5.3 Spectral Rate Efficiency for Constrained Modulation Formats 14.6 Nonlinear Lightwave Channels 14.6.1 The Full Kerr Lightwave Channel 14.6.2 The Memoryless Kerr Information Channel 14.6.3 Dispersionless Channel 14.6.4 Kerr Wavelength-Multiplex Information Channel

xv

646 648 655 659 660 663 663 664 664 665 669 671 672 673 674 675 676 679 680 682 685 685 688 696 697 698 701 703 706 707 709 710 713 718 718 719 720 722 723 724 724 726

xvi

Contents

14.6.5 Kerr Wavelength MIMO Channel 14.6.6 The Capacity Using the Envelope Method References Historical Notes Problems

728 729 729 730 731

The Quantum-Optics Model 15.1 An Operational View of Quantum Optics 15.1.1 Lightwave Signal States 15.1.2 Modulation 15.1.3 Measurements 15.1.4 State Detection 15.1.5 Gaussian Signal States 15.2 A Formal View of Quantum Optics 15.2.1 Signal States 15.2.2 Operators 15.2.3 Time Dependence 15.2.4 Quantum Wave Functions 15.2.5 Measurements 15.3 Coherent States 15.3.1 Operators for Coherent States 15.3.2 Canonical Commutation Relationship 15.3.3 Position–Momentum Representation 15.3.4 Minimum-Uncertainty Coherent States 15.3.5 The Coherent-State Operator 15.3.6 Representation of a Coherent State 15.3.7 The Pairwise Nonorthogonality of Coherent States 15.3.8 Antipodal Coherent States 15.4 Statistical Quantum Optics 15.4.1 Derivation of the Density Matrix 15.4.2 Representation of a Density Matrix 15.4.3 Decoherence 15.4.4 Quantum Entropy 15.4.5 Measurements on Density Matrices 15.5 Classical Methods for Quantum-Lightwave Signals 15.5.1 Lightwave Couplers for Coherent States 15.5.2 Homodyne Demodulation to Real Baseband 15.5.3 Joint Demodulation 15.6 Quantum-Lightwave Signal Distributions 15.6.1 The P Quasi-probability Distribution 15.6.2 The Wigner Quasi-probability Distribution 15.6.3 The Husimi Quasi-probability Distribution 15.6.4 Representations of Classical Signals 15.6.5 Gaussian Signal States

736 736 737 741 742 747 748 750 750 751 752 752 754 761 761 763 765 766 768 772 774 775 776 777 781 783 784 786 789 789 790 792 797 797 799 801 802 806

14.7 14.8 14.9 15

Contents

15.7 15.8 15.9 16

References Historical Notes Problems

xvii

807 808 808

The Quantum-Lightwave Channel 16.1 Methods of Quantum-Optics State Detection 16.1.1 Classical Channels and Detection 16.1.2 Quantum-Lightwave Channels and State Detection 16.1.3 Detection Operators 16.1.4 Detection for Pure Symbol States 16.2 State Detection for Binary Modulation Formats 16.2.1 Detection for Binary Mixed-Signal-State Modulation 16.2.2 Detection for Binary Pure-State Modulation 16.2.3 Detection for Antipodal Coherent-State Modulation 16.2.4 Detection for On–Off-Keyed Coherent-State Modulation 16.2.5 Other Methods of State Detection 16.2.6 Binary State Detection in Additive Noise 16.3 State Detection for Multilevel Modulation Formats 16.3.1 Square-Root Detection Basis 16.4 Quantum-Lightwave Information Channels 16.4.1 Signal-Dependent Information Channels 16.4.2 Component-Symbol State Preparation and Detection 16.4.3 Block-Symbol-State Preparation and Detection 16.5 Classical Channel Capacity of a Quantum-Lightwave Channel 16.5.1 An Ideal Classical Channel 16.5.2 Holevo Information 16.5.3 A Noiseless Product-State Channel 16.5.4 A Noisy Product-State Channel 16.5.5 The General Quantum-Lightwave Channel 16.6 The Ideal Quantum-Lightwave Channel 16.6.1 Capacity for Conventional Binary Modulation Formats 16.6.2 Capacity using Component-Symbol-State Detection 16.6.3 Capacity Using Block-Symbol-State Detection 16.7 The Gaussian Quantum-Lightwave Information Channel 16.7.1 Gaussian Channels 16.7.2 Phase-Insensitive Gaussian Channels 16.7.3 Capacity for a Phase-Insensitive Gaussian Channel 16.8 References 16.9 Historical Notes 16.10 Problems

815 816 817 818 820 823 827 827 829 831 834 835 837 839 840 843 843 844 846 853 854 856 858 860 861 862 863 867 869 871 871 871 875 878 879 879

Bibliography Index

883 902

Preface

A lightwave communication signal can be described at many levels: a stream of photons, a bundle of rays, an electromagnetic wave, a quantum state, a modulated waveform, or a stream of information bits. Each description has its own vocabulary, traditions, and notation. The goal and challenge of this book is to seamlessly integrate these levels into a unified treatment of an information-bearing lightwave traveling in an optical fiber. To the user of the communication system, however, such a discussion of the science of guided lightwave signal propagation is irrelevant. The user sees only the reliable transmission of a bit stream and is unaware that at the deepest level of the system, the bits have long lost their individual identity, buried deeply within the system and spread across time, wavelength, polarization, and space so as to create this reliable communication channel. Thus, the second goal and challenge of the book is to explain the many steps through which a commonplace bit stream is transformed into a physical lightwave and then back again to that bit stream. This text combines a rigorous foundation of the physical characteristics of the guided lightwave channel with the study of modern methods of algorithm-based communication in time and space. Our view is that a fiber together with the lightwave it conducts is only one component of the larger lightwave communication system. The integration of this material into a single text that is accessible to readers from a wide range of backgrounds is necessary to facilitate the design of future lightwave communication systems. The physics of a guided lightwave, as such, is a well-developed subject based on quantum theory and its two offspring, wave optics and photon optics. Within this richer environment, this book studies many traditional topics of digital communications extended to the theory of guided lightwaves. At these frequencies, the carrier has a dual wave/particle nature. Often the conventional theory of the communication topics must be enriched for these purposes. As a consequence, most chapters of the book contain some material that has been repurposed or reinterpreted. These individual chapters, in return, inform with new insights on how digital information can be conveyed. We believe that some readers who are not interested in guided lightwave communications will find value added to the understanding of conventional topics of digital communication. This book is addressed to them as well. Early textbook treatments of lightwave communication systems have emphasized the physical characteristics of the lightwave channel and the components used to generate, amplify, and photodetect lightwave signals. This emphasis contrasts with the usual

xx

Preface

treatments of other communication systems at lower frequencies. In the more traditional lower-frequency communication systems, the exponential increase in the available processing power has enabled complex coding and detection algorithms that use many samples per modulation interval, and with many bits per modulation interval. This is due to the remarkable power and flexibility of modern methods of coded-modulation. Existing techniques operate close to the fundamental theoretical limit. These techniques include real-time estimation of the communication channel and may perform thousands of computational operations per symbol interval to yield data rates in excess of ten bits per second for every hertz of bandwidth or at an energy per bit smaller than the noise power density spectrum. Historically, lightwave communication systems have used simpler methods of modulation and relied on wavelength-multiplexed systems to scale the capacity of a single fiber. However, the current rate of growth for data that can be transmitted on a single wavelength channel is now less than the rate of growth of the network traffic and is also less than the rate of growth of processing power. As a consequence, many of the communication techniques and algorithms developed for other multi-input multi-output communication systems that exploit both time and space are beginning to be applied to lightwave communication systems. These techniques will increasingly be used to expand the information rate carried in a single fiber. A concurrent trend is the rapid evolution from single-wavelength point-to-point links into multiple-wavelength networks where many network functions such as multiplexing and switching may be implemented in the optical domain. Understanding and designing modern lightwave communication systems that reflect these trends requires a unifying system-level approach that is accessible to readers from a physics-oriented background, who have been the traditional developers of lightwave communication systems, as well as readers with a background in other types of communication systems and networks. From a systems-level perspective, there are significant differences between current lightwave communication systems and other kinds of communication systems that operate at lower frequencies. In addition to the significantly higher data rates, the principal differences are (1) the physical sources that generate noise and interference, (2) mitigating signal impairments in multiple-wavelength and long-distance systems that are caused by nonlinearities in the fiber channel, and (3) the large energy of a lightwave photon compared with the mean thermal energy of the environment. The last difference brings elements of the quantum nature of a lightwave signal to the forefront. In contrast to lower-frequency communication systems, for which the assumption of additive thermal noise is usually a valid starting point, the physical sources of noise for lightwave sources, lightwave amplifiers, and some photodetectors are different, with the statistics associated with some of these processes being nongaussian and signaldependent. The signal-dependent nature of both the gain and the noise makes the system-level analysis more involved. At the network level, the use of lightwave signaling has a profound effect on the type of functionality that can be implemented. Currently, it is difficult to rapidly switch a large number of lightwave signals without first converting each lightwave signal into

Preface

xxi

an electrical signal, then reconverting the electrical signal, after switching, back into a lightwave signal. It is also practically impossible to store photons for meaningful periods of time because photons, as such, do not exist unless they are propagating in a medium or a vacuum. The design of future lightwave networks is facilitated by an understanding of how these basic aspects of lightwave communication systems differ from lower-frequency communication systems. The material in this book is suitable for an advanced undergraduate course or a first-year graduate course. It is complementary to most introductory undergraduate communication courses. Extensive background and reference material on linear systems, random signals, and electromagnetics is provided in Chapter 2 for review as needed. Chapter 2 also serves to introduce the notation and terminology used throughout the text. Our journey describing the elements of a lightwave communication system starts with a description of guided lightwaves in Chapter 3 and ends with a description of channel codes in Chapter 13, but the study does not end there. These eleven core chapters develop the theory in depth, but are directed towards the application rather than towards the science. Chapters 3 and 4 discuss the linear guided lightwave channel and dispersion, while Chapter 5 considers the nonlinear lightwave channel. Fundamental noise concepts, as well as the Poisson transform as a practical proxy for much of the relevant nature of quantum optics, are addressed in Chapter 6. Concepts from these early chapters are applied to discuss aspects of the components used for lightwave systems as described in Chapter 7. The electrical channel, which surrounds the lightwave channel, is developed in Chapter 8. The information channel, which surrounds the electrical channel, is developed in Chapter 9. These abstracted channel models include the encoding and modulation process at the transmitter and the demodulation, detection, and decoding process at the receiver. Modulation formats are discussed in Chapter 10. Interference is discussed in Chapter 11, with techniques that estimate the channel response parameters described in Chapter 12. Channel coding for lightwave channels is discussed in Chapter 13. The last three chapters, Chapters 14, 15, and 16, discuss lightwave communications at a deeper and abstract level directed more towards the science of the topic. Chapter 14 provides a formal treatment of information theory, which is applied to determine the capacity of different types of lightwave channels. This novel chapter approaches the topic of information theory from the unique perspective of the lightwave, thereby unifying particle and wave descriptions of the same information channel. Chapter 15 is an interlude that presents background material for a quantum-optics signal model and discusses how this model is related to other signal models. The final chapter, Chapter 16, applies this quantum-optics signal model to study quantum-optics lightwave communication systems, a formal model whose presence is often seen in the background throughout the book. Chapters 15 and 16 present standard ideas of quantum information theory in an original way that is appropriate to our treatment of lightwave communications. The draft manuscript of the book was developed and tested over many years in the classrooms at the University of Illinois, the University of California at San Diego,

xxii

Preface

and the University of Pennsylvania. The core material in the book naturally divides into Chapters 3–7, which describe the physical aspects of a lightwave channel, and Chapters 8–13, which consider a lightwave communication system. Accordingly, several courses can be constructed, depending on the background of the students. Students with a strong electromagnetics background may prefer to bypass the early chapters, using them only for reference. A one-semester course has been taught multiple times covering primarily Chapters 1–10. A one-quarter (ten-week) graduate-level course has been taught covering material from Chapters 8–11, augmenting several topics with material from earlier chapters. A one-semester (fifteen-week) graduate-level course might include additional material from Chapters 12–14 or a range of topics from Chapters 1–7, depending on the instructor. Moreover, individual chapters on various topics may provide supplementary material not found elsewhere for courses on those topics. An advanced graduate-level course on quantum-lightwave communications and quantum information theory can be constructed using Sections 6.1–6.3, part or all of Chapter 14, and Chapters 15 and 16. This material, with its unique perspective, provides a thorough introduction to quantumlightwave communication systems that convey classical information using the earlier parts of the book as a reference. A one-semester undergraduate-level course that uses linear system concepts and gaussian noise models can be constructed from the following core material: Sections 3.1–3.3, 4.1–4.4, 5.1, 5.2, 6.1.2–6.2, 7.1–7.4, 8.2–8.2.4, 9.5–9.5.2, 10.1–10.4, 11.1, and 11.2, with the depth and specific topics depending on the instructor. Finally, it should be mentioned that, although the chapters are integrated with a common underlying treatment, most chapters can be read independently without too much difficultly. This means that students of other subjects, such as physical optics, photonic devices, information theory, and communication systems, will find original material in the appropriate chapters.

Acknowledgements

A book of this scope would not have been possible without the help of many people drawn from diverse research communities. Courses based on early versions of this book were taught at the University of Illinois by Andy Singer and Peter Dragic. At the University of California at San Diego, our colleagues Nikola Ali´c, Joseph Ford, Massimo Franceschetti, Shayan Mookherjea, Robert Lugannani, Stojan Radi´c, and Alexander Vardy provided comments and read parts of the book specific to their research interests. The chapter on lightwave components was reviewed by James Coleman and Joseph Campbell. Many students provided useful comments, including Dumitru Ionescu, Patrick Ling, Vahid Ataie, and Zeinab Taghavi. John Proakis read an early complete draft of the book and made many suggestions in the organization and the core material. Magnus Karlsson also provided steady invaluable input throughout the project. The chapters on quantum-lightwave communication were guided by early conversations with Paul Kwiat at the University of Illinois and detailed comments by Lu Sham at the University of California at San Diego. Comments on later revisions were provided by Saikat Guha, Raúl García-Patrón Sánchez, and Alexander Holevo. Later versions of the book were reviewed in detail by Don Snyder and Frank Kschischang. Their detailed comments and careful reading caught many mistakes and added clarity and rigor to the final draft. Additional comments, some brief, some extensive, for later versions of the draft were provided by Alex Alvarado, Erik Agrell, Ian Blake, Dan Costello, Dave Forney, Gerhard Kramer, Gottfried Ungerboeck, and Peter Winzer. Our special thanks to Joe Kahn for sharing some of his problems, which were adapted and adopted for this book. At Cambridge University Press, Julie Lancashire provided a deft combination of encouragement and patience with Steven Holt having just the right touch for copy editing. We would also like to thank the rest of the publication team for their dedication and attention to detail. And lastly, we thank our wives Theresa and Barbara for their patience and support.

Notation

The choice of notation in a book that is designed to bridge several communities, each of which has its own well-established conventions, is challenging. Moreover, the book makes use of four separate signal models, often requiring different notation. A continuous signal model based on wave-optics is used whenever the noise sources can be accurately described using continuous quantities. A ray-optics signal model is used when the guiding structure is large compared with the wavelength. A discrete-energy model based on photon optics is used when some aspects of the quantum nature of the lightwave signal are evident. Finally, a quantum-optics signal model is developed and compared with the other signal models. Fields that propagate in the +z direction are expressed in complex notation using the form exp [i(ωt − β z)]. Quantum-optics signals use the form exp[i(β z − ωt )]. The orientation of polarization (i.e. left-handed or right-handed) is defined with respect to the field propagating towards the observer. Vectors and sequences are denoted using boldface. Matrices and linear transformations represented as matrices are defined using blackboard letters M. Linear transformations for the quantum-optics signal are denoted with a caret ± · . Random variables x and random sequences n are denoted by an underbar. This underbar is omitted for a random process n(t ). Statistical expectations use ±·² . Temporal averages use an overbar. Sans serif symbols such as n, m and N are typically used for discrete quantities or expected values of discrete quantities. The amplitude of a cosinusoidal signal s (t ) is defined using √ a peak amplitude with the root-mean-squared (rms) amplitude being a factor of 2 smaller. The spectral characteristics are expressed using frequency f (Hz). The amplitude of a field or an envelope of a field that has both temporal and spatial dependence is defined using an rms amplitude with the spectral characteristics expressed . using an angular frequency ω = 2π f with units of radians/second. Signals are defined in the optical domain or the electrical domain with the units depending on the signal model. The square-law characteristic of direct photodetection produces different scaling factors in the optical domain and the electrical domain for each signal model. In the optical domain, the scaling factor of the photon energy h f relates the signal energy E in the continuous wave-optics model to the expected number of signal photons E = ±m² in the discrete photon-optics signal model. In the electrical domain after the lightwave signal has been photodetected, the units for each signal model and the scaling factors relating the two signal models are different. For wave optics, the directly photodetected lightwave energy W has units of charge and

xxvi

Notation

is called the photocharge. Dividing the photocharge by the electron charge e produces the mean number of discrete photodetection events W caused by the detection of photons. These detected photon events are called photoelectrons. The directly photodetected lightwave power has units of current and is called the photocurrent. Dividing the photocurrent by the electron charge e produces the mean photoelectron arrival rate µ . The relationships between these quantities are summarized in Table 6.2. The units for the power density spectrum follow the same convention. For example, the lightwave power density spectrum of the spontaneous emission N sp ( f ) defined in (6.2.18) is expressed in units of energy or watts/Hz. The equivalent term N sp ( f ) in the photon-optics signal model is scaled by the photon energy h f and is expressed as the expected number of noise photons. The corresponding power density spectrum N opt = Nsp of the spontaneous emission generated by direct photodetection has units of charge when wave optics is used, where is a scaling constant defined in (6.2.23), which is called the responsivity. For photon optics, the directly photodetected spontaneous emission ηN sp has units of photoelectrons, where η is the probability that a photon will generate a photoelectron, which is called the quantum efficiency. For wave optics, the electrical power density spectrum is typically expressed in units of amperes-squared per hertz, which is the power density spectrum per unit resistance R. For photon optics, the electrical power density spectrum has units of hertz.

R

R

Primary Symbols

Symbol

Description

Equation

Lightwave Signal Terms s (t ) P (t ) E ±(t ) R( t ) m( t ) E

Lightwave signal Lightwave power Lightwave energy Photon arrival process Photon arrival rate Photon counting process Expected number of signal photons

(8.1.1) (1.2.5) (1.2.1) (6.2.19) (1.2.5) (6.2.20) (6.2.22)

Lightwave Noise Terms N sp Pn

N0 N sp

Spontaneous emission power density spectrum Spontaneous emission noise power Expected number of noise photons Expected number of additive-noise photons from spontaneous emission

(7.7.8) (6.2.18) (6.1.7) (7.7.7)

Electrical Signal Terms p(t ) r (t ) q (t )

Electrical pulse Noiseless electrical waveform Target electrical pulse used for sampling

(8.2.24) (9.2.5) (9.2.4)

Electrical Noise Terms N0 N shot N RIN

Electrical power density spectrum for an additive-noise source for phase-synchronous demodulation Power density spectrum from shot noise Power density spectrum from intensity noise

(8.2.3) (6.7.8) (7.8.10)

Symbols

a ±a ak ±aI , ±aQ ±aN a (z , t ) ²a (z , t ) A, A (t )

A Afiber Acoh Aoverlap Aeff ±Ahomo ±Ahetero Arms (t ) b

B B

B

Bc BN c0 ; c cov C

C Cmax

C C f

( )

Radius of fiber; constant Coherent-state operator Expansion coefficient for spatial decorrelation In-phase and quadrature coherent-state operators Image mode vacuum-state operator Root-mean-squared lightwave signal complex envelope Modulated complex envelope Amplitude of a signal Area Area of a fiber core Coherence region Overlap region Effective area for a fiber Homodyne demodulation operator Heterodyne demodulation operator Root-mean-squared amplitude of a modulated signal Normalized propagation contant Magnetic flux density Passband bandwidth Matrix with diagonal elements that are the singular values of the channel matrix H Number of coherence intervals per unit time Noise-equivalent bandwidth Phase velocity of light in vacuum (m/s); velocity of light in medium Covariance Capacitance; channel spacing Power efficiency; bandlimited capacity of a channel (bits/second) Maximum channel capacity (bits/second) Single-letter capacity Single-letter capacity as a function of frequency

List of Symbols

Cp Cw Cband Cmimo

C x ( f ); Cx (ω) C (t 1, t2 )

C

d

d d

di j dmin dmin

D

D D

D D± ± D

D

D Dλ



Dguide e e et e e( t ) ex ±e ±e1 ,±e2 ±eslow, ±efast E Eb Ec Es Eg E total E Eb

xxix

Single-letter capacity associated with the discrete particle nature of lightwave Single-letter capacity associated with the continuous wave nature of lightwave Capacity of a bandlimited channel (bits) Capacity of a multi-input multi-output channel (bits) Characteristic function of a random variable x Covariance function Real covariance matrix Distance; diameter Hamming distance Dataword Euclidean distance between two signals si (t ) and s j (t ) Minimum euclidean distance Minimum Hamming distance Total group-velocity dispersion in wavelength units Polarization transformation matrix Linear dispersion operator in the time domain Linear dispersion term in the frequency domain Displacement operator for a coherent state Detection operator; dispersion operator Electric flux density (C /m2) Complex electric flux density: (r, t ) = Re[D(r)ei2π f t ] Material dispersion coefficient in units of wavelength Derivative of the polarization transformation matrix Waveguide dispersion coefficient Charge of an electron: 1.602 × 10−19 C; error Normalized electric field vector Normalized transverse electric field vector Error pattern for a block code Error signal Extinction ratio Unit vector such that ± e ×± h=± k Orthogonal unit vectors used to represent the polarization state Unit vectors for slow and fast axes of a linear-birefringent material Energy; electric field component of a monochromatic wave Expected value of the energy in an uncoded bit Expected value of the energy in a codebit Expected value of the energy in a symbol s Energy gap Total signal energy Expected number of signal photons Expected number of signal photons in a bit

D

xxx

List of Symbols

Es

E

E

E E t ( r) f fc fd f0 fmax fλ (λ) fIF fLO f x (x ) F Fm FN ; F N ( f ) FNP FT FWHM F (x ) g g(t ) G; G

G GT ( f ) h

±

±h h (t ) h(t ) h s (t ) h elec(t ) h m (t ) h ht ± H

H

H

Expected number of signal photons or signal photoelectrons in a symbol s Energy spectral density; energy efficiency Electric field vector Complex electric field: E (r, t ) = Re[E(r)ei2π f t ] Transverse component of the electric field Frequency; probability density function Carrier frequency Damping rate for laser diode; frequency difference from carrier Relaxation oscillation frequency Frequency of a photon with energy E ( f max = E / h); maximum frequency in a signal Normalized power density spectrum in wavelength units Intermediate frequency Local oscillator frequency Probability density function of the random variable x . ³ ´2 SNR scaling constant: F = A /2σ 2 ; excess noise factor for an avalanche photodetector Proportion of the total power in a mode m Noise figure; spectral noise figure Noise figure defined using the photon number Fourier transform Full-width at the half-maximum Cumulative probability density function Mode-group index g = ν + 2m Photoelectron arrival process Gain Total number of mode-groups Fourier transform of a sample function of photon arrival process ±(t ) truncated to a finite time T Planck’s constant: h = 6. 62607 × 10 −34 J-s Reduced Planck constant: ± = h/2π = 1.05457 × 10−34 J-s Unit vector for the magnetic field Impulse response Time-domain channel matrix of a multi-input multi-output channel Photodetector impulse response Impulse response for the power using a noncoherent source Impulse response for mode m Normalized magnetic field vector Normalized transverse magnetic field vector Hamiltonian Magnetic field vector Complex magnetic field: H(r, t ) = Re[H(r)ei2π f t ]

List of Symbols

H ( f ), H (ω)

H(ω)

H (x ) Hmode (x ) Htotal( x ) H² i i ; i (t ) i LO i bias i th ±I

I

I (t ) I (r; s) I m (·) Im J Jm (·) k k0 k kr k x , k y , kz K K m (·)

K L

L³ (·) Lm (·) Lmn (·)

Lc L pol L comp L eff L wo L NL LD

m

m (β)

M n nc

Transfer function Transfer function of multi-input multi-output channel Entropy of a random variable x Entropy of a mode Total entropy of a system Hypothesis that the ²th symbol or symbol-state was transmitted √ −1 Directly photodetected signal; current Directly photodetected lightwave local oscillator Bias current Threshold current for a laser Identity operator Identity matrix Optical intensity Mutual information Modified Bessel function of the first kind of order m Imaginary part Jones vector Bessel function of the first kind of order m Wavenumber; Boltzmann’s constant: 1.38 × 10−23 J/K Free-space wavenumber Wavevector Radial component of the wavevector Components of the wavevector k Number of independent subsamples; spatial frequency of a ray in a graded-index fiber Modified Bessel function of the second kind of order m Gram matrix Length; number of symbol values Log-likelihood ratio Laguerre polynomial Generalized Laguerre polynomial Coherence length Polarization decorrelation length Compensation length for a nonlinear equalizer Effective fiber length Walk-off length in a fiber Nonlinear length characterizing the fiber nonlinearity Dispersion length Number of photons or photoelectrons Total number of modes with propagation constants larger than β Signal constellation matrix Index of refraction Complex index of refraction

xxxi

xxxii

List of Symbols

n2

na nb n o, n e n0 n t +1 n (t ) n e (t ) n o( t ) n sp n N N

± N N1 N2 Na Nc N in N sp N shot N spe N0

N0

N opt

N RIN NA OSNR OSIR OSNIR p p ±p pc pe p(t )

Pm

±Pm P

P

Ps ; Ps (t )

Nonlinear index of refraction coefficient Index of refraction at core/cladding interface for a graded-index fiber Background index of refraction Ordinary and extraordinary indices of refraction Maximum index of refraction of a graded-index fiber Number of error patterns with t + 1 errors Complex-baseband noise process Electrical noise process Spontaneous emission noise process Spontaneous emission noise factor for a lightwave amplifier Average number of nearest neighbors in a signal constellation Carrier density in a semiconductor; group index Nonlinear term in the time domain; Photon-number-state operator; nonlinear operator Number density of the lower-energy state Number density of the higher-energy state Power density spectrum at the amplifier input Group index at the center of a graded-index fiber Power density spectrum of the electrical noise at an input Power density spectrum of the lightwave power from spontaneous emission Power density spectrum of the photodetected shot noise Power density spectrum from spontaneous emission using phase-synchronous demodulation Total electrical power density spectrum Expected number of noise photons Photodetected lightwave noise power density spectrum or the expected photocharge from lightwave noise Power density spectrum from relative intensity noise Numerical aperture Optical signal-to-noise ratio Optical signal-to-interference ratio Optical signal-to-noise-plus-interference ratio Probability; photon momentum; decay rate for cladding solutions Prior; input probability vector; input Jones vector Momentum operator Probability of a correct detection event Probability of a detection error Demodulated electrical pulse Projection matrix Projection operator Lightwave power Syndrome Lightwave signal power

List of Symbols

Pn ; Pn (t ) Pin (t ); Pout (t ) Pcoh Pe (t ) Pin ; Pout Pmax PL PNL

P ( α) P (ω)

P

P q q (t )

Q Q

Q (α I , α Q ) r

r (t ) r (t ) ²r (t ) rk rk

±, ψ±, ± r

z

r

R R

Rc Rs

R

R P (τ) Rs (τ) Re (τ) Rn (τ)

R

Re s s s0 , s1, s2 , s3 sλ

xxxiii

Lightwave noise power Input and output lightwave power in a pulse Lightwave signal power in a spatial coherence region Acoh Electrical power Input and output lightwave signal power Peak lightwave signal power Lightwave power from linear interference Lightwave power from nonlinear interference P -representation of a quantum-optics signal Polarization-dependent part of the multi-input multi-output channel matrix Material polarization vector Complex material polarization vector Charge; spatial frequency for waveguide solution in the cladding Target pulse after the detection filter Figure-of-merit for evaluating error probabilities (see (9.5.28)) Channel transition matrix Husimi quasi-probability distribution Position vector in space; radial coordinate in cylindrical coordinates; radius Received noisy complex waveform; filtered complex waveform Received noise-free complex waveform Received noisy passband waveform Noise-free sampled output after the detection filter Noisy sampled output after the detection filter Unit vectors in cylindrical coordinates Ratio of the prior probabilities Data rate; resistance; normalized radius of a fiber Photon-optics signal arrival rate Code rate Sample rate Covariance matrix at the channel output Autocorrelation function of the lightwave power Autocorrelation function of the complex lightwave signal Autocorrelation function of the electrical signal Autocorrelation function of a noise process Responsivity of photodetector; decision region; decision subspace region Real part Stokes vector; signal point in signal space Codeword; scaled signal point in signal space ( |s|2 = |s| 2/±ω ) Stokes parameters The slope of the wavelength-dependent group delay at the carrier wavelength λc

xxxiv

List of Symbols

²s (t ), s(t ) S

S ( f ), Sλ ( f ) Sdφ ( f ) Sn ( f ) S (±ρ )

Save

S ave SNR t

T

T

±T

T0 Trms Ts TB u (t ) u (r ) u (α I ) U (α Q ) ± U UT ( f ) ²(r, t ) U U( r , t )

v vg V

V VN Vs

V

wn ws

W, W Ws , Ws Wn , Wn Wb , Wb W p , Wp

Wdark

W Wrms Wh

Modulated lightwave signal and the complex-baseband equivalent Expected number of photons; syndrome Power density spectrum in frequency and wavelength Power density spectrum for the derivative of the phase noise Power density spectrum of the noise von Neumann entropy Time-averaged Poynting vector Average Poynting vector for a monochromatic wave Signal-to-noise ratio Temporal mean Transmittance Coupling matrix Quantum-lightwave channel transformation Temperature Temporal root-mean-squared width of a pulse Sample time Timewidth–bandwidth product Unit-step function Ratio of posterior probability distributions Quantum wave function for the in-phase signal component Quantum wave function for the quadrature signal component Unitary transformation Fourier transform of a sample function ut (t ) Passband field amplitude Complex field envelope Voltage; velocity Group velocity Normalized frequency or the V -parameter Complex covariance matrix Complex covariance matrix of the noise Complex covariance matrix of the signal Volume Photocharge from the noise at the input to a gain process Photocharge from the signal at the input to a gain process Photocharge; mean number of photoelectrons W = eW = E Signal photocharge; mean number of signal photoelectrons Noise photocharge; mean number of noise photoelectrons Photocharge in a bit; mean number of photoelectrons in a bit Photocharge in a pulse; mean number of photoelectrons in a pulse Mean number of photoelectrons from dark current Baseband bandwidth Root-mean-squared baseband bandwidth Half power or −3 dB bandwidth

R

List of Symbols

W (q , p), W (α I , α Q ) x x (t ) ±x ±x (t )

±, ±, ± x

y

z

X (r, t ) y (t )

Y



α

αe , αh αs αm α NL

α

| α± α ,α I

Q

β , β, β± β2 δslow, δfast δ(t ) δi j ´ ´τ ´²τ ε ε0 εr φ φe ϕ(τ); ϕ (τ) φ(t ), φ(ω) I

φ φSPM φXPM ±(t ) γ NL

xxxv

Wigner quasi-probability distribution Random variable Transmit pulse Position operator Hilbert transform of x (t ) Unit vectors in cartesian coordinates Susceptibility of the material Detection filter Expected number of received photons Generalized measurement operator Index-of-refraction power-law profile for a graded-index fiber; linear interference constant Ionization coefficients for an avalanche photodiode Scattering loss Absorption coefficient Nonlinear interference power scaling coefficient Glauber number Coherent state specified by the Glauber number α Measured values of the in-phase and quadrature components of a coherent state Propagation constant, propagation vector, and unit propagation vector Group velocity dispersion coefficient Phase shift along the fast axis and the slow axis Dirac impulse Kronecker impulse Normalized index difference Differential group delay Polarization-mode dispersion vector Permittivity Permittivity of free space ε0 = 8.854 × 10−12 C2 /(N · m2 ) Relative permittivity Phase; angular coordinate in cylindrical coordinates Phase error Coherence function; intensity coherence function Phase functions in time and frequency; phase-noise random process Nonlinear phase Nonlinear phase shift from self-phase modulation Nonlinear phase shift from cross-phase modulation Photon arrival process Small-signal gain of an lightwave amplifier; separation constant; sample signal-to-noise ratio; fiber nonlinear coefficient; Euler’s constant = 0.5772

xxxvi

List of Symbols

µ κ κp κblk κsym η0 η |ηk ± λ λ0 ³ μ(t ) μdark μ0 ρ ρ10 ±ρ ±ρs ±ρin, ±ρout ±ρblk ±ρsym ±ρvac σa σe σs σ2 σn

σ2 2 σinter P

2 σintra 2 σshot 2 σISI σsp2

σλ2 τ τ+ ; τ−

Net small-signal gain; phase-noise parameter Attenuation coefficient of a fiber; coupling coefficient; inner product of two signal states Polarization-dependent loss coefficient of a fiber Inner product between two block-symbol states Inner product between two component-symbol states √ Impedance of free space: η0 = μ0 /ε0 = 377 ¶ Quantum efficiency; coupling efficiency; impedance Sampling eigenstate Wavelength; likelihood ratio Free-space wavelength Wavelength of an acoustic wave; spatial period; likelihood function Photoelectron arrival rate Generation rate for dark current . Permeability of free space: μ 0 = 4π × 10−7 N/A2 Crossover probability for a binary symmetric channel Correlation coefficient between two signals Density matrix Density matrix of a signal state Density matrix at the channel input; channel output Density matrix of a block-symbol state Density matrix of a component-symbol state Density matrix of a vacuum state Absorption cross section Emission cross section Scattering cross section; root-mean-square value of the variance for the signal Variance Channel state vector Variance of the lightwave signal power Mean-squared width of the impulse response caused by mode-dependent intermodal dispersion Mean-squared width of the impulse response caused by wavelength-dependent intramodal dispersion Variance from shot noise Variance from intersymbol interference Spontaneous emission noise power per quadrature component per polarization component

Variance of the power density spectrum Sλ (λ) for a modulated lightwave signal in wavelength units Delay; group delay Group delay for the principal polarization states

List of Symbols

τc τm τmax τtotal χ χ θc θmax · ω ωc ¶ |ψ² ± ξ NL

ζ

Coherence time Group delay for mode m Maximum delay spread Total group delay in the presence of polarization mode dispersion Holevo information; susceptibility; angle describing polarization state Nonlinear susceptibility Critical angle Maximum launch angle for a guided ray Threshold used for detection Angular frequency in radians/second Carrier frequency Resistance; solid angle; Fourier transform variable Signal state Angle describing polarization state; loss factor for a channel number of photoelectrons; eigenvalue of H(ω)H(ω)† Damping parameter for a phase-locked loop

xxxvii

1

Introduction

A vast network of optical fiber continues to expand beneath the surface of our planet. This fiber network forms the communications infrastructure upon which sit the omnipresent mobile devices that now bind society in new ways, never seen before. At the heart of this network is the seemingly simple and passive optical fiber made of glass or plastic that is about the width of a human hair yet over hundreds of kilometers long when used in wide-area networks. This revolutionary component, though largely passive, will require the length of this book to describe how it is used to convey information. This introductory chapter presents an overview of digital guided-lightwave communication systems, emphasizing the differences between lightwave and other types of communication systems. Communication systems convey information (voices, images, video, or data) from a source to a destination. For the purpose of transmission, modern communication systems first map information into electronic signals that can be either analog or digital. Analog systems map information into a continuous physical quantity. A digital system maps or encodes information into a sequence of discrete logical symbols or letters. If the original information source is analog (such as voice), it can be transmitted digitally by first digitizing the continuous waveform into a sequence of digital symbols by sampling and quantizing1 the continuous waveform to produce a sequence of sample values. The combination of sampling and quantization produces a sequence of digital values. The set of logical values for each transmitted symbol is called the channel input alphabet, with each discrete value being a letter from that alphabet. The most common digital symbol has two possible letter values and is called a bit. One letter is called one or high or mark, and the other letter is called zero or low or space. Symbols in other alphabets can have more than two letter values and can be represented by a group of bits. As an example, the keyboard character “$” is commonly mapped into an eight-bit symbol. This keyboard character could be sent as eight separate two-state letters (bits) or it could be sent as a single letter drawn from an alphabet of 256 letters. Every point-to-point digital communication system conveys data between a source and a destination. Starting with the source, this information is typically handled at multiple conceptual layers of functionality before being transmitted over a communication channel. The most basic of these hierarchical communication layers is known as the physical layer or the modulation layer, with each higher layer providing additional 1 The word quantization is also used to describe the discrete nature of lightwave signals.

2

1 Introduction

communication functions, such as controlling traffic. The goal of modern physical-layer digital communication system engineering, which is the subject of this book, is to provide the physical layer with specific attributes such as information rate, reliability, and security, to achieve energy-efficient use of the communication channel, and to make the physical layer invisible to the user. All aspects of the communication system that relate to the point-to-point transmission of a sequence of symbols from a source to a destination are regarded as a part of the physical layer. The differences between lightwave and other types of communication systems are a result of the differing physical-layer transmission mechanisms for the generation, propagation, amplification, and detection of lightwaves.

1.1

Digital Lightwave Communication Systems A general point-to-point guided lightwave communication system consists of a transmitter, a guided lightwave channel, and a receiver. Each of these system elements is shown in Figure 1.1.

1.1.1

Channel Coding

The input to the transmitter is a sequence of digital symbols called a dataword. The dataword is denoted by a sequence d of length k. To ensure reliable communication, modern digital systems use an encoder to modify the dataword to produce a longer sequence of logical symbols s of length n, called a codeword. When the transmitter and receiver use the same alphabet, the codeword blocklength is always larger than the dataword blocklength, with the ratio of the blocklengths known as the code rate. The replacement of the original dataword by a codeword with more symbols can be regarded as a controlled form of memory because the symbols of the codeword depend on the dataword. This deliberate form of memory creates redundancies, with (a)

Optical fiber

Transmitter

Receiver

(b) Discrete

Continuous Transmitter

d User data

Baseband modulation

Encoder

Discrete

Channel Frequency translation

Fiber channel

Receiver Photodetection/ demodulation

Filtering/ sampling

Symbol detection

Codeword detection

Decoder



s

Codeword

s(t )

s±(t )

±r(t)

r(t )

Baseband signal

Modulated lightwave signal

Received lightwave signal

Electrical signal

rk

Sample sequence

r Senseword or sensed codeword

²s

Detected codeword

Decoded user data

Figure 1.1 (a) Simplified lightwave communication system. (b) The main functional blocks in a digital lightwave communication system.

1.1 Digital Lightwave Communication Systems

3

the information in the dataword “spread out” over the length of the codeword. These redundancies are used to control or correct symbol errors. The process of creating this form of controlled memory is called channel coding. The transmitted symbols of a long message are represented by a sequence {s j } of such codewords. 1.1.2

Modulation

The second function implemented by the transmitter, which may also involve aspects of coding, is the mapping, in turn, of each codeword s, one or several symbols at a time, into a continuous physical quantity s (t ) comprising the baseband signal waveform. This process is called modulation. The baseband signal waveform consists of a superposition of symbol pulses. The amplitude of each symbol pulse depends on the present symbol and perhaps on previous symbols. For a lightwave communication system, the baseband signal is transformed, or modulated, using frequency translation, into another waveform ±s (t ) called a passband signal waveform. In general, the term “modulation” will be used interchangeably for the process of creating a baseband waveform from a sequence of pulses, or for the process of creating a passband waveform from a baseband waveform by frequency translation, or both of these operations together. In many modern communication systems, the separation between modulation and coding need not be clear-cut, and we refer to the combination of the two broadly as coded-modulation. The coding of the sequence is used to control the dependences of transmitted symbols across the available independent degrees of signaling freedom. These degrees of freedom may include frequency, time, polarization, space, and the number of characteristic spatial patterns or modes that the guiding structure supports. Understanding the useable number of independent degrees of freedom is essential to designing an efficient communication system. Intentional interdependences generated by a code are used to control transmission errors or to manage the spectrum of the transmitted waveform. The modulation process controls the form of the waveform that represents the data symbols. These two aspects are combined to generate specific sequences of modulated symbols that lead to a small probability of a demodulation error. Various versions of this interdependent coded-modulation process are now in common use in modern communication systems.

1.1.3

Types of Lightwave Channels

One or more modulated passband waveforms ± s (t ) are transmitted over a lightwave channel. In this book, only a guided-lightwave channel based on an optical fiber is studied. Other lightwave channels such as free space or water have different propagation properties.2 Each waveform in a fiber channel may use a separate guiding structure within the same fiber, a separate mode within a guiding structure, a separate wavelength λ, or a separate polarization mode within a spatial mode. 2 See Karp, Gagliardi, Moran, and Stotts (2013).

4

1 Introduction

Guided lightwave communication channels are commonly classified by information rate and transmission distance, called reach. For low to moderate data rates and short distances, systems often use multimode optical fibers and do not use lightwave amplification. For high data rates and long distances, the lightwave channel is a span of multiple connected segments of an optical fiber that conventionally supports only one spatial mode. Within each segment, the lightwave signal may be amplified to compensate for signal attenuation. This amplification process introduces noise. In a practical lightwave channel, signal attenuation and other signal impairments affect the performance of the communication system. These impairments adversely redistribute the signal energy and distort the signal waveform as the signal propagates within the fiber. This redistribution or interference creates unintentional dependences between the transmitted symbol intervals, between transmitted polarization components, or between multiple datastreams within the same physical channel. Depending on the lightwave signal power P (t ), the physical mechanisms that cause distortion may be linear, nonlinear, or a combination of both. The most significant linear distortion mechanism is linear dispersion that arises because different frequency components or signaling modes have different propagation velocities. The most significant nonlinear distortion mechanism is an intensitydependent modification of the propagation characteristics of the lightwave channel, which redistributes the signal energy both within a single datastream and among multiple datastreams within the same physical channel. Signal propagation in a linear dispersive optical fiber is presented in Chapters 3 and 4. Signal propagation in a nonlinear dispersive optical fiber is presented in Chapter 5. Chapter 8 extends these topics to multiple datastreams for which there may be interchannel interference between the datastreams. The mitigation or accommodation of interference is discussed in Chapter 11, while the estimation of signal and channel parameters is considered in Chapter 12. 1.1.4

Demodulation

The received passband lightwave waveform ± r (t ) consists of a combination of a distorted replica ± sout(t ) of the input passband signal waveform and lightwave noise. Any part of the noise that does not depend on the signal is called signal-independent noise. Any part of the noise that depends on the signal is called signal-dependent noise. The origin and characterization of noise are discussed in Chapter 6. The received modulated lightwave signal ± r (t ) is converted into an electrical signal using a photodetector, and is then demodulated. For some lightwave communication systems, the combination of the conversion between the lightwave signal and the electrical signal along with the demodulation process produces a baseband electrical signal r (t ) that is proportional to the received lightwave signal. For other lightwave communication systems, the conversion between the lightwave signal and the electrical signal has a square-law characteristic that generates an electrical baseband signal r (t ) that is proportional to the received lightwave signal power P (t ). In such a system, the photodetected electrical signal is proportional to the squared magnitude of the received lightwave signal.

1.1 Digital Lightwave Communication Systems

1.1.5

5

Detection and Decoding

After the lightwave signal has been converted into an electrical signal, a detection and decoding process is used to determine the most likely sequence of symbols that comprises the transmitted codeword. The first part of this detection process consists of a signal transformation followed by sampling. The signal transformation on the received electrical waveform is typically implemented by a linear filter called a detection filter. This filter generates a filtered electrical waveform, also denoted r (t ), which is used to generate a sequence {rk } of samples. Alternative signal transformations based on a nonlinear threshold operation are appropriate for the detection and counting of individual photons. The sequence of sample values is used in the detection process to form the senseword, or the sensed noisy codeword, which is then sent to the decoder to determine the most likely transmitted codeword. One method of forming the codeword assigns a logical value to each symbol separately. The detection of the corresponding senseword of logical values is called hard-decision demodulation. Alternatively, the senseword can be formed using a quantized value of each sample of the received waveform. The quantized value is a discrete approximation of the analog value of that sample. This is called softdecision demodulation. For either detection process, the estimate ² s of the most likely transmitted codeword is then decoded or detected to recover the codeword or to reconstruct the corresponding dataword. The combination of the frequency translation and the recovery of the user dataword defines the demodulator and the decoder. The combination of the modulator and the demodulator, including the encoder and the decoder, is called the modem. Each of these functions will be fully explained at an appropriate place in the book. When the total distance, or reach, of a span requires the use of multiple segments – each with a lightwave amplifier – one can choose to amplify-and-forward, remodulateand-forward, or recode-and-forward. These choices differ in performance and cost. The first method simply amplifies both signal and noise. The second method demodulates, then remodulates each symbol to remove the noise, but sometimes makes symbol errors. The third method, called regeneration, decodes then recodes each block or frame of data to remove demodulation errors. Regeneration differs from the first two methods because it involves all of the receiver functions and all of the transmitter functions described above, and so is more complex. At a regenerator, the decoded dataword is used as the input to another transmitter, producing a new transmitted signal that is intended to be identical to the original transmitted signal at the source. In this way, noise and distortion are removed at each regenerator, though (rarely) regeneration may destroy an entire block of data, but only if it is already too distorted to recover. 1.1.6

Error Probabilities

The ability of a communication system to reliably convey information, which depends on the method of detection, is measured by the appropriate error probability. If the detection is done on a symbol-by-symbol basis, as is often the case for common lightwave systems, then the probability of symbol error, denoted pe , is used to quantify the

6

1 Introduction

performance. If the symbols are binary symbols, then the performance is expressed in terms of the probability of a bit error, usually called the bit error rate. The bit error rate is a fundamental physical-layer performance figure-of-merit for a binary lightwave communication system that uses symbol-by-symbol detection. If a more complex detection technique that processes a sequence of symbols in a codeword as a whole is used, or if the transmission medium has random propagation characteristics, then the errors tend to cluster and the bit error rate may be less useful. In this case, the term codeword error rate, block error rate, frame error rate, or message error rate may be more appropriate. Minimizing the probability of a detection error, either for a single bit or for a whole message, subject to a set of system constraints, such as the information rate or the transmitted power, is the fundamental design goal at the physical layer.

1.2

Lightwave Signal Models The relevant properties of the lightwave signal are expressed using a signal model. For some systems, the lightwave signal is adequately modeled using continuous quantities based on geometrical rays or electromagnetic waves. For other systems, particularly at low signal levels, the discrete, quantum nature of a lightwave signal is evident. This unique dual nature of a lightwave signal has led to the development of several signal models, with each signal model being useful in appropriate circumstances. These signal models, which will be discussed in detail throughout the book, can be grouped into classical optics, photon optics, and quantum optics. Quantum optics is the most complete signal model. Both classical optics and photon optics are limiting forms of quantum optics and do not incorporate all the properties of a quantum-lightwave signal that could be used to convey information measured in bits. Classical optics is based on continuous quantities and includes geometrical optics (also called ray optics) and wave optics. Wave optics models a lightwave signal as a continuous electromagnetic waveform with an amplitude, phase, and polarization that are described by Maxwell’s equations. This signal model, which treats the lightwave energy as a continuous quantity, is a limiting form of quantum optics. The application of these equations to an optical waveguide results in characteristic spatial patterns called modes and a corresponding dispersion relationship that governs the linear propagation characteristics of a lightwave signal. Within the wave-optics signal model, the coherence of the lightwave plays an important role in determining the appropriate form of analysis. An ideal coherent lightwave has a deterministic phase. Knowing the phase of a coherent lightwave at one time instant provides complete information about the phase of the lightwave at a different time instant regardless of how widely spaced the time instants are. An ideal noncoherent lightwave has a completely random phase. This means that knowing the phase of the lightwave at one time instant provides no information about the phase of the lightwave at a different time instant regardless of how closely spaced the time instants are.

1.2 Lightwave Signal Models

7

The coherence properties of a lightwave as well as other kinds of randomness such as noise and randomness in a datastream that conveys information are treated within wave optics as forms of statistical uncertainty. This form of uncertainty arises from incomplete knowledge about one or more attributes of the system or the user datastream. Ray optics models a lightwave as a ray of light. The simplicity of ray optics provides an elementary description of lightwave signal propagation and is the limiting form of wave optics whenever the relevant dimensions of the guiding structure are much larger than the wavelength λ of the light in the medium. When the dimensions of the structure are comparable to λ , or smaller than λ, ray theory is no longer adequate. Moreover, ray theory cannot typically be used to assess the phase of a lightwave or to incorporate noise. Photon optics incorporates the granularity of the energy of a lightwave signal as expressed by the concept of a photon. Photon optics is a limiting form of quantum optics that excludes the phase of the lightwave, but does include the discrete-energy nature of a lightwave signal. Relative to communication systems that use lower frequencies, these discrete-energy effects are evident at lightwave frequencies because the quantum of energy of a lightwave photon is relatively large compared with the quantum of energy of other sources of randomness, as quantified later in this chapter. Compared with wave optics, photon optics incorporates a distinct source of uncertainty that is fundamentally different than statistical uncertainty. This source of uncertainty is conventionally called photon noise and is attributed to the fact that photon arrival times are random. Within quantum optics, this source of randomness is a form of quantum uncertainty associated with a photon-counting measurement of a conventional lightwave source. This form of uncertainty is recognized throughout the book, and is analyzed in detail in Section 15.2.5. Whereas statistical uncertainty need not be present, quantum uncertainty is always present when a conventional lightwave is photodetected. Accordingly, the quantum uncertainty as expressed by photon noise is deemed fundamental. A pragmatic combination of wave optics and photon optics is usually adequate for most systems. This combination is called a semiclassical signal model for lightwaves. Quantum optics incorporates additional properties of a lightwave signal that are not incorporated into photon optics or wave optics. One of the unique properties of a quantum-lightwave signal is a quantum correlation or quantum coherence structure. The description and use of quantum coherence is at odds with “common-sense” notions, based on macroscopic observations, of how signals interact. Accordingly, systems based on the quantum-optics signal model that use these properties can seem to be counterintuitive. The quantum-optics signal model is discussed in Chapter 15. Relative to quantum optics, wave optics is incomplete because it does not incorporate the discrete nature of a lightwave signal, or photon noise, nor can it fully describe quantum coherence effects. Photon optics is incomplete because it is based solely on the discrete energy of a lightwave signal. This model does incorporate photon noise, but does not incorporate phase or modal structure.

8

1 Introduction

1.2.1

Relationship of Wave Optics to Photon Optics

One of the challenges of this book is to smoothly transition between these signal models, presenting the relevant physical conditions when each signal model is appropriate. To that end, we provide a preliminary discussion of the relationship between the waveoptics signal model, which is based on a continuous-energy signal that has an amplitude and a phase, and the photon-optics signal model, which is based on a discrete-energy signal that does not incorporate the phase. Wave optics is necessary to describe propagation. Photon optics is necessary to describe the physics of lightwave generation and photodetection. Wave optics models a lightwave signal as a continuous electromagnetic wave. The relationship between the continuous power P (t ) and the continuous energy E (t ) defined over a time interval of duration T is given by E (t ) =

³ t +T /2 t −T /2

P (τ)dτ.

(1.2.1)

Each of these quantities could have statistical uncertainty because of randomness in the lightwave source or in the subsequent continuous sources of noise. Photon optics models a lightwave signal as a stream of photons. Each photon has an energy given by hc E = h f = ±ω = , (1.2.2)

λ

where h is Planck’s constant3 and ± = h /2π is the reduced Planck constant, f is the frequency of the light expressed in hertz, ω = 2π f is the angular frequency of the light expressed in radians/second, λ is the wavelength of light in the medium, and c = f λ is the velocity of light in the medium. This simple relationship implicitly couples the particle concept of energy with the wave concept of frequency. Each photon has a momentum with magnitude p given by

.

p=

h

λ = ±k ,

(1.2.3)

where k = 2π/λ is defined as the wavenumber. This simple relationship implicitly couples the particle concept of momentum with the wave concept of wavenumber. Moreover, because k = ω/c0 (cf. (2.3.23)) in free space, p = E /c0 in free space. This expression must be modified for a guided wave.

Photodetection of a Lightwave Signal

A lightwave signal interfaces with an electrical signal both at the transmitter and at the receiver. At the transmitter, a baseband electrical signal is converted into a modulated lightwave signal by a modulator. At the receiver, a lightwave signal is converted into an electrical signal by a photodetector. The term “photodetection” is used to describe the physical optical/electrical conversion process that transfers the modulation on the lightwave signal onto the electrical signal. The term “detection” is used to describe 3 Planck’s constant is h

= 6.626 × 10 −34 joule-seconds.

1.2 Lightwave Signal Models

9

the system-level algorithmic process that determines the most probable transmitted sequence of letters. Within the wave-optics signal model, photodetectors are sensitive to the lightwave signal power P (t ), which is the time-averaged square of a passband lightwave signal ± s(t ) (cf. Section 1.3), where the time average is long with respect to the temporal period of the lightwave carrier but short compared with the information-bearing frequency components that are modulated onto the carrier. This signal is derived from a spatial integration over the aperture of the photodetector, with the lightwave power being the spatial integral of the lightwave intensity. The direct conversion of the continuous lightwave power P (t ) into an electrical signal r (t ) using a square-law photodetector is called direct photodetection. Within wave optics, the photodetector output r (t ), called the photocurrent, is related to the lightwave power P (t ) by

R

r (t ) =

R P (t ),

(1.2.4)

where is a constant called the responsivity of the photodetector with units of amperes per watt. Often, for pedagogical convenience, we will set = 1. Other methods of photodetection convert the lightwave amplitude into an electrical amplitude. Therefore, the optical/electrical conversion process can be linear with respect to the lightwave signal amplitude or it can be linear with respect to the lightwave signal power. The linear conversion of the magnitude and the phase of a passband lightwave signal into an electrical signal r (t ) requires the use of an additional reference lightwave signal that is added to the incident lightwave signal before photodetection. This type of optical/electrical conversion, called balanced photodetection, is introduced later in this chapter and is discussed in detail in Chapter 7. Within the photon-optics signal model, a random number of photons is photodetected in any time interval T . This process is mathematically described as a Poisson counting process characterized by a photon arrival rate R(t ). A complete description of this process is given in Section 6.2.3. The bridge between wave optics and photon optics is established by associating the continuous wave-optics power P (t ), in watts, with the photon arrival rate R(t ), in photons per second, as given by

R

( ) =. P(t )/ h f ,

R t

(1.2.5)

where h f is the energy E of a single photon. When the wave-optics lightwave power P is a constant, the photon arrival rate R is a constant. The resulting random number of photon counts measured in a time interval of duration T is described by a Poisson probability distribution with mean RT . This probability distribution describes the quantum uncertainty conventionally attributed to photon noise. The randomness caused by photon noise in the continuous photodetected electrical signal r (t ) is called shot noise.4 When the wave-optics lightwave power P is a random variable, there is additional statistical uncertainty overlaid on the photon noise to generate a composite form of 4 In other contexts, shot noise arises from the discreteness of the electron charge.

10

1 Introduction

randomness that is a mixture of quantum uncertainty and statistical uncertainty. This composite form of uncertainty is described in Section 6.3. The choice between a wave-optics model and a photon-optics model depends on the particular attributes under discussion. When the mean energy in the lightwave signal over an observation interval T is much larger than the energy of a single photon so that many photons are to be observed, the wave-optics signal model is typically used. When the mean energy over an observation interval is on the order of tens of photons or fewer, the discrete-energy property of a lightwave signal is evident so that the photon-optics signal model must be considered. Examples of the magnitude of the quantities involved are useful. A wavelength5 of 1500 nm corresponds to a frequency of f = c/λ = 2 × 1014 Hz. A photon at this wavelength has an energy of E = h f = 1.33 ×10−19 joules. A one-milliwatt (mW) lightwave source at this wavelength emits on average approximately P / h f = 7.5 × 1015 photons per second. This large number means that wave optics is an appropriate signal model for this signal level. In contrast, a one-nanowatt lightwave source at this wavelength emits on average approximately 7.5 photons per nanosecond. On this scale, the discreteenergy nature of a lightwave signal can be observed, and the use of a photon-optics signal model is needed for an accurate analysis. At a radio frequency of 2 GHz, the energy of a single photon is five orders of magnitude smaller than the photon lightwave energy used in the previous example and is over three orders of magnitude smaller than the mean thermal energy kT0 in the external environment at a normal temperature. Because the energy of a photon at a frequency of 2 GHz is over one thousand times smaller than the average energy of the external environment at a normal temperature, the discrete-energy property of a radio-frequency signal cannot be observed. Accordingly, a wave-optics signal model that ignores the discrete-energy nature of a photon is the conventional signal model used for lower-frequency systems.

1.2.2

Choosing a Signal Model

The use of one signal model does not reject the conclusions of the other signal model. For example, the use of photon optics, which is based on discrete energy, does not inform us about the phase of the signal or the structure of modes, which are features of wave optics. Accordingly, the conclusions drawn from both signal models, such as the discrete energy and the continuous phase, may be appropriate for the same system. This apparent incompatibility is a consequence of the incompleteness of both wave optics and photon optics, each of which fails to model different properties of a lightwave signal. The quantum-optics signal model provides a common framework that incorporates the complete set of properties of a lightwave signal and is discussed in Chapter 15. A combination of ray optics, wave optics, and photon optics will be used to analyze lightwave communication systems. Ray optics and wave optics are used to describe 5 This wavelength, defined in free space, is near the attenuation minimum of a conventional optical glass

fiber.

1.3 Modulation and Demodulation

11

signal propagation within an optical fiber. Photon optics is used to describe the photodetection process. This approach treats both the lightwave signal and the noise sources before photodetection as continuous quantities, and treats the signal within the photodetector as a discrete point process, which then includes the effect of photon noise. In turn, the external output of the photodetector is modeled as a continuous quantity that includes electrical shot noise generated from the photon noise, as well as other sources of electrical noise. The basic expressions, which are developed later in the book, can be used to outline, by simple examples, the extraordinary nature of a modern lightwave communication system to convey information. A typical sheet of glass used for a window pane transmits approximately 90% of the incident lightwave power over a distance of a fraction of a centimeter. A typical fiber used for a long-distance communication system transmits approximately 95% of the lightwave power over a distance of one kilometer. This means that one kilometer of optical fiber is more transparent than a typical window. Comparing the transmission loss for an optical fiber with the transmission loss for a typical electrical cable, the cable has a larger loss in one meter than the loss in one kilometer of a typical optical fiber.6 For these values, the guided lightwave signal can be transmitted approximately one thousand times further than a guided electrical signal for the same relative loss in power. The extraordinary transmission characteristics of an optical fiber are matched by the remarkable sensitivity of a lightwave receiver. An ideal photon-optics receiver that can count photons is studied later. When there are no sources of noise other than photon noise, the reliable detection of a single pulse requires the detection of about ten signal photons. We remarked earlier that a one-milliwatt lightwave signal corresponds to an average of 7 .5 × 10 15 photons per second at the receiver. If only ten detected photons are needed to reliably determine whether a binary symbol is transmitted using that pulse, then this corresponds to an information rate of 7 .5 × 1014 bits per second per milliwatt of received power. At this information rate and a propagation speed of 2 × 108 meters/second, the entire 35-million-book collection of the Library of Congress could be transmitted, in principle, across a continent in a fraction of a second. The combination of these unprecedented and remarkable properties has enabled the modern global information infrastructure.

1.3

Modulation and Demodulation A real-baseband electrical signal s (t ) may be a voltage or a current. In either case, the instantaneous power7 that this signal generates in a unit resistance is P (t ) = s 2 (t ).

(1.3.1)

6 The values were determined using typical transmission values for window glass, 0.2 dB/km loss for a

single-mode fiber, and 24 dB loss for 100 m of a Category-5 twisted-pair electrical cable.

7 A complete discussion of the units used for power and energy is presented in the section titled Notation.

12

1 Introduction

In many communication systems, the baseband signal s (t ) modulates both the amplitude A(t ) and the phase φ(t ) of a deterministic carrier signal, or carrier, cos(2π fc t ), corresponding to the carrier frequency fc . The carrier frequency is always larger – and usually much larger – than the bandwidth , which is a measure of the spectral width of the baseband signal. A bandlimited signal has a finite bandwidth . The properties of the lightwave source determine whether the carrier is coherent or noncoherent. When a baseband signal s (t ) is modulated onto the carrier, the resulting signal is called a passband signal, denoted as ± s (t ). In this way, the low-frequency baseband signal s (t ) is translated in frequency to a carrier frequency greater than the baseband bandwidth so that it can be efficiently transmitted as the passband signal ±s (t ). The modulated passband signal ±s (t ) is often represented as the real part of a complex signal A (t )ei(2π f c t +φ(t )) with the term s (t ) = A(t )ei φ(t ) called the complexbaseband signal, or, sometimes, the complex envelope when the signal depends on time and space. The complex function e i2π ft = cos(2π ft ) + i sin(2π ft ) is periodic with period T and frequency f , where f T = 1. Background material on real-baseband and complex-baseband signals is presented in Chapter 2. The mean power for a passband signal over a time interval that is long compared with the carrier period but short compared with the fastest time variation of the baseband signal – which is approximately 1/ – is

W

W

W

W

Pave(t ) = A2 (t )cos2 (2π f c t

+ φ(t )) µ = A2 (t ) 2 + 21 cos(4π f c t + 2φ(t )) ≈ 21 A2(t ) (1.3.2) ≈ 12 |s (t )|2 , where the overbar indicates a time average, and |s (t )| = A (t ) is the passband signal envelope, which is the magnitude of the complex-baseband signal representation s(t ) of the passband signal ± s (t ). The term cos (2π f c t + φ(t )) distinguishing a passband signal

´1

from a complex-baseband signal yields an average power in the passband signal that is a factor of two smaller than that in a baseband signal with the same amplitude. This is also true for the passband signal energy. This factor of two is ubiquitous in the theory of communication systems when using both baseband and passband signals.

1.3.1

Phase-Synchronous Systems

The passband signal for single-component phase-synchronous modulation is generated by multiplying, or mixing, the baseband signal with the carrier signal cos(2π f c t ). This modulation process produces the amplitude-modulated passband signal ± s(t ) centered at the carrier frequency f c . When the bandwidth of the baseband signal is much smaller than the carrier frequency f c , the passband signal is a narrowband signal with the baseband signal s (t ) varying slowly relative to the carrier frequency fc . Moreover, a time-varying phase φ(t ) could be imposed on the carrier with information encoded in the phase.

W

1.3 Modulation and Demodulation

13

In phase-synchronous demodulation, the carrier phase may be unknown but regarded as a constant over some time interval. The phase must be recovered by adjusting the phase of a second sinusoidal signal that is generated with frequency f LO at the receiver. This local signal is called the local oscillator. It provides a phase reference that is used to estimate the unknown phase of the incident carrier. The baseband signal is recovered by multiplying the passband signal ± s(t ) by the local oscillator signal. When the local oscillator signal is at the same frequency as the carrier so that f LO = f c , the demodulation process is called homodyne demodulation, with the demodulated signal at baseband. When the local oscillator signal is a frequency different than the carrier, the process is called heterodyne demodulation, with the frequency difference | f c − f LO | called the intermediate frequency f IF. The resulting signal after the multiplication with the local oscillator signal is locally time-averaged to remove the sum frequency terms generated by the multiplication. For homodyne demodulation of the passband waveform s (t )cos(2π fc t ) using a coherent carrier and a coherent local oscillator, f LO = f c and r (t ) = s (t )cos(2π f c t ) · cos(2π f c t )

= 21 s (t ),

(1.3.3)

where s (t ) is a real-baseband signal. For heterodyne demodulation, f LO

= fc −

f IF and

r (t ) = s(t )cos (2π f c t ) · cos (2π( f c − f IF )t )

= 12 s (t )cos(2π f t ), IF

(1.3.4)

where the time average is long with respect to the carrier frequency f c , but short with respect to the intermediate frequency f IF . In this case, modulation of the passband signal at frequency f c is transferred to another passband signal at f IF . These methods of demodulation are discussed in Chapter 8. Phase-synchronous demodulation relies on both the carrier and the local oscillator having stable frequencies and a stable phase difference ±φ(t ) = φc (t ) − φLO (t ), where φc (t ) is the phase of the carrier and φLO (t ) is the phase of the local oscillator. A carrier that exhibits these characteristics is called a coherent carrier. A coherent carrier with a well-defined phase can be readily generated well into the gigahertz range. As a consequence, many nonlightwave communication systems through the gigahertz frequency range are phase-synchronous. For lightwave communication systems with carrier frequencies on the order of 1014 Hz, producing a coherent carrier does require more effort because of fundamental forms of noise in the source. This kind of noisy lightwave source is discussed in Section 7.8. The resulting noisy carrier signal and noisy local oscillator signal are not pure sinusoids and have a nonzero spectral width. A carrier is considered coherent if its spectral width is much less than the bandwidth of the baseband signal that modulates the carrier. When this condition is not satisfied, the carrier is partially coherent or noncoherent, with additional amplitude and phase variations in the demodulated signal that are not present in the modulating baseband signal. These additional impairments can lead to additional errors during the detection process.

W

14

1 Introduction

1.3.2

Phase-Asynchronous Systems

When the lightwave source is noncoherent, the communication system is phaseasynchronous. For such systems, the phase-asynchronous demodulation process does not use knowledge about the carrier phase. A phase-asynchronous demodulator can be implemented by adding a constant to the real-baseband signal s(t ). The form of the asynchronously modulated signal is

±s (t ) = (1 + s (t ))cos(2π f c t + φ(t )), (1.3.5) where the magnitude s(t ) of the signal is constrained to be smaller than one, and φ(t ) is the phase noise causing the carrier to be noncoherent. The term 1 + s (t ) is the biased signal envelope. The requirement that the magnitude of s(t ) be smaller than one ensures

that the envelope is nonnegative. An example of phase-asynchronous modulation and demodulation is shown in Figure 1.2. The signal is converted to a nonnegative real-baseband signal as is shown in Figure 1.2(b). When modulated onto a carrier, it can be demodulated without a phase reference and does not need a local oscillator. The modulated signal shown in Figure 1.2(c) is first passed through a rectifier to remove the negative-amplitude components, yielding the magnitude |(1 + s (t )) cos(2π f c t + φ(t ))| . The resulting rectified signal is shown in Figure 1.2(d). The signal is then lowpass-filtered to remove the high-frequency components. The rectification and filtering do not require that the phase of the carrier be stable or known. Therefore, phase-asynchronous modulation and demodulation can use a noncoherent carrier. The resulting filtered signal 1 + s (t ) is shown in Figure 1.2(e). Removing the constant signal recovers the initial baseband signal s (t ) shown in Figure 1.2(f). This type of phase-asynchronous demodulation is referred to as envelope demodulation. The transmitted signal for a phase-asynchronous lightwave communication system is generated by modulating the power, or intensity, of the lightwave carrier without regard to the phase. At the receiver, direct photodetection implements a form of phase-asynchronous demodulation in which the received electrical signal r (t ) is proportional to the lightwave signal power P (t ) defined in (1.2.4). This type of Modulation

Real baseband signal

Baseband signal plus bias

s(t)

1 +s(t)

(a)

(b)

Demodulation

Modulated signal

(c)

Rectified signal

(d)

Lowpass signal

Signal with bias removed

1 +s(t)

s(t)

(e)

(f)

Figure 1.2 Phase-asynchronous modulation and demodulation using a noncoherent carrier. The

modulated signal is first rectified to remove the negative-amplitude components and then lowpass-filtered to remove the carrier.

1.5 Multiplexing

15

phase-asynchronous demodulation is called intensity-modulated direct-photodetection. Because of its simplicity, it is widely used in basic lightwave communication systems. Intensity modulation is widely used, but is inefficient because the unmodulated signal that was added to ensure a nonnegative baseband signal contains at least half the total transmitted power but conveys no information. Moreover, the addition of a constant signal or bias can increase nonlinear effects within an optical fiber, causing additional errors. These issues motivate the use of bias-free modulation formats that convey information using both the amplitude and the phase. Amplitude/phase modulation and intensity modulation are discussed in Chapter 10.

1.4

Codes and Coded-Modulation A major component of the physical layer consists of coding and coded-modulation. Coding has been used at least since the early telegraphs implemented the Morse code to map the characters of the English alphabet into a sequence of pulses – dots and dashes – in order to transmit the data. Coding in modern communication systems can be divided into source coding and channel coding. Source coding transforms raw data from an input source into a more concise form used for transmission. Source coding is usually placed with the application and not with the communication system. For this reason, it is not explored in detail in this book. Channel coding maps datawords, which usually consist of source-coded data, into codewords that are designed to control errors that arise within the channel or to adhere to constraints imposed by the channel. Coded-modulation refers to the integration of channel coding and modulation into a single entity designed for a specific channel. In modern practice, a modem not only includes the modulator and the demodulator, but also includes the coding and decoding, the detection, and the timing recovery. The system performance is measured by the bit error rate, the codeword error rate, or the message error rate. This aspect of communication system design has seen huge growth due to the explosive increase in available processing power. These techniques are considered in Chapters 11 through 13.

1.5

Multiplexing Multiplexing is the process of combining multiple, independent datastreams or subchannels in the same physical medium, but using a separate transmitter and receiver for each subchannel. A multi-input multi-output channel transmits and receives separate datastreams on subchannels of the same physical channel, but processes the set of subchannels as a single block at both the transmitter and receiver. A system that uses spatially distinct guiding structures is called a space-multiplex system. Space multiplexing may use separate optical fibers or spatially separated cores within a single optical fiber. A system that uses separate modes within a common core is called a mode-multiplex

16

1 Introduction

system. Because the modes share a common core, this kind of multiplexing is distinguished from a space-multiplex system that uses multiple cores. A system that uses two polarization modes per spatial mode is called a polarization-multiplex system. Multiplex systems and multi-input multi-output systems are discussed in Chapter 8. A further distinction is based on whether the datastreams are multiplexed in time or in frequency (wavelength). In frequency-division multiplexing, the available channel bandwidth is divided into subchannels, with each subchannel assigned a distinct subcarrier frequency and a baseband bandwidth around this subcarrier. Frequency-division multiplexing is typically called wavelength-division multiplexing (WDM) in the context of lightwave communication systems. Many lightwave systems use a combination of time-division multiplexing and wavelength-division multiplexing. Several lower-rate datastreams are first combined by time-division multiplexing (TDM) into a single datastream on a single-wavelength carrier. Then several such wavelength carriers carrying separate datastreams are wavelength-multiplexed as subcarriers into a single fiber. The advent of practical widebandwidth, high-gain lightwave amplifiers has led to the development and deployment of these systems. The bandwidth of the fiber is usually exploited by placing all the wavelength subchannels in a wavelength band at which there is low attenuation in the fiber, and at which practical lightwave amplifiers are available. A schematic representation of wavelength-division multiplexing is shown on the right side of Figure 1.3. A lightwave communication network is often composed of nodes where datastreams enter and leave the network. At each node, one or more new wavelengths can be inserted into a fiber by multiplexing and one or more wavelengths can be dropped by demultiplexing. The multiple-wavelength subcarriers produce signal-impairment mechanisms that do not occur in single-carrier systems. These impairments include linear interchannel interference called crosstalk, which arises in components that multiplex and demultiplex the signals, as well as through nonlinear interference between the datastreams on each wavelength carrier caused by intensity-dependent distortion effects within the fiber. N channels at R bit/s/channel

M channels at R N bits/s/wavelength

TDM

λ1

λ2

λ1

WDM

λ3

λ2 λ3

MRN bit/s/fiber Time-division multiplexing

Wavelength-division multiplexing

Figure 1.3 Long-distance lightwave communication systems commonly use a combination of time-division multiplexing and wavelength-division multiplexing.

1.6 Communication Channels

1.6

17

Communication Channels The goal of any communication system is the reliable transmission of information from a source to a destination. The discrete logical symbols that represent the information undergo several transformations that will be described at multiple nested levels. Each of these levels describes another view of the communication channel. Communication channels described by the propagation characteristics of a lightwave signal or an electrical signal are physical channels, and may be modeled using either wave optics or photon optics. These channels are discussed in Chapter 8. The channel described by the input and output probability distributions that represent information is called an information channel. This channel is discussed in Chapter 9. The relationship between these channels for a wave-optics signal model is shown in Figure 1.4. The lightwave channel is a physical channel whose input is the modulated lightwave signal coupled into the fiber. The output is the lightwave signal at the output face of the fiber. The same physical fiber medium, which defines the lightwave channel, can be treated in several different ways, using different signal models depending on the relevant aspect of the lightwave signal. When wave optics is used to model the lightwave signal, the resulting channel is a waveform channel. For this case, the channel input and the output channel are real or complex continuous-time waveforms. The electrical channel is a physical channel that surrounds the lightwave channel. An electrical channel based on wave optics may be treated as a waveform channel or as a discrete-time channel. As a waveform channel, the channel input is the baseband Transmitter Encoding

Channel

Baseband modulation

Frequency translation

Receiver Photodetection/ demodulation

Fiber channel

Filtering/ symbol detection

Codeword detection

Decoding

d Transmitted dataword

²d s Codeword

s(t) Baseband signal

s(t ) ±

±r(t)

Modulated lightwave signal

Received lightwave signal

r(t) Electrical signal

r Sensed codeword

²s

Detected codeword

Decoded dataword

Lightwave channel (waveform) Electrical channel (waveform or discrete-time) Information channel (discrete or continuous) Figure 1.4 Communication channels. A lightwave channel based on wave optics is a waveform

channel. An electrical channel based on wave optics may be either a waveform channel, as shown in the figure, or a discrete-time channel. A discrete information channel is based on a sequence of discrete numbers. A continuous information channel is based on a sequence of continuous numbers.

18

1 Introduction

electrical signal s (t ) before modulation. The channel output is the baseband electrical signal r (t ) after demodulation. Alternatively, when the electrical waveform channel is bandlimited, the sampling theorem can be used to construct an equivalent discrete-time channel. For this case, the channel input and the channel output are sequences of real or complex numbers. Electrical waveform channels and discrete-time electrical channels are discussed in Chapter 8. The information channel surrounds the electrical channel. The encoder prepares the user datastream to form a sequence of symbols that is the input to the information channel. The decoder processes the output sequence of numbers from the information channel to recover the user datastream, and rarely with errors. When the encoder output is a sequence of discrete numbers, a discrete information channel is generated. When the encoder output is a sequence of real or complex numbers, a continuous information channel is generated. When the encoder output is a real or complex waveform, a waveform information channel is generated. The encoder and decoder are studied in Chapter 13. Because the information channel surrounds the electrical channel, it is affected by all of the characteristics that define the electrical channel and the lightwave channel. The information channel depends on (i) the method of coded-modulation, which maps a sequence of encoded user data values into a physical quantity used for modulation; (ii) the propagation characteristics of the physical channel, which depend on the signal model; and (iii) the method of detection, which maps a sequence of samples derived from the received waveform into a sequence of detected symbols. This means that for the same physical channel, different information channels may be defined, and with different channel capacities. As an example, a system that uses a hard-decision detection process to produce a discrete value for each detected symbol leads to a different information channel than a system that uses a soft-decision detection process. The basic principles of information channels, studied under the term information theory, determine the optimal distribution of input symbols from the encoder, called the prior probability distribution, and the optimal detection process to achieve the maximum reliable rate of information flow. The theoretical capacity of an information channel is discussed in Chapter 14. The ultimate purpose of Chapter 14 is to understand how the physical characteristics of lightwave channels and electrical channels affect the form of the discrete information channel that conveys information.

1.6.1

Common Wave-Optics Communication Channels

The distinction between synchronous and asynchronous communication channels is illustrated by comparing two lightwave communication systems with a common phasesynchronous radio-frequency communication system shown in Figure 1.5(a). The passband radio-frequency signal, modeled as a continuous electromagnetic wave, is converted into an electrical signal by an antenna. This linear conversion process preserves both the amplitude and the phase of the incident electromagnetic wave within

1.6 Communication Channels

(a)

Received RF signal s(t)cos(2πfct)

Antenna (linear conversion)

Demodulation and filtering

Added noise

Demodulated noisy signal r (t) = 1 s(t ) + n e(t ) 2

+ cos(2 πfct )

(b)

Received noisy lightwave signal (s(t )

+ no (t )) cos(2πfct )

Square-law photodetection 2

|·|

19

n e(t)

Added noise

+

Demodulated noisy signal r(t ) =

Rs t 2

| ( ) +

no(t)| 2 + ne(t)

ne(t)

Figure 1.5 (a) A typical wireless communication receiver modeled using continuous electromagnetics. (b) A typical noncoherent intensity-modulated direct-photodetection lightwave receiver modeled using wave optics. The scaling constants are discussed in later chapters.

the received passband electrical signal. As a consequence, the electrical channel after reception can be described as a scaled form of the physical channel that defines the propagation of the electromagnetic wave between the transmitting antenna and the receiving antenna. Suppressing the scaling constants describing the physical channel, the resulting noisy demodulated electrical signal r (t ) can be written as r (t ) = r¯ (t ) + n e (t ),

(1.6.1)

where r¯ (t ) = 12 s (t ) is the noiseless received signal at the receiver, and ne (t ) is anadditive gaussian noise source. This noise model is fully developed in Chapter 6. The linearity of this channel allows sources of noise added at various stages of the system to be treated as a single equivalent noise source by proper scaling and summing. This type of channel model is an additive-noise channel model, with the noise being independent of the signal. The electrical channel for a phase-synchronous lightwave communication system is similar to the electrical channel for the phase-synchronous radio-frequency system shown in Figure 1.5(a). The linear conversion of the lightwave signal amplitude into an electrical signal amplitude, called balanced photodetection, is achieved by adding a local oscillator signal to the received lightwave signal before photodetection. The nonlinear square-law characteristic of the photodetector provides the mixing operation that translates the lightwave carrier frequency to a lower carrier frequency, thereby converting the lightwave signal to an electrical signal while preserving both the amplitude and phase of the lightwave signal. Functionally, the distinction between the electrical channel and the lightwave channel is simply a scaling constant associated with the optical/electrical conversion process. This type of demodulation is discussed in Chapter 8.

20

1 Introduction

By contrast, the most common phase-asynchronous lightwave communication system is based on intensity modulation and direct photodetection as shown in Figure 1.5(b), where a lightwave noise term no (t ) is included before direct photodetection to represent the noise introduced by the lightwave amplifiers. The characteristics of this noise source are discussed in Chapter 6. The use of direct photodetection leads to an electrical channel that is not a scaled version of the lightwave channel. Instead, the noisy electrical signal r (t ) generated by direct photodetection is proportional to the squared magnitude | s (t )+ no (t )|2 of the received noisy lightwave signal so that r (t ) =

R2 |s (t ) + no (t )|2 + n e (t ),

(1.6.2)

where an electrical noise n e (t ) is included after photodetection, and the scaling constant is the responsivity (cf. (6.2.23)). For this direct-photodetection demodulator, the square-law characteristic of the photodetector mixes the received lightwave signal s (t ) with the lightwave noise no (t ), thereby producing a form of signal-dependent noise that depends on the lightwave signal s (t ). This type of channel model is a signal-dependent-noise channel model. The electrical channel model is simpler when there is no additive lightwave noise. In this case, no (t ) = 0. The expression then becomes

R

r (t ) =

R P(t ) + ne (t ),

where P (t ) = |s (t )|2 /2. When noncoherent intensity modulation is used, the transmitted lightwave signal power is proportional to the modulating electrical signal. This leads to an electrical channel that is linear in the lightwave power, as is discussed in Chapter 8. The communication channels presented in this section are based on a continuousenergy, wave-optics signal model both for the signal source and for the noise source. If, instead, a discrete-energy, photon-optics signal model is used and the lightwave signal is converted into an electrical signal by counting the number of photoelectrons, then the electrical channel is modeled differently because the received signal and noise are modeled differently. This difference also affects the form of the discrete information channel generated after the detection process. We will devote a considerable effort to understanding how the physical channel properties, expressed using different signal models, affect the form of the resulting lightwave, electrical, and information channels, and the corresponding ability to convey information as expressed by the channel capacity.

1.6.2

Channel Capacity

An important performance measure for any communication system is the maximum rate at which information can be reliably conveyed by that system. This rate depends on the maximum of the average information that can be conveyed per symbol, which is called the single-letter channel capacity. The number of symbols per second that can be transmitted over the channel depends on the symbol rate, which depends on the bandwidth of the channel. No method of modulation and coding can exceed the capacity, but any rate not larger than the capacity can be achieved in principle.

1.6 Communication Channels

21

Any bandlimited channel has a maximum information rate (bits/s) that can be conveyed through the channel at an arbitrarily small probability of symbol error pe . This maximum information rate is called the bandlimited channel capacity C . The mathematical statement of the channel capacity, called the information-theoretic capacity, depends on the form of the channel model, which depends on the interface between the modem and the physical channel, which in turn depends on the form of the lightwave signal model. To the extent that the channel model is an approximation of the true channel, the corresponding information-theoretic capacity is an approximation of the capacity of the actual channel. Because the information-theoretic capacity depends on how the channel is modeled, the information-theoretic capacity of a channel determined from the wave-optics signal model is not necessarily equal to the information-theoretic capacity of the same channel determined from the photon-optics signal model or from the quantum-optics signal model. Moreover, if the detection method is considered to be part of the channel and not part of the modem, then the capacity of the information channel that uses hard-decision detection is not equal to the capacity of the information channel that uses soft-decision detection. Consider the additive gaussian-noise-limited channel model given in (1.6.1) based on wave optics. For this model, the channel has no distortion or signal-dependent noise, but independent gaussian noise ne (t ) is added to the signal within the channel. When the mean power P is constrained at the transmitter, the bandlimited capacity C depends solely on the bandwidth B of the transmission channel and on the average signal-tonoise ratio (SNR) at the receiver, which is the ratio of the signal power to the noise power over a bandwidth B. If the additive-noise term ne (t ) is white gaussian noise, which has a constant noise power per unit frequency, then the relationship between the bandlimited capacity C and the bandwidth B is called the Hartley–Shannon theorem

C

= B log2(1 + SNR) bits/s ,

(1.6.3)

for a frequency band that is an ideal rectangle of bandwidth B. This expression is the Shannon bound for the bandlimited capacity of an ideal, bandlimited wave-optics channel. The dependence of the channel capacity on the bandwidth B is linear for a fixed signal-to-noise ratio, and the dependence of the channel capacity on the signal-to-noise ratio is logarithmic. These relationships dictate the optimal distribution of the signal energy, as discussed in Chapter 14. For error-free communication, the information rate R in bits per second can never be larger than the channel capacity C. It can be made nearly equal to C by using a sophisticated coded-modulation format. When additional constraints or considerations are imposed on the channel, such as a peak-power constraint, then the capacity may be less than indicated by (1.6.3), but it cannot be larger. When the lightwave communication system is described using a photon-optics channel model, the information-theoretic capacity is somewhat different, and perhaps more informative about how the properties of lightwaves affect the channel capacity. For an ideal, bandlimited channel of bandwidth B with only photon noise, the bandlimited capacity is given by the elegant expression

22

1 Introduction

C

= B log2



1+

R

B

·

+ R log2



1+

B R

·

,

(1.6.4)

where R is the photon arrival rate (cf. (1.2.5)) in photons per second used to convey the information. This expression is the Gordon formula for the bandlimited capacity of an ideal, bandlimited channel with only photon noise. The first term on the right corresponds to (1.6.3), which may be described as the part of the bandlimited capacity attributed to the wave-nature of a lightwave. The second term on the right then corresponds to the part of the bandlimited capacity associated with the particle-nature of a lightwave. These two terms are symmetric, with the roles of R and B reversed for the second term compared with the first term. This expression is discussed in detail in Chapter 14. Channel Efficiency

The efficient use of a channel can be described in several ways. The maximum spectral rate efficiency is defined as the ratio of the information capacity to the bandwidth given by C / B, measured in bits per second per hertz. The spectral rate efficiency r is defined as the ratio of the information rate R to the bandwidth given by R / B . Given that the information rate R can never be larger than the capacity for error-free communication, we have r = R / B ≤ C / B . The design goal for the coded-modulation format is to satisfy this inequality nearly with equality. Often, the bandwidth B (Hz) and the information rate R (bits/s) are used interchangeably, but this usage is imprecise. The channel capacity C in bits per second depends on B, but, without specifying the signal-to-noise ratio, one cannot determine the capacity from the bandwidth. The efficient use of energy is another important design consideration and depends on the channel characteristics and on the modulation format. The energy per bit is defined as the ratio of the power P to the information rate R given by E b = P / R. In Chapter 14, it is shown that for gaussian noise, as R / B goes to zero, meaning that the bandwidth B goes to infinity, there is a minimum ratio of the bit energy E b (joules) to the noise power density spectrum N0 (watts/hertz) required for reliable communication within the waveoptics signal model. This least possible bit energy is the famous Shannon bound on the spectral rate efficiency.8 A different bound is obtained for a photon-optics signal model. These issues are considered in detail in Chapter 14. The design of a modern communication system that achieves a high information rate near the channel capacity by efficiently using both the energy and the bandwidth is one of the principal goals of modern communication engineering.

1.7

References There are many available texts on communications. At the undergraduate level see Blahut (2010), Haykin (2001), Proakis and Salehi (2007), and Ziemer and Tranter (2015). At the graduate level, see Proakis (2001) and Benedetto and Biglieri (1999). 8 Remarkably, modern coded-modulation formats for some wireless and wireline systems are now close to

the Shannon bound of E b /N 0

> loge 2 = 0. 69 on a linear scale or −1.6 dB on a decibel scale.

1.8 Historical Notes

23

Lightwave communication systems are covered in Ross (1966), in Einarsson (1996), in Gagliardi and Karp (1976), and in Gowar (1984).

1.8

Historical Notes Free-space lightwave communication systems predate all forms of electronic communication by many centuries because of our native ability to sense light. Signal towers were constructed in antiquity and signaling lamps are still used for some ship-to-ship communications. These systems were digital in a time when most systems were analog. The use of light for communication was more recently proposed by Alexander Graham Bell (1880), who made his “photophone” work over a distance of several hundred meters in the late 1800s. It was also realized in antiquity that practical free-space lightwave communication systems were limited because of the highly variable attenuation in the atmosphere from smoke, fog, rain, or dust. The development of the technologies that enabled practical guided lightwave communication systems has a much shorter history, but it does now occupy more than a half century. Among these technology developments, three proved to be seminal. The first development was based on two independent analyses in 1966: one in Kao and Hockham (1966) and a second in Werts (1966). Kao argued that if it were possible to produce glass with sufficiently low loss, then optical fibers would be a viable alternative to electrical wires. Subsequent research led to the publication in Kapron, Keck, and Mauer (1970) of the first optical glass fiber with an attenuation of 20 decibels per kilometer. The second breakthrough was the development of the semiconductor laser. Demonstrations of laser action in semiconductors were reported in Hall, Fenner, Kingsley, Soltys, and Carlson (1962), in Nathan, Dumke, Burns, Dill, and Lasher (1962), in Holonyak and Bevacqua (1962), and in Quist, Rediker, Keyes, Krag, Lax, McWhorter, and Zeigler (1962). All of the papers were independently submitted within a span of less than three months in 1962. The practical breakthrough of the first continuous-wave room-temperature semiconductor laser was reported by Alferov, Andreev, Portnoi, and Turkan (1970). The third seminal technology was the development of a practical optical amplifier suitable for lightwave communications. This was reported in the same year by Mears, Reekie, Jauncey, and Payne (1987), and Desurvire, Simpson, and Becker (1987) and was based on doping the rare-earth element erbium into the core of an optical fiber. This development rapidly led to long-haul high-data-rate lightwave transmission systems based on wavelength-division multiplexing and erbium-doped fiber amplifiers, with the first commercial wavelength-division multiplexing system deployed in 1996 (Gowan 2012) and the first commercial lightwave modem based on a coherent source and phase-synchronous demodulation reported in Sun, Wu, and Roberts (2008). The increased power in these wavelength-multiplexed systems led to studies of the nonlinear fiber communication channel. Early work in this area was reported in Waarts and Braun (1986) and in Inoue and Shibata (1989). It appears that the first study of the channel capacity of the nonlinear fiber channel was presented in Splett, Kurtzke,

24

1 Introduction

and Petermann (1993), with subsequent work in Mitra and Stark (2001). The development of advanced coded-modulation formats designed for the nonlinear fiber channel that can maximize the system reach or the information rate is an active research area. The enormous demand for worldwide data-intensive services continues to drive lightwave communication technology forward through the demands of the marketplace. The chapters of this book provide much of the theory that underlies such future developments.

1.9

Problems 1 Spectrum of an amplitude-modulated signal Let s t be a bandlimited baseband signal with a frequency spectrum S f given by 1 f f max for f f max , where f max is the maximum frequency of the baseband signal. This baseband signal is multiplied by cos 2 f c t to produce an amplitudemodulated passband signal s t s t cos 2 f c t , where f c is the carrier frequency and f c is much larger than f max. (a) Sketch the frequency spectrum S f of the baseband signal. (b) Sketch the frequency spectrum S f of the amplitude-modulated passband signal. (c) The amplitude-modulated (AM) signal is demodulated by multiplying s t by a coherent signal of the form cos 2 fc t . The signal is filtered by an ideal lowpass filter with a cutoff frequency f max . Sketch the magnitude of the frequency spectrum of the demodulated signal.

() − | |/

| |
0 t =0 t < 0.

(2.1.5)

A linear shift-invariant system is causal if its impulse response h (t ) satisfies h(t ) = h (t )u(t ) except at t = 0. For this case, the lower limit of the integral for the output signal given in (2.1.3) is equal to zero. A function related to the unit-step function is the signum function defined as

.

sgn (t ) = 2u (t ) − 1

⎧ 1 ⎨ =. ⎩ 0 −1

for for for

t >0 t =0 t < 0.

(2.1.6)

30

2 Background

A system for which the output r (t ) depends on only the current value of s (t ) is called memoryless. The corresponding property in space is called local.

The Fourier Transform

The Fourier transform2 (or spectrum) S ( f ) of the temporal signal s (t ) is defined, provided the integral exists, as S( f ) =

²∞

−∞

s (t )e−i2π f t dt .

(2.1.7)

The Fourier transform formally exists for any signal whose energy3 E , given by E

=

²∞

−∞

|s (t )|2 ,

(2.1.8)

is finite. Such signals are called finite-energy or square-integrable signals. The Fourier transform can be extended to include a large number of signals and generalized signals with infinite energy, but finite power, such as cos(2π fc t ) and ei2π f c t , by means of a limiting process that often can be expressed using the Dirac impulse δ(t ). The signal s (t ) can be recovered as an inverse Fourier transform s (t ) =

²∞

−∞

S ( f )ei2π f t d f ,

(2.1.9)

with s (t ) ←→ S ( f ) denoting the Fourier-transform pair. To this end, two signals whose difference has zero energy are regarded as the same signal. Another way to say this is that the two signals are equal almost everywhere. A Fourier transform can also be defined for spatial signals. For a one-dimensional spatial signal f (x ), we have 4 F ( k) =

²∞

−∞

f (x )eikx dx ,

(2.1.10)

where the continuous variable k is the spatial frequency, which is the spatial equivalent of the temporal frequency ω = 2π f .

Properties of the Fourier Transform

Several properties of the Fourier transform used to analyze communication systems are listed below. 1. Scaling s (at ) ←→

1 | a| S

³f´ a

(2.1.11)

∞ s( t ) ω = 2π f µ(Hz) is also used to define a Fourier transform pair, where S(ω) = µ−∞ ∞ i ω t ( ) = (1/2π) −∞ S(ω)e dω. We will use this alternative notation for electromagnetics,

2 Angular frequency e−iωt dt, and s t

where it is conventional.

3 The term energy here refers to a mathematical concept and does not necessarily correspond to physical

energy.

4 The usual sign convention for the spatial Fourier transform is the opposite of the sign convention for the

temporal Fourier transform, but this is a matter of preference.

2.1 Linear Systems

31

for any nonzero real value a. This scaling property states that the width of a function in one domain scales inversely with the width of the function in the other domain. 2. Differentiation d s (t ) ←→ i2π f S ( f ). (2.1.12) dt The dual property is 1 d t s (t ) ←→ − S( f ). (2.1.13) i2π d f 3. Convolution

s (t ) ± h(t ) ←→ S( f ) H ( f ).

(2.1.14)

Convolution in the time domain is equivalent to multiplication in the frequency domain. The dual property is s (t )h(t ) ←→ S( f ) ± H ( f ).

(2.1.15)

4. Modulation A special case of the convolution property occurs when h (t ) = ei2π f c t and gives s (t )ei2π f c t

←→ S ( f − f c )

(2.1.16)

for any real value f c . Multiplication in the time domain by e i2π fc t translates the frequency origin of the baseband signal S( f ) to the carrier frequency f c . This can be written as S( f ) ± δ( f

− fc ) = S ( f − f c ). The modulation process is linear with respect to s (t ), but does contain frequency components that are not present in the original baseband signal. The dual property is s (t

− t0) ←→ e−i2π f t S( f ) 0

(2.1.17)

for any real value t 0. 5. Parseval’s relationship Two signals s(t ) and h(t ) with finite energy satisfy

²∞

−∞

s (t )h∗ (t )dt

=

²∞

−∞

S ( f ) H ∗( f )d f .

(2.1.18)

When h(t ) = s (t ), the two integrals express the energy in s (t ) computed both in the time domain and in the frequency domain. These energy integrals are equal and finite. If s (t ) is real, then the following relationships hold for S( f ) = SR ( f )+ iSI ( f ), where . . SR ( f ) = Re[S ( f )] is the real part and SI ( f ) = Im[ S( f )] is the imaginary part of the Fourier transform: S( f ) = S ∗(− f )

SR ( f ) = SR (− f )

SI ( f ) = − SI (− f )

(2.1.19)

32

2 Background

|S ( f )| = | S(− f )| φ( f ) = −φ(− f ), ¶ where | S( f )| = S ( f )2 + S ( f )2 is the magnitude5 of the Fourier transform, and ³ ´ . − 1 S (f) φ( f ) = arg S( f ) = tan S ( f ) R

I

I

R

is the phase of the Fourier transform. A consequence of these properties is that the Fourier transform of a real signal s (t ) is conjugate symmetric, meaning that the negative-frequency part of the Fourier transform contains the same information as the positive-frequency part. This observation allows us to construct an equivalent representation of the real signal that consists only of the nonnegative-frequency components. To do so, define

⎧ 2S( f ) . ⎨ S(0) Z( f ) = ⎩ 0

>0 =0 (2.1.20) < 0. The function Z ( f ) is equal to twice the positive part of S ( f ) for positive frequencies, has a value equal to S(0) at the zero-frequency component,6 and contains no negativefrequency components. The inverse Fourier transform of Z ( f ) is called the analytic signal z (t ) corresponding to s (t ) with z(t ) ←→ Z ( f ). The analytic signal z(t ) is complex. Similarly, the real signal s (t ) is related to z (t ) by ( ) s (t ) = 21 z(t ) + z ∗ (t ) (2.1.21a) = Re[z (t )], (2.1.21b) where S( f ) = S∗ (− f ) has been used because s(t ) is real. The analytic signal z (t ) is directly related to the real signal s (t ) by z (t ) = s (t ) + i · s (t ), where · s (t ) is the Hilbert transform of s (t ) defined as ² 1 ∞ s (τ) dτ. (2.1.22) ·s (t ) = π −∞ t − τ The Hilbert transform is formally the convolution of s (t ) and (π t )−1. For example, if s (t ) = cos(2π f t ), then the analytic signal is z (t ) = cos(2π f t ) + i sin (2π f t ) = ei2π f t . for for for

f f f

The Hilbert transform relates a real function of time to a complex function of time with a one-sided function of frequency Z ( f ). A counterpart of the Hilbert transform, called the Kramers–Kronig transform (or the Kramers–Kronig relationship), relates a function of frequency to a real-valued one-sided (causal) function of time. The inverse of this transform relates a real-valued one-sided function of time to a function of frequency. Let s(t ) be a real-valued causal function with Fourier transform S(ω) = SR (ω)+ iSI (ω), 5 The word modulus is sometimes used for the magnitude of a complex number. 6 The value at zero frequency is often called the DC value (direct current) even if the signal does not

represent current.

2.1 Linear Systems

here conventionally stated using the angular frequency SR (ω) and SI (ω) are related by

ω in place of

f . The functions

² ∞ S (±) S (ω) = π −∞ ω − ± d±. 1

I

33

R

(2.1.23)

An alternative form of the Kramers–Kronig transform expresses SR (ω) in terms of SI (ω).

Modes of a Linear Time-Invariant System

Let the input to a linear time-invariant system characterized by an impulse response h (t ) consist of a single complex frequency component given by s (t ) = ei2π f t . Using the commutative property of the convolution operation, the output is

²∞

h(τ)ei2π f ( t −τ) dτ −∞ ² ∞ = ei2π f t h(τ)e−i2π f τ dτ −∞ = H ( f )ei2π f t ,

r (t ) =

(2.1.24)

where H ( f ), called the transfer function, is the Fourier transform of h (t ). For any frequency f0 , the output H ( f 0)ei2π f 0t is a scaled version of the input ei2π f 0t . Therefore, the function ei2π f 0t is an eigenfunction,7 eigenmode, or simply a mode of a linear, shift-invariant system, with the value H ( f 0 ) being the eigenvalue. Given that a linear, shift-invariant system can only scale the function ei2π f0 t by a complex number H ( f 0), a linear shift-invariant system cannot create new frequency components. Any s (t ) at the input to h(t ) will have an output described by the convolution r (t ) = s (t ) ± h(t ). Using the convolution property of the Fourier transform given in (2.1.14), the output signal R ( f ) in the frequency domain is given by R ( f ) = S ( f ) H ( f ), where S( f ) is the Fourier transform of s(t ). The inverse Fourier transform (cf. (2.1.9)) of R ( f ) yields the output signal r (t ), r (t ) =

²∞

−∞

S( f ) H ( f )ei2π f t d f .

(2.1.25)

The relationship between the input and the output of a linear shift-invariant system both in the time domain and in the frequency domain is shown in Figure 2.2, where the two-way arrows represent the Fourier-transform relationship. s(t)

h(t)

r(t) = s(t) ⊗ ⊕ h(t)

S( f )

H(f )

R(f ) = S(f )H( f)

Figure 2.2 Time and frequency representation of a time-invariant linear system. 7 In general, an eigenfunction of a linear transformation is any function that is unchanged by that

transformation except for a scalar multiplier called an eigenvalue.

34

2 Background

2.1.1

Bandwidth and Timewidth

Signals used in communication systems are constructed from finite-energy pulses. A signal s (t ) of finite energy must have most of its energy in some finite region of the time axis, and its Fourier transform S( f ) must have most of its energy in some finite region of the frequency axis. The timewidth is a measure of the width of the signal s(t ). The bandwidth is a measure of the width of the spectrum S( f ). The root-mean-squared bandwidth rms of a signal s (t ) with nonzero energy is defined by ² .= 1 ∞ ( f − f )2|S ( f )|2 d f , 2 (2.1.26) rms E −∞

W

W

where S( f ) is the Fourier transform of s (t ), where

. f =

²

1 ∞ f | S ( f )|2 d f E −∞

µ∞

(2.1.27)

is the centroid or the mean of the term | S( f )| 2/ E , and where E = −∞ | S( f )| 2 d f is the energy in the pulse s (t ). Expanding the square of (2.1.26) and simplifying yields the alternative form

W

2 rms

= =

where f2

²

1 ∞ 2 f | S( f )| 2 d f E −∞ f2 − f

=.

2



,

f

2

(2.1.28)

²

1 ∞ 2 f | S( f )| 2 d f E −∞

(2.1.29)

is defined as the mean-squared frequency. The root-mean-squared timewidth Trms is defined in an analogous fashion to (2.1.28) as

= t 2 − t 2, (2.1.30) µ µ ∞ ∞ 2 defines the where t = E1 −∞ t | s (t )|2 dt and t 2 = E1 −∞ t 2| s (t )|2 dt. The value Trms root-mean-squared timewidth of the pulse s(t ). 2 Trms

W

The relationship between Trms and rms for a baseband pulse is shown in Figure 2.3(a). For a passband pulse, as shown in Figure 2.3(b), the root-mean-squared bandwidth is a one-sided measure of the bandwidth. The passband bandwidth B is twice the baseband bandwidth because it occupies twice the frequency range. Other measures of the bandwidth and timewidth are common. The ideal bandwidth is the smallest value of such that S ( f ) = S( f )rect( f / ). The three-decibel bandwidth or half-power bandwidth of a signal s (t ) whose power density | S( f )| 2 is unimodal is denoted by h . It is defined as the frequency at which | S ( f )|2 is half (or −3 dB) of the power density at the maximum of | S( f )|2 . The effective real timewidth Treal of a nonnegative real pulse s(t ) is defined as

W

W

W

W

. Treal =

1 E

³² ∞

−∞

s (t )dt

´2

2.1 Linear Systems

(a)

35

S(t)

s(t)

Trms

rms

t

(b)

f S(f)

s(t) t –fc

f

fc

Figure 2.3 (a) A baseband pulse and its spectrum. (b) A passband pulse and its spectrum.

(µ ∞ s (t )dt )2 = µ−∞ ∞ s 2 (t )dt .

(2.1.31)

−∞

Instead, the effective complex timewidth Tcomplex of a complex pulse s(t ) is defined differently in terms of the instantaneous power P (t ) = | s(t )| 2 as

(µ ∞ P(t )dt )2 . Tcomplex = µ−∞ ∞ −∞ P 2 (t )dt (µ ∞ |s(t )|2 dt )2 = µ−∞ . ∞ 4 −∞ | s (t )| dt

(2.1.32)

These two definitions are similar, but are not the same.

Timewidth–Bandwidth Product

The Schwarz inequality, discussed in Section 2.1.3 (cf. (2.1.72)), can be used to determine a lower bound on the timewidth–bandwidth product8 of the root-mean-squared timewidth Trms of the signal s (t ) and the root-mean-squared bandwidth rms of the corresponding spectrum S( f ). A pulse s (t ) with a mean time t and a mean frequency f has the same timewidth and bandwidth as the pulse s (t − t )ei2π f . Therefore it suffices s (t ) with both means, t and f , equal to zero. Normalize the energy so that µto∞consider µ∞ 2 2 −∞ | s(t )| dt = −∞ | S( f )| d f = 1 (cf. (2.1.18)). The derivation then follows from the expression

W

d ds∗ (t ) ds (t ) ( t |s (t )| 2) = |s (t )| 2 + ts(t ) + ts∗ (t ) dt ¸ dt ds∗ (t ) ¹ dt = |s (t )|2 + 2 Re ts(t ) dt .

Integrate both sides from −∞ to ∞ ,

¹ ¸² ∞ ºº∞ ² ∞ ds ∗ (t ) dt . = | s (t )|2 dt + 2 Re ts(t ) −∞ dt

t |s (t )|2 º

−∞

8 This is also called the time–bandwidth product.

−∞

(2.1.33)

36

2 Background

The left is zero because the power | s (t )|2 in a finite-energy pulse must go to zero faster than 1/| t | as |t | goes to infinity. Therefore, the squared magnitudes of the terms on the right are equal so that

º2 º ¸² ºº² ∞ ¹º2 ºº |s (t )|2 dt ººº = 4 ºººRe ∞ ts(t ) ds ∗ (t ) ººº . dt −∞ −∞

(2.1.34)

Setting the left side to E 2 and applying the Schwarz inequality given in (2.1.72) to the right gives E

2

² ∞ ºº ds ∗ (t ) ºº2 º º dt . ≤ 4 |ts(t )| dt −∞ −∞ º dt º ²∞

2

(2.1.35)

2 The first integral on the right equals E Trms (cf (2.1.30)) because t¯ = 0. Using the differentiation property of the temporal Fourier transform (cf. (2.1.12)) and Parseval’s relationship (cf. (2.1.18)), the second integral can be written as (cf. (2.1.26))

² ∞º ² ∞ ºº ds∗ (t ) ºº2 2 ºi2π f S∗ ( f )ºº2 d f = 4π 2 EWrms ºº ºº dt = . dt −∞ −∞

With these expressions, (2.1.35) now leads to the following inequality for the timewidth– bandwidth product:9 Trms

W

rms

≥ 41π .

(2.1.36)

2 As an example, the Fourier transform S( f ) of a gaussian pulse s(t ) = e−π t in time is a gaussian pulse in frequency, with the transform pair given by

e−π t

2

←→ e−π f . 2

(2.1.37)

Each of these pulses expressed in the standard form e− x / 2σ , is characterized by 2 is defined using |s (t )|2 and 2 σ 2 = 1/2π . For a gaussian pulse, because Trms rms 2 2 2 is defined using | S ( f )| , Trms = rms = 1 /4π , so that Trms rms = 1/ 4π , which satisfies (2.1.36) with equality. When rms is expressed using angular frequency, Trms rms = 1/2. This means that a gaussian pulse, perhaps time-shifted or frequency-shifted, produces the minimum value of the timewidth–bandwidth product. 2

W

W

W

2

W

W

Communication Pulses

Several basic pulses are conventionally used in the study of communication systems. A rectangular pulse of unit height and unit width centered at the origin is the rect pulse defined as

.

rect(t ) =

»

1 0

|t | ≤ 1/2 |t | > 1/2.

9 This is called the Heisenberg uncertainty relationship in other contexts.

(2.1.38)

2.1 Linear Systems

37

The Fourier transform of this rectangular pulse is

²∞

S( f ) =

−∞

s (t )e−i2π f t dt

= sinπ(πf f ) = sinc( f ), where the sinc pulse is defined as

(2.1.39)

sin(π t ) (2.1.40) πt . The sinc pulse has its zeros on the nonzero integers and is equal to one for t equal to zero. The Fourier-transform pairs

.

sinc (t ) =

rect(t ) ←→ sinc( f ),

(2.1.41a)

sinc(t ) ←→ rect( f )

(2.1.41b)

are duals. The scaling property of the Fourier transform gives the pair 1 rect(t / T ) ←→ sinc( f T ). (2.1.42) T In the limit as T goes to zero, the left side approaches a Dirac impulse and the right side approaches the constant one. In the sense of this limit, the Fourier-transform pair and its dual

δ(t ) ←→ 1

(2.1.43)

1 ←→ δ( f )

(2.1.44)

can be defined. Another useful Fourier-transform pair that is defined using a limiting process is the Fourier transform of an infinite series of Dirac impulses. This transform pair is given by10

∞ ¼

j =−∞

δ(t − j ) ←→

∞ ¼

j =−∞

δ( f − j ).

This Fourier-transform pair is abbreviated as comb (t ) pair

∞ ¼

j =−∞

δ(t − j Ts ) ←→ (1/ Ts )

∞ ¼

j =−∞

(2.1.45)

←→ comb( f ). The transform δ( f − j / Ts )

(2.1.46)

then follows from the scaling property of the Fourier transform. Another useful pulse is the gaussian pulse, 2 2 s (t ) = e−t / 2σ .

(2.1.47)

10 This pair is whimsically called the “picket-fence miracle.” A companion statement that avoids the use of

impulses is the Poisson summation formula.

38

2 Background

Table 2.1 Table of Fourier-transform pairs

s (t )

S( f )

δ( f )

1

δ(t )

1

ei2π fc t

δ( f − f c ) sinc( f ) rect ( f )

rect (t )

sinc(t )

e−π t

2



π e−2π|t | ∞ ¼ δ(t − j )

∞ ¼

j =−∞

j =−∞

comb (t ) j =− K

σ

2 2

e−i π/4 e iπ f 1 1+ f2

2

K ¼

2

2πσ e−2π

2 2 e−t /2σ

e−i π t

e −π f

f2

2

δ( f − j )

comb ( f ) ½ ¾ sin ( 2K + 1)π f sin(π f )

δ(t − j )

←→ e−π f given in (2.1.37) becomes √ √ e−t / 2σ ←→ 2πσ e−2π σ f = 2πσ e−σ ω / 2,

The transform pair e−π t

2

2

2

2

2 2 2

2 2

(2.1.48)

by using the scaling property of the Fourier transform given in (2.1.11). Inserting the mathematical symbol i into the argument of a gaussian pulse gives another pulse, called a quadratic phase pulse, a chirp pulse, or an imaginary gaussian 2 pulse. Because e−iπ t has infinite energy, it does not conform to the requirements of the formal definition of a Fourier-transform pair. Therefore, the Fourier transform must be defined by a limiting process and is given by e−iπ t

2

←→ e−iπ/4 eiπ f = e−i π/4eiω /4π . 2

2

(2.1.49a)

The duality property of the Fourier transform gives e−iπ/4 eiπ t

←→ e−iπ f .

(2.1.50)

π e−2π |t | ←→ 1 +1 f 2 ,

(2.1.51)

2

2

Another transform pair is

with the pulse waveform in the frequency domain called a lorentzian pulse. For this pulse Trms exists, but rms does not (or is infinite). A list of Fourier-transform pairs is provided for reference in Table 2.1.

W

2.1 Linear Systems

2.1.2

39

Passband and Complex-Baseband Signals

A passband signal is a signal of the form

¿s (t ) = A(t )cos (2π f c t + φ(t )) , (2.1.52) A(t ) is the amplitude, φ(t ) is the phase, and both A (t ) and φ(t ) vary slowly

where compared with the carrier or reference frequency f c . Lightwave signals of the form of (2.1.52) are passband signals because of the high frequency ( f c ≈ 1014 Hz) of the lightwave carrier compared with the baseband modulation bandwidth on a single lightwave carrier, which is not more than one terahertz (1012 Hz). An equivalent representation for a passband signal can be derived using the trigonometric identity cos( A + B ) = cos A cos B − sin A sin B to yield

¿s (t ) = s (t )cos(2π f c t ) − s (t )sin(2π f c t ), (2.1.53) where s (t ) = A (t ) cos φ(t ) is the in-phase component, and s (t ) = A(t )sinφ(t ) is I

Q

I

Q

the quadrature component. A passband signal can also be written as the real part of a complex signal,

ÀÁ Â Ã ¿s (t ) = Re A(t )eiφ(t ) ei2π f t À( Ã ) = Re s (t ) + is (t ) ei2π f t À Ã = Re s (t )ei2π f t ,

(2.1.54a)

c

I

(2.1.54b)

c

Q

(2.1.54c)

c

where s (t ) = s I (t ) + is Q (t ) is the complex-baseband signal11 that represents the passband signal ¿ s (t ). The amplitude of a passband signal can be written in terms of the root-mean-squared amplitude A rms (t ), defined by

. Arms (t ) =

IJ

IJ

Å

1 2 A (t ), 2 T T (2.1.55) µ where 0T cos2(2π f c t )dt = 1/ 2 has been used, and the integration time T is large compared with 1 / fc and small compared with any temporal variation of A (t ). The amplitude-modulated waveform shown in Figure 1.2 is an example of a passband signal. If the complex-baseband signal is a function of both time and space, then the signal is called the complex signal envelope a(z , t ), and is often expressed using a root-mean-squared amplitude. The Fourier transform of the passband signal ¿ s (t ) is

¿s (t )2 dt =

¿S ( f ) =

²∞

−∞

( A(t )cos(2π f c t + φ(t )))

¿s (t )e−i2π f t dt

2

=

²∞

−∞

À

dt



Ã

Re s (t )ei2π f c t e−i2π f t dt .

11 The real and imaginary components of a complex-baseband signal are conventionally subscripted I and

Q coming from the in-phase and quadrature designations of the components of the passband signal.

40

2 Background

(z + z ∗ ) gives ²∞Á  ¿S( f ) = 1 s (t )ei2π f t + s ∗ (t )e−i2π f t e−i2π f t dt . 2

Using the identity Re[z ] =

1 2

c

−∞

c

Applying the modulation property of the Fourier transform yields

¿S ( f ) = 1 ( S( f − f c ) + S ∗ (− f − f c )) , 2

(2.1.56)

where S( f ) is the Fourier transform of the complex-baseband signal s (t ). The notion of a passband signal implies that S ( f − f c ) and S( f + f c ) have essentially no overlap. A passband impulse response ¿ h(t ) has a passband transfer function of the form

¿( f ) = H ( f H

− f c ) + H ∗ (− f − f c ),

(2.1.57)

which has the same functional form as (2.1.56), but without the factor of 1/ 2. Provided the terms H ( f − f c ) and H ( f + f c ) do not overlap,

ºº ¿ ºº2 H ( f ) = | H ( f − f c )|2 + | H (− f − f c )|2 .

(2.1.58)

To define the baseband equivalent of the passband system, the complex-baseband transfer function H ( f ) is defined as

» H¿( f + f ) c H( f ) = 0

f f

< fc > fc.

Using the modulation property of the Fourier transform given in (2.1.16) and (2.1.57), the real passband impulse response can be written as

¿h (t ) = h (t )ei2π f t + h ∗(t )e−i2π f t = 2 Re[h (t )ei2π f t ], c

c

c

(2.1.59)

where h(t ) is the complex-baseband impulse response, which is the inverse Fourier transform of the complex-baseband transfer function H ( f ). The output passband signal r (t ) has a Fourier transform given by

¿( f ) = ¿S( f ) H¿( f ) R ( = 21 R( f − fc ) + R∗ (− f

) − fc) , (2.1.60) ¿( f ), noting that the two cross which can be verified using the definitions of ¿ S( f ) and H ∗ ∗ terms S ( f − f c ) H (− f − f c ) and S (− f − f c ) H ( f − f c ) are zero under the same set of constraints as was used to derive (2.1.58). In summary, the output of a passband linear system ¿ h(t ) for a passband signal ¿ s (t ) at the input can be determined in either the time domain or the frequency domain using

¿r (t ) = ¿s (t ) ± ¿h(t ), ¿( f ) = ¿S( f ) H¿( f ). R

2.1 Linear Systems

41

With these translated to complex baseband using the same frequency reference, the output of the corresponding complex-baseband linear system h(t ) with an input complex-baseband signal s (t ) is r (t ) = s (t ) ± h(t ),

R ( f ) = S( f ) H ( f ). 2.1.3

Signal Space

The set of all complex functions of finite energy on an interval [0, T ] or [−T /2, T /2] defines the signal space over that interval. The elements of the signal space are called signals. Two elements within the signal space are deemed to be equivalent, or equal, if the energy of the difference of the two signals is zero. Any element of signal space can be multiplied by a complex number to produce another element in signal space. Two elements of signal space can be added to produce another element of signal space. Accordingly, signal space is a vector space. Its elements can also be called vectors. A countable set of signals {ψn (t )} of a signal space spans that signal space if every element of the signal space can be expressed as a linear combination of the ψn (t ). This means that every s(t ) can be written as s (t ) =

∞ ¼ n=1

sn ψn (t ),

(2.1.61)

in the sense that the difference between the left side and the right side has zero energy. The set {ψn (t )} is called a basis if the elements of the set are linearly independent and span the signal space. Every basis for a signal space is countably infinite. An orthonormal basis {ψn (t )} for signal space is a countably infinite basis that satisfies the additional requirement that

²T 0

ψm (t )ψn∗ (t )dt =. δmn ,

(2.1.62)

for all m and n, where δ mn is the Kronecker impulse. This means that each basis function satisfies

²T 0

and

²T 0

|ψn (t )|2 dt = 1

ψm(t )ψn∗ (t )dt = 0

(m ±= n).

(2.1.63a)

(2.1.63b)

For an orthonormal basis, the coefficient sn of the expansion in (2.1.61) is given by sn

=.

²T 0

s(t )ψn∗ (t )dt .

(2.1.64)

A set of basis functions must span the entire signal space, which implies that the number of functions in any basis for signal space is infinite. A basis must be infinite, but

42

2 Background

not every infinite orthonormal set is a basis for the set of square-integrable functions. Two examples, described below, are the Fourier basis and the Nyquist basis. A linear transformation on a signal space is a mapping from the space onto itself that satisfies the linearity properties. With respect to a fixed basis, a linear transformation can be described by a matrix, called the transformation matrix.

Inner Product For any orthonormal basis {ψm (t )} a signal s (t ) in the signal space over [0, T ] is completely determined by an infinite sequence of complex components sn , which are the coefficients of the expansion given in (2.1.61). These coefficients may be regarded as forming an infinitely long vector s called a signal vector. Given a signal vector s with complex components, define the conjugate transpose vector as the vector s† whose components are the complex conjugates of the corresponding components of the vector s. If s is a column vector, then s† is a row vector. ∑ ∑ Using a (t ) = m am ψm (t ) (cf. (2.1.61)) and b(t ) = n bn ψn (t ), define the inner product as

. a·b= = = =

²T 0

a (t )b∗ (t )dt

¼¼

m n ¼ ¼ m ¼ m

n

am b∗

²

n

T

0

ψm (t )ψn∗(t )dt

am b∗n δmn

am b∗m ,

(2.1.65)

where (2.1.62) is used in going from the second line to the third line. Setting a(t ) equal to b(t ) in (2.1.65) immediately gives the energy statement

²T 0

|a(t )|2 dt =

¼ m

|am |2 .

(2.1.66)

For a finite-energy signal a(t ), this implies that |am |2 goes to zero as m goes to infinity. An arbitrarily small amount of energy is discarded by including only N terms for some integer N . The term am is a component of the (column) signal vector a. The term bm∗ is the component of a (row) signal vector b† . Therefore, a · b = b† a,

(2.1.67)

where b †a is the matrix product of a 1 × N matrix and an N × 1 matrix. Using (2.1.66), the energy in the signal s (t ) is E

=

²T 0

|s (t )|2 dt =

¼ n

|sn |2 = |s|2 .

(2.1.68)

2.1 Linear Systems

Similarly, the component sn in (2.1.64) is determined using a (t ) ψn (t ), sn

=

²T 0

s (t )ψ n∗ (t )dt

43

= s (t ) and b(t ) =

=. s · ψ n .

(2.1.69)

This expression defines the projection of s (t ) onto ψn (t ). The vector ψ n has the nth component equal to one and all other components equal to zero. It is a basis vector that corresponds to the basis function ψn (t ) defined in (2.1.61). The set of all linear combinations of basis vectors along with an inner product is an instance of a Hilbert space.

Outer Product . In contrast to the inner product operation a · b = b †a defined in (2.1.65), which produces a scalar, the outer product operation of two vectors produces a transformation. Just as the inner product operation is invariant with respect to a change in basis, the outer product, as an operation, is invariant with respect to a change in basis. However, the matrix representation M of an outer product with elements m i j = ai b∗j does depend on the basis. The outer product of two column vectors is defined as

.

a ⊗ b = ab† .

(2.1.70)

The outer product is also called the dyadic product or the tensor product of the two vectors. The outer product distributes over both addition and scalar multiplication so that

(αa + β b) ⊗ (γ c + δd) = αγ (a ⊗ c) + αδ(a ⊗ d) + βγ (b ⊗ c) + βδ(b ⊗ d). Schwarz Inequality Any two signal vectors satisfy

|s1 · s2|2 ≤ |s1|2 |s2 |2 , which is known as the Schwarz inequality. For the signal space of complex functions with finite energy on the interval [0, T ], the Schwarz inequality can be written using (2.1.65) as

ºº² T º2 ² ² ºº s1(t )s2∗ (t )dt ººº ≤ T |s1(t )|2 dt T |s2 (t )|2 dt , 0 0 0

whereas for the set of square-integrable functions on the infinite line, it is

(2.1.71)

ºº² ∞ º2 ² ∞ ²∞ 2 ºº s1 (t )s2∗ (t )dt ººº ≤ |s1(t )| dt |s2(t )|2 dt . (2.1.72) −∞ −∞ −∞ In either case, the equality holds if and only if s1(t ) = ks2 (t ) for some constant k, possibly complex.

44

2 Background

Distance in a Signal Space 2 Using (2.1.65) and (2.1.68), the squared euclidean distance d12 between two signals s1 (t ) and s2 (t ), or, equivalently, between two signal vectors s1 and s2, is defined as 2 d12

=.

²T

|s1(t ) − s2(t )|2 dt ²0 T Á Â = |s1 (t )|2 + |s2 (t )|2 − s1 (t )s2∗ (t ) − s1∗ (t )s2(t ) dt 0 = E1 + E2 − 2 Re[s1 · s2 ].

(2.1.73)

This expression states that the squared euclidean distance between two signals depends on the energy of each signal as well as their inner product. Now consider an infinite-duration passband signal

¿s (t ) = A(t )cos(2π f c t + φ(t )) À Ã = Re s(t )ei2π f t . c

Using Parseval’s relationship and the modulation property of the Fourier transform, the energy in this passband signal is

²∞

²∞

¿s (t )dt = ¿S2 ( f )d f −∞ ²−∞ º2 ∞ ºº = º 21 S( f − f c ) + 12 S ( f + f c )ºº d f −∞ ² 1 ∞ =2 |S( f )|2 d f = 12 Es ,

E¿s =

2

−∞

(2.1.74)

provided that S ( f − f c ) S( f + fc ) = 0. Expression (2.1.74) states that, under this condition, the energy in a complex-baseband signal s(t ) is twice the energy in the passband signal ¿ s (t ). Similarly, under this condition, the euclidean distance between two complex-baseband signals is twice the distance between the equivalent passband signals, so that di2j (complex-baseband signal) = 2di2j (passband signal).

(2.1.75)

Using the same line of reasoning, the cosine and sine components are orthogonal with

²∞

−∞

sI (t )cos (2π f c t )s Q (t )sin(2π f c t )dt

= 0.

(2.1.76)

For a narrowband signal, both A (t ) and φ(t ) vary slowly compared with the carrier frequency f c . Therefore, over a finite time interval T much greater than 1/ f c , the energy in the signal is well approximated by (2.1.74), with the cosine and sine components being nearly orthogonal.

Fourier Series The set of all complex exponentials ei2π f t whose frequency f is an integer multiple of 1/ T is a basis for a signal space on the interval [0, T ]. A function in this signal

2.1 Linear Systems

45

space can be expanded in a Fourier series using the Fourier basis {ein 2π f t }. The Fourier coefficients sn are the expansion coefficients. Then s (t ) = where sn

=

µT 0

∞ ¼ n=−∞

sn ein2π t / T ,

(2.1.77)

s (t )e−in 2π t / T .

Nyquist–Shannon Series A deterministic baseband waveform s (t ) whose spectrum S ( f ) is zero for | f | larger than is called a bandlimited waveform. Because (− , ) defines an interval on the frequency axis, the set of functions on this interval is a signal space and is spanned by a countable set of basis functions. One such set of basis functions is the Nyquist basis {sinc ((t − T )/ T )} as is given by the Nyquist–Shannon sampling theorem. The sampling theorem can be described by setting = 1/2 and multiplying s (t ) by comb(t ) (cf. (2.1.45)). This produces a sampled waveform with the samples s( j ) spaced by the sampling interval Ts = 1. Two sampled waveforms are shown in Figure 2.4 for two different sampling rates. Using comb (t ) ←→ comb( f ), together with its dual (cf. Table 2.1) and the convolution property of the Fourier transform (cf. (2.1.15)) gives

W

WW

W

s (t )comb(t ) ←→ S( f ) ± comb( f ),

(2.1.78a)

(s (t )comb(t )) ± sinc(t ) ←→ ( S( f ) ± comb( f )) rect( f ). Combs in time

(a) ...

(b) ...

Spectra of sampled signals

...

t

Baseband signals

Sampled signals

(2.1.78b)

... t

t

t

t

t

Rect function

f

f

Figure 2.4 (a) Sampling at greater than the Nyquist rate. (b) Sampling at less than the Nyquist rate, showing the effect of aliasing.

46

2 Background

The left side of (2.1.78a) is an infinite sequence of impulses with the area of the kth impulse equal to the kth sample value. The right side is the spectrum of the sampled waveform S( f ) ± comb( f ) and is shown at the bottom of Figure 2.4 for two different sampling rates. For the left set of curves in Figure 2.4(a), multiplying the right by rect( f ), which is shown as a dashed line, recovers S( f ) because the support of S( f ) is [−1/2, 1/2], and thus the images of the original spectrum S( f ) do not overlap in S( f ) ± comb( f ). Multiplication by rect ( f ) in frequency corresponds to a convolution in time with sinc (t ). Because [S ( f ) ± comb( f )]rect( f ) = S ( f ), the convolution of sinc (t ) with the left side of (2.1.78b) recovers s (t ) so that s (t ) = sinc(t ) ± [s(t )comb(t )]

=

∞ ¼

j =−∞

s ( j )sinc(t

− j ).

(2.1.79)

This expression states that a waveform s (t ) bandlimited to [−1/2, 1/ 2] can be expanded using a set of orthogonal basis functions {sinc (t − j )} with the coefficients simply being samples s( j ) of the bandlimited waveform s (t ) for a sampling interval Ts = 1. The expression for an arbitrary sampling interval Ts can be determined by applying the scaling property of the Fourier transform. It is s( t ) =

W

∞ ¼ j =−∞

s ( j Ts )sinc(2

W t − j),

(2.1.80)

where Ts = 1/2 . In this way, the sequence of sinc functions is seen as the sequence of interpolating functions for a bandlimited signal. The images of the original signal spectrum S ( f ) shown in Figure 2.4 are offset by the sampling rate Rs = 1/ Ts . When Rs ≥ 2 , the images do not overlap. The minimum sampling rate R s = 2 that generates nonoverlapping images of the signal spectrum is called the Nyquist rate. When the sampling rate is greater than or equal to the Nyquist rate, the images do not overlap and the original signal s (t ) can be reconstructed as given by (2.1.80). The curves in Figure 2.4(a) show a spectrum for a signal that is sampled at a rate larger than the Nyquist rate. When the sampling rate is smaller than the Nyquist rate, the images of the original signal spectrum S( f ) overlap. This effect is called aliasing, and is shown in the set of curves in Figure 2.4(b). In this case, any filter that reconstructs all of the frequency components of the original signal must also pass some frequency components from one or more images. Aliasing is a form of signal distortion that replicates frequency components in the original signal at other frequencies in the reconstructed signal.

W

Matrices

W

A matrix A is a doubly indexed set {ai j , i = 1, . . . , n , j = 1, . . . , m } of real or complex numbers conventionally interpreted as a two-dimensional array A = [ai j ]. A matrix is square if n is equal to m. The conjugate of A is the matrix A∗ = [ai∗j ]. The transpose of A is the matrix AT = [a ji ]. The conjugate transpose of A is the matrix

2.1 Linear Systems

47

A† = [a∗ji ]. A symmetric matrix satisfies A = AT . A hermitian matrix satisfies A = A† . Symmetric and hermitian matrices must be square.

Matrix Multiplication The product of two matrices A and B, written C = AB, is defined by ci j

=

² ¼

k =1

aik bk j ,

i

= 1, . . . , n,

j

= 1, . . . , m ,

(2.1.81)

provided A is an n ײ matrix and B is an ²× m matrix. For square matrices, both AB and BA are defined, but, in general, AB is not equal to BA. Two matrices for which AB = BA are said to commute.

Eigenvectors and Eigenvalues The product of a matrix A with a vector s, written as r = As, is defined by ri

=

² ¼ k =1

aik sk .

(2.1.82)

A vector s satisfying

As = λ s (2.1.83) for some scalar λ is called an eigenvector of A, and λ is called an eigenvalue of A.

A matrix with only nonnegative eigenvalues is called a nonnegative-definite matrix. A matrix with only positive eigenvalues is called a positive-definite matrix.

Trace of a Matrix ∑ The trace of a square matrix A is defined as the sum n A nn of the diagonal elements of A.12 The trace is an inherent property of the transformation represented by the matrix and is independent of the choice of basis. The trace equals the sum of the eigenvalues of the matrix. The trace operation has the following properties: trace (cA) = c trace A,

trace (A + B) = trace A + trace B, trace (AB) = trace (BA) .

(2.1.84a) (2.1.84b) (2.1.84c)

The trace of a square matrix that can be expressed as an outer product of two vectors xy† is equal to the inner product of the same two vectors y†x. This is given by trace(xy†) = y† x.

(2.1.85)

The proof of this statement is asked for as an end-of-chapter exercise. 12 The term “trace” will also be used more generally to denote the integral of the “diagonal” of a bivariate

function A( x , y ) given by trace A =

µ∞

−∞ A (x , x )dx.

48

2 Background

Determinant of a Matrix The determinant of a square matrix A, denoted det A, is a real or complex number defined by the Laplace recursion formula. Let Ai j be the (n − 1) × (n − 1) matrix obtained from the n × n matrix A by striking out the ith row and the jth column. Then, for any fixed i , det A =

n ¼

j =1

(−1)i + j ai j det Ai j ,

(2.1.86)

where ai j is the element of A indexed by i and j . The determinant is an inherent property of the transformation represented by a matrix and is independent of the choice of the basis. Hadamard’s inequality states that the determinant of a square positive-definite matrix is not larger than the product of the eigenvalues with equality for a diagonal matrix. A matrix whose determinant is nonzero is a full-rank matrix. The rank of a matrix is defined as the maximum number of linearly independent row vectors in the matrix. Equivalently, the rank of a matrix is the maximum number of linearly independent column vectors in the matrix. The rank of a matrix is equal to the size of the largest square submatrix with a nonzero determinant. The determinant of an n × n matrix has the following properties: det AB = det A det B,

det(c A) = c det A, n

det A =

1 ( ), det A−1

(2.1.87a) (2.1.87b) (2.1.87c)

provided that det A is nonzero, where A−1 is called the inverse matrix. The inverse matrix satisfies A−1A = I, where I is the identity matrix, which is a matrix with all diagonal elements equal to one and all other elements zero. The trace and the determinant are the two important invariant scalar metrics describing the characteristics of a square matrix.

Unitary Matrix A matrix A that satisfies AA† = A†A = I is a unitary matrix representing a unitary transformation. The inverse of a unitary matrix satisfies A−1 = A†. The determinant of the unitary matrix A satisfies |det A| = 1. Multiplication of a vector by a unitary matrix preserves length and can be regarded as a generalized rotation in the space spanned by the eigenvectors of A. Given a column vector v in signal space, that coordinate system can be “rotated” using the unitary matrix A by the expression v² = Av and is called a change of basis. This can be regarded as a rotation of v. The set of all such rotations of v, denoted |v³, given by

|v³ = {Av : A is a unitary matrix}

(2.1.88)

is a basis-free representation of v, called a ket. The conjugate transpose of |v³, called a bra and denoted ´v|, consists of the corresponding abstract row vector, which may be regarded as the set of all possible representations of the row vector. Then the inner product ´v1|v2 ³ is invariant with respect to A.

2.1 Linear Systems

49

Gram Matrix Given a set of n vectors {v1, v2 , . . . , vn } in a vector space, the Gram matrix of this set is the n × n matrix with the inner product vi · v j as the i j th element. A Gram matrix is a nonnegative-definite hermitian matrix. Hermitian Matrix A square matrix that is invariant under the conjugate transpose satisfies A† = A, and is called a hermitian matrix. The transformation it represents is called a self-adjoint transformation. Every hermitian matrix can be diagonalized by a change of basis and has only real eigenvalues. However, not every matrix with real eigenvalues is a hermitian matrix. The eigenvectors of a hermitian matrix are orthogonal. Upon normalizing the eigene j = 1 for every j , and the set of normalized vectors · e j , the inner products satisfy · e†j· eigenvectors, called an eigenbasis, forms an orthonormal basis. Every hermitian matrix A has the following decomposition:

A = UDU† ,

(2.1.89)

where D is a diagonal matrix with elements that are the eigenvalues of A and U is a unitary matrix whose columns are the eigenvectors of A. Multiplying on the left of each side of (2.1.89) by U† and multiplying on the right by U gives the diagonal matrix D,

D = U† AU, in terms of A, where U†U = U−1 U = I for a unitary matrix.

(2.1.90)

Singular-Value Decomposition A transformation described by a matrix need not have real eigenvalues and the eigenvectors need not be orthogonal as would be the case for a hermitian matrix. A useful decomposition of the matrix A, called the singular-value decomposition, is

A = U M V †.

(2.1.91)

The matrices U and V are each unitary. The columns of U are the eigenvectors of AA† , whereas the columns of V are the eigenvectors of A† A. The only nonzero elements of the matrix M are on the diagonal, whose elements are denoted m k . These elements √ are called the singular values of A. They are the nonnegative square roots ξk of the eigenvalues ξk of the real symmetric matrix AA† so that ξ k = | m k | 2. For a hermitian matrix, because A† is equal to A, U is equal to V, and the matrix is diagonalized with the orthogonal eigenvectors of A.

Matrix Exponentials A square matrix A can be exponentiated as defined by the power series eA

=

∞ Ak ¼ = I + A + 21 A2 + · · · . k! k =0

Two matrices A and B satisfy eA eB

= eA+B if and only if AB equals BA.

(2.1.92)

50

2 Background

Discrete Linear Transformations

A discrete linear transformation, when represented by the transformation matrix A, can be expressed as a vector–matrix product r = As .

When the matrix A is a square matrix, the length of the output vector r is equal to the length of the input vector s. A discrete linear transformation corresponding to a matrix A is characterized by a finite set of eigenvectors {e j } and a corresponding finite set of eigenvalues {λ j } (possibly complex if A is nonhermitian), such that

Ae j = λ j e j .

For a hermitian matrix, the eigenvectors are always orthogonal when the eigenvalues are distinct. The eigenvectors can be selected to be orthogonal even when the eigenvalues are not distinct because then the dimension of the signal subspace corresponding to that eigenvalue is equal to the multiplicity of that eigenvalue. The eigenvectors then form a basis. The eigenvalues are the zeros of det(A − λI) = 0, which is a polynomial in λ of degree n.

Projections The set of orthonormal eigenvectors {· e j } of a hermitian matrix representation of a selfadjoint transformation can be used as a set of orthonormal basis vectors. The inner product of an orthonormal basis vector with itself is equal to one. The outer product of an orthonormal basis vector with itself is a matrix

P j = ·e j·e†j

(2.1.93)

with only a single nonzero element equal to one, which is on the diagonal of the matrix. This matrix is referred to as a projection matrix P j , where the subscript denotes that the projection is onto the jth orthonormal basis vector · e j . Because there is only a single nonzero element equal to one on the diagonal, the matrix product P2j is equal to P j . For a given basis, the sum of all projection matrices P j equals the identity matrix I in the signal space spanned by that basis. Thus

I=

¼ j

Pj =

¼ j

·e j·e†j .

(2.1.94)

The transformation A expressed by a hermitian matrix can be re-expressed in terms of its eigenvalues λ j and its projection matrices P j . This is a diagonal matrix D given by

D = diag (λ1, . . . , λ J ) =

¼ j

λjPj =

¼ j

λ j·e j·e†j .

(2.1.95)

Using this decomposition, a matrix function f that acts on a transformation A described by a hermitian matrix can be expressed by the action of the function on the eigenvalues λ j of that matrix as given by f (A) = f (D) =

¼ j

f (λ j )P j .

(2.1.96)

2.1 Linear Systems

In particular, eA

51

= ∑∞j =1 eλ P j , where λ j are the eigenvalues of A. j

A transformation A can be re-expressed using yet another basis {· xm }. Each eigen∑ vector · e j in the new basis is given as · ej = xn . The matrix X in the new n a n· basis is

¼

λ j·e j·e†j j ¼ ¼ = λ j an a∗m·xn·x†m m ,n ¼ j = Xmn·xn·x†m , (2.1.97) m ,n ∑ λ a a ∗ . This expression states that any where the matrix X has elements X mn = j j n m X=

hermitian matrix can be expressed as a linear combination of the outer products of the basis vectors that are used to represent A. If the basis consists of the set of eigenvectors of A, then (2.1.97) reduces to (2.1.95).

Commuting Transformations Two square matrices A and B of the same size commute if AB is equal to BA. The order in which the commuting transformations are applied, AB or BA, does not affect the outcome. Two square matrices with a common set of eigenvectors commute. A qualified converse statement is also true. If AB = BA and either of the matrices, say A, has distinct eigenvalues, then the eigenbasis for A is also an eigenbasis for B. This means that two hermitian matrices with the same eigenvectors can be diagonalized by the same unitary transformation and are diagonal in the same basis. When the eigenvalues are not distinct, an eigenbasis for A need not be an eigenbasis for B. A matrix A that commutes with A† is called a normal matrix. Then AA† = A† A. Every unitary matrix is a normal matrix. Every hermitian matrix is a normal matrix. A normal matrix can be diagonalized by a unitary matrix whose columns are the eigenvectors of the normal matrix. Two transformations that are represented by square matrices A and B of the same size comprise a relationship known as a commutator defined as

.

[A, B] = AB − BA.

(2.1.98)

Two such matrices that do not commute do not have a common set of eigenvectors, the commutator [ A, B] is nonzero, and the order in which the transformations are applied does affect the outcome. Two square matrices A and B that do not commute can always be embedded in two larger square matrices that do commute by appending additional rows and columns. The proof of this statement is asked for as an exercise at the end of the chapter.

Kronecker product The Kronecker product or outer product, denoted A ⊗ B, is of matrix A and matrix B as given by

52

2 Background

⎡a B 11 ⎢ A ⊗ B = ⎣ ...

··· .. .. . . an1B an 2B · · · a12B

⎤ .. ⎥⎦ , .

a1m B

anm B

(2.1.99)

The Kronecker product does not commute. That is, A ⊗ B is not equal to B ⊗ A in general. For a hermitian matrix A with distinct eigenvalues λi for i = 1, . . . , N and a second hermitian matrix B with distinct eigenvalues µ j for j = 1, . . . , N , the eigenvalues ζ k of A ⊗ B are given by13

ζk = λ i µ j

for i

= 1, . . . , N and j = 1, . . . , N .

(2.1.100)

This property states that the eigenvalues of a Kronecker product are the pairwise products of the eigenvalues of the component matrices. This property extends to a Kronecker product with more than two component matrices.

2.2

Random Signals Probability and statistics are important topics for the subject of communications because the information that is conveyed is always random and a communications channel is always noisy. To introduce a quantitative discussion of a random variable, consider the random thermally generated voltage across a resistor as a function of time. Because the voltage is random, the specific measured waveform is only one of an infinite number of waveforms that could have been measured. Each possible waveform is called a sample function, or a realization. The collection of all possible realizations defines a random process.14 Four sample functions of the voltage waveform are shown in Figure 2.5. A random process is described in terms of its amplitude structure and its time structure. In Section 2.2.1, the amplitude structure of a random process at one time instant is described using a probability distribution function. In Section 2.2.2, the temporal (or spatial) structure of a random process is described using correlation functions and power density spectra. Slice at time t1

Sample functions (a)

fv(v)

v (b)

Figure 2.5 (a) Four realizations of a random voltage waveform. The slice across all possible

realizations at a time t1 defines the random variable v(t1 ) with a corresponding probability density function f v(v) shown in (b). 13 See Chapter 4 of Horn and Johnson (1994) for details. 14 This is also called a stochastic process.

2.2 Random Signals

2.2.1

53

Probability Distribution Functions

Consider a slice at a fixed time t1 through all possible realizations of the random voltage shown in Figure 2.5(a). The voltage at time t1 is a random variable, and as such is denoted by v . A specific instance, or realization, of the random variable v is denoted by v. If the sample function v(t ) represents a complex random function of time, such as a complex-baseband signal, then the random variable v(t 1) defined at time t1 is a complex random variable v consisting of a real part and an imaginary part. Associated with any discrete random variable is a probability distribution function called a probability mass function denoted by px ( x ) or p(x ).15 Associated with any real continuous random variable x is a cumulative probability distribution function (cdf), Fx (x ), and a probability density function, f x (x ). The cumulative probability distribution function is defined as

.

Fx (x ) = Pr {x

≤ x },

where Pr{x < x } is the probability that the random variable x is less than x. Every cumulative probability density function is nonnegative, monotonically increasing, and goes to one as x goes to infinity. The probability density function f x (x ) and the cumulative probability density function Fx (x ) are related by

.

f x (x ) =

d Fx (x ), dx

(2.2.1)

provided the derivative exists. The probability density function f x (x ) is a nonnegative real function that integrates to one. The definite integral Pr{ x1 < x

< x2 } =

²x

2

x1

f x (x )dx

is the probability that the random variable x lies in the interval between x1 and x2 . The statistical expectation, or the expected value, of the random variable x is defined as

´x ³ =.

²∞

−∞

x f x (x )dx .

(2.2.2)

The expectation of x is also called the mean or the first moment of the random variable x. In a similar way, the expected value of a function g(x ) of the random variable x is

´g (x )³ =

²∞

−∞

g(x ) f x (x )dx ,

(2.2.3)

provided the integral exists. The nth moment of x is defined as

´x ³ =. n

²∞

−∞

x n f x (x )dx ,

15 The underlined subscript on p denoting the random variable is sometimes omitted for brevity.

(2.2.4)

54

2 Background

provided the integral exists. In order for the nth moment to exist, the probability density function must decrease faster than x −n as x goes to infinity. The variance σx2 of the random variable x is defined as (cf. (2.1.30))

σx2 =. ´(x − ´x ³)2³ = ´x 2 ³ − 2´x ³´x ³ + ´x ³2 = ´ x 2 ³ − ´x ³ 2 .

(2.2.5)

The variance measures the spread of the random variable x about the mean ´x ³. The square root σ x of the variance of the random variable x is called the standard deviation of x. This is the root-mean-squared value (rms) of x if x has zero mean. As an example, consider the gaussian probability density function (cf. (2.2.18)), which will be discussed in detail later in this section. An important property for a gaussian random variable is that all moments of the probability density function of an order larger than two can be expressed in terms of the first-order and second-order moments.16 Therefore, the gaussian distribution is completely characterized by its mean and variance. As another example, consider the probability density function f x (x ) =

» λx −(λ+1) 0

x x

≥1 < 1,

(2.2.6)

which is called the Pareto probability density function with a Pareto index λ that is a positive number. The reader is asked as a problem at the end of the chapter to show that the mean is λ/(λ − 1) for λ ≥ 1 and otherwise is infinite, and to show that the variance is λ/[(λ − 1)2(λ − 2)] for λ ≥ 2 and otherwise is infinite.

Joint Probability Distributions A probability density function can be defined for more than one random variable. The probability density function for two random variables f x , y (x , y) is called a bivariate probability density function or a joint probability density function. The probability that the joint event {x 1 < x < x 2} ∪ { y1 < y < y2 } occurs is equal to the volume under f x ,y (x , y) over the rectangle supported by the two corners (x 1, y1 ) and (x2 , y2 ). Several other probability density functions can be defined in terms of the joint probability density function f x , y ( x , y ). Every joint probability density function f x ,y (x , y ) is associated with marginal probability density functions and conditional probability density functions. The marginal probability density functions f x (x ) and f y ( y ) are determined by integrating f x , y ( x , y ) over the range of the other variable, a process called marginalization. Thus, f y( y ) = f x (x ) =

²∞

²−∞ ∞ −∞

f x , y ( x , y )dx

(2.2.7a)

f x , y ( x , y )dy .

(2.2.7b)

16 This statement is called the Isserlis theorem. See Reed (1962).

2.2 Random Signals

55

Substituting the marginal (2.2.7a) into (2.2.2) gives

´ y³ =

² ∞² ∞

−∞ −∞

y f x ,y (x , y)dx dy

as the mean for y. The probability density function of y, given that the event {x = x } has occurred, is called the conditional probability density function. It is denoted by f y | x (y |x ) and is given by f y | x (y | x ) =

fx, y( x , y ) f x (x )

,

where the notation y |x indicates that the probability density function of y depends on, or is conditioned by, the event { x = x }. Likewise, the conditional probability density function, denoted by f x | y ( x | y ), is the probability density function of x given that the event { y = y} has occurred. The joint probability density function of the events {x = x } and {y = y } is equal to the probability density function of the event {x = x } multiplied by the conditional probability density function of the event { y = y| x = x }, so that f x ,y (x , y) = f x (x ) f y | x (y | x ).

(2.2.8)

Similarly, f x , y (x , y) = f y ( y) f x | y (x | y ). Equating these two expressions gives f x | y ( x | y) =

f x (x ) f y |x ( y |x ) f y ( y)

,

(2.2.9)

which is a form of Bayes’ rule.17 When x represents the value of the transmitted signal and y represents the value of the received signal, the term on the left is called the posterior probability density function. It is the probability density that x was transmitted given that y is received. The marginal probability density function f y ( y) can be expressed in terms of the conditional density function f y |x ( y| x ) and the other marginal probability density function f x (x ) by integrating both sides of (2.2.8) with respect to x and using (2.2.7) to give f y (y ) =

²∞

−∞

f x ( x ) f y | x ( y |x )dx .

(2.2.10)

Correlation and Independence The random variables x and y are independent and f x , y (x , y ) is called a product distribution if the joint probability density function f x ,y (x , y) can be written as f x ,y (x , y ) = f x (x ) f y (y ). If two random variables are independent, then knowing the realization of one random variable does not affect the probability density function of the other random variable. This means that f y| x (y | x ) = f (y ) for independent random variables x and y. 17 Bayes’ rule is an immediate consequence of the definition of marginal and conditional distributions.

56

2 Background

If x and y are independent, then the probability density function f z (z) of the sum = x + y of the two random variables is ²∞ f z ( z) = f x (z − y ) f y (y )dy = f x (z ) ± f y (z), (2.2.11) −∞ where ± is the convolution operator defined in (2.1.3). The derivation of this expression

z

is discussed in an exercise at the end of the chapter. The correlation of two real random variables x and y is the expected value of their product. Then

´x y ³ =

² ∞² ∞

−∞ −∞

The covariance cov is defined as

x y fx ,y (x , y )dx dy.

.

cov = ´x y ³ − ´x ³´ y ³.

(2.2.12)

(2.2.13)

If at least one of the two random variables has zero mean, then cov = ´x y ³. Two random variables x and y are uncorrelated if ´x y ³ = ´x ³´ y ³ in (2.2.12) or if cov = 0 in (2.2.13). If random variables x and y are uncorrelated and at least one has a zero mean, then ´x y ³ = cov = 0.

Characteristic Functions The expected value of the function eiω x of the random variable x that has a probability density function f (x ) is called the characteristic function C x (ω), C x (ω) = ´eiωx ³

= =

²∞

f x (x )ei ωx dx

−∞ ∞ ¼ (iω)n n= 0

n!

´x n ³,

(2.2.14)

provided the moments exist, where the power-series expansion for e x has been used in the last expression. For a discrete random variable with a probability mass function p(k ) on the integers, the characteristic function is C k (ω) =

∞ ¼

k =−∞

eiωk p(k).

(2.2.15)

Using the convolution property of the Fourier transform given in (2.1.14), the characteristic function C z (ω) of the probability density function f z (z ) for the sum of two independent random variables x and y (cf. (2.2.11)) is equal to the product C x (ω)C y (ω) of the characteristic functions of the two probability density functions for the two random variables. The inverse Fourier transform of this product yields the desired probability density function

²

∞ 1 C x (ω)C y (ω)e−iωz dω. (2.2.16) 2π −∞ This expression is readily extended to multiple independent random variables. f z ( z) =

2.2 Random Signals

57

The nth moment of a probability density function, if it exists, can be determined by differentiation of the characteristic function

º

º 1 dn C x (ω)ºº . n n i dω ω=0

´x ³ = n

(2.2.17)

The derivation of this expression is assigned as a problem at the end of the chapter.

Probability Density Functions Used in Communication Theory

Probability density functions and probability mass functions are regularly encountered in the analysis of communication systems. The most relevant probability density functions for the analysis of lightwave communication systems are reviewed in this section.

Gaussian Probability Density Function A gaussian random variable has a gaussian probability density function defined by

. √ 1 e−(x −m) /2σ . 2πσ

f x (x ) =

2

2

(2.2.18)

It is easy to verify that the mean of x is m and the variance is σ 2. A unique property of gaussian random variables is that any weighted sum of multiple gaussian random variables, whether independent or dependent, is also a gaussian random variable. The probability that a unit-variance gaussian random variable exceeds a value z, Pr {x > z }, is expressed in terms of the complementary error function, which is denoted by erfc and defined as 18 1





²∞

2 1 e−x /2 dx = erfc

2

z

³z´ √ , 2

(2.2.19)

where erfc(z) = 1 − erf(z), with erf(z ) being the error function, defined as

. √2 π

erf(z ) =

²z 0

e−s ds. 2

For large arguments, the complementary error function can be approximated by erfc(x ) ≈

1 −x √ e . x π 2

(2.2.20)

A multivariate gaussian probability density function is a joint probability density function for a block x of real random variables with components x i given by fx (x) =



1

(2π)

N

det C

1 T −1 e− 2 (x−´ x³) C (x−´x³) ,

(2.2.21)

where C is the real covariance matrix defined for any multivariate probability density function as

C=

Æ(

x − ´x³

) (x − ´x³) Ç. T

√ ( ) =. 21 erfc(z/ 2). However, the

18 This probability is also expressed using the equivalent function Q x

Q

(2.2.22)

symbol is used in another context in this book as is conventional in treatments of lightwave communication systems, and so Q( x ) is not used herein.

58

2 Background

The square symmetric matrix C has a determinant det C. The diagonal matrix element C ii is the variance of the random variable x i . The off-diagonal matrix element C i j is the covariance (cf. (2.2.13)) of the two random variables x i and x j . These two elements are uncorrelated if Ci j equals zero. In general, this need not be a strong statement, but, for jointly gaussian random variables, it means that they are independent. It is possible to have a joint probability density function such that each marginal density function is a gaussian probability density function, yet the joint probability density function is not jointly gaussian, and so is not given by (2.2.21). This means that knowing that each marginal probability density function is gaussian is not sufficient to infer that the joint probability density function is jointly gaussian. This is discussed in an end-of-chapter exercise. A zero-mean bivariate gaussian random variable consists of two random, zero-mean gaussian components x and y, which may be correlated. The covariance matrix given in (2.2.22) is

¸

¹ 2 σ ρ x y σx σ y x C= ρ σ σ , σ y2 xy x y

where

(2.2.23)

ρx y =. ´x y ³/σx σ y

(2.2.24)

is defined as the correlation coefficient. An example of a two-dimensional joint gaussian probability density function is shown in plan view in Figure 2.6. If σ x = σ y = σ , then (2.2.21) reduces to f x , y (x , y ) =

È

1

±

2πσ 2 1 − ρ 2x y

x 2 − 2 ρ x y x y + y2 exp − 2σ 2 (1 − ρx2y )

É

.

(2.2.25)

Moreover, if ρ x y = 0, then f x ,y (x , y ) is a product distribution in the chosen coordinate system. For this case, the bivariate gaussian density function is called a circularly symmetric density function, with the bivariate gaussian random variable called a circularly symmetric gaussian random variable. The joint gaussian probability density y

x′

y′

x

(a)

(b)

(, )

Figure 2.6 Contours of the joint gaussian probability density function f x , y x y as a function of

the correlation coefficient ρ x y : (a) ρx y

= 0, (b) ρx y = 0.5.

2.2 Random Signals

59

function f x , y (x , y), now also including a nonzero mean for each component, can then be written as fx, y( x , y ) =

¸



1

2πσ

2 2 e−( x −´x ³) / 2σ

¹¸



1

2πσ

2 2 e−( y−´ y ³) /2σ

¹

,

(2.2.26)

where the probability density function of each component is written in the form of (2.2.18). Therefore, uncorrelated jointly gaussian random variables are independent. For a set of N independent real gaussian random variables with C = σ 2 IN , the joint probability density function is f x (x) =

¶ 12 (2πσ )

N

2 2 e−(x−´x³) /2σ ,

(2.2.27)

which factors as a product of the N single-variable gaussian densities. A complex version of (2.2.27) is given in (2.2.29). A real covariance matrix is a symmetric matrix and can be diagonalized by a change of basis. Therefore, any multivariate gaussian probability density function has a basis for which the probability density function expressed in this basis is a product distribution. The resulting marginal gaussian random variables in this basis are independent, but need not have the same mean and variance. For example, consider the two-dimensional gaussian probability density function given in (2.2.25) with diagonal elements σ x2 = σ y2 = σ 2 and off-diagonal elements σ 2 ρx y . Define a new basis (x ² , y² ) that is a rotation of the original basis ( x , y ). The components in the new basis for this example can be expressed by a unitary transformation R of components in the original basis as given by

¸

x² y²

¹

=R

¸x¹ y

.

The matrix R is generated from the normalized eigenvectors of the covariance matrix C given in (2.2.23) and satisfies the matrix equation (cf. (2.1.90))

RTCR = D, where D is a diagonal matrix with diagonal elements given by the eigenvalues of C. Using these eigenvalues, the variances of the uncorrelated gaussian random variables in this new basis are σ x2² = σ 2 (1 + ρx y ) and σ y2² = σ 2 (1 −ρ x y ), which can be equal only if ρx y = 0. Using the normalized eigenvectors of C, the components of the new basis are x ² = √1 (x + y) and y ² = √1 (x − y). The joint gaussian probability density function in 2

2

the new basis is a product distribution given by

⎤ ⎡ ² 1 f x ² , y² (x ² , y² ) = ⎣ ± e−x / 2σ ( 1+ρ ) ⎦ 2πσ 2 (1 + ρx y ) ⎡ ⎤ 1 ² × ⎣± e− y /2σ (1−ρ ) ⎦ , 2πσ 2(1 − ρ x y ) 2

2

2

xy

2

xy

60

2 Background

which is written to show that each marginal probability density function in the new basis is an independent gaussian probability density function of the form of (2.2.18) with each distribution having a different variance.

Complex Circularly Symmetric Gaussian Random Variables and Vectors A complex gaussian random variable z = x + iy has components x and y described by a real bivariate gaussian random variable (x , y ). A complex gaussian random vector, denoted as z = x + iy, has components z k described by a real bivariate gaussian random variable (x k , yk ). A complex gaussian random variable z = x + iy with independent, zero-mean components x and y of equal variance is a circularly symmetric gaussian random variable. The corresponding probability density function is called a complex circularly symmetric gaussian probability density function. A complex circularly symmetric gaussian random variable has the property that eiθ z has the same probability density function for all θ . Generalizing, a complex, jointly gaussian random vector z = x + iy is circularly symmetric when the vector ei θ z has the same multivariate probability density function for all θ . Such a probability density function must have a zero mean. The multivariate probability density function for a complex circularly symmetric gaussian random vector z is f z ( z) =

π

N

† −1 1 e−z V z . det V

(2.2.28a)

Using the properties of determinants, the leading term (π N det V)−1 in (2.2.28a) can be written as det(π V)−1 . The term V in (2.2.28a) is the complex covariance matrix given by

V =. ´zz† ³,

(2.2.28b)

with † denoting the complex conjugate transpose. Because z is complex, this matrix is hermitian. In contrast to the covariance matrix, the pseudocovariance matrix is defined as ´zzT ³, where zT is the transpose of z, unconjugated. The pseudocovariance matrix of a circularly symmetric matrix is always zero.19 As an example, the covariance matrix of a set of N uncorrelated jointly complex circularly symmetric gaussian random variables has the form V = ´ zz† ³ = 2σ 2I N , where I N is the N × N identity matrix. Using det(2σ 2I N ) = (2σ 2)N , the multivariate probability density function given in (2.2.28a) reduces to f z (z) =

1

(2πσ ) 2

N

2 2 e−|z| /2σ ,

(2.2.29)

which is a product distribution composed of N terms. When the mean value ´z³ is not equal to zero, the random vector z is not circularly symmetric. For this case, the multivariate complex gaussian probability density function is given by 19 See Gallager (2013).

2.2 Random Signals

f z (z) = with

π

N

† −1 1 e−(z−´z³) V (z−´z³ ), det V

61

(2.2.30a)

V =. ´(z − ´z³)(z − ´z³)† ³

(2.2.30b)

being the complex covariance matrix.

Covariance Matrices for Real and Complex Random Gaussian Vectors To relate the covariance matrix V for a complex random vector to the covariance matrix C for a real random vector, define z as a vector of N complex circularly symmetric gaussian random variables with a complex covariance matrix V given in (2.2.30b). Define x as a real column vector of length 2N ½ that consists of the real part Re[z] and¾T the imaginary part Im[z] in the order x = Re[z 1], . . . , Re[ z N ], Im[ z1 ], . . . , Im[ z N ] . The real 2N × 2N covariance matrix C given in (2.2.22) is expressed in block form as 1 2

C= in terms of the N

¸

Re V Im V

−Im V ¹ ,

(2.2.31)

Re V

× N complex covariance matrix V.

Rayleigh and Ricean Probability Density Functions Several useful random variables and their associated probability±density functions are generated from the square root of the sum of the squares r = x 2 + y 2 of independent gaussian random variables x and y, usually with equal variances and possibly with nonzero means. The random amplitude r has a geometrical interpretation as the random length of a random vector with components x and y that are gaussian-distributed. The probability density function of the amplitude can be determined by transforming the joint probability density function f x y (x , y ) from rectangular coordinates, as given in (2.2.26), into the corresponding joint probability density function f r φ (r, φ) in polar coordinates. The marginal probability density function fr (r ) is generated by integrating out the functional of φ . ± 2dependence 2 Let A = ´x ³ + ´ y ³ be the mean amplitude, where ´x ³2 and ´ y³2 are the squares of the means of the independent gaussian random variables x and y, both with variance ¶ σ 2 . Now make the change of variables, ´x ³ = A cos θ , ´ y³ = A sin θ , r = x 2 + y2 , x = r cos φ , y = r sin φ , and dx dy = dr r dθ . Then redefine φ with respect to the phase θ of the constant-amplitude signal so that θ − φ is replaced by φ . Using these substitutions and standard trigonometric identities, (2.2.26) becomes fr φ (r, φ) =

r

2πσ

2

2 2 2 e−(r −2 Ar cos φ+ A )/2σ .

The marginal probability density function fr (r ) is fr (r ) =

r

2πσ

2

2 2 2 e−(r + A )/2σ

² 2π 0

e Ar cos φ/σ dφ. 2

(2.2.32)

62

2 Background

The integral can be expressed in terms of the modified Bessel function of the first kind of order zero.20 Using the change of variable x = Ar /σ 2 , the probability density function of the amplitude r is fr (r ) =

r −(r 2+ A 2)/2σ 2 e I0 2

σ

³A r ´ σ2 ,

r

≥ 0.

(2.2.33)

This probability density function is known as the ricean probability density function and characterizes a ricean random variable. As the ratio of A /σ becomes large, the ricean probability density function begins to resemble a gaussian probability density function with the same mean and variance. For A = 0, the probability density function reduces to fr (r ) =

r −r 2 /2σ 2 ,

σ2 e

r

≥ 0,

(2.2.34)

which is known as a Rayleigh probability density function that characterizes a Rayleigh √ random variable. This probability density function has mean σ π/2 and variance σ 2 (2 − π/2). Plots of the ricean and the Rayleigh probability density functions are shown in Figure 2.7(a). The marginal probability density function of the phase φ √ can be obtained by inte² 2 2 grating √ (2.2.32) over r with the variables changed as r = r / 2σ , F = A /2σ , and B = F cos φ. Completing the square in the exponent, and factoring, produces (a)

fr (r)

f r(r)

f r(r)

r

(b)

r

fφ( φ)

−π

−π 2

0

r

f φ(φ)

π 2

π

−π

−π 2

f φ(φ)

π 2

0

π −π

−π 2

0

()

π 2

π

Figure 2.7 (a) The marginal probability density function of the amplitude fr r given in (2.2.33).

The expected value A increases from left to right. (b) The marginal probability density functions for the phase f φ (φ) given in (2.2.35). The leftmost set of plots is for A = 0.

ν

20 The modified Bessel function of the first kind of order is defined as π e x cos θ cos d . The order can be an integer or half of an integer. I x 1 2

µ

. ν ( ) = ( / π) −π

(νθ) θ

ν

2.2 Random Signals

f φ (φ) =

1 ( B 2 −F ) e

π

²∞ 0

r ² e−(r − B) dr ² . ²

2

To evaluate the integral, use the second change of variables R

²∞

2 ² r ² e−(r −B ) dr ² =

63

²∞

= r ² − B to yield

( R + B)e− R dR 0 −B  √ 1 Á −B = 2 e + B π (1 + erf( B)) . Substituting this expression back into the expression for f φ (φ) gives Á Á√  1 Á −F √ fφ (φ) = π F cos φ e−F sin φ 1 + erf F cos φ . e + 2π 2

2

2

(2.2.35)

The function fφ (φ) is a zero-mean, even, periodic function with a period T = 2π that integrates to one over one period for any value of F . If F = 0, then f φ (φ) = 1/ 2π and the marginal probability density function of the phase is a uniform probability density function over [−π, π) . If F ± = 0, then f φ (φ) becomes “peaked” about φ = 0 with the width of the probability density function inversely related to F. These effects are shown in Figure 2.7(b). √ For F µ 1 and φ ≈ 0, the approximation erf ( F cos φ) ≈ 1 holds. Using this expression, setting cos φ ≈ 1, sin2φ ≈ φ 2, and neglecting the first term in (2.2.35) as compared with the second term gives f φ (φ) ≈

Å

F −Fφ 2 e .

π

(2.2.36)

This is a zero-mean gaussian probability density function with variance 1 /2F = σ 2/ A2 . This form is evident in the lower right plot of Figure 2.7(b). While this approximation is defined only over −π ≤ φ < π , it is sometimes a mathematically expedient approximation when the variance is small to extend the range to −∞ ≤ φ < ∞ because then the value of f φ (φ) is negligible outside the interval −π ≤ φ < π .

Noncentral and Central Chi-Square Probability Density Functions A central chi-square random variable with two degrees of freedom is defined as the sum of the squared amplitudes z = x 21 + x 22 , where x 1 and x 2 are independent gaussian random variables with zero means and equal variances. A noncentral chi-square random variable with two degrees of freedom has the Ë Êform, Ë where x 1 and x 2 are Ê same independent gaussian random variables with means x 1 and x 2 . The central chi-square probability density function with one degree of freedom (N = 1) is 2 1 z−1/ 2e−z / 2σ , z ≥ 0, (2.2.37) f ( z) = √ 2 2πσ and has mean σ 2 and variance 2 σ 4. The noncentral chi-square probability density function with two degrees of freedom (N = 2)

64

2 Background

³ √´ f (z ) = (2.2.38) 2σ 2 σ 2 , z ≥ 0, is determined from the ricean probability density function Ê Ëin2 (2.2.33) using the . Ê Ë2 given transformation z = r 2 and dz = 2r dr , where A2 = x + x . The mean and the 1

2 2 A z e−( z + A )/2σ I0

1

variance are

For A

2

´ z ³ = A 2 + 2 σ 2, σz2 = 4σ 2 A2 + 4σ 4 .

= 0, f (z ) reduces to

1

e−z / 2σ

(2.2.39)

, z ≥ 0, (2.2.40) 2σ which is a central chi-square probability density function with two degrees of freedom. This probability density function is the same as an exponential probability density function. The mean 2σ 2 and the variance 4 σ 2 can be determined by setting A = 0 in (2.2.39). These density functions can be extended to N independent degrees of freedom. For ∑ this purpose, the random variable z is replaced by z = Nk=1 x 2k , where each random variable x k is an independent, identically distributed, gaussian random variable, and ∑ A2 = kN=1 ´x k ³2 is the sum of the squares of the means for each degree of freedom. If A is nonzero, then a noncentral chi-square probability density function with N degrees of freedom given by f ( z) =

f (z ) =

1

2σ 2

Á z Â( A2

N − 2)/ 4

2

2

2 2 e−(z + A )/2σ I( N /2)−1

³ A √z ´ σ2 ,

z

≥ 0,

(2.2.41)

is generated. The mean and the variance are

´z³ = A2 + N σ 2, σ z2 = 4σ 2 A2 + 2N σ 4.

When N

= 2, (2.2.42) reduces to (2.2.39). The characteristic function is ³ iω A2 ´ 1 Cz (ω) = exp . (1 − i2σ 2ω) / 1 − i2ωσ 2 N 2

(2.2.42)

(2.2.43)

For A = 0, this is a central chi-square probability density function with N degrees of freedom 1 2 z N /2−1e−z / 2σ , z ≥ 0, (2.2.44) f ( z) = 2 N / 2 (2σ ) ³( N /2) where

³(k ) =.

²∞ 0

x k −1 e−x dx

(2.2.45)



is the gamma function. When N = 1 (2.2.44) reduces to (2.2.37) where ³( 1/2) = π. This expression is not easy to obtain by setting A = 0 in (2.2.41). Instead, set A = 0 in (2.2.43) to yield the characteristic function C z (ω), C z (ω) =

1 (1 − i2σ 2 ω)N /2 .

(2.2.46)

2.2 Random Signals

65

Table 2.2 Comparison of the chi and chi-square random variables

=

±

x 21

+ · · · + x 2n

Form of the random variable

z

Zero mean Nonzero mean

Central chi Noncentral chi

z = x 21 + · · · + x 2n Central chi-square Noncentral chi-square

The inverse Fourier transform of this function then produces the aforementioned probability density function. For N = 2, C z (ω) is the Fourier transform of an exponential probability density function with mean 2σ 2 .

Noncentral and Central Chi Probability Density Functions The generalization of the ricean probability density function to N degrees of freedom leads to a noncentral chi random variable with N degrees of freedom. The noncentral chi probability density function with N = 2 is the ricean probability density function given in (2.2.33). For A = 0, this is called a central chi random variable. The central chi probability density function with one degree of freedom ( N = 1) is determined from the central chi-square probability density function using the transformation z = x 2 , where x is constrained to be positive. Equating the infinitesimal probabilities so that f z (z )dz = f x (x )dx and dz = 2x dx, the probability density function of x can be written as f (x ) =

1 1 2 2 √ e−x /2σ 2 2πσ 2

for x

> 0.

(2.2.47)

This is a gaussian probability density function that is constrained to have only positive arguments. The central chi probability density function with two degrees of freedom (N = 2) is a Rayleigh probability density function. The central chi probability density function with three degrees of freedom (N = 3) is a maxwellian probability density function defined as

Å

2 x 2 −x 2 /2σ 2 e , 3

x ≥ 0, (2.2.48) πσ √ with mean 2 σ 2/π and variance σ 2 (3π − 8)/π . This maxwellian distribution characf (x ) =

terizes a maxwellian random variable. See Table 2.2 for a comparison of chi and chi-square random variables.

Exponential and Gamma Probability Density Functions The central chi-square random variable with two degrees of freedom is known as an exponential random variable with an associated exponential probability density function. The general form of an exponential probability density function is f (z ) = µe−µz ,

(2.2.49)

66

2 Background

(a)

1.0 (µ = 1, k = 1)

0.8

(µ = 1, k = 5)

0.12 0.10 ) x( p

(µ = 2, k = 2)

) x(p

0.6 0.4

(µ = 1, k = 30)

0.08

(µ = 1, k = 50)

0.06 0.04

0.2 0.0

(b)

0.14

0.02 0

1

2

x

3

4

0.00

5

0

10

20

30

40

50

60

70

x

µ = k = 1 and a gamma probability density function with µ = k = 2. (b) For a fixed value of µ , as k increases, the gamma probability density function approaches a gaussian probability density function.

Figure 2.8 (a) Plot of an exponential probability density function with

with mean µ −1 and variance µ−2 . The sum of k independent, identically distributed exponential random variables, each with a mean µ −1 , is a gamma random variable with a probability density function given by f (z) = µ³( k)−1 (µz )k −1 e−µz ,

(2.2.50)

where k > 0 and ³(k) is the gamma function (cf. (2.2.45)). The mean of the gamma probability density function is k µ−1 and the variance is k µ −2. Using the substitutions k = N /2 and µ = 1/2σ 2, the gamma probability density function is equal to a central chi-square probability density function with N degrees of freedom. Plots of the gamma probability density function of several pairs (µ, k ) are shown in Figure 2.8.

Laguerre Random Variable A Laguerre random variable m is a discrete random variable parameterized by K , a, and x. It has a probability mass function given by

.

p(m; K , a, x ) =

am −ax K −1 (1 + a)m+ K e Lm (x )

for m

= 0, 1, 2, . . . ,

(2.2.51)

where L Km (x ) is the generalized Laguerre polynomial defined for the integers K and m as

³ K + m´ x k Lm (x ) = (−1) K − m k! , k =0 K

()

m ¼

k

(2.2.52)

with n² denoting the binomial coefficient. A Laguerre polynomial corresponds to the special case when K = 1.

The Central Limit Theorem A random variable that describes an event may arise as the sum of many independent random variables x i for repeated instances of some underlying constituent event. If the variances of the random variables x i are finite√and equal, then the probability density ∑ function px (x ) for the normalized sum x = (1/ N ) iN (x i −´ x ³) as N goes to infinity

2.2 Random Signals

67

φi

A

Many independent contributions

Sum of independent random vectors

)erutardauq( sixA yranigamI

)erutardauq( sixA yranigamI

Ai

θ

A Joint gaussian pdf θ

Real Axis (in-phase)

Real Axis (in-phase)

Figure 2.9 The limit of the sum of many independent random vectors superimposed on a constant

vector is a circularly symmetric gaussian probability density function centered on the constant Aeiθ .

will usually tend towards a gaussian probability density function with mean ´x ³ irrespective of the functional form of the probability density functions px i (x i ) of the individual constituent events. When the density functions are all the same, the formal statement is called the central limit theorem. The central limit theorem explains why the gaussian probability density function and its variants are ubiquitous in statistical analysis. As an example of the application of the central limit theorem, consider the probability density function of the normalized sum of N independent and identically distributed (IID) complex random variables Ai eiφi added to a constant Ae iθ , where Ai and φ i are zero-mean, independent, and identically distributed random variables, and the probability density function of φ i is uniform over [0, 2π). The resulting normalized sum is a complex random variable written as S=

√1

N Á ¼

N i =1

A i eiφi

 + Aeiθ .

√ ∑ = x + i y, where x = (1/ N ) iN=1( A i cos φ i + A cos θ) and √ ∑ y = (1/ N ) kN=i ( A i sin φi + A sin θ). In the limit as N goes to infinity, asserting the central limit theorem yields a joint probability density function f x , y (x , y) for S that is

This can be written as S

a circularly symmetric gaussian probability density function centered on the constant Aei θ . This is shown schematically in Figure 2.9. Although the central limit theorem is quite powerful, the convergence to a gaussian distribution is not complete for any finite number of summand random variables. This means that calculations of small probabilities of events by using the central limit theorem to validate the use of a gaussian distribution may not be sound.

2.2.2

Random Processes

Returning to the random voltage measurement shown in Figure 2.5, the probability density function of the random voltage defined at a single time instant is a first-order

68

2 Background

probability density function. To study the time structure of the random process at two time instants, consider a joint probability density function f (v1, v2 ; t1 , t2 ) that describes the joint probability that the voltage v1 is measured at time t1 and the voltage v2 is measured at time t2 . This probability density function is called a second-order probability density function because it relates two, possibly complex, random variables at two different times. The correlation of two continuous random variables defined from the same random process at two different times t 1 and t2 defines the autocorrelation function

. R (t 1, t2 ) = ´v 1 v ∗2³ =

²∞²∞

−∞ −∞

v1 v2∗ f (v1 , v2; t 1, t2 )dv1 dv2 .

(2.2.53)

The covariance function is defined as C (t1 , t 2) = R (t1 , t2 ) − ´v 1 ³´v 2 ³ (cf. (2.2.13)). Similar expressions in time and space are defined for lightwave fields in Section 2.3.5. There, the temporal properties of a random lightwave field are characterized by a temporal coherence function that is an extension of the autocorrelation function to lightwave fields. Higher-order joint probability density functions can be defined in a similar way using n different times. The collection, or ensemble, of all sample functions that could be measured, which are shown schematically in Figure 2.5, along with all of the nth-order probability density functions for all values of n, completely specifies a general random process. A gaussian random process is a random process for which every nth-order probability density function is jointly gaussian. If a gaussian random process is transformed or filtered by a linear system, then the output at any time t is a weighted superposition of gaussian random variables and hence is also a gaussian random variable. Accordingly, filtering a gaussian random process produces another gaussian random process, with the filtering affecting the mean, the variance, and the correlation properties, but not the gaussian form of the random process.

Stationarity and Ergodicity

The analysis of a random process is simplified whenever some or all of the probability density functions are shift-invariant. When the first-order and the second-order probability density functions are time-invariant, then the mean is independent of the time t1 , and the autocorrelation function depends only on the time difference τ = t 2 − t1 so that R (t1 , t 2) = R (t2 − t1 , 0) = R (τ, 0). When the autocorrelation function is time-invariant, it is written as R (τ). Random processes for which the first-order and second-order probability density functions are time-invariant are called stationary in the wide sense. In this case, the subscripts are dropped because the mean and the mean-squared value defined by (2.2.2) and (2.2.5) are now independent of time. If all probability density functions that can be defined for a random process are time-invariant, then the process is called strict-sense stationary. Often, every sample function of a stationary random process contains the same statistical information as every other sample function. If the statistical moments of the

2.2 Random Signals

69

ensemble can be constructed from the temporal moments of a single sample function, then the random process is called ergodic. In particular, the expectation can be replaced with a time average v over a single sample function v(t ), 1 ´v³ = v =. Tlim →∞ 2

² T /2

−T /2

v(t )dt

for an ergodic random process.

Power Density Spectrum

The power density spectrum S ( f ) can be regarded as the density of the power per infinitesimal frequency interval at the frequency f . For a wide-sense stationary random process s (t ), the power density spectrum can be derived by defining a temporal func. tion s T (t ) with finite support given by sT (t ) = s (t )rect(t / T ) with a Fourier transform UT ( f ). Convolving s T (t ) and sT∗(t ) and using the convolution property of the Fourier transform gives

²∞

sT (τ)sT∗ (t

−∞

− τ) dτ ←→ |U ( f )|2 . T

Take the expectation of each side and divide by T . Then, because the expectation is linear, we can write

²

1 ∞Ê sT (τ)sT∗ (t T −∞

.

Æ Ç Ë − τ) dτ ←→ T1 |U ( f )|2 . T

Now define RT (τ) = ´ sT (τ)sT∗(t − τ)³ as the autocorrelation function of the finiteduration random process s T (t ). In the limit as T goes to infinity, this becomes

²

1 ∞ R T (τ)dτ lim T →∞ T −∞

Ç 1Æ ←→ Tlim | U ( f )| 2 . →∞ T T

(2.2.54)

The Wiener–Khintchine theorem states that the left side is the autocorrelation function R (τ) of s (t ), with the right side defined as the two-sided power density spectrum. This means that the autocorrelation function R (τ) and the power density spectrum S ( f ) are a Fourier-transform pair given by

S( f ) = R (τ) =

²∞

²−∞ ∞ −∞

R (τ)e−i2π f τ dτ,

(2.2.55a)

S ( f )ei2π f τ d f .

(2.2.55b)

If the autocorrelation function R (τ) is a real and even function, then, from the properties of the Fourier transform (cf. (2.1.19)), the power density spectrum is real and even as well. For this case, a one-sided power density spectrum can be defined for positive f with a value that is twice that of S ( f ). The relationship between the signal power and the two-sided power density spectrum is given by P

=

²∞

−∞

S ( f )d f .

(2.2.56)

70

2 Background

It is conventional in lightwave communication systems to specify the power density spectrum Sλ (λ) in units of wavelength. This form of the power density spectrum is related to S ( f ) by the differential relationship

S λ (λ)dλ = S ( f )d f . Using λ f

(2.2.57)

= c or f = λ−1c, the differential factor d f /dλ is df dλ

= − λc2 ,

(2.2.58)

where the negative sign indicates that an increase in the wavelength corresponds to a decrease in the frequency.

Coherence Timewidth

The coherence timewidth τc defined as

τc =.

1

|R(0)|2

²∞ −∞

|R(τ)|2 dτ

(2.2.59)

is a measure of the width of the magnitude of the autocorrelation function of a stationary random process. For a random lightwave field, the coherence timewidth is the width of the temporal coherence function. This is discussed in Section 2.3.5. Within a coherence interval, defined as any interval of duration τc , the values of the random process are highly correlated. This means that within a coherence interval, the random process can be approximated as unchanging and described by a single random variable. A random process can then be approximated as a sequence of random variables with each random variable defined over a coherence interval. The reciprocal quantity Bc = 1/τc of the coherence timewidth is the effective bandwidth. It quantifies the number of coherence intervals per unit time. The effective bandwidth can be written as

ººµ ∞ º S ( f ) d f º2 | R (0)|2 −∞ Bc = µ ∞ = µ∞ , 2 2 −∞ | R (τ)| dτ −∞ |S ( f )| d f

(2.2.60)

as a consequence of Parseval’s relationship. The concepts of the autocorrelation function and the power density spectrum are illustrated by determining the autocorrelation function and power density spectrum for an example of a random process called the random telegraph signal. This consists of a sequence of random and independent nonoverlapping pulses, each transmitted within a time interval T and of duration T . For each pulse interval, it is equiprobable that amplitude 0 or amplitude A is transmitted. This random process is written as s (t ) =

∞ ¼ n=−∞

A n rect

³ t − nT − j ´ T

,

where j is an offset time described by a uniformly distributed random variable over [0, T ] . Three realizations with different random offset times are shown in Figure 2.10.

2.2 Random Signals

71

j1

j2

j3 Figure 2.10 Three possible realizations of a random binary waveform consisting of a random

sequence of marks of amplitude A and spaces of amplitude zero offset by a random variable j.

For any realization s(t ) and for τ > T , the pulses are independent because they are generated from independent bits. Therefore, the autocorrelation function R (τ) in this region for equally likely pulses with p = 1/2 is R (τ) = ´s (t )s (t + τ)³

= ´s (t )³´s(t + τ)³ ³ A ´ ³ A ´ A2 = 2 2 = 4

for τ

> T.

This is the expected power in the random signal with mean amplitude of A /2. For τ < T , there are two possibilities depending on the value of the offset j for each sample function. If j > T −τ , then the random variables s (t ) and s (t +τ) are defined in different time intervals. Thus ´s (t )s (t + τ)³ = A 2 /4 because each term has amplitude A with a probability of one-half and otherwise has amplitude zero. If j < T − τ , then s (t ) and s (t + τ) are defined in the same time interval. If a mark is transmitted, then s (t )s (t + τ) = A 2. If a space is transmitted, then s (t )s (t + τ) = 0. Because marks and spaces are equally probable, ´s (t )s (t + τ)³ = A 2/2. The results for j < T − τ and j > T − τ are combined by recalling that the offset j is described by a uniform probability density function, f ( j ) = 1/ T . The resulting autocorrelation function is

´s(t )s (t + τ)³ = =

² T −τ 0

³

A2 dj 2T

τ A2 2− 4 T

´

+

The same analysis holds for negative values of function is ⎧ A2 ⎪⎨ ³4 ´ R (τ) = ⎪⎩⎪ A2 2 − |τ| 4 T

²T

A2 dj T −τ 4T

(0 < τ < T ). τ.

Therefore, the autocorrelation

|τ | > T |τ| ≤ T .

(2.2.61)

A plot of the autocorrelation function is shown in Figure 2.11(a). The corresponding power density spectrum S ( f ) is the Fourier transform of R (τ). The Fourier transform of the constant mean power A2 /4 is a Dirac impulse ( A 2 /4)δ(t ). The Fourier transform

72

2 Background

R(τ)

(a)

0

(b) )Bd( rewoP dezilamroN

A2 2

−10 Modulated −20

A2 4 ±T

Zero-frequency component

T

signal component

−30 −4

τ

−2

0

2

4

Normalized Frequency (f T)

Figure 2.11 (a) Autocorrelation function of a binary sequence of marks and spaces. (b) Power

density spectrum.

of the triangular function is a sinc-squared function. The total power density spectrum is A2 T A2 sinc2 ( f T ), (2.2.62) S ( f ) = δ( f ) + 4 4 and is shown in Figure 2.11(b). For this waveform, half of the signal power is carried in the zero-frequency component and conveys no information. As a second example, consider the random process x (t ) =

M ¼

1



M m =1

A m e− i2π fm t ,

where { A m } is again a set of identically distributed random variables indexed by m, and f m is a known frequency for each m. We want to determine whether this random process is wide-sense stationary and whether it is ergodic. For the process to be wide-sense stationary, the expected value must be independent of time and the autocorrelation function can depend only on the time difference. The expected value is M ¼

´x (t )³ = √1

M m =1

´ Am ³e−i2π f t . m

For the process to be stationary, the expected value cannot depend on t. Examining Ê A Ë = 0. Therefore, the probability density m function of the amplitude must have zero mean. The autocorrelation function is

´x (t )³, this condition is satisfied only if R (t , t

+ τ) = =

M ¼ M ¼ m =1 ²=1 M ¼ M ¼

m =1 ²=1

´ Am A∗²³e−i2π f

mt

´ Am A∗²³e−i2π( f

m

ei2π f ² (t +τ)

− f² )t ei2π f ² τ .

In order for the autocorrelation function to depend only on τ and not on t , the expected value ´ A m A∗² ³ must vanish when m ± = ². If m = ², then ´ Am A∗² ³ = ´| A m | 2³. Thus the requirement for the process to be stationary is that ´ Am A∗² ³ = ´| A m |2³δ m ² ,

2.2 Random Signals

73

where δ m ² is the Kronecker impulse. This condition implies that the random variables are uncorrelated. Combining the two observations, we conclude that in order for the process to be stationary, the random variables must have mean zero and be uncorrelated. To test for ergodicity, the temporal moments of a single sample function must be equal to the statistical moments of the ensemble. However, each term in the summation for a sample function is of the form Am e−i2π fm t , and is sinusoidal. Therefore, each sample function is a deterministic function with individual temporal sections correlated over any time interval. It follows that the random process is not ergodic.

Noise Processes

Conventional noise processes, both electrical and optical, can be accurately modeled as stationary gaussian random processes with a constant power density spectrum over a limited frequency range. This will be shown in Chapter 6. Let Sn ( f ) = N0 /2 be a constant (two-sided) power density spectrum of a stationary, zero-mean noise process n(t ), possibly gaussian. This noise process has an equal contribution to the power density spectrum from every frequency component and is called a white-noise process.21 Because Sn ( f ) = N0 /2 is a constant, the autocorrelation function Rn (τ) of a white-noise process is a scaled Dirac impulse. When a stationary random process with a power density spectrum S ² ( f ) is the input to a linear time-invariant system with a causal impulse response and a baseband transfer function H ( f ), the power density spectrum S ( f ) of the random process at the output is related to the power density spectrum S ² ( f ) at the input by the expression

S ( f ) = S ² ( f )| H ( f )|2 .

(2.2.63)

S ( f ) = 21 N 0| H ( f )|2 .

(2.2.64)

For white noise, S ² ( f ) = N0 /2, and

Using (2.2.56), and the fact that the stationary noise process has zero mean, the output noise power Pn at time t is equal to the variance σ 2 given by (2.2.5) so that

= σ2 =

Pn

=

²

N0 ∞ | H ( f )|2 d f 2 −∞ ² N0 ∞ |h (t )|2 dt , 2 −∞

(2.2.65a) (2.2.65b)

where the second line follows from Parseval’s relationship (2.1.18). In general, the power in the filtered noise process can be determined using (2.2.63) and (2.2.65a) along with the convolution property of the Fourier transform given by (2.1.14), Pn

=

²∞

−∞

Sn ( f )| H ( f )|2 d f

21 A photodetected electrical signal has units of current so that N is sometimes expressed using an 0 equivalent power density spectrum per unit resistance with units of A2 Hz. A discussion of units is given

in the book section titled Notation.

/

74

2 Background

ºº = Sn ( f )| H ( f )| e d f ºº −∞ ºº τ=0 ∗ = Rn (τ) ± h(τ) ± h (−τ) τ =0, ²∞

2 i2π f τ

(2.2.66)

where R n (τ) is the noise autocorrelation function defined in (2.2.55b) corresponding to Sn ( f ). A noise source whose power density spectrum varies with frequency is called a colored noise source. The noise power σ 2 can be written as

σ 2 = R(0) = G N0 B ,

(2.2.67)

N

where B N is defined as the noise-equivalent bandwidth BN

1 =. 2G

²∞

²−∞ ∞ 1

= 2G

0

|H ( f )|2 d f

(2.2.68a)

|h(t )|2 dt ,

(2.2.68b)

with the normalization constant G = max| H ( f )|2 equal to the maximum power gain of the bandlimited system that defines the noise power. The concept of noise-equivalent bandwidth regards the total noise power as equivalent to a power density spectrum that is flat to a frequency B N and zero thereafter. For example, the noise-equivalent bandwidth of filtered white noise using a baseband system described by H( f ) = is BN

=

1 | H (0)|2

²∞ 0

1 1 + i f /Wh 1

1 + ( f /Wh )

2

df

= π2 Wh ,

where max | H ( f )|2 = | H (0)| 2 = G = 1, and Wh is the half-power or 3-dB bandwidth of the baseband spectrum. The expression | H ( f )| 2 and the corresponding noise-equivalent bandwidth are plotted in Figure 2.12(a). The passband noise-equivalent bandwidth B measures the width of a passband spectrum (cf. Figure 2.3). It is twice the bandwidth W of the baseband signal because it occupies twice the frequency range. The corresponding noise-equivalent bandwidth of a passband system is plotted in Figure 2.12(b). To determine the noise-equivalent bandwidth of a system described by the causal impulse response h(t ) = A rect((t / T )−1/2), use (2.2.68b) with the upper limit set to T . The Fourier transform of a rectangular pulse is a sinc function, which has its maximum ºµ ∞ º2 value at f = 0. Therefore G = max| H ( f )| 2 = |H (0)|2 = º 0 h(t )dt º = ( AT )2 . Then BN

=

1 2G

²T 0

A2 dt

=

A2 T 2G

= 2T1 .

(2.2.69)

2.2 Random Signals

(a)

75

(b) BN = π 2

0

BN = π

fc

Frequency

Figure 2.12 (a) Noise-equivalent bandwidth of a baseband transfer function. (b) Noise-equivalent

bandwidth of a passband transfer function that is symmetric about the carrier frequency fc .

If A = 1, then the filter integrates the noise over a time T , and the noise power is T 2 N0 2T

σ 2 = G N0 B = N

=

N0 T , 2

(2.2.70)

as given by (2.2.67). The effect of the noise is quantified by the signal-to-noise ratio, which is defined differently in the electrical domain and in the optical domain. In the electrical domain, the electrical signal-to-noise ratio is given by SNR =

expected electrical signal power . expected electrical noise power

(2.2.71a)

In the optical domain, the optical signal-to-noise ratio is given by expected lightwave signal power . expected lightwave noise power

OSNR =

(2.2.71b)

The relationship between these two quantities depends on the method of photodetection, and the coherence of the lightwave carrier. Accordingly, a general relationship between the electrical signal-to-noise ratio and the optical signal-to-noise ratio cannot be stated.

Passband Noise

When the passband bandwidth B is much smaller than the passband center frequency fc , the bandlimited noise process is called a passband noise process. A passband noise process ¿ n(t ) can be written in several equivalent forms as

¿n(t ) = n (t )cos(2π f c t ) − n (t )sin(2π f c t ) = A(t )cos (2π f c t + φ(t )) À Ã = Re A(t )ei(2π f t +φ(t )) À Ã = Re n(t )ei2π f t , I

Q

c

c

(2.2.72)

where n(t ) = n I (t ) + in Q(t ) = A(t )ei φ(t ) is the complex-baseband noise process. The term n I (t ) = A (t )cosφ(t ) is the in-phase component of the noise, while

76

2 Background

n Q (t ) = A (t )sinφ(t ) is the quadrature component of the noise. The autocorrelation function of ¿ n(t ) is R¿ n(t )¿ n (t n (t ) (τ) = ´¿

=

+ τ)³ Ã À ÃÇ n (t )ei2π f t Re n(t + τ)ei( 2π f ( t +τ)) .

Æ À Re

Using the identity Re[ z1 ]Re[z2 ] independent phase gives

c

=

1 2

c

Re[z1 z 2] +

1 2

Re[z ∗1 z 2] and the fact that

θ

is an

À Ã À Ã + τ)³ei[2π f (2t +τ)] + 12 Re ´n(t )∗ n(t + τ)³ei2 π f τ À Ã = 21 Re ´n(t )∗ n(t + τ)³ei2π f τ À Ã = 21 Re Rn (τ)ei2π f τ , (2.2.73)

R¿n (τ) =

1 2

Re ´n(t )n(t

c

c

c

c

where the rapidly oscillating first term is neglected compared with the second term. The term Rn (τ) = ´n(t )n (t + τ)³ is the autocorrelation function of the complex-baseband noise process. The passband power density spectrum is obtained by taking the Fourier transform of (2.2.73),

S¿n ( f ) =

1 4

(S ( f − f ) + S ∗(− f − f )) , n c c n

(2.2.74)

where the modulation property of the Fourier transform given in (2.1.16) has been used. The passband power density spectrum of the noise is shown in Figure 2.13(a). The power density spectrum of the complex-baseband noise process is obtained by substituting (2.1.58) into (2.2.64) and equating terms in the resulting expression for (2.2.74). This gives N0 /2

(a) –f c

f

fc

2N0

(b) –f c

f

fc N0 Passband noise bandwidth

(c) –fc

BN

fc

( )

Figure 2.13 (a) Power density spectrum of the real passband noise process S¿ n f . (b) The complex-baseband power density spectrum Sn f . (c) Power density spectrum of the complex-baseband noise components, Sn I f and Sn Q f , along with the passband

noise-equivalent bandwidth.

( )

( )

( )

2.2 Random Signals

Sn ( f ) = 2N 0| H ( f )| 2 = 4S¿n ( f + f c ),

f

> 0,

77

(2.2.75)

where only the positive-frequency part of the passband noise power density spectrum S¿n ( f ) is used to determine the complex-baseband power density spectrum Sn ( f ), and H ( f ) is the complex-baseband transfer function defined in (2.1.58). This power density spectrum is shown in Figure 2.13(b). The power density spectrum for each complex-baseband noise component of a passband noise process with a passband bandwidth B is determined using the same steps that were used to derive (2.2.74), with

Sn I ( f ) = S n Q ( f ) = S¿n ( f − f c ) + S¿n ( f

+ f c ).

This spectrum is shown in Figure 2.13(c). Using (2.2.73), the noise power σ 2 in the passband noise process ¿ n(t ) is

²

N 0 ∞ ºº ¿ ºº2 H( f ) d f 2 −∞ R¿n (0)

σ = = Ê Ë = 12 |n(t )|2 . 2

(2.2.76)

A complex-baseband signal has twice as much signal energy as the corresponding passband signal because it excludes the average of a cosine-squared term, which is equal to one-half (cf. (2.2.73)). Accordingly, the complex-baseband noise-equivalent bandwidth BN is defined to be twice as large as the baseband noise-equivalent bandwidth given in (2.2.68a) so that ² 1 ∞ . BN = |H ( f )|2 d f . (2.2.77) G −∞ For the idealized case in which a constant noise power density spectrum N0 is filtered by an ideal lowpass filter, the noise-equivalent bandwidth B N is equal to the effective bandwidth Bc defined in (2.2.60). To show this, let H ( f ) = rect( f / B ) so that the filtered noise process is S n ( f ) = N0 rect( f / B ). The effective bandwidth of this noise process is

(µ ∞ S ( f )d f )2 n Bc = µ−∞ ∞ S 2( f )d f )2 (µ ∞−∞ n N 0 rect( f / B )d f −∞ = B. = µ∞ 2 2 N rect ( f / B )d f −∞

0

The complex-baseband noise-equivalent bandwidth is given by (2.2.77), BN

= =

²

1 ∞ |H ( f )|2 d f G −∞ ²∞ 1 N 2 rect2 ( f / B )d f N 02 −∞ 0

= B.

(2.2.78)

78

2 Background

Therefore, B N = B c = B. For this case, the coherence timewidth defined in (2.2.59) is the exact reciprocal of the noise-equivalent bandwidth BN . For other filters, this equality need not hold, and the number of coherence intervals per unit time Bc does not equal the noise-equivalent bandwidth B N . When the in-phase n I (t ) and quadrature n Q (t ) noise components of a passband noise process ¿ n(t ) are independent, and have the same statistics, the autocorrelation for each noise component is the same, with R n I (τ) equal to R nQ (τ). The corresponding autocorrelation function Rn (τ) of the complex-baseband process is Rn (τ) = 2R n I (τ) = 2R n Q (τ).

(2.2.79)

The power in each quadrature component can be determined using (2.2.76) and (2.2.79),

Ê Ë Ê Ë Ê Ë Êº º Ë σ 2 = ¿n2(t ) = n2(t ) = n2 (t ) = 21 ºn(t )º2 . I

Q

(2.2.80)

When the complex process is a gaussian random process, then the two-dimensional gaussian probability density function of the in-phase and quadrature components at any time t describes a circularly symmetric gaussian random variable, with the process being a circularly symmetric gaussian random process.

Noise Figure The amount of noise added to a signal in a linear system can be quantified by the concept of the noise figure FN , which is defined in terms of the signal power and the noise power over a bandwidth B . A spectral noise figure FN ( f ) can be defined by constraining the bandwidth B, centered at f , to be sufficiently narrow that the signal power and the noise power are independent of frequency over bandwidth B. The spectral noise figure FN ( f ) is defined as the ratio of the total noise power in the system to the noise power Pin ( f ) at the input FN ( f ) =

Pin ( f ) + Pa ( f ) Pin( f )

= 1 + PPa (( ff )) , in

(2.2.81)

where Pa ( f ) is an additional uncorrelated noise contribution, typically from an amplifier, that is added to the input noise Pin ( f ). The noise powers are referenced to the output of the system with a transfer function H ( f ) defined over the bandwidth B, given that the noise and signal are frequency-independent. The input power Sin ( f ) for the signal is filtered by the system to produce the output signal power Sout( f ) = Sin ( f )| H ( f )|2 . The spectral noise figure can be expressed in terms of the signal-to-noise ratio by multiplying the numerator and denominator of (2.2.81) by Sout ( f ) = Sin ( f )| H ( f )| 2 to obtain FN ( f ) =

Sin ( f )| H ( f )|2 ( Pin ( f ) + Pa ( f )) . Sin ( f )| H ( f )|2 Pin ( f )

(2.2.82)

Sin ( f ) Pout ( f ) , Sout ( f ) Pin ( f )

(2.2.83)

Then FN ( f ) =

where Pout( f ) = | H ( f )|2 ( Pa ( f ) + Pin ( f )) is the output noise power.

2.3 Electromagnetics

79

The frequency-independent form of the noise figure F N is FN

SNRin = SSin PPout = SNR , out in

out

(2.2.84)

where the signal power and the noise power need not be frequency-independent over the bandwidth B .

2.3

Electromagnetics The physical quantity used for lightwave signaling in the continuous wave-optics model is a time-varying electromagnetic field described by Maxwell’s equations. These equations comprise a system of partial differential equations having a rich set of solutions that depend both on the geometry and on the medium. This section provides a summary review of Maxwell’s equations as used for analyzing lightwave signal propagation. The electromagnetic field in free space can be described by two vector functions of space r and time t . These functions are the electric field vector E (r, t ) and the magnetic field vector H(r, t ). When an electromagnetic field interacts with a material, two additional material-dependent vector quantities, D(r, t ) and B(r, t ), are needed to describe the interaction of the electromagnetic field with the material. In the most basic case, these quantities are scaled forms of E (r, t ) and H(r, t ), with D(r, t ) = εE (r, t ) and B(r, t ) = µH(r, t ), where ε and µ are material-dependent constants. These constants convert the field quantities E (r, t ) and H(r, t ) into the material-dependent quantities D(r, t ) and B(r, t ). Suppressing the arguments of the vector functions, Maxwell’s set of partial differential equations for a material with no free charge is given by

∇ × E = − ∂∂Bt , ∇ × H = ∂∂Dt , ∇ · B = 0, ∇ · D = 0.

(2.3.1a) (2.3.1b) (2.3.1c) (2.3.1d)

The operator ∇· is the scalar divergence differential operator, and the operator ∇× is the vector curl differential operator. Integrating the electric field vector E along a line produces the potential difference between the end points of the line. Integrating the magnetic field vector H around a closed curve yields the “displacement current” that passes through the surface enclosed by the closed curve. Each material-dependent quantity, D or B , has units of field per unit area and is called a flux density. The term D, called the electric flux density, accounts for the bound charge within the dielectric material, and so can generate an electric field. The term B , called the magnetic flux density, accounts for any closed current paths that are induced within the material, and so can generate a magnetic field.

80

2 Background

The application of Maxwell’s equations to lightwave propagation at the single angular frequency ω can be described as a monochromatic electric field vector E (r, t ) = Re[E(r, ω)eiωt ]. Using a set of unit vectors {· x, · y,· z} defined for cartesian coordinates, the expression E (r, ω) = E x (r, ω)· x + E y (r, ω)· y + E z (r, ω)· z

(2.3.2)

is the complex-baseband vector field22 at a position r and frequency ω of the monochromatic component given by eiωt . The general electric field is then a superposition over all appropriate ω. As will be explained in Section 2.3.2, the function E (r, ω) satisfies the vector Helmholtz equation (cf. (2.3.15)) given by

∇ 2 E(r, ω) + k02 n2(r, ω)E(r, ω) = 0, where k0 = ω/c0 is the free-space wavenumber, and n(r, ω) is the index of refraction as a function of the position r and the frequency ω. Given the boundary conditions

determined by the geometry, the solution to this equation describes the propagation of a monochromatic lightwave field, which is a field that has a single temporal frequency.

2.3.1

Material Properties

The two quantities D and B account for the effect of the material on the field. In free space, the relationship between the electric field vector E and the electric flux density D is given by D = ε0 E , where ε0 is a constant known as the permittivity of free space. Similarly, in free space, the magnetic flux density B is related to the magnetic field vector by B = µ0 H, where µ 0 is a constant known as the permeability of free space. The materials used for the fabrication of optical communication fibers are glass or plastic. These materials have no free charge. Materials with no free charge are called dielectric materials. To guide light, the materials are spatially modified to create two or more regions with different dielectric properties. The fields in each region are constrained by the boundary conditions at the interface between the regions. The boundary conditions can be derived from the integral formulation of Maxwell’s equations, where either a line integral or a surface integral spans the discontinuous interface. Applying a limiting operation results in the expressions for the boundary conditions for the differential form of Maxwell’s equations. These conditions state that for a dielectric material, the tangential components of the fields E and H and the normal components of the flux densities D and B must be continuous across the interface. When an electric field E is applied to a dielectric material, the bound charge separates slightly to create dipoles. These separated charges produce an additional field, called the material polarization23 P , that is added to the original field to produce the electric flux 22 The term “complex-baseband vector field” is not generally used in electromagnetics. Its use here is to

emphasize the relationship between electromagnetic fields and the communication signals derived from electromagnetic fields. 23 Two distinctly different properties are referred to using the word “polarization” – the field polarization and the material polarization within a material.

P

2.3 Electromagnetics

81

density D. Accordingly, in a dielectric material, the flux densities are related to the fields by

D = ε0 E + P , B = µ0H ,

(2.3.3a) (2.3.3b)

which are known as the constitutive relationships. The relationship between the applied electric field E and the resulting material polarization P is the material susceptibility, denoted X. When the susceptibility is linear and does not depend on time or space, it appears as the simple expression

P (t ) = ε0 X E (t ).

(2.3.4)

In general, the susceptibility may be a scalar function of space X (r), a scalar function of time X (t ), or a scalar function X (r, t ) of both, or even a tensor. These are each described in turn in the following paragraphs.

Homogeneous and Inhomogeneous Media A material whose properties do not depend on the position r is homogeneous. If some material properties do depend on r, then the material is inhomogeneous. A common form of an inhomogeneous material is a material for which the susceptibility depends on r, and so the index of refraction, denoted n (r), depends on r. The relationship between the index of refraction and the susceptibility is derived later in this section. Isotropic and Anisotropic Materials Materials whose properties do not depend on the orientation of the electric field within the material are called isotropic materials. Materials for which the material properties depend on the orientation of the electric field are called anisotropic materials. The response of an anisotropic material varies depending on the orientation of the electric field vector E with respect to a set of preferred directions, called principal axes, which are a consequence of the internal structure of the material. An optically anisotropic material may exhibit birefringence, which describes a material with an index that is different for two orthogonal polarization states. For a birefringent material, the material polarization component Pi in a direction i, for i = 1, 2, 3 representing directions x, y, z, depends on the electric field component Ei in that direction as well as the components E j in directions for which j is not equal to i . For a material that is both linear and memoryless, this dependence can be written as

Pi

= ε0

3 ¼ j =1

XijEj,

(2.3.5)

where X i j is an element of a 3 × 3 matrix X called the susceptibility tensor. The susceptibility tensor represents the anisotropic nature of a birefringent material. Birefringent materials are discussed later in this section.

82

2 Background

Linear Causal Media Many materials exhibit a causal, linear response in time, and a local response in space for which the material polarization at each position r depends on the incident electric field only at that position and not at other positions. In this case, suppressing the spatial dependence on r from the notation, the polarization (t ) for an isotropic material can be written, in general, as a temporal convolution

P

E

P

P ( t ) = ε0

²∞ 0

X (t

− τ)E (τ)dτ,

(2.3.6)

where X (t ) is called the temporal susceptibility of the material. The temporal susceptibility describes the real, causal impulse response of the material to the electric field at each position r. Accordingly, the temporal susceptibility represents the temporal memory of the material to the applied field, as expressed by the temporal convolution in (2.3.6).

E

Linear Material Dispersion

The susceptibility X (t ) may have a narrow timewidth compared with the relevant time scale. Then the susceptibility is modeled as instantaneous, with X (t ) = X δ(t ), where X is a real constant. Materials that have no memory on the relevant time scale are called nondispersive materials. Using (2.1.2) in (2.3.6), the material polarization at a single point in space for a nondispersive material is given by (2.3.4). Substituting that expression into (2.3.3a) gives

D(t ) = ε0 (1 + X ) E(t ) = ε0εr E (t ),

. where εr = 1 + X is the relative permittivity of the nondispersive material.

(2.3.7)

Materials that do exhibit memory on the relevant time scale are called dispersive materials. For a dispersive material, the material polarization at a single point of the dielectric material often has the linear restoring force of a simple harmonic oscillator, in which case the material is linear. The dielectric material consists of a volume of bound charge with a density N in units of coulombs per cubic centimeter, possibly depending on r. Each bound charge with a mass m and a total charge q has a linear restoring force that models the “stiffness” of the material response. The restoring force can also be nonlinear, leading to a nonlinear relationship between and . Nonlinear material response is discussed in Chapter 5. For a linear isotropic dielectric dispersive material, the material polarization (t ) at each point r is in the same direction as the applied electric field (t ), with the response of a single component of (t ) to a single component of an applied electric field (t ) described by the second-order differential equation

P

E

E

P

d2 P (t ) dt 2

P

+ σω dPdt(t ) + ω20P (t ) =

N q 2 E (t ) , m

P

E

(2.3.8)

where ω02 = K / m is the resonance frequency of the material response and σω is the spectral width of the resonance.

2.3 Electromagnetics

83

For this linear material, the response to a monochromatic electric field E (t ) = Re[ E (ω)eiωt ] is another monochromatic field, P (t ) = Re[ P (ω)eiωt ]. Substituting these forms into the differential equation (2.3.8), the complex spectral susceptibility, denoted as χ(ω), is

ω20 (ω) = χ . (2.3.9) χ(ω) = ε1 EP (ω) 0 2 ω0 − ω 2 + iσω ω 0 Because the spectral susceptibility χ(ω) is the temporal Fourier transform of the temporal susceptibility X (t ), denoted as X (t ) ←→ χ(ω) , the temporal susceptibility X (t ) follows from (2.3.9). As ω goes to zero, χ(ω) goes to χ0, where χ0 = N q 2/ m ε0ω02 is the low-frequency susceptibility. As ω goes to infinity, χ(ω) goes to zero. Because the temporal susceptibility X (t ) is a real, causal, linear function of time, there is an inherent relationship between the real part χ (ω) and the imaginary part χ (ω) of the complex spectral susceptibility χ(ω) given by the Kramers–Kronig transform (cf. (2.1.23)). A similar relationship exists between the magnitude of χ(ω) and the phase of χ(ω). The consequences of this relationship are discussed in SecR

I

tion 4.3.2.

2.3.2

The Wave Equation

Modulated waveforms within the wave-optics signal model are electromagnetic waves. The wave equation for electromagnetics can be derived by applying the vector curl operation to both sides of (2.3.1a). Substituting (2.3.1b) and the constitutive relationships (2.3.3a) and (2.3.3b) for a nondispersive material into the resulting equation yields 2 2 ∇ × ∇ × E (r, t ) + 12 ∂ E∂(t r2, t ) = −µ0 ∂ P∂ t(2r, t ) , (2.3.10) c0 √ where c0 = 1/ ε0 µ0 is the speed of light in vacuum. Substituting the vector identity ∇ × ∇ × E (r, t ) = ∇ (∇ · E (r, t )) − ∇ 2E (r, t )

and (2.3.4) into (2.3.10) gives 2 2 ∇ 2 E (r, t ) − ∇ (∇ · E (r, t )) − n2 ∂ E∂(t r2, t ) = 0, c0

where n2

=. 1 + X = εr

(2.3.11)

(2.3.12)

defines the index of refraction n for a nondispersive homogeneous material, conventionally referred to simply as the index. More generally, for an inhomogeneous material, the index depends on r. For a dispersive material, the index depends on time (or frequency). The index n = c0 /c relates the speed of light c in the material to the speed of light c0 in vacuum, with c = c0 / n, as will be shown later in this section. For silica glass, the index is approximately 1.5.

84

2 Background

For a nondispersive inhomogeneous material for which the index n (r) varies as a function of r slowly in comparison with the variations of E (r, t ) as a function of r, the term ∇(∇ · E (r, t )) in (2.3.11) is zero or can be neglected. Then the wave equation is written as 2 2 ∇ 2E (r, t ) − n (2r) ∂ E∂(t r2, t ) = 0. (2.3.13) c0 The specific conditions that need to be satisfied for (2.3.13) to be valid are discussed in an end-of-chapter problem.

Complex Representations

The analysis of Maxwell’s equations in a dispersive medium can often be simplified by using a complex representation based on Fourier analysis. For a time-invariant system, the temporal dependence of the electric field vector E (r, t ) can be written as an inverse temporal Fourier transform (cf. (2.1.9))

E ( r, t ) =

²

∞ 1 E(r, ω) eiωt dω, 2π −∞

(2.3.14)

showing the temporal dependence of E (r, t ) expressed as a superposition of complex frequency components E(r, ω)eiωt . Substituting this form into (2.3.13), and using ∂ 2 (E(r, ω)eiωt )/∂ t 2 = −ω2 E(r, ω)eiωt , the complex representation of the wave equation for a dispersive material is described by the vector Helmholtz equation

∇ 2 E(r, ω) + n 2(r, ω)k02E(r, ω) = 0,

(2.3.15)

where E(r, ω) is the complex electric vector field at position r and frequency ω (cf. (2.3.2), where k0 = 2π/λ0 is the free-space wavenumber, with λ0 being the free-space wavelength, and where n(r, ω) is the index at that position and frequency.24 A similar expression governs the complex vector magnetic field H(r, ω) . The square n2 (r, ω) of the frequency-dependent index appearing in (2.3.15) is equal to 1 + χ(r, ω) (cf. (2.3.12)). The complex spectral susceptibility χ(r, ω) (cf. (2.3.9)) is the temporal Fourier transform of the real temporal susceptibility X (r, t ) (cf. (2.3.6)) and characterizes the dispersive properties of the material. For a dielectric material, a basic expression for χ( r, ω) is given by (2.3.9). In a homogeneous, nondispersive dielectric material, n (r, ω) reduces to a constant n, and the vector Helmholtz equation reduces to

.

∇2 E(r) + k2 E(r) = 0,

(2.3.16)

where k = nk0 is defined as the wavenumber. When all vector components of the E field and the H field have the same functional form, then wave propagation can be analyzed using a scalar form of the Helmholtz equation given by

( , ω) can be formally derived by introducing (2.3.6)

24 The expression for the frequency-dependent index n r

into (2.3.11).

2.3 Electromagnetics

∇ 2U (r, ω) + n 2(r, ω)k02U (r, ω) = 0,

85

(2.3.17)

where U (r, ω) is a scalar function representing one component of either the electric field E (r, ω) or the magnetic field H (r, ω) normalized so that |U (r, ω)|2 represents a spatial power density. This normalization is discussed later (cf. (2.3.29)). For a constant index, the scalar Helmholtz equation simplifies to (cf. (2.3.16))

∇ 2U (r) + k2 U (r) = 0,

(2.3.18)

with k being the wavenumber. This equation is used to develop geometrical optics later in this section.

Plane Waves in Unbounded Media

Solving the vector Helmholtz equation for a specified geometry and medium requires applying the specified boundary conditions in an appropriate coordinate system. The most basic geometry consisting of an unbounded, lossless, linear, isotropic, homogeneous medium, not necessarily free space, allows the simplest solution of the Helmholtz equation. To this point, the fields E and H of the form E(r) = E 0e−i β·r · e,

H(r) = H0e−i β·r · h,

(2.3.19a) (2.3.19b)

satisfy (2.3.15) for such a medium, where · e and · h are orthogonal unit vectors. This is · a plane wave. The cross product · e × h defines the real propagation vector β± = βx· x + ± βy·y + βz·z with the unit vector β =. β /β pointing in the direction of propagation of the plane wave. The magnitude β = | β | is called the propagation constant. In general, a wave for which both the electric field and the magnetic field are transverse to the direction of propagation is called a transverse electromagnetic wave or a TEM wave. The complex amplitudes E 0 and H0 for the field given in (2.3.19) for the plane wave √ are related by H0 = E 0 /η, where the material-dependent quantity η = µ0 /ε is the impedance, which may depend on the frequency. The spatial phase term e−iβ ·r depends on the inner product of the propagation vector β and the position vector r = x · x + y· y+ z· z. Using E 0 = | E 0| eiφ , the real electric field for a plane wave is given by

À

E (r, t ) = Re E(r)eiωt

Ã

= | E0 |cos (ωt − β · r + φ)·e. (2.3.20) The cosine function repeats in time whenever ωt = m2π , where m is an integer. This condition defines the temporal period T = 2π/ω = 1/ f . The cosine function repeats in space whenever β · r = m2π . This corresponds to a spatial period or wavelength λ = c/ f that is defined as the distance along the β± direction between two consecutive

phase fronts in the medium. For a plane wave, the propagation vector given by β

β

is also called the wavevector k, and is

= k =. k0 n(ω)β±,

(2.3.21)

86

2 Background

.

where the magnitude |k | = k = n(ω)k0 is the wavenumber. Setting the propagation constant β equal to the wavenumber k gives k λ = 2π or

λ = 2π/ k . The phase velocity c is defined by

. λ = λ(ω) ω = ω , T 2π β(ω)

c=

(2.3.22)

where the notation shows the dependence of the propagation constant β on the frequency ω, and thereby the dependence of the wavelength λ on the frequency ω .

Dispersion Relationship Substituting the planes waves of (2.3.19) into the vector Helmholtz equation in (2.3.15) and noting that the spatial operator ∇2 reduces to a multiplication by −β 2 for a planewave field of the form of (2.3.19), the Helmholtz equation has a solution when β(ω) satisfies β(ω) = k0n (ω) = ω nc(ω) = 2πλn(ω) . (2.3.23) 0 0 Using c0 = ω/ k0 = λ0 f , the index of refraction n(ω) = c0 /c(ω) is the ratio of the phase velocity in free space to the phase velocity in the medium. The functional dependence β(ω) of the propagation constant β on the frequency ω is called the dispersion relationship. Values of ω and β that satisfy (2.3.23) produce plane-wave solutions to the Helmholtz equation. As the field propagates a distance L in ±, the complex amplitude of the field is multiplied a lossless medium in the direction of β − iβ(ω)L by e , which is a distance-dependent phase shift. This phase shift does not change the functional form of the solution. Each plane-wave solution is called an eigenfunction, or eigenmode, of the unbounded, lossless, linear, homogeneous medium, with the phase shift e−iβ(ω) L being the eigenvalue defined at a distance L . For other geometries, such as waveguiding structures, the dispersion relationship is different than (2.3.23) because of the presence of boundaries. This dispersion relationship must be derived for a given geometry, starting with the vector Helmholtz equation given in (2.3.15), and applying the geometry-dependent boundary conditions. This means that different waveguiding geometries have different dispersion relationships.

Poynting Vector

The cross product of the time-varying vector fields E (r, t ) and H(r, t ), which are not necessarily plane waves, nor orthogonal, is defined as the Poynting vector S (r, t ),

S(r, t ) =. E (r, t ) × H (r, t ).

The Poynting vector is a directional spatial power density with units of power per unit area. For a guided mode, the Poynting vector may have components that are not in the direction of propagation.

2.3 Electromagnetics

87

Narrowband fields can be written as E (r, t ) = Re[E(r, t )eiωc t ] and H(r, t ) = Re[H(r, t )ei ωc t ], where the complex-baseband electromagnetic fields25 E(r, t ) and H(r, t ) are slowly varying as compared with the carrier frequency ωc . The time-average power density S ave (r, t ) for the complex-baseband field is expressed as

S ave (r, t ) = Re

À1

Ã

∗ 2 E(r, t ) × H (r, t )

= Re [Save (r, t )] , where S ave(r, t ) = 21 E(r, t ) × H∗ (r, t ) is the complex Poynting vector. The intensity of the field I (r, t ) is defined as the magnitude of Save (r, t ), . I (r, t ) = |S ave(r, t )| .

(2.3.24a) (2.3.24b)

(2.3.25)

When both field vectors are transverse to the direction of propagation, the wave is a transverse TEM wave with the Poynting vector along the direction of propagation. When, instead, one of the two fields has an axial vector component along the direction of propagation, as in a dielectric waveguide, the Poynting vector has a component along the direction of propagation and a component transverse to the direction of propagation. This is discussed at the end of Section 3.3.1.

Lightwave Power

The lightwave power P (z, t ) flowing along the z axis through a cross-sectional region A transverse to the z axis at a distance z is given by P ( z, t ) =

²

A

S ave(r, t ) · ·z dA ,

(2.3.26)

where · z is a unit vector in the z direction and dA is the differential of the area. For a TEM wave propagating in the z direction, the Poynting vector Save (z, t ) is in the z direction, and P (z , t ) =

²

A

I (r, t )dA .

(2.3.27)

When the intensity is constant over the transverse region A , the power P (z , t ) and the intensity I (z , t ) differ only by the constant scaling factor equal to the area of the region A. As an example, the complex Poynting vector given in (2.3.24a) for the plane wave described by (2.3.19) is S ave(r, t ) = 21 E(r, t ) × H∗ (r, t )

Á Â Á Â = 12 E0e−i ·r ·e × H0∗ ei · r ·h 2 = | E2η0| β±, β

β

(, )

(2.3.28)

25 The same symbol E is used for the complex-baseband electric field E r t in the time domain and the

complex electric field E( r, ω) in the frequency domain. The context will resolve the ambiguity.

88

2 Background

± and H0 = E 0/η (cf. (2.3.19)). The corresponding intensity is I = where · e ×· h = β 2 |Save| = | E0 | /2η. Expression (2.3.28) states that the intensity of a TEM wave is proportional to the squared magnitude | E 0 |2 of the electric field, and therefore proportional to the squared magnitude of the magnetic field. For this case, it is convenient to define a complex envelope of the field U(r, t ) with the squared magnitude of U(r, t ) normalized so that I (r, t ) = |U(r, t )| 2,

(2.3.29)

where U(r, t ) represents either E(r, t ) or H(r, t ), normalized.

Orthogonality of Spatial Modes

Consider a waveguiding structure with a cross section in the (x , y) plane that does not vary along the z direction. Using the vector Helmholtz equation, given in (2.3.15), and applying the boundary conditions produces characteristic spatial solutions called eigenfunctions, eigenmodes, or simply modes. Expressions for these modes will be developed in Section 3.3. In general, a mode consists of axial electric and magnetic field components along the direction of propagation and transverse electric and magnetic field components perpendicular to the direction of propagation. This section discusses the orthogonality of modes and shows that in a medium for which the attenuation is the same for every mode, the modes are orthogonal. In contrast, in a medium for which the attenuation can vary from mode to mode, the modes need not be orthogonal. The second case is discussed in Section 4.6.4. The first case is discussed in this section for the special case of a lossless medium. The theory of self-adjoint differential operators, as applied to electromagnetics,26 shows that the transverse field components of a set of normalized modes for a lossless dielectric waveguide, which includes both discrete guided modes and a continuum of unguided modes, form an orthonormal basis. This orthogonality property is a consequence of the nature of the differential operator that defines wave propagation in a lossless medium as described by the vector Helmholtz equation. Given the solution for the transverse field components, the complete field, which includes axial field components, can be determined using Maxwell’s equations. For a lossless dielectric waveguide with an (x , y ) cross section that is independent of z, the spatial part of the complex electric field E j (r, ω) = E j (x , y , z , ω) for the j th propagating mode at frequency ω can be expressed in terms of a normalized vector field e j (x , y) that has unit power. The dependence of E j on ω is suppressed henceforth in this section. Similar statements hold for H j (x , y , z, ω) . Chapter 3 shows that for a lossless waveguide, the spatial dependence of the vector electric field E j (x , y , z ) for the j th mode is the product of two terms. The first term is a phase term e−iβ j z along the direction of propagation, which depends on the mode index j. The second term, e j (x , y ) = e jt (x , y )·t + e j z (x , y )· z,

26 See Friedman (1956) and Chew (1990).

(2.3.30)

2.3 Electromagnetics

89

describes the normalized vector field as a sum of a transverse field component e jt (x , y ) with · t a unit vector in the (x , y) plane and an axial field component e j z ( x , y ) oriented along the direction of propagation described by the unit vector · z. Each of these vector components depends on the transverse coordinates x and y. Combining the two terms gives E j (x , y , z ) = e j (x , y)e−iβ j z .

(2.3.31a)

A similar expression holds for H j (x , y, z ), H j (x , y , z ) = h j (x , y )e−iβ j z .

(2.3.31b)

The propagation constant β j that governs the z-dependent phase is determined by the geometry of the waveguide. The standard definition of the power orthogonality condition for the modes of an arbitrary waveguide is a generalization of the mean power density given in (2.3.24). Define the mean cross-power density S jk (r) between the j th and kth normalized modes as the real function

S jk (r) = 21 Re[S jk (r)] =

1 2

Re[e j (r) × h ∗k (r)],

(2.3.32)

where S jk (r) is the complex cross-power density, and the rightmost term is normalized to have unit power. To recover (2.3.24), which is the power density in a single mode, set j equal to k. For a waveguide constructed using a homogeneous material, the mean power density given in (2.3.24) and the orthogonality condition given in (2.3.32) can be expressed solely in terms of either e j (x , y) or h j ( x , y ). This is discussed further at the end of Section 3.3.1. The orthogonality condition for a lossless guided electromagnetic mode using the normalized fields e j (x , y) and hk (x , y ) can be written as 1 2

² ( A

e j (x , y) × h∗k (x , y)

) · · z dA = ¶δ jk ,

(2.3.33)

where A is the transverse region of the waveguide in the (x , y ) plane and the sign convention is given below by (2.3.35). (For the unguided radiation modes, the index j in (2.3.31) is a continuous variable and the Kronecker impulse δ jk , with both j and k being discrete indices in (2.3.33), is replaced by a Dirac impulse δ( j − k ), with both j and k now being continuous variables.) Expression (2.3.33) is a generalization to a set of orthogonal modes of the expression for the propagating power given as (2.3.26). Similarly to that expression, (2.3.33) states that only field components lying in a plane transverse to · z will yield a nonzero contribution to the density of power propagating in the z direction. Therefore, using “t ” to denote the transverse component, (2.3.33) can be rewritten in terms of the normalized transverse electric field component et j (x , y) = et j (x , y)· t given in (2.3.30) and the corresponding component h tk (x , y), so that 1 2

² ( A

)

et j (x , y) × h∗tk (x , y) dA

= ¶δ jk .

(2.3.34)

90

2 Background

Each guided mode supports a wave that can travel in either direction, with the propagation constant given by ¶|β j |. Given the cross-product relationship between et j (x , y), h tk (x , y ), and the direction of power flow, the mode index j is defined so that for j positive, the power propagates in the positive-z direction with a positive propagation constant β j , and for j negative, the power propagates in the negative-z direction with a negative propagation constant β j . Using this convention gives

β− j = −β j ,

ej

= e− j ,

h− j

= −h j .

(2.3.35)

An arbitrary transverse field component can be expressed in terms of a superposition of the modes. These modes provide an orthonormal basis for the transverse field components. For the guided modes, this is

¼

Et ( x , y ) =

j ¼

Ht ( x , y ) =

j

a j et j (x , y ),

(2.3.36a)

a j h t j (x , y).

(2.3.36b)

Because of symmetry and the linearity of Maxwell’s equations, both expansions have the same set of coefficients a j . The expansion coefficient a j for mode j is the cross product of the normalized transverse field component h∗tk (x , y) with each side of (2.3.36a) integrated over the cross-sectional area. Using the orthogonality condition given in (2.3.34), this gives

²

A

¼ ²

E t (x , y ) × h ∗tk (x , y )d A =

aj

j



¼ j

A

et j (x , y ) × h∗tk (x , y )dA

2a j δ jk

= ¶2ak .

(2.3.37)

The expansion coefficient a j is given by aj

= ¶ 21

²

A

Et (x , y ) × h∗t j (x , y )d A .

(2.3.38)

The axial component of the E field and the axial component of the H field follow from the monochromatic forms of Maxwell’s equations. Each has the same phase term e−i β j z . For mode j, let e j (x , y) be the vector sum of the transverse electric field component and the axial electric field component. In terms of the complete field, the expansion coefficient a j is given by aj

² = ¶ 21 (E(x , y) × h∗j (x , y)) · ·z dA. A

(2.3.39)

Using the expansion coefficient a j given in (2.3.39) and (2.3.31), the spatial dependence of an arbitrary complex field can be written as E(x , y, z ) = H( x , y , z ) =

¼

j ¼ j

a j e j (x , y )e−i β j z ,

(2.3.40a)

a j h j (x , y)e−iβ j z ,

(2.3.40b)

2.3 Electromagnetics

91

where the summation runs over both positive and negative values of j , with the signs of the fields and the propagation constants following the sign convention given in (2.3.35). The expressions (2.3.36) and (2.3.40) show that an arbitrary field within a lossless waveguide can be expressed in terms of the orthogonal modes of that waveguide. Given the propagation characteristics of each guided mode, the propagation of an arbitrary field within a waveguide can be determined using these expressions.

2.3.3

Geometrical Optics

For a guiding structure whose features are sufficiently larger than the wavelength of the light, geometrical optics, as embodied in the methods of ray tracing, accurately describes many aspects of lightwave signal propagation. Geometrical optics can describe the intersection of rays with arbitrary surfaces defined by two regions that have different indices. It can also describe propagation in a material with an index n (r) that varies with the position r, and so is inhomogeneous. Other properties of an electromagnetic field, such as polarization, can be incorporated by attaching additional properties to a ray. Using these additional properties, geometrical optics can be extended to analyze light propagation in an anisotropic material, and can be used to describe other effects such as polarization. The basic equations governing geometrical optics can be derived from wave optics for an isotropic medium, not necessarily homogeneous. For an isotropic medium, the material polarization P lies in the same direction as the electric field E . Therefore, wave propagation can be analyzed using a scalar Helmholtz equation (cf. (2.3.17)). For a single frequency component ω, the scalar Helmholtz equation is written as

∇ 2U (r) + k02 n2(r)U (r) = 0, (2.3.41) where U (r) is a scalar function representing either E (r) or H (r) normalized so that |U (r)|2 = I (r) (cf. (2.3.29)), n(r) is the index of refraction for an inhomogeneous medium, and k 0 is the free-space wavenumber. When the spatial dependence of the index n(r) varies slowly with respect to the wavelength λ , the solution to (2.3.41) is spatially narrowband and can be written as U (r) = u(r)e−ik0 S( r) ,

(2.3.42)

where the spatial envelope u(r) varies slowly compared with the spatial phase k0 S (r). Now substitute (2.3.42) into (2.3.41). Both the real part and the imaginary part of the resulting equation must be equal to zero. Setting the real part equal to zero gives

³ λ ´2 ∇ 2 u(r) , |∇ S (r)| = n (r) + 2π0 u (r) where λ0 = 2π/ k0 . As λ 0 goes to zero, the second term can be neglected, and |∇ S (r)|2 = n2 (r). 2

2

(2.3.43)

(2.3.44)

92

2 Background

Expression (2.3.44) is called the eikonal equation. It is one form of the governing equation for geometrical optics. 27 A spatial surface on which S(r) is a constant defines a geometrical wavefront. A geometrical ray is defined as a trajectory that is orthogonal to the geometrical wavefront at each point of the trajectory. To derive the governing equation for a ray trajectory, divide each side of (2.3.44) by n2 (r). This gives |∇ S(r)/n (r)|2 = 1, which implies that

·s(r) = ∇nS(r(r)) ,

(2.3.45)

k(r) = k0 ∇ S (r).

(2.3.46)

where · s(r) is a unit-amplitude vector that is normal to the geometrical wavefront defined by a constant value of S(r). The vector · s(r) defines the direction of a ray. Each ray can be interpreted as the propagation direction of a “local” plane wave with a local . wavevector k(r) given by k (r) = k0 n(r)· s(r) (cf. (2.3.21)), and a phase velocity given by c(r) = c0 / n(r). Using this definition of the direction of propagation, an equivalent form of (2.3.45) that is used to determine the propagation of a ray is

This expression states that for a structure for which ray optics is valid, each ray may be regarded as a “local” plane wave with the local wavevector k (r) given by (2.3.46).

Derivation of the Ray Trajectories for Small Angles

If all of the initial ray trajectories make angles that are small with respect to the z axis, ( )1/2 then the differential path length ds = dr 2 + dz 2 ≈ dz. For this paraxial approximation, the derivative dr/ ds of the ray vector along this path can be expressed in a cartesian coordinate system as dr ds

dy dz = dx · x+ · y+ · z ds ds ds dy ≈ dx · x+ · y +· z. dz dz

(2.3.47)

To use this expression to describe ray propagation, consider a point along the ray r(s ), where the position vector r of the point is a function of the distance s along the ray, so that dr/ds = · s. On substituting this expression into (2.3.45), the eikonal equation is rewritten as n ( r)

dr ds

= ∇ S(r).

(2.3.48)

Substituting ∇ S (r) = n(r)· s(r) from (2.3.45) into (2.3.48), taking the derivative with respect to the path length s, and using (d/ds )n(r)· s(r) = ∇ n(r) gives 27 The eikonal equation can be inferred from Fermat’s principle, which states that the path taken by a ray

between any two points is the minimum-time path.

2.3 Electromagnetics

d ds

³

dr n(r) ds

´

= ∇n (r).

93

(2.3.49)

This expression is the eikonal equation expressed in terms of dr/ ds and n (r). For a paraxial ray propagating in a medium such as an optical fiber for which n(r) depends only on the transverse coordinates, set n(r) = n(x , y ), and replace ds with dz so that ∇ =. ·x ∂/∂ x + ·y ∂/∂ y. With these changes, (2.3.49) separates into a pair of scalar differential equations for the ray trajectory (x (z ), y (z)) as a function of z, 1 ∂ n (x , y ) n (x , y ) ∂ y

2

= ddz y2 ,

1 ∂ n (x , y ) n( x , y ) ∂ x

2

= ddzx2 .

(2.3.50)

The ray-optics equations can be used to analyze signal propagation for a fiber with a guiding structure that is large compared with the wavelength.

2.3.4

Polarization

The polarization28 of a propagating electromagnetic field refers to the direction of the electric field vector as a function of time and space. For a monochromatic plane wave propagating in the z direction, the polarization state can be described using two orthogonal unit vectors, · x and · y, defined in a plane transverse to the direction of propagation. These unit vectors define a basis that can express an arbitrary state of polarization as a linear combination of two complex polarization components E(x , y) = E 1 (x , y )· x + E 2( x , y )· y,

(2.3.51)

= E · ·x = | E1|eiδ , E2 = E · · y = | E 2| eiδ

(2.3.52)

where each of E 1( x , y ) and E2 (x , y) is a complex function of x and y, which dependence is henceforth suppressed. Then E1

1

2

are the two complex projections of the complex amplitude of the electric field vector E onto the two orthogonal unit vectors, · x and · y, where | E 1| and | E 2 | are the magnitudes of each vector component, and where δ1 and δ 2 are the phases of each vector component. Each vector component of the electric field E (z , t ) is given by (2.3.20). The sum of these two vector components is the vector field E (z , t ), which is given by

E (z, t ) = | E 1| cos(ωt − kz + δ1)· x + | E 2| cos(ωt − kz + δ 2)· y

= | E1|cos(ψ + δ1)·x + | E2|cos(ψ + δ2)·y,

(2.3.53)

where ψ = ω t − kz is a phase term that describes the rotation of the plane-wave electric field vector in time and space. The two field components E1 (ψ) = | E 1| cos(ψ +δ 1) and E2 (ψ) = |E 2 |cos(ψ + δ2 ) are the parametric equations of an ellipse parameterized by ψ with the state of polarization completely described by |E1 |, | E2|, and δ = δ2 − δ1 . 28 Two distinctly different properties are referred to using the word “polarization” – the polarization of an

electric field and the polarization of a material.

94

2 Background

A linear state of polarization has δ = 0 or δ = ¶π . A circular state of polarization has |E1 | = | E2| and δ = ¶π/2. The three independent degrees of freedom that characterize the state of polarization are expressed in several alternative representations using either the complex field or the intensities of the polarization components. Two popular representations will be described: the Jones representation and the Stokes representation. The Jones vector representation expresses the polarization state as a complex column vector J = [x1 x2]T , where x1 and x2 are the complex components of the monochromatic lightwave vector field E at wavelength λ, normalized such that the magnitude of the vector | x1 |2 + | x2 |2 is unity. Two polarization states J1 and J2 for which J†1 J2 = 0 are orthogonal. For example, if | E 1 | = √ | E2| and δ2 − δ1 = δ = π/2 in (2.3.53), then J = √1 [1 i ]T, where the factor of 2 normalizes the amplitude. This Jones Vector 2

is called a right-handed circular polarization state. The orthogonal left-handed circular polarization state is √1 [ 1 −i]T . 2 In general, the state of polarization is a function of the wavelength. A wavelengthdependent polarization state leads to a form of dispersion called polarization-mode dispersion, as discussed in Section 4.6. The Stokes vector representation expresses the polarization in terms of a set of three real quantities called the Stokes parameters (s1 , s2 , s3 ). The first Stokes parameter s1 is the difference in the intensities of two linear polarization components oriented in the same two directions as the components of the Jones vector. The second Stokes parameter s2 is the difference in the intensities of linear polarization components for a set of axes rotated by π/4 with respect to the axes used to define s1. The third Stokes parameter s3 is the difference between the right-handed polarization component and the left-handed polarization component. For a well-defined state of polarization, the Stokes parameters satisfy s12 + s22 + s32 = 2 s0 , where s20 is the total intensity in the field, and is normalized to one to conform with the normalization convention used for the Jones representation. Light is unpolarized if the polarization changes randomly and rapidly compared with the time scale of interest. Light is partially polarized if the polarization changes at a rate comparable to the time scale of interest.29 The Stokes parameters can be determined from the two complex components (x 1, x2 ) of the normalized Jones vector J by the equations s1 = x1 x1∗ − x2 x2∗

= |x1 |2 − |x2 |2 , s2 = x1 x2∗ + x1∗ x2 = 2 Re[ x1 x2∗ ], ) ( s3 = i x1 x2∗ − x1∗ x2 = −2 Im[ x1 x2∗ ].

(2.3.54)

In general, the inverse function determining the two complex components (x 1, x2 ) of a Jones vector J from the three real components (s1, s2, s3) has an ambiguity with respect to the phase of the complex field. Nevertheless, the Jones representation and the 29 For partially polarized light, which is not considered, the intensities are defined using averages with s0 2 s1 2 s2 2 s3 2 . For unpolarized light, every polarization state is equally likely, so that s1 2 s2 2 s3 2 0. For details, see Shurcliff (1962).

´ ³ ≥´ ³ +´ ³ +´ ³ ´ ³ =´ ³ =´ ³ =

2.3 Electromagnetics

Circular

95

s3

Elliptical

s1

s2

Linear

Figure 2.14 A first octant of the Poincaré sphere. Linear states of polarization are confined to the

equatorial plane (s3 polarization states.

= 0). Circular polarization is at a pole. All other points are elliptical

Stokes representation each express a well-defined state of polarization in terms of three independent degrees of freedom. The Stokes parameters (s1, s2, s3) can be expressed as the cartesian coordinates of a Stokes vector ¸s , where the arrow above is included to distinguish the Stokes representation from the Jones representation. The tip of the Stokes vector lies on the surface of a sphere of unit radius called the Poincaré sphere. The first octant of the Poincaré sphere is shown in Figure 2.14. The Poincaré sphere is useful for visualizing the evolution of the polarization state of a propagating wave, which can be described as a trajectory on the surface of the Poincaré sphere. The components of the Stokes vector can also be defined using a pair of angles (ξ, χ) that are related to the standard angles (θ, φ) in a spherical coordinate system by the equations or

θ = π2 − 2χ,

φ = 2ξ

(2.3.55a)

χ = π4 − 2θ ,

ξ = φ2 . (2.3.55b) The angle θ is measured from the s3 axis. The angle φ is measured in the (s1, s2 ) plane from the s1 axis of the projection of the Stokes vector s¸ onto the (s1 , s2 ) plane. Using these relationships, the angle 2ξ with |ξ| ≤ π/2 can be regarded as the “longitude” on the Poincaré sphere, whereas 2 χ with |χ| ≤ π/4 can be regarded as the “latitude” on the Poincaré sphere. The normalized Stokes parameters expressed in terms of these two angles are

= cos (2χ)cos(2ξ), s2 = cos (2χ)sin(2ξ), s3 = sin(2χ). s1

If χ = 0, then θ = Poincaré sphere.

(2.3.56)

π/2 and the field is linearly polarized and lies on the equator of the

96

2 Background

A basis of orthogonal Jones vectors {J1, J2} can be stated in terms of the set of angles

(ξ, χ) as

¸ a ¹ ¸ b ¹ . . J1 = (2.3.57) −b ∗ , J 2 = a ∗ , where a = cos ξ cos χ − i sin ξ sin χ , b = − sin ξ cos χ + i cos ξ sin χ , and | a|2 +|b|2 = 1. For a Jones vector that satisfies J†1J2 = 0, the corresponding two Stokes vectors ¸s1 and ¸s2 are antipodes (opposite sides) of the Poincaré sphere, and define orthogonal polarization states in that representation. Given that (2χ, 2ξ) gives the latitude and longitude for one Stokes vector, the orthogonal state is given by (−2χ, 2ξ ¶ π) . Polarization Transformations

The state of polarization can change as a lightwave field propagates, and can vary as a function of frequency ω . A change in the state of polarization occurs in an anisotropic material that has a different index of refraction for each component of the electric field. This response is modeled by a linear transformation

= Tp1 p¸2 = S p¸1

(Jones representation),

p2

(2.3.58a)

(Stokes representation),

(2.3.58b)

where p 1 or p¸1 is the state of polarization at the input, p 2 or p¸2 is the state of polarization at the output, T is the polarization transformation expressed as a 2 × 2 matrix with complex elements in a Jones representation, and S is a 3 × 3 matrix with real elements in a Stokes representation. For a lossless polarization transformation, the transformation matrix U(ξ, χ) is a function of the set of angles (ξ, χ). In a Jones representation, each column of U(ξ, χ) is one of the orthogonal Jones vectors defined in (2.3.57) so that

U(ξ, χ) = [J1, J2] =

¸

a −b ∗

b a∗

¹

.

(2.3.59)

This matrix can be interpreted as a transition along an arc on the surface of the Poincaré sphere described by a change in the longitude 2ξ and latitude 2χ of the polarization state. It is a unitary matrix satisfying U−1 (ξ, χ) = U† (ξ, χ). The output Jones vector p2 in terms of the transmitted Jones vector p 1 is p2

= U(ξ, χ)p1 .

For initial and final polarization states that are linearly polarized, reduces to ¸ cos ξ −sin ξ ¹ U(ξ) = sin ξ cos ξ ,

(2.3.60)

χ = 0, and (2.3.59) (2.3.61)

so the rotation is along the equator of the Poincaré sphere. The representation of a general lossless polarization transformation T depends on the basis. If T is a polarization transformation in the first basis and W is a polarization transformation in a second basis, then

W (ξ, χ) = U(ξ, χ)TU† (ξ, χ),

where U(ξ, χ) is the transformation from the first basis to the second basis.

(2.3.62)

2.3 Electromagnetics

97

Linear Birefringence

A linear-birefringent material has an angularly dependent index of refraction that is different for two orthogonal linear polarization states. A circular-birefringent material has an index that is different for two orthogonal circular polarization states. Standard optical fiber exhibits linear birefringence. The index n is different for linearly polarized light along two preferred orthogonal axes. These axes are called the two optic axes. One axis, denoted · efast, is called the fast optic axis and the other axis, denoted · eslow , is called the slow optic axis. Using the two optic axes as a basis, (2.3.51) can be written as E = E fast· efast + E slow· eslow .

(2.3.63)

The polarization component E fast along the fast optic axis experiences an index n fast. The polarization component E slow along the slow optic axis experiences an index n slow , where n slow is larger than nfast. For plane-wave propagation, the phase shift along the fast optic axis is given by φfast = β fast · r = βfast L = k 0nfast L, whereas φslow = k0 nslow L is the phase shift along the slow optic axis, where k0 = 2π/λ0 is the free-space wavenumber. By writing the transformation matrix in the form

T = e−iφ0 D

(2.3.64)

the phase shifts along the two optic axes can be partitioned into a common term,

φ0 = 21 (n slow + n fast)k0 L , which is the same for both polarization components, and a symmetric differential matrix,

D= where

¸

e−i ´φ/2 0

0

ei´φ/2

´φ = (β slow − βfast) L

¹

,

(2.3.65)

(2.3.66)

is the differential phase shift between the linearly polarized components oriented along the two optic axes. When the differential phase ´φ depends on the frequency, the medium exhibits polarization-mode dispersion, which is discussed in Section 4.6.

2.3.5

Random Lightwave Fields

A noncoherent lightwave carrier can have significant random fluctuations in both time and space. The temporal properties of a stationary random signal were characterized in Section 2.2.2 by an autocorrelation function. This section extends that analysis to the random properties of a lightwave field by defining a joint coherence function for time and space together. The temporal randomness of a lightwave field is quantified by the temporal coherence function. For a single point in space, this function is an extension of the autocorrelation function defined in (2.2.53) that includes the vector nature of the field.

98

2 Background

The spatial randomness in a lightwave field is quantified by the spatial coherence function. In general, the term autocorrelation function is conventionally used when describing the statistical properties of electrical signals, whereas the term coherence function is conventionally used when describing the statistical properties of lightwave fields. The meaning is much the same. For a random lightwave field, the mean intensity I (r, t ) at a single time instant and a single point in space is defined as I (r, t ) = ´U(r, t ) · U∗ (r, t )³ = ´|U(r, t )|2 ³.

(2.3.67)

Extending this definition to two points r1 and r2 in space, one of which may be delayed in time by τ , the first-order mutual coherence function ϕ(r1 , r2 , τ) is defined as

ϕ(r1, r2, τ) =. ´U(r1 , t ) · U∗ (r2, t + τ)³,

where the expectation is over both time and space. The mutual coherence function may be viewed as the ability of a lightwave field at two times separated by τ to interfere at two points r1 and r2 in space. For r1 = r2 = r, the temporal coherence function describes the ability of a lightwave field to interfere with a time-delayed version of itself at the single point r. The width of the temporal coherence function, which can be defined in several ways, is called the coherence timewidth τc (cf. (2.2.59)). Likewise, for τ = 0, the spatial coherence function describes the ability of a lightwave field to interfere in space at a single time instant t . A coherence region coh is the spatial equivalent of the coherence timewidth τc and is a region for which the lightwave field is highly correlated in space. For a single point in space, an ergodic lightwave field is one that satisfies

A

²

T 1 ϕ(r, r, τ) = Tlim U(r, t ) · U∗ (r, t + τ) dt . →∞ 2T − T Informally, this expression states that a single realization of the random, ergodic timevarying lightwave field suffices to compute a property defined for the ensemble. In general, the lightwave field from an unmodulated source is ergodic. Likewise, the associated cross-coherence function for two complex lightwave fields is

ϕi j (r1 , r2 , τ) = ´Ui (r1, t ) · U∗j (r2 , t + τ)³.

(2.3.68)

The intensity autocoherence function is defined as

ϕ (r, r, τ) =. ´ I (r, t ) I (r, t + τ)³ = ´U (r, t )U ∗ (r, t )U (r, t + τ)U ∗ (r, t + τ)³. I

This is a fourth-order function of the complex field envelope at a single point in space.

Electrical Signals Generated from Random Lightwave Fields

R

The photodetected electrical signal is r (t ) = P (t ) (cf. (1.2.4)). When the photodetected electrical signal r (t ) is a stationary random process, the electrical autocorrelation function Re (τ), previously defined in (2.2.53), is

2.5 Problems

99

. + τ)³ 2 = R R (τ),

(2.3.69)

.

(2.3.70)

Re (τ) = ´r (t )r (t P

where

R P (τ) = ´ P (t ) P (t

+ τ)³

is the autocorrelation function of the lightwave power. The lightwave power autocorrelation function R P (τ) and the corresponding electrical autocorrelation function R e (τ) are used in Chapter 6 to characterize lightwave noise.

2.4

References Basic material on linear systems can be found in Kudeki and Munson (2009) and in Oppenheim and Willsky (2013). The subject of generalized functions was developed by Schwartz (1950) and is discussed in Kanwal (2004). Properties of matrices and Kronecker products are discussed in Horn and Johnson (1994). Background material on random processes is discussed in Helstrom (1991), in Stark and Woods (1994), in Gallager (2013), and in Hajek (2015). Probability distributions involving gaussian random variables are discussed in Simon (2007). Electromagnetic theory is covered in Harrington (1961), Kong (1990), and Chew (1990). Electromagnetics applied to optical frequencies is discussed in Born and Wolf (1980) and Saleh and Teich (1991). An introduction to modes in dielectric waveguides is given in Diament (1990). Polarization is covered in Shurcliff (1962). The Kramers– Kronig relations are discussed in Landau, Lifshitz, and Pitaevski˘ı (1984) and in Yariv (1989). Landmark papers in the field of polarization are compiled in Swindell (1975). Statistical optics is covered in Goodman (2015), and in Mandel and Wolf (1995). Polarization transformations used for lightwave communications are discussed in Huttner, Geiser, and Gisin (2000).

2.5

Problems 1 Linear systems Show that for any constants a and b, the definition of a linear system can be replaced by the single statement

a x 1(t ) + b x2 (t ) → a y1 (t ) + b y2 (t )

whenever x1(t ) → y1 (t ) and x2(t ) → y2 (t ).

2 Properties of the Fourier transform (a) Starting with the definition of the Fourier transform and its inverse, derive the primary properties of the Fourier transform listed in Section 2.1. (b) Using the modulation property of the Fourier transform and the trans∞ form pair 1 f , show that −∞ ei2π f1 t e−i2π f 2t dt f2 f1 ,

←→ δ( )

µ

= δ( − )

100

2 Background

thereby demonstrating that the set {e−i2π f j t } of time-harmonic functions is orthogonal. 3 Gram–Schmidt procedure The Gram–Schmidt procedure is a constructive method to create an orthonormal basis for a space spanned by a set of N signal vectors that are not necessarily linearly independent. Let xn t be a set of signal vectors. The procedure is as follows. (a) Set 1 t x1 t E1 , where E 1 is the signal energy. (b) Determine the component of x2 t that is linearly independent of 1 t by finding the projection of x2 t along 1 t . This component is given by x2 t 1 t 1 t , where the inner product is defined in (2.1.65). (c) Subtract this component from x2 t . (d) Normalize the difference. The resulting basis vector can be written as

{ ( )}√ ψ ( ) = ( )/

()

[ ( ) · ψ ( )]ψ ( )

()

ψ()

ψ()

()

ψ2 (t ) = |xx2((tt))−−[x[x2((tt))··ψψ1((tt))]]ψψ1((tt)|) . 2

2

1

1

(e) Repeat for each subsequent vector in the set, forming the normalized difference between the vector and the projection of the vector onto each of the basis vectors already determined. If the difference is zero, then the vector is linearly dependent on the previous vectors and does not constitute a new basis vector. (f) Continue until all vectors have been used. Using this procedure, determine the following bases. (a) An orthonormal basis for the space over the interval [0, 1] spanned by the functions x1 (t ) = 1, x2 (t ) = sin(2π t ) and x3 (t ) = cos2 (2π t ). (b) An orthonormal basis for the space over the interval [0, 1] spanned by the functions x1 (t ) = et , x 2(t ) = e−t , and x3 (t ) = 1. 4 Gaussian pulse 2 (a) Using the Fourier transform pair e−π t the Fourier transform, show that 2 2 e−t / 2σ

←→



←→ e−π f

2 2 2 2πσ e−2π σ f

=

2



and the scaling property of

2 2 2πσ e−σ ω / 2.

(b) Using an angular frequency ω , show that when the root-mean-squared timewidth is defined using the squared magnitude of the pulse and the rootmean-squared bandwidth is defined using the squared magnitude of the Fourier transform of the pulse, Trms rms = 1/2. (c) Derive the relationship between the root-mean-squared bandwidth rms for the signal power and the −3 dB or half-power bandwidth h for a pulse whose 2 2 power P (t ) is given by e−t /2σ P . (d) A lightwave pulse s (t ) modeled as a gaussian pulse with a root-mean-squared timewidth Trms is incident on a square-law photodetector with the electrical pulse p(t ) generated by direct photodetection given by |s (t )| 2/2. Determine the following. i. The root-mean-squared timewidth of p(t ) in terms of Trms .

W

W

W

2.5 Problems

101

ii. The root-mean-squared timewidth of the electrical power per unit resistance Pe (t ) = p(t )2 in terms of Trms . (e) Finally, rank-order the root-mean-squared timewidth of the lightwave pulse s (t ), the electrical pulse p(t ) generated by direct photodetection, and the electrical power pulse Pe (t ). Is this ordering valid for any pulse shape? 5 Pulse formats Derive relationships between the root-mean-squared width, the 3 dB width, and the full-width-half-maximum width both in the time domain and in the frequency domain for the following pulses. (a) A rectangular pulse defined as p t 1 for W 2 t W 2, and zero otherwise. (b) A triangular pulse defined as p t 1 t W for t W , and zero otherwise. (c) A lorentzian pulse defined as



() =

− / ≤ ≤ /

( ) = −| |/ p(t ) =

t2

| |≤



+ α2 ,

where α is a constant. 6 Pulse characterization The rectangular pulse p t defined in Problem 2.5 is used as the input to a timeinvariant linear system defined by h t p t so that the impulse response is equal to the input pulse. (a) Derive the full-width-half-maximum timewidth and the root-mean-squared 2 timewidth of the output y t h t p t and show explicitly that 2 p2 y where is the root-mean-squared timewidth. (b) Let the full-width-half-maximum width be denoted by F . Determine whether the relationship 2F p2 Fy2 holds for each pulse defined in Problem 2.5.

()

()= ( )

σ =σ

()= ()± ()

σ

=

7 Passband, baseband, analytic signals, and the Hilbert transform (a) Using

¿s (t ) = A(t ) cos (2π f c t + φ(t )) À Ã = Re (s (t ) + is (t ))ei2π f t = Re[ z(t )], I

Q

c

determine expressions for A(t ) and φ(t ) in terms of s I (t ) and s Q (t ). (b) Verify the following relationships: ½ ¾ i. s I (t ) = Re z(t )e−i2π fc t , ½ ¾ ii. s Q(t ) = Im z (t )e−i2π f c t , iii. A(t ) = | z (t½)|, ¾ iv. φ(t ) = arg z (t )e−i2π f c t . (c) Derive a relationship for the Hilbert transform · s (t ) in terms of the complexbaseband signal sI (t ) + is Q (t ) and the carrier frequency f c .

102

2 Background

(d) Given that s (t ) is a real causal function with the Fourier transform pair s (t ) ←→ S R(ω) + iSI (ω), use the conjugate-symmetry properties of SR (ω) and SI (ω) to show that the Kramers–Kronig transform can be written as

SR

²∞

ω S (±) d±, π ²0 ω − ±2 ∞ ± (ω) = π2 ±2 − ω2 S (±)d±. 0

SI (ω) =

2

R

2

I

8 Preservation of the commutator property Prove that the commutator property A B basis, where O is the zero matrix.

[ , ] = O is preserved under a change of

9 Properties of commutators Let A, B , and C be n n matrices. Show that the commutator satisfies the following identities. (a) A B C A C B C. (b) A BC A BC BA C. (c) ABA−1 B−1 I, if A B O. Identity (c) redefines the commutator using only the operation of matrix multiplication and its inverse. Addition and subtraction are not used. This definition of the commutator is used in group theory because it applies to algebraic groups with only a single operation.

×

[ + , ]=[ , ]+[ , ] [ , ]=[ , ] + [ , ] = [ , ]=

10 Matrix exponentiation (a) Do the matrices

A=

¸

1 0

0 0

¹

and B

=

¸

0 0

1 0

¹

commute? (b) Show explicitly that eA eB ± = eA +B for the matrices specified in part (a). (c) Let A be any square matrix satisfying A2 = I. Determine a simplified expression for eA . (d) By reference to the elementary expansion ex e y = ex + y , explain why one should expect that commutativity is necessary to write eA eB = eA+B . 11 Matrix logarithms The logarithm of a square matrix A is a matrix B such that A matrix A, express log A in terms of the eigenvalues of A.

= eB. For a symmetric

12 Trace of an outer product Using (2.1.97), a square matrix T can be written as a weighted sum of outer products

T=

¼ m ,n

Tmn· xn· x†m .

Using this expression and the properties of the trace operation, show that the trace of a square matrix expressed as an outer product of two vectors is equal to the inner product of the same two vectors.

2.5 Problems

13 Commutation in an enlarged signal space Let A and B be n n hermitian matrices for which (a) Prove that the two 2n 2n matrices

×

× ¸A B¹

[A, B] ± = O.

¸B A¹

and

B A

103

A B

do commute. This shows that operators that do not commute can be embedded in a larger signal space, called an ancilla embedding, in which they do commute. (b) Prove that trace (AB ) = trace (BA ) (cf. (2.1.84c)) even if [A, B] ± = O. 14 Probability density functions (a) Verify that the mean of the Rayleigh probability density function

r −r /2σ , r ≥ 0, σ2 e √ is σ π/2 and that the variance is σ 2 (2 − π/2). f (r ) =

2

2

(b) Show that as A becomes large, a ricean probability density function can be approximated by a gaussian probability density function. Why should this be expected? 15 Transformation of a function of a random variable A new probability density function f y y is generated when a random variable x with probability density function f x x is transformed by the functional relationship y T x , where T is invertible over the region where f x x is defined. (a) Using the fact that the transformation must preserve probabilities on intervals so that f y y dy f x x dx, show that

()

= ()

()

() = ()

½

f y ( y) = f x

()

Á

¾

T −1( y)

 ºº dx ºº ºº ºº . dy

(b) Using f y (y ) = f x T −1( y) | dx /dy |, show that for y = x 2/ 2 and f x (x ) = (x /σ 2)exp(− x 2/ 2σ 2 ), which is a Rayleigh probability density function, f y (y ) is an exponential probability density function with an expected value σ 2 . (c) Let x = G (w, z) and y = F (w, z ) be the inverse transformations that express the variables (x , y) in terms of the variables (w, z). For a bivariate probability density function f x y (x , y), the expression for fw z (w, z ) is f w z (w, z ) = f x y (G (w, z), F (w, z )) | J | , where | J | is the determinant of the jacobian matrix of this transformation, which is given by

⎡ ∂G ⎢⎢ ∂w ⎢⎣ ∂F ∂w

∂G ∂z ∂F ∂z

⎤ ⎥⎥ ⎥⎦ .

104

2 Background

= G (w, z ) = z − w , and y =

Using this expression and the transformation x F (w, z ) = w, show that

fw z (w, z ) = f x y (z − w, w).

(d) Using the result of part (c), show that f z (z ) =

²∞

f x y (z − y, y)dy.

−∞

(e) Show that if the two random variables x and y are independent, then f x y (z − y , y ) is a product distribution, and the probability density function f z (z) for z is given by f z (z ) =

²∞

−∞

f x (z − y) f y ( y )dy

= f x (z ) ± f y (z ),

which is (2.2.11). 16 Marginalization The bivariate gaussian probability density function has the form 2 2 px ,y (x , y ) = Ae −(ax +2bx y +cy ) .

(a) Express A in terms of a, b, and c. (b) Find the marginals, px (x ) and p y ( y), and the conditionals px | y (x | y) and py | x (y | x ). (c) Find the means ´x ³ and ´ y³, the variances σx2 and σ y2 , and the correlation ´x y ³. Hint: ax 2 + 2bx y + cy 2

= (a − b2 /c)x 2 + c(y + bx /c)2.

17 Number of required terms for the Fourier-series expansion of the phase function (requires numerics) Let f 1 be the exact expression for the marginal probability density function of the phase given in (2.2.35) and let f2 N be the series approximation using N terms of the Fourier series as given by

(φ)

( , φ)

1 2π

f φ (φ) =

+

∞ ¼

n=1

A n cos (n φ),

where F = A 2/ 2σ 2 , and where the zero-frequency component is 1/2π . The coefficients of the cosine Fourier series are given by30 An

=

1 2

Å

¸

F −F / 2 e I( n−1)/2

π

³F´ 2

+ I(n+1)/2

³ F ´¹

where In is the modified Bessel function of order n. Define the root-mean-squared error as

δ( N ) = 30 See Prabhu (1969).

Ä ²π 2

0

( f1 (φ) − f 2( N , φ))2 dφ.

2

,

2.5 Problems

105

(a) How many terms are required so that the error is less than 10−6 if F = 1? (b) How many terms are required when F = 10? (c) Discuss the results with respect to the number of terms required for a specified accuracy as a function of F . 18 Joint and marginal gaussian probability density functions The joint probability density function px ,y x y is given as

⎧ ⎪⎨ p x ,y ( x , y ) = ⎪⎩

(, ) ÉÍ Ì È 2 y2 1 1 x + 2σ 2 exp − 2πσx σ y 2 2σ x2 y 0

>0 if x y < 0. if x y

(a) Show that this function is a valid probability density function. (b) Sketch px , y (x , y) in plan view and in three dimensions. Is this joint probability density function jointly gaussian? (c) Find the marginal probability density functions px (x ) and py ( y ) and comment on this result. 19 Derivation of the noncentral chi-square probability density function Using the transformation z x 2 y 2, where x and y are independent gaussian random variables with the same mean A and variance 2 , show that the probability density function f z for z can be written as

+

=

()

f ( z) =

1

2σ 2

σ

³ √´ σ2

2 2 A z e−( z + A )/2σ I0

z

≥ 0,

which is (2.2.38). 20 Derivation of the eikonal equation Using a local plane-wave solution of the form

U (r) = u(r)e−ik0 S(r ) in the scalar Helmholtz equation

∇ 2U (r) + n2 (r, ω)k 20U (r) = 0, show that

À

Ã

k20 n(r)2 − |∇ S(r)| 2 u(r)

À Ã + ∇ 2 u(r) − ik0 2 ∇ S(r) · ∇u (r) + u(r)∇ 2 S(r) = 0.

The real part of this expression is (2.3.43). 21 Coherence function and the power density spectrum (a) Let R e−|τ | ei2π f c τ . Determine the one-sided power density spectra S f and Sλ . (b) A lightwave carrier has a power density spectrum Sλ given by

(τ) = (λ)

( )

π Sλ (λ) = (λ − λc )2 + π .

Determine the total lightwave signal power P.

(λ)

106

2 Background

(c) Determine the full-width-half-maximum width of the spectrum in part (b). (d) Estimate the coherence timewidth τc for the spectrum in part (b). 22 Autocorrelation and the power density spectrum of a random signal using sinusoidal pulses A binary waveform consists of a random and independent sequence of copies of the pulse 1 cos 2 t T rect t T with random amplitude A n for the nth term of the sequence. The start time j of the pulse sequence is a uniformly distributed random variable over 0 T . The symbols transmitted in each nonoverlapping interval of length T are independent. The probability of transmitting a mark with an amplitude A is 1 2. The probability of transmitting a space with an amplitude 0 is 1 2. (a) Determine the autocorrelation function of the signal. (b) Determine the power density spectrum of the signal.

( + ( π / ))

(/ )

[, ]

/

/

23 Covariance matrices Define z as a vector of N circularly symmetric gaussian random variables with a complex covariance matrix V given in (2.2.30b). Define x as a vector of length 2N that consists of the real part Re z and the imaginary part Im z in the order Re z N Im z 1 Im z N . x Re z1 Show that the real 2N 2N covariance matrix C given by (cf. (2.2.22))

[] = { [ ], . . . , [ ], [ ], . . . , [ ]} × Æ( )( )Ç C = x − ´x ³ x − ´x ³ ,

[]

T

where x is a random column vector formed by pairwise terms, can be expressed in block form in terms of the N × N complex covariance matrix V as

C=

1 2

¸

Re V Im V

−Im V ¹ . Re V

24 Pareto probability density function The Pareto probability density function (cf. (2.2.6)) is

f x (x ) =

» λx −(λ+1)

x x

0

≥1 < 1.

(a) Show that the mean is λ/(λ − 1) for λ ≥ 1 and otherwise is infinite. (b) Show that the variance is λ/[(λ− 1)2 (λ− 2)] for λ ≥ 2 and otherwise is infinite. 25 Diagonalizing a covariance matrix A real covariance matrix C of a bivariate gaussian random variable is given by

C=

¸

1 1

1 4

¹

.

(a) Determine a new coordinate system ( x ² , y ² ) such that the bivariate probability density function in that coordinate system is a product distribution. Express the bivariate probability density function in that new coordinate system as the product of two one-dimensional gaussian probability density functions.

107

2.5 Problems

(b) Plot this probability density function using a contour plot showing the original coordinates (x , y ) and the transformed coordinates (x ² , y ² ). (c) Determine the angle θ of rotation defined as the angle between the x axis and the x ² axis. 26 Sums of gaussian random variables A random variable G is formed from the sum of two independent gaussian random variables of equal variance 2 and expected values E 1 and E 1 1 as shown in the figure below.

σ

E1

( + δ)

E1 (1 + δ)

E1 – x σ

(a) Determine the probability that the value of G is less than E 1 − x σ in terms of x, σ , and δ , where x is a scaling parameter. (b) Show that if x is large, then, for a finite value of δ , the probability determined in part (a) is dominated by the random variable centered at E 1 and that the contribution to the probability from the random variable centered at E 1 + δ is negligible.

27 Pseudocovariance matrix In contrast to the covariance matrix of z, defined as V zz† , where z† denotes the complex conjugate transpose of z, the pseudocovariance matrix is defined as zzT , where zT denotes the transpose of z. Prove that z is a circularly symmetric random vector if and only if the pseudocovariance matrix equals zero.

=. ´ ³

28 Material impedance (a) By substituting the complex plane-wave fields at a frequency

E = E 0 e−iβ· r · e,

´ ³

ω

H = H0e −iβ· r · h into the two curl equations

∇ × E = − ∂∂Bt , ∇ × H = ∂∂Dt , ±, and using the constitutive relationships given in (2.3.3) along with · e ×· h=β √ show that H = E /η, where η = µ /ε is the material impedance. 0

(b) Starting with

0

0

.

n 2(r) = ε(r)/ε0

= εr (r) = 1 + X (r), show that the material impedance η(ω) as a function of ω can be written as 0 , η(ω) = nη(ω)

108

2 Background



where η0 = µ0 /ε0 is the impedance of free space. (The impedance of free space has a value of 377 ±.) 29 Wave equation This problem discusses the derivation of the wave equation. (a) Using (2.3.7), solve for the divergence term r in (2.3.11) in terms of r , where the time dependence is suppressed. The result consists of two terms, one of which contains the term r loge n2 r . (b) State the conditions under which the divergence term can be neglected so that (2.3.11) reduces to (2.3.13), which is the wave equation.

∇·E ( )

E( ) · ∇

D( )

()

30 Polarization ellipse (a) Show that the expressions

E x (ψ) = | E 1| cos(ψ

(b)

+ δ1), E y (ψ) = | E 2| cos(ψ + δ 2) define an ellipse in the ( E x , E y ) plane as ψ varies from 0 to 2π . Set | E 1| = | E 2| = 1 and plot the ellipse for each of the following values of δ = δ2 − δ1: (0, π/4, π/2, 3π/4, π). Comment on the results.

31 Transformation between polarization-state representations (a) Given two Stokes vectors that satisfy s2 s1, show that s1 is orthogonal to s2 . (b) Show that the Jones vectors

=−

J1

=.

¸ cos ξ cos χ − i sin ξ sin χ ¹ ¸ ¹ . −sin ξ cos χ + i cos ξ sin χ and J2 = sin ξ cos χ + i cos ξ sin χ cos ξ cos χ + i sin ξ sin χ

are orthogonal. 32 Electrical and optical signal-to-noise ratio Consider a binary on–off-keyed lightwave system. The transmitted lightwave signal power is P for a mark and 0 for a space. Marks occur with a probability p1 . Spaces occur with a probability p0 1 p1. The lightwave noise power variance is given by P2. (a) Derive the relationship between the mean lightwave signal power P and the mean electrical signal power Pe as a function of p1 and P2 . (b) Under what conditions is the mean electrical power equal to the square of the mean lightwave power? Does a lightwave communication system satisfy these conditions?

= −

σ

Ê Ë

Ê Ë

σ

33 Square-law photodetector A finite-energy lightwave field U r t is directly photodetected to produce an electrical waveform r t given by

(,)

()

r (t ) =

R P (z , t ) = R

=R

²

A

²

A

I (r, t )dA

|U(r, t )|2 dA,

2.5 Problems

109

where (1.2.4) has been used. (a) Show that when U(r, t ) is bandlimited to the frequency interval − W ≤ f ≤ W , the electrical waveform r (t ) is bandlimited to the interval −2W ≤ f ≤ 2W. (b) Given the coherence timewidth τc of the lightwave signal, estimate the coherence timewidth of the directly photodetected electrical signal. (c) Estimate the associated width of the lightwave signal power density spectrum and the width of the electrical power density spectrum.

3

The Guided Lightwave Channel

The information capacity of a modern lightwave communication system results from the remarkable transmission characteristics of an optical fiber. The transmission medium known as an optical fiber is a cylindrical waveguide constructed from a dielectric material such as glass or plastic. The cross-sectional structure of a fiber is shown on Figure 3.1(a). Early work in 1966 by Kao and Hockham and, in that same year, by Werts predicted that optical fibers for communication would be an attractive alternative to electrical wires if it were possible to develop a material with sufficiently low attenuation. Several such materials were soon developed. Even more noteworthy is the manufacturing technology developed to draw these materials into fibers of amazingly uniform diameter with a cross-sectional variability measured in fractions of a micron over a distance measured in tens of kilometers. This chapter studies the linear transmission characteristics of an optical fiber using continuous-signal models based on rays and waves, and a discrete-signal model based on photons. An elementary analysis of lightwave signal propagation is first given using ray optics. A more comprehensive analysis is provided using wave optics and is then compared with the photon-optics analysis. Wave optics introduces the structure of modes. This analysis is built slowly, starting with a simple noncylindrical waveguide geometry in the form of a rectangular slab, followed by the analysis of several cylindrical waveguide geometries. The chapter concludes by considering the coupling of energy between modes.

3.1

Characteristics of an Optical Fiber The structure of optical fibers used for lightwave communication varies with the application, but is based on common theory. This section studies the structure and its theory, as well as the dominant mechanisms that produce signal attenuation.

3.1.1

Fiber Structure

There are two distinct kinds of optical fiber structures used for lightwave communication systems: step-index fiber and graded-index fiber. More complicated fiber structures are variations of one of these two basic kinds. A step-index fiber consists of a cylindrical core with a constant index of refraction, n 1, surrounded by a cylindrical cladding

3.1 Characteristics of an Optical Fiber

(a)

111

Cladding Core Cladding

(b)

n0

(c) n2

n1

n2

na

n(r)

r 2a

na r

2a

(d)

Figure 3.1 (a) Geometry of a step-index fiber. (b) Cross-sectional profile of the index of

refraction for a step-index fiber. (c) Cross-sectional profile of the index of refraction for a graded-index fiber. (d) Geometry of a multicore fiber.

with a smaller constant index of refraction,1 n2 . The cross-sectional profile of the index of refraction for a step-index fiber is shown in Figure 3.1(b). A graded-index fiber is also a cylindrical structure with a cylindrical core, but the index of refraction in the core n1(r ) varies as a function of the radial distance r from the center of the fiber. The cross-sectional profile of the index of refraction for a graded-index fiber is shown in Figure 3.1(c). The material used for both the core and the cladding is typically silica glass (SiO2 ), though it is sometimes plastic or another kind of glass. The index of refraction (or simply the index) in the core is controlled by adding a small amount of an element such as germanium that increases the index. The index change between the core and the cladding is small, typically in the range from 0.1% to 0.2% for a conventional single-mode silica glass fiber. Two types of step-index fiber are currently in widespread use. A small-core step-index glass fiber with a core diameter of a few microns2 is used in long-distance or high-datarate communication systems because such a fiber supports only one propagation mode for each polarization. The small core size precludes the use of ray theory for analysis, so wave theory must be used. A step-index fiber with a larger core is used in short-distance systems and may employ glass or plastic for both the core and the cladding. This type of fiber supports multiple modes. A ray-optics analysis is often appropriate for this kind of large-core fiber. A graded-index fiber is a fiber that supports multiple modes. It has a core index that varies in the transverse direction from a maximum of n0 at the center of the fiber to a minimum of n a at the core/cladding interface, as shown in Figure 3.1(c). The diameter of the core for this kind of fiber is typically large compared with the wavelength of light. Therefore, the ray theory of signal propagation is often used for an initial analysis and to provide insight. Wave theory is then used to determine the spatial structure and behavior of the guided modes. 1 The cladding is surrounded by a casing, which plays virtually no role in this book. 2 Single-mode fibers have a cross-sectional area that varies from about 20 m 2 to 120

µ

µm2 .

112

3 The Guided Lightwave Channel

A more recent fiber structure is a multicore fiber, which uses multiple cores within the same cladding structure, as illustrated in Figure 3.1(d). The analysis of this kind of fiber structure includes the analysis of the potential coupling between lightwave signals in multiple cores. To a first approximation, the cores are far enough apart in the cladding to be considered individually. The unintentional coupling of power between modes in a single core, discussed in Section 3.4, also applies to the coupling between multiple cores.

3.1.2

Optical Fiber Attenuation

Attenuation is a fundamental limit on the performance of any communication system. Attenuation is the loss of signal energy during propagation through the fiber. At a fundamental level, attenuation converts or redistributes the signal energy into another form of energy – such as thermal energy – that is not recovered as an electrical signal at the lightwave receiver. This attenuation is caused by a combination of material absorption and scattering. Each of these mechanisms is considered in turn.

Material Absorption

The silica glass used in most optical fibers designed for communication is a noncrystalline material composed of silicon and oxygen atoms with trace amounts of other materials used to control the index, and also some unavoidable impurities. The constituents in a glass have a discrete set of electronic, vibrational, and rotational energy states that can be determined using quantum theory. Accordingly, its absorption can be separated into three categories: electronic, vibrational, and rotational. These three categories can be best understood using photon optics. Within photon optics, an absorption event can occur if a constituent material absorption site is in an allowed energy state and an incident photon has an energy equal to the difference between the energy of the occupied state and an unoccupied higher-energy state. For silica glass, the energy difference between electronic energy states corresponds to wavelengths that are in the ultraviolet region of the spectrum and are shown schematically as a dashed line in Figure 3.2. In contrast, the energy difference between vibrational/rotational energy states corresponds to wavelengths in the infrared region of the spectrum with wavelengths longer than approximately 1.6 microns. This region is also shown in Figure 3.2. Between the electronic absorption region and the vibrational/rotational absorption region there exists a transparency region. Here, depending on the composition of the glass, there is little material absorption by the primary materials in the fiber. The most significant absorption in the transparency region is caused by impurities, as is shown in Figure 3.2. The frequency dependence of the absorption due to the impurities can be modeled by defining an absorption cross section σa or σa (ω) when it is appropriate to display the frequency dependence. The absorption cross section is discussed in Section 4.3.2. The absorption cross section measures the ability of a single impurity site 3 within the glass to absorb light and has units of area. The area may be thought of as the effective 3 An impurity site is typically an atom or ion.

3.1 Characteristics of an Optical Fiber

113

Ultraviolet absorption Infrared absorption

~0.15 µm )mk/Bd( ssoL

Rayleigh scattering ~1.5 µm ~2 µm

Impurity absorption Wavelength (µm)

Figure 3.2 A notional representation of the attenuation of a typical glass. The transparent region is

defined by ultraviolet and infrared absorption bands. Within this region, the loss is dominated by Rayleigh scattering and impurity absorption.

“size” of the impurity as seen by the incident light. The number of absorbing impurity sites per unit volume, denoted by N , is called the volume density of the absorbing sites. Because impurity sites are sparse, an impurity site that can absorb a photon is not usually affected by other impurity sites and acts independently. The fractional loss in the power ± P/ P as a lightwave signal propagates a distance ±z is ± P / P = −αm ± z, where the absorption per unit length αm , called the absorption coefficient, is defined as Letting ± P This is

αm =. σa N .

(3.1.1)

→ dP results in the differential equation that governs the lightwave power. dP dz

= −αm P, which has a solution P (L ) at a distance z = L given by P ( L ) = P (0)e−α L , (3.1.2) where P (0) is the power at z = 0. This expression, called the Beer–Lambert law, is m

valid whenever the absorption at a site is independent of the absorption from other sites. The absorbed energy may be subsequently re-radiated as thermal energy at a longer wavelength. For this kind of absorption process, the re-radiated field is uncorrelated with the incident field, is not at the same wavelength, is not photodetected, and is not of further interest.

Scattering

The second significant attenuation mechanism in glass is scattering. Scattering is a short-time-scale interaction between the lightwave field and the material in which the scattered field is highly correlated with the incident field. This correlation is in contrast

114

3 The Guided Lightwave Channel

to an absorption process for which the re-radiated energy is uncorrelated with the incident field. This section discusses some qualitative aspects of scattering that pertain to signal propagation in an optical fiber. In a perfect crystalline material, which glass is not, the lattice sites would form a three-dimensional periodic structure. The mean distance between sites in a typical crystalline material is on the order of 1 Å (10 −10 m). This distance is much smaller than the wavelength of the lightwave signal, which is on the order of one micron (10 −6 m). For this large scale difference, the scattering from the periodically spaced sites within a crystalline material adds constructively in the direction of propagation and essentially cancels out in all other directions. In a noncrystalline material such as glass, there is a random variation in the locations of the constituents that comprise the material. This randomness affects the index in the material, which, in turn, causes scattered field components in directions other than the direction of the incident field. The directional dependence of the scattering is modeled using a directionally dependent scattering cross section σs . An elastic process is a process for which the wavelength of the scattered field is equal to the wavelength of the incident field. Elastic scattering is a linear process. An inelastic process is a process for which the wavelength of the scattered field is not equal to the wavelength of the incident field. Inelastic scattering is a nonlinear process, and is discussed in Chapter 5. If the elastic scattering is caused by small-scale random variations in the material density that are much smaller than the wavelength λ, then the scattering is called Rayleigh scattering. These random variations are an intrinsic aspect of the disordered nature of a glass. They arise from the fundamental thermally generated density variations that occur in the liquid glass as it cools into a solid. Once in solid form, these density variations are frozen into the glass. The small-scale variations that are the source of Rayleigh scattering can be modeled as a material polarization source term P in the wave equation given in (2.3.10). The source term generating the E field for this equation is proportional to ∂ 2 P (t )/∂ t 2 . This means that the frequency dependence of the scattered field amplitude E at ω is proportional to ω 2, with the scattered field intensity proportional to ω4 or 1/λ4 . Accordingly, the Rayleigh scattering from shorter wavelengths is stronger than the Rayleigh scattering from longer wavelengths as a direct consequence of (2.3.10).4 The attenuation per unit length from scattering is given by αs = σs N , where σs is the directionally dependent scattering cross section. The combined effect of absorption and scattering is quantified using an attenuation coefficient κ = αm + αs . Accordingly, the expression Pout

= Pin e−κ L = Pin T

(3.1.3)

relates the output lightwave power Pout at a distance L to the input power Pin at the beginning of the fiber, where T is the transmittance of a fiber segment of length L . 4 The strong wavelength-dependence of Rayleigh scattering is the well-known reason why the sky is blue.

3.2 Guided Signal Propagation

3.2

115

Guided Signal Propagation Guided signal propagation in an optical fiber is analyzed by using ray optics, wave optics, and photon optics. The choice of the signal model depends on the relevant features of the signal that are of interest. Because each model is an approximation, we are free to combine features from several signal models as needed, always respecting the accuracy and the appropriateness of the approximations. This section discusses the application of each of these signal models to guided signal propagation in a fiber. A ray-optics signal model is adequate whenever the relevant dimensions of the guiding structure are large in comparison with the wavelength. The ray-optics model is useful when appropriate because it is simple, because it is in accord with everyday intuition, and because it provides a basic understanding of propagation, dispersion, and some forms of attenuation. The wave-optics model is more accurate and more informative, but more difficult.

3.2.1

Guided Rays

This section analyzes ray propagation in a step-index fiber and in a graded-index fiber.

Ray Propagation in a Step-Index Fiber

The propagation of a guided ray in a step-index fiber is based on the reflection and refraction of a ray at an interface between two homogeneous regions that have different indices of refraction. Rays in homogeneous regions are straight lines. The laws governing reflection and refraction between homogeneous regions can be derived from Maxwell’s equations using the boundary conditions at the interface (cf. Section 2.3). The law of reflection states that at a boundary between two media, the angle of incidence θi is equal to the angle of reflection θr , where both angles are measured from the normal to the surface as shown in Figure 3.3(a). The law of refraction, known as Snell’s law, states the angular direction of a ray crossing a planar boundary between two dielectric media satisfies n1 sin θi

= n2 sin θt ,

(3.2.1)

where n 1 and n2 are the indices on the two sides of the boundary. A portion of the light incident on the planar surface in the first medium at angle θi , measured from the normal

n1 n2

θi

θr

θr

θi

n1 n2

θt

(a)

θr

θi

n1 n2 θt

(b)

= π/ 2 (c)

Figure 3.3 (a) Reflected and refracted rays for an angle of incidence smaller than the critical angle

θc , where n 1 is larger than n 2. (b) Reflected and refracted rays for an angle of incident equal to θc . (d) Total internal reflection when θi > θc .

116

3 The Guided Lightwave Channel

to the planar surface, reflects. A second portion of the light transmits through the planar surface into the second medium at an angle θt , also measured from the normal. If n 1 is greater than n2, then, because θt is constrained to be real, a solution for the refracted angle θt does not exist when (n1 / n2) sin θi > 1. The light cannot cross the boundary into the second medium; it is entirely reflected back into the first medium, and no light is lost by refraction into the second medium. The angle θc , measured from the normal, at which this occurs is n2 sin θc = , (3.2.2) n1

and is called the critical angle. The maximum grazing angle θg , measured from the interface ± between the two media, is the complement of the critical angle so that cos θc =

sin θg = n 21 − n22 / n1. For typical values of n2/ n 1 used in an optical fiber designed for communications, the critical angle is close to 90◦ so that θg is at most a few degrees. Any ray traveling through the fiber at a grazing angle smaller than θg is totally internally reflected. The total path length for a ray traveling by multiple reflections at a small grazing angle is only slightly larger than the path length for a ray traveling along the axis of the fiber.

Numerical Aperture

The amount of light that can be coupled into a fiber depends both on the size of the fiber and on the maximum angle at which a ray can enter the fiber and still be guided by total internal reflection. This angle is expressed using the concept of the numerical aperture. Referring to Figure 3.4, apply Snell’s law at the air/fiber interface at the incident face of the fiber to yield sin θmax

= n1 sin θ1 = n1 cos θc = n1

±

²

n21 − n 22 n 1

=

±

n 21 − n22 .

(3.2.3)

Accordingly, the numerical aperture, denoted NA, is defined as

± . NA = sin θmax = n 21 − n22 ,

(3.2.4)

where the reference medium defining the NA is air. The coupling of rays into an ideal cylindrical fiber is rotationally symmetric. Therefore, the angle θmax describes a cone of admissible ray angles at the input face to the fiber. Only rays within this cone are launched into the fiber. Some initial launch conditions within the admissible cone angles produce a ray trajectory that passes through the central axis of the fiber. This ray is called a meridional ray. A ray that does not pass through the central axis is called a skew ray. n2 θ1 θc

θmax

n1

2a

θ1

n2

θ

Figure 3.4 Geometry of a ray being guided in a fiber at the critical angle c . The maximum external angle max defines the numerical aperture.

θ

3.2 Guided Signal Propagation

117

Any ray within the cone specified by the numerical aperture will be guided by total internal reflection at the core/cladding interface as is shown in Figure 3.4. For such rays, the interface behaves as a perfect mirror. Any ray outside this cone will partially refract and partially reflect at the core/cladding interface. Even if only a minute amount of light is lost by refraction for each reflection, the light will be quickly lost into the cladding because there are so many partial reflections. These rays are called unguided rays. The inherent reversible nature of rays in a material such as glass implies that the largest acceptance angle defined by the numerical aperture also determines the maximum angle at which a ray can exit the fiber. Values of the numerical aperture range from 0.1 for small-core, step-index fibers to greater than 0.3 for multimode gradedindex fibers. This corresponds to an admissible cone of angles ranging from arcsin (0. 1), which is equal to 5.7◦ , to arcsin(0. 3), which is equal to 17.5◦. The numerical aperture is also expressed in terms of a normalized index difference ±, defined as ± = n1 n− n 2 ≈ n1 n− n2 . (3.2.5) 1 2 The normalized index difference is typically between 0.1% and 0.25% for most communication fibers. Because n1 is nearly the same as n2 , either index can be used in the denominator. The numerical aperture in terms of ± is

±

³ (n1 + n 2)(n 1 − n2 ) ³ ≈ 2n1 (n1 − n2 ) √ √ ≈ n 1 2± ≈ n 2 2±. (3.2.6) As an example, let ± = 0.3% and n 1 = 1.53, then NA ≈ 0.12. For this small numerical aperture, if ± is increased by a factor of four, then the numerical aperture is doubled, and NA =

n21 − n 22 =

the angle of the corresponding admissible cone is nearly doubled from approximately 7◦ to approximately 14 ◦.

Coupling Efficiency

Later in this chapter it is shown that a waveguide with a core that is large compared with the wavelength of light can support many characteristic spatial patterns called modes. For this kind of multimode waveguide, the proportion of the lightwave power that is coupled from a light source into the fiber is called the coupling efficiency. The coupling efficiency from one multimode lightwave component to another multimode lightwave component depends on the product of the area A of the lightwave component and on the corresponding solid angle ² measured in steradians, as defined by the numerical aperture of the lightwave component. The product A² of the solid angle and the area that contains the lightwave field is called the étendue and is proportional to the number of spatial modes that can be supported by a structure that is large compared with the wavelength. To efficiently couple light from a lightwave source into a guiding fiber, the étendue of the source must be less than or equal to the étendue of the guiding fiber.

118

3 The Guided Lightwave Channel

Lens Source

Fiber

φ

f (θ) Beam flooding the acceptance angle

Figure 3.5 Geometry of the coupling of a lightwave source with a rotationally symmetric angular

distribution f (θ) into a fiber with a numerical aperture NA

= sin θmax.

If a lightwave source has a conical power distribution f (θ) that is independent of the azimuthal angle φ (see Figure 3.5), then the total power coupled into the fiber is given by an integral over the solid angle

´

´ 2π ´ θ P= P (²) d² = dφ ² 0 0 ´ arcsin(NA) = 2π f (θ)sin θ dθ,

max

f (θ)sin θ dθ

0

where (3.2.4) has been used to express θmax in terms of the numerical aperture. For a lightwave source that has a conical power distribution and a small numerical aperture, sin θ ≈ θ and ² ≈ π(NA s )2. The As ²s product of the source is as2π 2(NA s )2 , where NAs is the numerical aperture of the lightwave source, and the area A of the lightwave source is the area of a circle of radius as . Similarly, for an optical fiber A f ² f = a2f π 2(NA f )2 , where a f is the radius of the fiber with a numerical aperture NA f . For a lightwave source coupled into a fiber, when A f ² f is greater than As ²s , nearly all the light can be coupled into the fiber and the coupling efficiency η is close to one. When As ²s is greater than A f ² f , η is less than one and can be estimated using the ratio of the A² products as given by

µ NA ¶2 f η ∝ AAf ²² f ≈ aaf NA s

s

s

s

(As ²s > A f ² f ),

(3.2.7)

where a f and NA f are, respectively, the radius and the numerical aperture of the fiber. As an example, consider coupling, end-to-end, the output of one fiber to the input of a second fiber where the first fiber has a core diameter of 50 microns and a numerical aperture of 0.2, and the second fiber has a core diameter of 25 microns and a numerical aperture of 0.1. Using (3.2.7), the ratio η of the A² products for the two fibers is approximately [(25 /2 × 0.1)/( 50/2 × 0. 2)]2 = 1/16. This ratio is much smaller than one because the second fiber supports fewer guided modes. The coupling efficiency is nearly unity when coupling from the smaller, lower-numerical-aperture fiber to the larger, higher-numerical-aperture fiber because the second fiber has a significantly larger A² product.

3.2 Guided Signal Propagation

119

n5 n4 n3 Δx n2 n1 Figure 3.6 Snell’s law for laminar layers of a width

± x.

Ray Propagation in a Graded-Index Fiber

The propagation of a guided ray in a graded-index fiber can be described using the laminar model sketched in Figure 3.6. The appropriate formal equations for the propagation of rays in the graded-index fiber can be derived formally using the paraxial approximation of the eikonal equation given in (2.3.47). An alternative, more direct approach replaces the cylindrical core by the laminar structure as shown in Figure 3.6. Using this expression, the discretized form of Snell’s law in the x direction can be written as n(x )sin θ( x ) = (n (x ) + ± n)sin(θ( x ) + ±θ).

(3.2.8)

Expanding the terms on the right of (3.2.8) and keeping only terms of order ± gives the differential form of Snell’s law in the limit as dn = −n cot θ. (3.2.9) dθ A conventional graded-index profile, known as a parabolic-index profile, is defined by

± ·

(

n ( x , y) = n 0 1 − K 2 x 2 + y 2

≈ n 0 1 − 21 K 2

·

)

x 2 + y2

¸¸

.

(3.2.10)

The approximation holds when 21 K 2a 2 ± 1, where a is the radius of the fiber core and x 2 + y2 ≤ a2 . This condition is satisfied for typical graded-index fibers. To determine an equation that governs the trajectory of the ray in the x direction in this medium, use the paraxial approximation to write cot θ ≈ dx /dz. Taking the derivative of this expression with respect to z gives d2 x dz 2

≈ dzd cot θ ≈−

1

dθ dz

= − ddzθ ,

θ θ and sin θ ≈ 1 have been used. Using the chain rule for sin

2

where d cot θ/ dθ = −1/ sin differentiation along with (3.2.9) and (3.2.10) gives 2

d2 x dz 2

2

dθ dn ≈ − dn × dx × dx dz ¹º»¼ ¹º»¼ ¹º»¼ tan θ/ n( x )

≈ −K x , 2

−n0 K 2x

cot θ

(3.2.11)

120

3 The Guided Lightwave Channel

x0 ,y0

x

θmax

a

y

x z

Λ (a)

(b)

Figure 3.7 (a) End view of the input face of a fiber showing the initial ray displacements. (b) A

side view with the transverse dimension greatly exaggerated, showing guided meridional rays for x 0 = y0 = θ y0 = 0.

where dn/ dx = −n0 K 2 x and n 0/ n (x ) ≈ 1 have been used. This equation can also be derived from (2.3.50). A similar expression for the y direction is d2 y dz 2

= − K 2 y.

The two solutions to these differential equations are

sin θx0 sin( K z ), K (3.2.12) sin θ y0 y(z ) = y0 cos( K z ) + sin( K z ). K Within this paraxial approximation, and for the parabolic index profile, the trajectory of the x component of the ray and the trajectory of the y component of the ray are sinusoidal with a spatial frequency K and a spatial period ³ = 2π/ K . The specific trajectory is defined by the initial displacements x 0 and y0, shown in Figure 3.7(a), and the initial angles θx0 and θy0 , defined inside the fiber. A meridional ray executes a cosinusoidal trajectory as a function of z in a plane defined by the initial ray trajectory and the z axis. A plot of several meridional ray trajectories in a graded-index fiber is shown in Figure 3.7(b). Without loss of generality, a meridional ray defined in the (x, z) plane with x0 = y0 = θy0 = 0 has a ray trajectory given by x (z ) = (sin θx0 / K )sin( K z ). The maximum value of x (z) occurs at a quarter of the spatial period, where z = π/2K . At this value of z, sin(K z ) = 1 and x = sin θx0 / K . The maximum value of x is equal to the radius a of the fiber core. Setting x = a gives x (z ) = x 0 cos( K z ) +

sin θmax . (3.2.13) K This expression defines the maximum launch angle θmax of a guided ray inside the fiber as a function of the core radius a and the spatial frequency K , and is shown in Fig. ure 3.7(b). Using (3.2.10) with x 2 + y 2 = a2 , the index n a = n (a ) at the core/cladding interface is a=

na

= n0

±

1 − sin2 θmax

= n 0 cos θmax ,

(3.2.14)

where n0 is the index at the center of the core. Using this expression, the numerical aperture for a graded-index fiber is defined as

3.2 Guided Signal Propagation

.

.

NA = n0 sin θmax

=

±

n20 − n 2a

√ = n0 2±,

121

(3.2.15)

where ± = (n 0 − n a )/n 0 for a graded-index fiber. The cladding index na of a gradedindex fiber plays the same role that the cladding index √ n 2 plays for√a step-index fiber. Using these expressions and (3.2.13), sin θmax = 2± , and K = 2±/a. Therefore, the constraint 12 K 2a2 ± 1 required for (3.2.10) to be a valid approximation is equivalent to ± ± 1. This is satisfied for typical graded-index fibers having a value of ± on the order of 1%. Using ± = 0.01 and a radius a equal to 25 microns, the spatial period 2π/ K of a ray trajectory is a few millimeters. Skew rays, which have trajectories that do not pass through the central axis of the fiber, can be generated from a different set of initial conditions. For example, with initial conditions x0 = sin θ y0 = 0 and y0 = sin θx0 / K , the trajectory is given by x (z) = y0 sin(K z ) and y(z ) = y0 cos (K z ). These are the parametric equations of a helix of radius y0. A helical skew ray with this set of initial conditions never passes through the central axis of the fiber.

3.2.2

Guided Waves

Wave optics must be used whenever the wavelength of the light is comparable to or larger than the dimensions of the guiding structure. Wave propagation for the complex field E(r, ω) is described by the vector Helmholtz equation as given in (2.3.15):

∇ 2 E(r, ω) + n 2(r, ω)k02E (r, ω) = 0,

(3.2.16)

where k0 = 2π/λ0 is the free-space wavenumber. Applying the boundary conditions to a waveguide geometry that does not vary in the z direction produces the characteristic spatial solutions for each ω called eigenmodes or simply modes. Expressions for these modes will be developed in Section 3.3. A guided mode is a mode for which the power is confined to a local cross-sectional region as defined by the waveguide geometry. An unguided mode is a solution for the same structure whose power is not confined to the waveguiding structure and is lost into the cladding. To determine the lightwave field in a waveguide at a distance L , the incident monochromatic field at the input to the waveguide is first decomposed into a linear combination of modes. The spatial characteristics of the guided modes are determined in this chapter. The field in each mode, indexed by j , is propagated using a mode-dependent propagation constant β j , which depends on the geometry of the waveguide. The temporal and propagation characteristics of the individual modes, which depend on β j as a function of ω, are determined in Chapter 4. The spatial period of the wave in the propagation direction is given by 2π/β j . The field in each guided mode at distance L is summed to produce the total field at that distance.

122

3 The Guided Lightwave Channel

3.2.3

Guided Photon Streams

Modeling lightwave signal propagation using a discrete photon-optics signal model based on photon5 streams is complementary to modeling the signal using a continuous signal model based on rays or waves. Each signal model is useful under appropriate circumstances and each signal model can be derived as an alternative approximation to the fundamental quantum-optics signal model, which is presented in Chapter 15. This section provides an introduction to the propagation of an information-bearing signal using photon optics. A single lightwave pulse within a photon-optics signal model can be regarded as a packet or a puff of photons with the time-varying photon arrival rate R(t ) proportional to the wave-optics power P (t ) (cf. (1.2.5)). A schematic notional representation of this photon puff is shown in Figure 3.8(a), with the random number of photon counts over a small time interval T superimposed on a graph of a wave-optics signal power P (t ). For each discrete time instant t , the number of photon counts in a time interval T centered at that time instant t is a Poisson random variable characterized by R(t ). One realization of this random variable for each time interval T is shown as a point in Figure 3.8(a). This inherent randomness is photon noise. Given this random puff, the spatial and temporal properties of the lightwave power within the wave-optics signal model are interpreted as averaged quantities that remove the effect of the photon noise but do not remove any other form of noise. A pulse that is deterministic within the wave-optics signal model is replaced by a random point process in the photon-optics signal model. The arrival rate R(t ) of this process is proportional to the lightwave power P (t ) (cf. (1.2.5)). This arrival rate is shown as the solid curve in Figure 3.8(a). The integral of the photon arrival rate over an interval T gives the mean number of photons E(t ). The actual number of photons m(t ) fluctuates about the mean value as shown in Figure 3.8(a). The probability mass function of the number of photons in the interval T is derived in Section 6.2.3. 60

60 (a) )s/snotohp( rewoP

)s/snotohp( rewoP

40 30 20 10 0

(b)

50

50

0

20

40

60

80

Time

100

40 30 20 10 0

0

20

40

60

80

100

Time

Figure 3.8 (a) The lightwave power for a deterministic pulse using the wave-optics signal model along with realizations of the corresponding number of photons, shown as points, using a photon-optics signal model. (b) A wave-optics pulse in noise. 5 A photon is directly photodetected only by extinguishing it. Nevertheless, it is convenient to refer to

photons in transit as if observed.

3.3 Waveguide Geometries

123

The correspondence given in (1.2.5) between the photon arrival rate R(t ) in the photon-optics signal model and the lightwave power P (t ) in the wave-optics signal model forms the basis for describing the propagation of a photon stream in an optical fiber. If the lightwave pulse within the wave-optics signal model is noisy, as is shown in Figure 3.8(b), then this randomness implies that the photon arrival rate R(t ) in the photon-optics signal model is also random. The puff of photons has additional randomness, as is suggested by Figure 3.8(b). The probability mass function Pm (m) of the number of photons m in a time interval T for a random wave-optics pulse that includes the effect of photon noise is developed in Chapter 6 and in Section 7.7. As the signal pulse propagates, the lightwave power is reduced because of attenuation and is redistributed because of dispersion. These propagation effects reduce and redistribute the photons within the stream and so change the photon arrival rate within each time modulation interval of duration T . Accordingly, the probability mass function pm (m) of the number of photons in each time interval T evolves as a function of the propagation distance L .

3.3

Waveguide Geometries The guided modes allowed by a fiber will be studied by considering several waveguide geometries, which are shown schematically in Figure 3.9. In this chapter, only lightwaves with a single frequency or wavelength, called monochromatic lightwaves, will be considered. The first geometry, shown in Figure 3.9(a), is a simple slab waveguide defined in a rectangular coordinate system. The modes of this structure provide insight into the nature of guided modes in other waveguides. The method of solution for this structure, which is based on a rectangular coordinate system, is then repeated for a cylindrical coordinate system to determine the modes for a step-index fiber. Two kinds of step-index fibers are considered. In the first geometry, shown in Figure 3.9(b), the normalized index difference ± is constrained to be small so that approximations can be made. In the second geometry, shown in Figure 3.9(c), the normalized index difference ± is arbitrary, and the solution is exact. Finally, the modes in a graded-index fiber, shown in Figure 3.9(d), are briefly considered. (a)

(c)

Step-index fiber (arbitrary Δ)

Slab waveguide (b)

(d)

Step-index fiber (small Δ)

Graded-index fiber

Figure 3.9 Waveguide geometries. (a) Slab waveguide. (b) Step-index fiber with a small index

difference ±. (c) Step-index fiber with an arbitrary index difference ±. (d) Graded-index fiber.

124

3 The Guided Lightwave Channel

The modes of a dielectric waveguide are determined in this section using the vector Helmholtz equation given in (2.3.15). Because the cross-sectional region of a waveguide does not change in the z direction, a mode maintains the same transverse spatial structure in a lossless medium in which it propagates. This means that the z dependence of the jth mode has a form e−iβ j z , where β j is a real propagation constant that depends on the geometry. For this kind of propagating solution in a dielectric waveguide, all of the transverse field components can be expressed in terms of the two axial field components E z and Hz . Equivalently, the two axial components can be expressed in terms of the transverse components. The proof of this statement is considered in a problem at the end of the chapter. One may ask whether a dielectric waveguide can support a transverse electromagnetic mode (TEM). This is a mode in which both the electric field vector E and the magnetic field vector H are transverse to the direction of propagation because both axial components, E z and Hz , are zero. While it is possible for a metallic waveguide to support such a transverse electromagnetic mode, this kind of mode requires at least two conducting surfaces, such as in a coaxial cable. A dielectric waveguide has no such metallic conductor, and cannot support such a mode. This means that at least one of E and H must have an axial field component along the direction of propagation. This can be said in another way. In a dielectric waveguide, all of the field components can be expressed as linear functions of the axial field components E z and Hz . Because these components are defined as zero for a TEM mode, all of the transverse components in a dielectric waveguide would also be zero for a TEM mode. Therefore, a propagating mode in a dielectric waveguide must have at least one axial component, which can be either E z or Hz or both. A mode with no E z component is called a transverse electric mode (TE mode). A mode with no Hz component is called a transverse magnetic mode (TM mode). A mode that has both an E z component and an Hz component is called a hybrid mode. Regarding this point, consider a TE mode given by E(x , y, z ) = E y (x , y)e−iβ z ½ y. This transverse field has a single electric field component in the y direction given by E y (x , y ), and undergoes a phase shift of e−iβ z in the z direction. The magnetic field for this TE mode is determined using (2.3.1a) along with B = µ0 H (cf. (2.3.3b)). Because the field is oriented in the y direction, the curl operator is given by ∇× = ½ z ∂/∂ x − ½x ∂/∂ z. For a monochromatic wave with temporal dependence eiωt , the magnetic field H(x , y, z ) is i

−iβ z ωµ0 ∇µ × E y (x , y)e ½y ¶ = ωµi iβ Ey (x , y )½x + ∂ Ey∂(xx , y ) ½z e−iβz ,

H( x , y , z ) =

0

(3.3.1)

which means that for a TE mode, the magnetic field has a component both in the transverse ½ x direction and in the axial ½ z direction, with the transverse component Hx (x ) being a scaled form of E y (x ).

3.3 Waveguide Geometries

125

For any lossless dielectric waveguide, the linear propagation of a lightwave field over a distance L will be determined using the following method. 1. The complex monochromatic field E(r) at a frequency ω at the input to the waveguide is decomposed into a superposition of the guided modes of the waveguide as given in (2.3.40). The j th expansion coefficient of this decomposition is the complex spatial amplitude s j of the j th mode. These amplitudes are determined using (2.3.39), and each can be regarded as the spatial projection of the incident field onto a guided mode. 2. For a lossless dielectric waveguide, the amplitude for the j th guided mode at a distance L is the product of the initial amplitude s j and a phase shift e−iβ j (ω) L that depends on the frequency ω and the distance L . The dependence of β j (ω) on ω causes linear dispersion, as studied in Chapter 4. 6 3. At a given distance L , the complex-valued amplitudes for all modes are summed to produce the complex monochromatic field for lightwave frequency ω. 4. An arbitrary temporal waveform is expressed as a superposition of monochromatic field components using a temporal Fourier decomposition of the incident waveform.

3.3.1

Modes in a Slab Waveguide

Although a practical fiber has a circular cross section, we start with the simpler geometry of a symmetric dielectric slab waveguide shown in Figure 3.9(a), where the width in the y direction is sufficiently large that it can be treated as infinite. Because a mode in a slab waveguide is only a function of one transverse coordinate, the form of the mode is easy to derive and understand. The insight obtained by using this method of solution will be applied to cylindrical waveguides in what follows. The cross-sectional geometry of an ideal slab waveguide is shown in Figure 3.10. This kind of waveguide consists of a core lying between two infinite parallel planes. The core has an index n1 and thickness 2a sandwiched between two semi-infinite claddings with an index n2 . Because the two indices are different, part of the power of a guided mode resides in the core and part of the power resides in the cladding. The combination of the geometry of the waveguide and the frequency-dependence of the index of refraction causes n2 2a

a

n1 n2

x -a

z

Figure 3.10 The geometry of a dielectric slab waveguide. The field propagates in the z direction

and there is no variation in the field in the y direction.

λ for a dispersive material. Because only a monochromatic field is considered in this chapter, the discussion of this dependence is deferred to the next chapter. Here the index is treated as a constant.

6 The index n is a function of the wavelength

126

3 The Guided Lightwave Channel

different modes and different wavelengths within each mode to have a different proportion of the power in the core and the cladding. This leads to linear dispersion. This effect is a frequency-dependent phase shift of the lightwave signal that redistributes the signal energy without loss. Linear dispersion is discussed in Chapter 4. In this chapter, dispersion does not arise because only monochromatic lightwaves are considered. The monochromatic electric field vector in cartesian coordinates can be written in terms of its projections onto unit vectors as E = E x (x , y , z)½ x + E y (x , y , z)½ y + E z (x , y, z )½ z. Substituting this form of the field into the vector Helmholtz equation given in (2.3.15) results in three uncoupled scalar Helmholtz equations – one equation for each component of the complex vector field E, with each component, in general, a function of x, y, and z. For a TE mode in the slab geometry shown in Figure 3.10, the axial electric field component E z is zero so the electric field vector is transverse to the direction of propagation. For a TM mode in the slab geometry shown in Figure 3.10, the axial magnetic field component Hz is zero so the magnetic field vector is transverse to the direction of propagation. For the TE mode, the electric field in the y direction given by E y (x , z ) does not depend on y, and is parallel to the interface between the two media. Because a dielectric waveguide must have at least one axial field component, this mode must have an Hz (x , z ) field component in the direction of propagation. Similarly, for the TM mode, the magnetic field in the y direction given by H y (x , z ) does not depend on y. This mode must have an E z (x , z) field component in the direction of propagation. The scalar Helmholtz equation for the TE-mode field component E y (x , z ) is

∇ 2 Ey (x , z) + n2 k20 Ey (x , z ) = 0.

(3.3.2)

To solve this equation, use the method of separation of variables. This method posits that the transverse electric field E y (x , z) is the product of two functions, namely f (x ), depending only on x, and g(z ), depending only on z. Substituting E y ( x , z ) = f (x )g(z ) into (3.3.2) and dividing through by f (x )g(z ) yields 1 d2 f (x ) f (x ) dx 2

2 + g(1z ) d dzg(2z ) + n2 k20 = 0.

(3.3.3)

Each of the first two terms in (3.3.3) is a function of a single variable. Hence, so that the first two terms sum to the constant term −n2 k02, each term must itself be a constant, independent of x and z. Equating the first term containing f (x ) to a constant −γ 2 and equating the second term containing g(z ) to another constant −β2 , (3.3.3) separates into two ordinary differential equations, d2 f (x ) + γ 2 f (x ) = 0, dx 2 d 2 g ( z) + β2 g(z ) = 0, dz 2

(3.3.4a) (3.3.4b)

where the separation constants γ and β are related by the constraint equation

γ 2 + β 2 = n2i k20 ,

(3.3.5)

3.3 Waveguide Geometries

127

with ni for i = 1, 2 being the index in the core and the cladding, respectively. The separation constant β is the propagation constant of the mode. It is the spatial frequency of the field in the direction of propagation. A solution E y (x , z ) to these equations for positive x is of the form E y ( x , z) = f ( x ) g ( z )

= Ce iγ x e−iβ z,

(3.3.6)

where γ may be real or imaginary, and C is a constant. This form means that the transverse spatial dependence f (x ) does not change as a function of z, which is a requirement for a solution to be a mode. This complete solution must satisfy four constraints required by the boundary conditions. These constraints are as follows. 1. The form of g(z ) must have the functional form e−iβ z with the same value of β both in the core and in the cladding. For a lossless waveguide β is real. This means that the field can experience only a z-dependent phase shift with g(z ) = e−iβ z . 2. ¾The total power per unit width in the y direction, which is proportional to ∞ 2 f (x ) −∞ | f (x )| dx, must be finite. For a guided mode, this constraint implies−qthat ( x − a ) must decay with x in the infinite cladding. Only the functional form e for x > a satisfies (3.3.6) and this constraint. Unguided modes have a different form of solution in the cladding, but are not of interest. 3. The functions f (x ) in the core and the cladding must be equal at the core/cladding boundary. This constraint is a consequence of the boundary condition that the tangential electric field must be continuous at the boundary. 4. The derivatives of the functions f (x ) in the core and the cladding must be equal at the core/cladding boundary. This constraint is a consequence of the boundary condition that the tangential magnetic field component must be continuous at the boundary. Solutions in the core and the cladding must be matched to satisfy these constraints. In particular, the separation constant γ will be real in the core and imaginary in the cladding. Accordingly, let γ = p be the value of the separation constant for the core and let γ = iq be the value of the separation constant for the cladding with q being real. Using these values in the core and the cladding, (3.3.4a) separates into two equations, d2 f (x ) + p 2 f (x ) = 0, dx 2 d2 f (x ) − q 2 f (x ) = 0, dx 2

|x | ≤ a,

(3.3.7a)

|x | ≥ a.

(3.3.7b)

|x | ≤ a,

(3.3.8a)

| x | ≥ a.

(3.3.8b)

The solutions to these equations are f (x ) =

¿

B sin( px ) B cos( px )

f (x ) = Ae−q (| x |−a ) ,

128

3 The Guided Lightwave Channel

The parameter p is the transverse spatial frequency of the cosinusoidally varying mode in the core, with a larger value of p corresponding to a mode that undergoes more spatial oscillations. The parameter q is the cladding decay rate, with a larger value of q corresponding to a faster spatial decay rate of that mode in the cladding. To determine the relationship between p and q, apply the last two constraints. These require that both f (x ) and its derivative d f (x )/dx be continuous at the boundary x = a between the core and the cladding. The derivation of these constraints is posed as a problem at the end of the chapter. Each cosine solution of (3.3.7a) that satisfies the constraints is known as an even TE mode. Each sine solution of (3.3.7a) that satisfies the constraints is known as an odd TE mode. Equating (3.3.8a) and (3.3.8b) for f (x ) and d f (x )/dx at x = a yields A = B cos( pa ),

−q A = − Bp sin( pa), for the even TE modes, which have a cosine dependence in the core. Similarly,

= B sin( pa), −q A = Bp cos( pa) A

for the odd TE modes, which have a sine dependence in the core. The ratio of the two expressions, multiplied by a, yields

= pa tan( pa ) (even TE modes), qa = − pa cot( pa ) (odd TE modes). qa

(3.3.9a) (3.3.9b)

These equations are the characteristic equations for the TE modes in a slab waveguide. A second expression relating p and q can be obtained using (3.3.5). Substituting γ = p and ni = n1 for the core yields while using γ

β 2 = n21k02 − p2 ,

(3.3.10a)

= −iq and ni = n2 for the cladding yields β 2 = n22k02 + q 2.

(3.3.10b)

Equating (3.3.10a) and (3.3.10b) and multiplying both sides by a2 yields

(qa)2 + ( pa)2 = a 2k02

·

n 21 − n22

¸. 2 =V ,

(3.3.11)

which is the equation of a circle of radius V in the ( pa, qa) plane. The parameter V , called the normalized frequency, is defined as

µ a ¶± . 2 2 V = 2π λ0 n1 − n 2,

(3.3.12a)

where λ 0 is the wavelength in vacuum and a is the half-width of the waveguide. The normalized frequency V can be written in several ways as

µa¶ √ V = 2π NA = ak0 n1 2±, λ0

(3.3.12b)

3.3 Waveguide Geometries

129

4

2

ap–

aq

)ap (toc

)ap(nat ap

3

)ap(na t ap

1

1

π

2

2

3

4

pa

Figure 3.11 Graphical solution to determine the values for pa and qa that define a slab waveguide

mode for V = 4. The circular curve is (3.3.11). The other solid curves are the characteristic equations for the TE modes given by (3.3.9a) and (3.3.9b). The dashed curves are plots of the TM modes and used n1 = 1.5 and n 2 = 1 to clearly distinguish these modes from the TE modes.

where the numerical aperture NA is defined in (3.2.4) and the normalized index difference ± is defined in (3.2.5). As an example, Figure 3.11 shows graphs of the characteristic equations leading to the first three modes as well as the constraint equation (qa )2 + ( pa )2 = V 2 . Each intersection of a characteristic equation and a constraint equation defines a permitted value p of the transverse spatial frequency in the core and a corresponding permitted value q of the exponential decay rate in the cladding, and thus defines a mode. The number of modes can be increased or decreased only by changing the normalized frequency V . Because V = 2π NA (a/λ 0 ), the number of modes can be controlled only by changing a/λ 0 or by changing the numerical aperture. This completes the discussion of the TE modes. The TM modes supported by the waveguide can be treated by the same formalism. The characteristic equations for the TM modes are determined by replacing E y with Hy . The tangential field components at the interface are H y and E z . Using the monochromatic form of (2.3.1b), the E z field in each region is proportional to (1/ n2i )(dHy /dx ), where n i is the index of refraction. Matching Hy and its derivative at x = a, as before, yields

µ n ¶2

qa

=

qa

=−

2

n1

pa tan( pa)

µ n ¶2 2

n1

pa cot( pa)

(even TM modes),

(3.3.13a)

(odd TM modes).

(3.3.13b)

Except for the constant (n2 / n1 )2 , the characteristic equations for the TM modes are the same as the characteristic equations for the TE modes given in (3.3.9).

130

3 The Guided Lightwave Channel

To continue the above example, Figure 3.11 also plots the characteristic equation leading to the first three TM modes defined by (3.3.13a) and (3.3.13b). The effect of (n2 / n1)2 given in (3.3.13) produces the offset seen in Figure 3.11 for the characteristic equation for a TM mode, which is plotted using a dashed line, as compared with the characteristic equation for a TE mode, which is plotted using a solid line. Referring to Figure 3.11, for any value of V , there is at least one allowed solution or branch of the characteristic equation that intersects a circle of radius V . This lowestorder even branch has the form pa tan( pa) = 0. However, the intersection of the lowest-order odd branch − pa cot( pa ) = 0 occurs at pa = π/2 and thus, whenever the normalized frequency V is less than π/ 2, only one even TE mode and one even TM mode will propagate in the waveguide. These two polarization modes have different propagation characteristics. For each mode defined by a branch of the characteristic equation, the value of pa at which qa = 0 defines the cutoff condition for that mode. At the cutoff condition for a mode, qa = 0, and the field no longer decays in the cladding. That mode is not guided and its power escapes to the cladding.

Relationship to Ray Optics For qa ² = 0, there is a finite number of solutions that define the guided modes. These modes can be intuitively reconciled with ray theory by using the ray to define the direction of a plane wave propagating in the slab waveguide. In this reconciliation, a mode is formed by the constructive interference of a propagating plane wave with itself as the plane wave “zig-zags” between the core/cladding interfaces. Constructive interference of these two plane waves occurs only for a finite set of angles that define the allowed modes. The cutoff condition using wave optics is equivalent to the ray angle being smaller than the critical angle θc , at which angle the ray begins to escape into the cladding. For qa close to V and so pa close to zero, the mode decays rapidly in the cladding and is well-guided with virtually all of the field transverse to the direction of propagation. The corresponding ray travels at a small grazing angle with respect to the optical axis of the fiber. As the size of the core increases, the normalized frequency V increases, there are more allowed modes, and therefore more allowed angles. For a core size that is large with respect to the wavelength λ, the allowed angles for the rays can be treated as a continuum and the waveguide can be fully analyzed using ray optics.

Mode Patterns and Transmitted Power

The separation-of-variables method yields a complex monochromatic electric field for a TE mode given by (cf. (3.3.3)) E ( x , z) = f ( x ) g ( z ) ½ y

(3.3.14) = E y (x )e−i βz ½y, f (x ) = E y (x ) is given by (3.3.8), and g(z) = e−i β z . The three parameters p , q,

where and β are determined by the intersections of (3.3.9) and (3.3.11) shown in Figure 3.11.

3.3 Waveguide Geometries

First even TE mode

First odd TE mode

131

Second even TE mode

2a

Field

Intensity

Intensity density

Field

Intensity

Intensity density

Field

Intensity

Intensity density

Figure 3.12 The cross-sectional electric field strength, the cross-sectional intensity, and the

intensity density for the first three TE modes for a slab waveguide with V dashed lines indicate the core/cladding interface.

= 4 using (3.3.8). The

Once the values of p, q, and β defining a mode have been determined, the spatial dependence of that mode is given by (3.3.8a) and (3.3.8b) for the core and the cladding, respectively. These modal patterns are shown in Figure 3.12 for the first three TE modes. To determine the transmitted power, (2.3.33) is used. This expression states that only field components that lie in a plane transverse to ½ z will yield a nonzero contribution to the power density propagating in the z direction. Referring to (3.3.1), the transverse magnetic field component Hx (x ) produces a net power transfer in the z direction. The axial magnetic field component Hz (x ) produces a standing wave between the boundaries of the waveguide and an exponentially decaying field in the cladding. The power in this component is reactive power, with the field component called an evanescent field. In general, an evanescent mode is distinguished from a guided mode or an unguided radiation mode because it does not propagate power. Using the transverse electric field component E y (x ) for a TE mode and the corresponding transverse magnetic field component Hx∗(x ) = −(β/ωµ0 ) E ∗y (x ) from (3.3.1), the time-average power density Save (r) per unit length is (cf. (2.3.24))

À Á E y (x )e−iβ z½ y × Hx∗( x )ei β z½ x 2 Â µ β ¶ Ã 1 − iβ z ∗ iβ z = 2 Re E y (x )e ½y × − ωµ Ey (x ) e ½x 0 ÄÄ ÄÄ2 β = 2ωµ Ey (x ) ½z. 0

S ave(r) = 1 Re

(3.3.15)

Using the orthogonality condition for the modes of a lossless waveguide defined in (2.3.33) gives

´∞

−∞

f j (x ) fk∗ (x )dx

= δk j ,

(3.3.16)

where f j (x ) is the normalized transverse electric ¾field in mode j and f k (x ) is the ∞ normalized transverse electric field in mode k with −∞ | f ( x )| 2 dx = 1 being the normalization condition. Expression (3.3.16) states that the TE modes form an orthogonal basis for guided wave propagation for electric fields oriented in the y direction. Given

132

3 The Guided Lightwave Channel

this basis, the transverse dependence of a generic guided monochromatic electric field s (x ) can be expressed as the linear combination of the modes given by s (x ) =

Å j

a j f j (x ),

(3.3.17)

with the complex expansion coefficients a j given by the projection of s (x ) onto the basis function f j (x ) (cf. (2.3.39)), aj

=

´∞

−∞

s (x ) f j∗( x )dx .

(3.3.18)

Applying the same form of analysis, the TM modes form an orthogonal basis for the transverse magnetic field for guided-wave propagation within the same slab waveguide. Combining these observations, the complete set of TE and TM modes forms an orthogonal basis for the transverse dependence of any guided field within the lossless dielectric slab waveguide. The axial field components follow from the monochromatic form of Maxwell’s equations. 3.3.2

Modes in a Step-Index Fiber

An optical fiber is a cylindrical dielectric waveguide. Three kinds of optical fiber waveguides are shown in Figure 3.9. This section presents two methods of solution of Maxwell’s equations for the guided modes in a step-index cylindrical fiber. The first solution is approximate and so is simpler. It is appropriate for a fiber for which the normalized index difference ± is small (cf. Figure 3.9(c)). It is well motivated by the slab waveguide, but now uses a cylindrical geometry for the fiber instead of the rectangular geometry used for the slab waveguide. The second solution is exact. It holds for an arbitrary value of the normalized index difference ±, and requires full solutions for all three vector components. The cross section of a step-index cylindrical fiber with a core of radius a is shown in Figure 3.13. For this geometry, the appropriate coordinate system is a cylindrical coordinate system (r, ψ, z ), with z being the direction of propagation. As for the case of a slab waveguide, a fiber cannot support a mode that does not have at least one axial field component.

Cladding

n2

Core

n1

Cladding

n2

K(qr) d = 2a

J(pr) K(qr)

Figure 3.13 The radial dependence of the solution to the wave equation in the core of a step-index

fiber is a Bessel function of the first kind Jν (r ), whereas the solution in the cladding is a modified Bessel function of the second kind Kν (r ). These forms are matched at the core/cladding interface.

3.3 Waveguide Geometries

133

Modes of a Step-Index Fiber For a Small Value ± of

For most step-index optical fibers used for communication, n1 ≈ n2 and the normalized index difference ± is much smaller than one. 7 This section derives the modes under the approximation that ± is small. In the next section, these results are validated by comparison with the exact solution for an arbitrary value of ± . To motivate the analysis of this section, consider the case for which ± goes to zero so that n1 = n2 = n. In this regime, there is no core/cladding boundary in the fiber. Because the cladding is regarded as infinite, the solution to the Helmholtz equation is a plane-wave with β = nk0 (cf. (2.3.19)). For a plane-wave solution, both the electric field vector E and the magnetic field vector H are transverse to the direction of propagation. This is a transverse electromagnetic (TEM) mode. This observation suggests that for ± much smaller than one, a guided-mode solution to the vector Helmholtz equation given in (2.3.15) is nearly a TEM mode, with the electric field vector E nearly transverse to the direction of propagation. This electric field can be written as E ≈ E x (r, ψ, z )½ x + E y (r, ψ, z )½ y, where cartesian coordinates are the judicious choice to describe the plane-wave-like nature of each component of the vector field, but then with cylindrical coordinates being the judicious choice to describe the functional dependence of each cartesian component to conform with the cylindrical symmetry of the fiber. Given the approximate form of the electric field vector E, the vector Helmholtz equation separates into a set of scalar Helmholtz equations as was the case for the slab waveguide. Each transverse field component ( E x , E y , Hx , and Hy ) satisfies the same form of scalar Helmholtz equation. One pair ( E x , Hy ) of transverse field components defines the direction of one field polarization. The second pair (E y , Hx ) of transverse field components defines the direction of the other field polarization. The same form of solution holds for both polarizations. Working with ( E x , Hy ), the E x component satisfies the scalar Helmholtz equation

∇ 2 E x (r, ψ, z ) + n 2k02 E x (r, ψ, z ) = 0, (3.3.19) where n = n1 in the core and n = n2 in the cladding. To determine E x (r, ψ, z ), the method of separation of variables that was used for

the slab waveguide is used again here. The differential equation (3.3.19) is solved by setting E x (r, ψ, z ) = f (r )g(ψ) e−iβ z . This form of solution reflects the fact that the geometry of an ideal lossless fiber is invariant in z, and thus the mode experiences only a z-dependent phase shift described by the real propagation constant β . Cylindrical optical fibers are also geometrically invariant in ψ . Therefore, the field of a guided mode must be periodic in ψ , with a period that is a multiple of 2 π . This implies that the functional form for g(ψ) must be eiνψ , where ν must be an integer so that g(ψ) is periodic. For ν = 0, the field is azimuthally symmetric. For ν ²= 0, the field has 2ν zeros in ψ over the range [0, 2π). 7 This kind of fiber is sometimes called a weakly guiding fiber.

134

3 The Guided Lightwave Channel

Substituting these forms into (3.3.19), and separating the equations, yields a differential equation for f (r ), d2 f (r ) dr 2

+

1 d f (r ) r dr

µ

+

n2 k02

2¶ ν − β − r 2 f (r ) = 0, 2

(3.3.20)

for some separation constant β 2 and integer ν . The derivation of this expression by the separation of variables is posed as a problem at the end of the chapter. This differential equation, derived for a cylindrical fiber geometry, corresponds to (3.3.4a), derived for a slab rectangular waveguide geometry. The equation is called the Bessel differential equation. The solutions to it are called Bessel functions. Our task now is to study solutions to this differential equation both in the core region, for which n = n1 , and in the cladding region, for which n = n2 . For the core region, defined by r ≤ a, set n = n 1 and set n 21k02 − β 2 = p2 (cf. (3.3.10a)), then substitute this expression into (3.3.20) to give the Bessel differential equation in the core region as d2 f (r ) dr 2

+

1 d f (r ) r dr

+

µ

p

2

2¶ ν − f (r ) = 0,

r2

r

≤ a,

which has a guided-mode solution given by f (r ) = A Jν ( pr ),

r

≤ a,

(3.3.21a)

where Jν (x ) is the Bessel function of the first kind of order ν and A is a constant.8 This function plays the same role for a fiber waveguide that the cosinusoidal functions play for a slab waveguide. For the cladding region defined by r ≥ a, set n = n2 and set n22 k20 − β 2 = −q2 , with q being real (cf. (3.3.10b)). Substituting this expression into (3.3.20), the form of the modified Bessel equation in the cladding is d2 f (r ) dr 2

+

1 d f (r ) r dr



µ

q

2

2¶ ν + f (r ) = 0,

r2

r

≥ a,

which has a decaying solution given by f (r ) = B Kν (qr ),

r

≥ a,

(3.3.21b)

where K ν (x ) is the modified Bessel function of the second kind of order ν , and B is a constant. This function plays the same role for a fiber waveguide that the decaying exponential function plays for a slab waveguide. The two kinds of Bessel functions are plotted in Figure 3.14. The values of p and q in (3.3.21) are related to the propagation constant β by the same expressions, (3.3.10a) and (3.3.10b), that were derived for the slab waveguide.

¾ν

( ) = ( / π) ν

8 The integral representation of a th integer-order Bessel function of the first kind is J x 1 2 ν π e i(x sin θ−νθ) d π cos x sin 1 d The integral representation of a th integer-order −π 0 ∞ − x cosh θ cosh d . modified Bessel function of the second kind is K ν x 0 e

¾

θ = ( /π)

(

θ − νθ) θ. ¾ ( )=

(νθ) θ

3.3 Waveguide Geometries

(a)

1

(b) 10

J0 (x) J1 (x)

0.5

K0 (x) K1 (x)

8

J2 (x)

135

K2 (x)

)x( K

ν

ν

)x( J

6

0.

4 2

–0.5 0

5

10 x

15

20

0

0

0.5

1 x

()

1.5

2

()

Figure 3.14 Bessel functions. (a) The first three Jν x Bessel functions (b) The first three K ν x

Bessel functions.

Therefore, the expression relating p and q for a step-index fiber has the same form as (3.3.11) with

( pa)2 + (qa)2 = V 2 ,

(3.3.22)

where a is the radius of the fiber core. Equations (3.3.21) correspond to equations (3.3.8) for a slab waveguide. The terms p and q in (3.3.21) correspond to the same parameters defined in (3.3.10a) and (3.3.10b), respectively, and behave in the same way. A larger value of p will produce more spatial oscillations of the field within the core. A larger value of q will produce a faster decay rate of the field in the cladding. A solution to the transverse field component E x (r, ψ, z ) = f (r )e−i νψ e−iβ z with f (r ) given by (3.3.21) is called a linearly polarized mode, abbreviated as an LP mode. The requirement that ± is much smaller than one is equivalent to the conditions β ≈ nk0 and nk0 a ³ V . The derivation of these expressions is asked for in an end-of-chapter exercise. The scalar Helmholtz equation for E x (r, ψ, z) has the same form as the TE slab waveguide mode with x replaced by r . This means that the characteristic equation for an LP mode can be determined by matching both the radial part f (r ) of E x (r, ψ, z) and its derivative d f (r )/dr at the core/cladding interface. Using (3.3.21) and matching the solutions for f (r ) at r = a yields A Jν ( pa) = B K ν (qa ),

(3.3.23)

for some constants A and B . Similarly matching the derivative of each solution with respect to r at r = a yields Ap Jν´ ( pa ) = Bq K ν´ (qa ).

(3.3.24)

Multiplying the preceding equation by a and forming the ratio gives J ´ ( pa) pa ν Jν ( pa)

´ qa ) = qa KK ν ((qa ). ν

(3.3.25)

136

3 The Guided Lightwave Channel

The derivatives in this expression can be eliminated by using the standard Bessel function identities for the derivatives, Jν´ ( x ) = ∓ Jνµ1(x ) µ

ν J (x ), ν x

K ν´ ( x ) = −K νµ1(x ) µ

ν K (x ). ν x

(3.3.26)

Substituting these identities into (3.3.25), the characteristic equation for a linearly polarized mode is rewritten as pa

Jνµ1 ( pa ) Jν ( pa)

(qa) , = µqa KKνµ(1qa ) ν

(3.3.27)

where the choice of the sign must agree throughout. This expression plays the same role that (3.3.9) plays for the TE modes of a slab waveguide. The plot of this characteristic equation shown in Figure 3.15 corresponds to the plot in Figure 3.11 for the slab waveguide. Because a linearly polarized mode has both a radial and an azimuthal dependence, the conventional labeling of a linearly polarized mode LPν´ has two indices. The first index denotes the order ν of the azimuthal component of the mode. This value is related to the number of times the field goes to zero in the azimuthal direction. For ν = 0, the mode is azimuthally symmetric. For ν ² = 0, the mode varies in ψ and has 2ν zeros for ψ = [ 0, 2π). Each of these allowed spatial modes can support two polarization modes. Given a fixed value of ν , the second index ´ corresponds to the allowed values of pa. The smallest allowed value of pa is labeled with ´ = 0. The mode with ´ = 1 8 01 11 21 02

6

31 12 aq

V=4

4

41 22

2

51

03 0

0

2

4 pa

6

8

Figure 3.15 Plot of the characteristic equation for a linearly polarized mode along with (3.3.11)

for V

= 4 (cf. (3.3.22)). The four intersections correspond to four allowed modes.

3.3 Waveguide Geometries

137

corresponds to the next largest value of pa and so on. This index is related to the number of spatial oscillations in the radial direction for a fixed azimuthal dependence. Each azimuthally symmetric LP0´ solution corresponds to a pair of transverse components – ( E x , Hy ) or (E y , Hx ) – which defines one possible polarization mode. Therefore, for ν = 0, each solution corresponds to two orthogonal polarization modes. For ν ² = 0, the azimuthal dependence can be either cos(νψ) or sin(νψ) , thereby generating two distinct spatial patterns for each polarization. This produces four modes for each (ν, ´) pair for ν ² = 0. Using the functional forms for the ψ dependence given by g (ψ) = eiνψ and the z dependence given by h(z ) = e−iβ z , the complete solution for a single transverse electric field component is

·

E x (r, ψ, z ) = E x (a )e−i β z Ae iνψ for the core, and

·

+ Be −iνψ

¸ Jν ( pr ) Jν ( pa)

for r

≤a

(3.3.28a)

¸ K ν (qr ) for r ≥ a (3.3.28b) K ν (qa ) . for the cladding, where the arbitrary amplitude E x (a) satisfies E x (a) = E x (a , 0, 0). The azimuthal dependence has the form cos(νψ) for A = B = 1/2, or sin(νψ) for A = −i/ 2 and B = i /2. Plots of a single transverse electric field component of a linearly polarized mode for a step-index fiber with V = 4.5 are shown in Figure 3.16. E x (r, ψ, z ) = E x (a)e−iβ z Aeiνψ

+ Be −iνψ

The modes of a lossless waveguide are orthogonal in the field amplitude. They are, of course, not orthogonal in the intensity.

Cutoff Conditions

Similarly to a slab waveguide, for each branch of the characteristic equation that defines a mode, the value of pa at which qa is equal to 0 is the cutoff condition for that mode. For a linearly polarized mode, the cutoff condition is qa = 0, and pa = V . Then the field no longer decays in the cladding and is not guided. The cutoff condition can be derived from (3.3.27) by multiplying through by Jν ( pa), knowing from the properties of Bessel functions that Jν (V ) and Jν−1 (V ) cannot both be zero at the same value of V . Setting the right side of (3.3.27) to zero yields Jν−1 (V ) = 0,

(3.3.29)

where V ² = 0. The cutoff condition for the LP11 mode is the first zero of J0( pa) and yields V = pa ≈ 2.40. Only the LP01 mode will propagate when the normalized frequency V is smaller than this value. For this case, there is a single spatial mode resulting in a single-mode optical fiber. The lowest-order mode is labeled “01” in Figure 3.15. This spatial mode, which is never cut off, can support two polarization modes. Although the LP 01 mode is not cut off, the field decays more slowly in the cladding as V becomes small, thereby producing a nonnegligible field at the outer radius of any finite cladding. This field interacts with the cladding/casing interface, invalidating the theory and causing the mode close to the cutoff condition to become unguided.

138

3 The Guided Lightwave Channel

Field

Field

Field

Field

Intensity

Intensity

Intensity

Intensity

LP 01

LP 11(cos ψ)

LP 02

LP21(cos 2 ψ)

LP 11(sin ψ)

LP 21(sin 2ψ)

Figure 3.16 The cross-sectional electric field strength, the cross-sectional intensity, and the

intensity density for the four LP modes supported in a fiber with V = 4.5. The field and intensity plots are along the x axis for a cosine ψ dependence. The term in parentheses for each mode is the azimuthal dependence of the mode.

Modes of a Step-Index Fiber for an Arbitrary Value ± of

The treatment of a step-index fiber in the previous subsection with ± much smaller than one is adequate for all existing practical fibers used for communication, but the analysis does involve approximation. Accordingly, to validate the treatment, the exact solution to the problem is now presented. This provides a deeper understanding and confidence in the approximations used in the previous analysis. Aside from these reasons, this subsection may be regarded as superfluous. For an arbitrary value of the normalized index difference ± , the field may have significant axial field components, so the approximation that a fiber mode is nearly a transverse electromagnetic mode is no longer valid. For this case, the exact solution requires matching two sets of tangential field components at the core/cladding boundary. The first set is the axial field components (E z , Hz ). The second set consists of the azimuthal field components ( E ψ , Hψ ). In general, if the mode is not azimuthally symmetric, then matching both sets of field components at the core/cladding boundary requires both an axial electric field E z component and an axial magnetic field Hz component. The resulting mode is a hybrid mode. Azimuthally symmetric modes can be either TE or TM modes. To determine the form of a mode in a fiber, the vector Helmholtz equation given in (2.3.15) is expressed in a cylindrical coordinate system. At each (r, ψ, z) point in the fiber, the electric field vector has three components,

3.3 Waveguide Geometries

139

± + E z(r, ψ, z )½z. E (r, ψ, z ) = E r (r, ψ, z )½ r + E ψ (r, ψ, z )ψ In contrast to the approximate solution that uses cartesian coordinates to express the nearly transverse nature of the vector field for ± much smaller than one, the general solution expresses the vector field using cylindrical coordinates. In this coordinate ± system, the unit vector ½ r for the E r component of the field and the unit vector ψ ±/∂ψ = −½r and ∂±r /∂ψ = for the E ψ component of the field are related, with ∂ ψ ½ ψ . The unit vector ± z for the E z component is not related to any other field component. Because E z (r, ψ, z ) can be studied by itself, we choose to first solve for E z (r, ψ, z ). Once this field component has been determined, the other two field components, E r (r, ψ, z ) and E ψ (r, ψ, z ), can be determined directly from E z (r, ψ, z) using the vector structure of Maxwell’s equations, as was the case for a slab waveguide geometry. The E z component satisfies a scalar Helmholtz equation,

∇2 Ez (r, ψ, z) + n2 k20 Ez (r, ψ, z) = 0.

(3.3.30)

Again using the method of separation of variables, this equation is solved by setting E z (r, ψ, z ) = f (r )eim ψ e−iβ z , where the value of the index m for the azimuthal component is not necessarily equal to the value ν used earlier for the approximate linearly polarized modes. Instead, the exact analysis here will show that the linearly polarized modes are linear combinations of the exact solutions. The solution for the axial field component f (r ) is of the same form as that given in (3.3.21). Using this form of solution, the axial field components E z (r ) and Hz (r ) must be continuous at the core/cladding interface. Matching E z (r ) and Hz (r ) at r = a gives

⎧ ⎪⎪ ⎨ E z (r ) = ⎪⎪ ⎩

⎧ ⎪⎪ ⎨ Hz (r ) = ⎪⎪ ⎩

E z (a)

Jm ( pr ) , Jm ( pa )

r

≤a

K m (qr ) E z (a) , r ≥ a, K m (qa ) Jm ( pr ) , r ≤a Hz (a ) Jm ( pa ) K m (qr ) Hz (a) , r K m (qa )

≥ a,

(3.3.31)

(3.3.32)

where E z (a ) and Hz (a ) are the values at the core/cladding interface. The derivation of E z (r ) and Hz (r ) for one of the two tangential field components is now complete. The second set of tangential field components consists of the azimuthal components E ψ (r ) and Hψ (r ). To derive an expression relating these field components, the azimuthal components are expressed in terms of the axial components E z (r ) and Hz (r ) using two of Maxwell’s equations, given in (2.3.1a) and (2.3.1b), expressed in cylindrical coordinates. Applying the boundary conditions at the interface, the azimuthal components are now expressed in terms of the axial components E z (r ) and Hz (r ) as

140

3 The Guided Lightwave Channel

⎧ mβ ⎪⎪ − Ez (r ) + ik0 η0 ∂ Hz (r ) , r ≤ a ⎨ p2 r p2 ∂r (3.3.33) E ψ (r ) = ⎪⎪⎪ mβ ik0η0 ∂ Hz (r ) ⎩ q 2r E z ( r ) − q 2 ∂ r , r ≥ a , ⎧ mβ ⎪⎪ − Hz (r ) − i (k0 /η0) n21 ∂ E z(r ) , r ≤ a ⎪⎨ p2 r p2 ∂r Hψ ( r ) = (3.3.34) ⎪⎪ mβ 2 n i k /η ) ∂ E ( r ) ( 0 0 2 z ⎪⎩ Hz ( r ) + q 2r q2 ∂ r , r ≥ a, √ where η0 = µ 0/ε0 is the impedance of free space. Now substitute into (3.3.33)

and (3.3.34) the expressions for E z and Hz for both the core and the cladding given in (3.3.31) and (3.3.32). This produces a set of equations for the field in the core and a set of equations for the field in the cladding. These are expressed as matrix equations. The matrix equation for the azimuthal components in the core is

⎡ ⎣

⎡ ⎤ − mp2βr ⎢⎢ E ψ (r ) ⎦ = ⎢⎢ ⎣ i (k0 /η0) n21 Jm´ ( pr ) Hψ (r ) − p J ( pr )

ik0 η0 Jm´ ( pr ) p Jm ( pr )

− pm2βr

m

⎤ ⎥⎥ ⎡ ⎥⎥ ⎣ ⎦

E z (r ) Hz (r )

⎤ ⎦,

where the prime indicates a derivative with respect to the argument of the function. The matrix equation for the azimuthal components in the cladding is

⎡ mβ ⎡ E (r ) ⎤ ⎢⎢ q 2r ψ ⎦ = ⎢⎢⎢ ⎣ ⎢⎣ Hψ (r ) i (k0 /η0 ) n22 K m´ (qr ) q K m (qr )

´ ) ⎤ − ik0qη0 KK m ((qr ⎥ m qr ) ⎥ ⎡ ⎥⎥ ⎣ ⎥⎥ ⎦ mβ

E z (r ) Hz (r )

⎤ ⎦,

q 2r

where, as before, a prime indicates a derivative with respect to the argument of the function. Applying the boundary conditions, the two expressions for the azimuthal components, which are tangential to the core/cladding interface, must be equal at the boundary r = a. Setting r = a in both expressions and subtracting yields

⎡ ⎢⎢ ⎢⎢ ⎣

mβ a

µ1

q2

+

1 p2



i (k0/η0 ) a F2( pa, qa )

⎤ −ik0 η0a F1 ( pa, qa) ⎥ ⎡ ⎥⎥ ⎣ ⎥⎦ µ ¶ 1 mβ 1 + p2 a q2

where F1 ( pa, qa ) =

K m´ (qa ) qa K m (qa )

+

E z (a) Hz ( a )

Jm´ ( pa ) pa Jm ( pa )

⎤ ⎦ = 0,

(3.3.35)

(3.3.36)

3.3 Waveguide Geometries

and F2 ( pa, qa ) =

n22 K m´ (qa )

n21 Jm´ ( pa)

+ qa K m (qa )

pa Jm ( pa )

.

141

(3.3.37)

Nonzero solutions to the matrix equation in (3.3.35) exist only when the determinant of the matrix is zero. This condition leads to a condition on the allowed values for pa and qa given by m 2β 2 k02

µ

1 (qa)2

¶2 + ( pa1 )2 − F1( pa, qa)F2 ( pa, qa) = 0.

(3.3.38)

For m ² = 0, a solution to (3.3.38) does not exist unless both F1( pa, qa ) and F2 ( pa, qa ) are nonzero. When this condition is satisfied, none of the elements of the matrix given in (3.3.35) are zero. Therefore both E z and Hz are nonzero and the mode is a hybrid mode having both an axial E z component and an axial Hz component. For m = 0, the mode is azimuthally symmetric and the diagonal elements of the matrix in (3.3.35) vanish. For this case, it is possible to have a TE mode or a TM mode with a single axial field component. If m = 0 and F1( pa , qa ) = 0, then (3.3.35) has a solution if and only if E z = 0. This is a TE mode. If m = 0 and F2( pa, qa ) = 0, then (3.3.35) has a solution if and only if Hz = 0. This is a TM mode. Because g (ψ) = e−im ψ reduces to a constant for m = 0, both the TE mode and the TM mode are azimuthally symmetric, having no dependence in the azimuthal variable ψ . The characteristic equation given in (3.3.38) can be put into a symmetric form by first eliminating β. To do so, divide (3.3.10a) by (k0 pa)2, divide (3.3.10b) by (k0qa )2 , and add the two resulting equations to produce

β2 µ k02

1 ( pa )2

+

1 (qa)2



2

2

n1 n2 = ( pa + 2 ) (qa)2 .

Substituting this expression into (3.3.38) gives m2

µ

1 ( pa)2

+

1 (qa )2

¶Æ

n 21 ( pa )2

+

n22 (qa)2

Ç

= F1 ( pa , qa) F2 ( pa , qa) ¶ µ J ´ ( pa) K m´ (qa ) m = pa J ( pa) + qa K (qa) m Ç Æ 2m ´ 2 n 1 Jm ( pa ) n 2 K m´ (qa ) × pa J ( pa) + qa K (qa) . m m (3.3.39)

Again, the derivatives can be eliminated by using the standard Bessel-function identities given in (3.3.26). Using those identities, (3.3.39) can be written as

( J + + K + ) ·n2 J − − n 2 K − ¸ + ( J − − K −) ·n 2 J + + n2 K +¸ = 0, 1

where

2



=

Jm µ1( pa) , pa Jm ( pa )

1



2

K m µ1(qa ) = qa . K (qa ) m

(3.3.40)

(3.3.41)

142

3 The Guided Lightwave Channel

m= 1

6 HE 11

5

EH 11

12

3

aq

3

aq

2

2

1

1 0

1

2

3 pa

4

5

6

0

5 TE 01

TM01

HE 21

4 3

aq

4

m= 2

6

5

4

0

m= 0

6

TE 02

TM 02

0

1

2

3 pa

4

Figure 3.17 Plot of the characteristic equation for n1

5

EH21

2

HE22

1 6

0

0

1

2

3 pa

4

5

6

= 2.5, n 2 = 1.5, and V = 4.

The allowed modes can be depicted graphically by plotting the characteristic equations given in (3.3.40) and (3.3.11) as functions of pa and qa. For each value of m, there are several curves or branches for (3.3.40). These branches are shown in Figure 3.17 for m = 1, 0, and 2. Each intersection of the characteristic equation with (3.3.11) defines a mode. For example, for V = 4 and m = 0, which is the center panel of Figure 3.17, there are two modes defined by the two intersections shown in the figure. It is conventional to label the hybrid modes as HEmn and EHmn . For the HE modes, E z is relatively strong and the mode is more “TM-like.” For the EH modes, Hz is relatively strong and the mode is more “TE-like.” For m = 0, there are also TE 0n modes and TM0n modes that have a single axial field component. The first index m denotes the azimuthal dependence. This index is not the same as the index ν used to characterize a linearly polarized mode, as will be discussed below. Given a fixed value of m, the second index n corresponds to the allowed values for pa. These are the intersections as pa increases that are shown in Figure 3.17. To compare hybrid modes with linearly polarized modes, consider the limit of the characteristic equation for a hybrid mode as n1 approaches n2 , which is the regime for which the linearly polarized modes are valid. For this limiting case, (3.3.40) reduces to

( J + + K + ) ( J − − K − ) = 0.

The equation is satisfied when J − conditions can be written as pa

=

K − or J +

Jm ( pa ) Jm µ1( pa)

(3.3.42)

= −K + . Using (3.3.41), these two

) = ∓qa KK m (qa (qa) . m µ1

(3.3.43)

The limiting form of the characteristic equation for a hybrid mode as n1 approaches n2 is reconciled with the characteristic equation for a linearly polarized mode given in (3.3.27) by observing that the two characteristic equations are equal when ν = m µ 1 or equivalently m = ν ∓ 1. This validates the approximations made leading to (3.3.27) as the normalized index difference ± becomes small. This observation leads to the following relationships between the modes for small ±. Using ν = m − 1, the HE1n modes form the LP0n modes. Each LP1n mode is a linear combination of the TE0n , TM0n , and

3.3 Waveguide Geometries

143

HE2n modes. For ν > 1, the HE(ν+1)n modes and the EH (ν−1)n modes form the LPν n modes. As an example, the LP12 mode is a linear combination of the TE02, the TM 02, and the HE 12 modes. The modes for m = 1 are defined by the intersections shown on the left side of Figure 3.17. The lowest-order branch of the characteristic equation intersects the origin (qa = 0, pa = 0). Therefore this mode will propagate regardless of the value of the normalized frequency V . This lowest-order mode is called the HE 11 mode and corresponds to the LP01 mode. The center panel is for m = 0. These are TE and TM modes.

Cutoff Conditions and Single-Mode Operation

Because the relationships between pa, qa, and V for the slab waveguide and the stepindex fiber have the same form, the cutoff condition defined for the slab waveguide geometry can also be applied to the step-index fiber geometry. A mode becomes unguided as qa goes to 0 and pa goes to V . The expressions for the cutoff frequencies can be derived analytically by taking the limit of (3.3.40) as qa goes to zero. Referring to Figure 3.17, the HE11 mode is never cut off, with pa going to zero as qa goes to zero. However, as qa goes to zero, the mode decays more slowly in the cladding, and the effect of the cladding/casing interface may invalidate the theory. The next mode that can propagate occurs when m = 0 and is the TE01 mode. As qa goes to zero, the TE01 mode and the TM01 mode have the same cutoff condition, which is given by the solution to the equation J0( pa) = 0.

(3.3.44)

The cutoff value of the TE01 and TM01 modes is pa ≈ 2.4. When the normalized frequency V is smaller than this value, only the HE11 mode will propagate. When n1 ≈ n2 , the cutoff condition for the HE21 mode approaches the value given by (3.3.44). A proof of the validity of this statement is asked for in an end-of-chapter exercise. Therefore, when n 1 ≈ n2, the TE 01, TM01, and HE21 modes all have the same cutoff condition as the LP11 mode (cf. (3.3.29)).

Lightwave Fields, Lightwave Envelopes, and Lightwave Signals

The modal structure of a lightwave in a fiber was analyzed for a monochromatic lightwave, which has no modulation. The propagation of a narrowband modulated lightwave can be approximated using the modal structure of the carrier for the modulated carrier. Chapter 4 will refine this approximation, writing the modes as a function of frequency. The real passband electric field E²(r, ψ, z , t ) for a fiber with ± much smaller than one has an azimuthal dependence of cos(νψ), a slowly varying amplitude A (z , t ), and a slowly varying phase φ(z , t ). Setting the unit vector ½ x in the same direction as the linearly polarized electric field gives (cf. (3.3.28))

144

3 The Guided Lightwave Channel

· ¸ pr ) ωc t − β z + φ(z , t ) cos (νψ) JJν (( pa )½x

(3.3.45a)

· ¸ qr ) ωc t − β z + φ(z , t ) cos (νψ) KK ν((qa )½x

(3.3.45b)

E²(r, ψ, z , t ) = A(z , t )cos for the core (where r

≤ a) and

ν

E²(r, ψ, z , t ) = A(z , t )cos

ν

for the cladding (where r ≥ a). Given that the field for a linearly polarized mode is nearly transverse to the direction of propagation, the lightwave signal power at a distance z is well approximated by È(r, t ) for (2.3.28). For this case, it is convenient to define the passband field amplitude U one linearly polarized field component in terms of the corresponding component U (r, t ) of the complex field envelope (cf. (2.3.29)). Separating U (r, t ) into an axial and transverse component as given by U (r, t ) = a(z , t )Ut (r, ψ) , the passband field amplitude È(r, ψ, z , t ) can be written as U

À À

È(r, t ) = Re U (r, t )ei(ωc t −β z) U

Á

Á = Re a(z , t )Ut (r, ψ)ei(ω t −βz) . c

(3.3.46)

The term a (z, t ) is the complex envelope of the signal

.

a(z , t ) = A (z, t )ei φ(z ,t ) ,

(3.3.47)

which describes the axial dependence of a linearly polarized mode (cf. (3.3.45)). The term Ut (r, ψ) describes the transverse dependence. The transverse dependence of the field Ut (r, ψ) is normalized so that the integral over the cross-sectional region A of the fiber is equal to one. Using this normalization and (2.3.27) gives P ( z, t ) =

´

|U (r, t )|2 dA A ´ = |a(z , t )|2 |Ut (r, ψ)|2 dA A = |a(z , t )|2 .

(3.3.48)

The propagation of a narrowband modulated lightwave field at frequency ωc given in (3.3.45) is now expressed in terms of the complex signal envelope a (z, t ), the propagation constant β , and the transverse part of the field U t (r, ψ) for each mode. This means that the complex envelope a (z , t ) can be used to describe the z-dependent propagation characteristics of a lightwave field with the transverse part of the field included when needed.

3.3.3

Modes in a Graded-Index Fiber

A graded-index fiber has a mode structure that has the same general form as the mode structure for a step-index fiber. In principle, the mode structure can be found in the same way as for a step-index fiber. However, to find the exact mode structure for a

3.3 Waveguide Geometries

α

=∞

α

=2

α

α

145

= 10

=1

2a Figure 3.18 Power-law profile of a graded-index fiber for several values of

α.

graded-index fiber with an arbitrary index profile requires numerical methods because the spatially varying index profile n(r) is not as simple as the step-index profile. Approximate analytic solutions for the lowest-order modes can be obtained for some specific index profiles. A commonly used graded-index profile is a power-law index profile given by À · r ¸α Á n2 (r ) = n20 1 − 2± for r < a, (3.3.49) a

.

where ± = (n 0 − na )/ n0 is the normalized index difference, and r is the radial distance. This profile is shown in Figure 3.18 for several values of the power-law index profile parameter α. If α = 2, the parabolic index profile given in (3.2.10) is recovered with K 2 = 2±/a 2. As α approaches infinity, the index profile approaches a step-index profile. For a typical graded-index fiber with ± on the order of 1%, a well-guided mode is confined to a spatial region that is small in comparison with the radius a of the core, which is on the order of tens of wavelengths. If the primary propagation characteristics of only these well-guided modes are of interest, then the finite parabolic index profile can be approximated by an infinite parabolic index profile that has no cladding or casing. Approximating a finite parabolic index profile with an infinite parabolic index profile is justified when ± is much smaller than one, with the radial dependence of the mode decaying rapidly so that the spatial extent of the mode is significantly smaller than the finite core. When these conditions are satisfied, the field at the graded-index core/cladding boundary is insignificant. The exact solution for an infinite parabolic index profile can be determined using the method of separation of variables. The guided modes for this geometry are called Hermite-gaussian modes and are given by ·√ x ¸ ·√ y ¸ −V (x 2+ y 2)/2a2 −iβi j z U (x , y, z ) = A i j Hi V V Hj e e , (3.3.50) a a where Hi (x ) is a Hermite polynomial of order i , the normalized frequency V is defined in (3.3.12b), with n0 replacing n1 , and Ai j is a normalization constant. For i not equal to j , the mode is not circularly symmetric. The propagation constant βi j is given by

βi j = n0k0

Æ



2± (i 1−2 n 0 k0a

Ç1/2 + j + 1) ,

(3.3.51)

146

3 The Guided Lightwave Channel

where ± is defined in (3.2.15). The derivation of these solutions along with the definition of a Hermite polynomial is discussed in an end-of-chapter exercise. Expression (3.3.50) is a widely used closed-form approximation for a parabolic index profile for which the sum i + j is small. It is a good approximation when the field is predominantly in the lowest-order modes, which are tightly confined. It becomes less accurate for the higher-order modes for which the power near the core/cladding interface is significant. For the lowest-order mode for which i = j = 0, the Hermite polynomial is a constant given by H0 (x ) = 1. In this case, the mode has a circularly symmetric gaussian profile.

3.4

Mode Coupling Mode coupling refers to the fact that energy can be redistributed among modes during propagation. It can occur within a single fiber that supports multiple modes, or between polarization modes. Mode coupling can also occur between spatially separated waveguiding structures that are in close proximity, such as the separate cores of a multicore fiber shown in Figure 3.1(d). The concept of mode coupling is based on perturbations in the waveguiding structure, which may be intentional modifications or unintentional imperfections. These perturbations cause the energy in one guided mode to couple into another guided mode, or into an unguided mode such as a radiation mode. In principle, the redistribution of the energy can be determined by directly solving Maxwell’s equations for the actual structure, including the perturbations and the appropriate boundary conditions. However, such an approach would usually be uninformative as well as computationally intensive. Instead, coupled-mode theory is used to analyze mode coupling for weak perturbations. Coupled-mode theory decomposes a complex structure into a set of substructures. The modes of each substructure, considered in isolation, are determined first. The effect of the other substructures is then modeled as a coupling process between the modes of the isolated substructures. This approach provides an intuitive way of understanding many energy-redistribution effects, both linear and nonlinear, that occur in practical fibers and in other components used in lightwave communication systems. To determine how the modes couple, start with the monochromatic form of Maxwell’s equations in a nonmagnetic dielectric material normalized as (cf. (2.3.1) and (2.3.7))

∇ × e = −iωµ0h, ∇ × h = iωε e,

(3.4.1a) (3.4.1b)

where e and h are normalized fields that have unit power, and ε = ε0εr is the permittivity of a lossless waveguide that has no perturbations. The fields E and H in the perturbed waveguide, which are not required to have unit power, are given by

∇ × E = −iωµ0H, ∇ × H = iωε ´ E,

(3.4.2a) (3.4.2b)

3.4 Mode Coupling

147

where the permittivity of the perturbed waveguide ε ´ is written as ε ´ (x , y , z) = ε( x , y ) + ±ε( x , y, z ), with ±ε(x , y, z ) being the difference between the permittivity of the unperturbed waveguide and the permittivity of the perturbed waveguide. For a fixed value of z, the perturbation ±ε( x , y , z) is a function of only the transverse coordinates. Modifying (2.3.40) to account for the z dependence of the perturbation, the field in the perturbed waveguide is expanded using a basis generated from the modes of the unperturbed waveguide, E(r) = H(r) =

Å

Å k

k

ak (z )ek (x , y)e−iβ k z ,

(3.4.3a)

ak (z )hk (x , y )e−iβ k z ,

(3.4.3b)

where ak (z ) is the z-dependent coefficient of the expansion that is constrained to vary slowly as compared with the spatial variation of e−i βk z . Section 3.4.1 below shows that the coefficients ak (z) are related by the following set of coupled differential equations:

µ dadzj (z ) = i where

κjk (z ) =. ω

´ A

Å k

κ jk (z )ak (z)ei(β −β )z , j

k

±ε( x , y, z )e∗j (x , y) · ek (x , y )dA

(3.4.4)

(3.4.5)

is defined as the coupling coefficient between mode j and mode k caused by the perturbation ±ε( r). This set of equations is the basis of coupled-mode theory.

3.4.1

Derivation of the Coupled Equations

To derive (3.4.4), form the inner product of the unperturbed field e∗ with each side of (3.4.2b), form the inner product of the perturbed field H with the complex conjugate of each side of (3.4.1a), and subtract the second equation from the first equation to yield

( ) ∇ · (e∗ × H) = i ω ε´ e∗ · E − µ0h∗ · H , (3.4.6) where the vector identity ∇ · (A × B ) = A · (∇ × B) − B · (∇ × A) is used to simplify

the left side. Repeat this process, forming the inner product of the perturbed field E with the complex conjugate of each side of (3.4.1b), and form the inner product of the unperturbed field h∗ with each side of (3.4.2a). Subtracting the second equation from the first equation gives

∇ · (E × h∗ ) = −iω(εe∗ · E − µ0h∗ · H).

(3.4.7)

Adding (3.4.6) to (3.4.7) yields

∇ · (¹E × h∗ º» + e∗ × H¼) = iω ±ε e∗ · E, S

(3.4.8)

148

3 The Guided Lightwave Channel

where S is the complex cross-power density (cf. (2.3.32)) of the unperturbed field and the perturbed field. The j th mode of the unperturbed waveguide can be written in the form of (2.3.31), e∗j (r) = e∗j ( x , y )ei β j z .

(3.4.9)

In general, this mode has both a transverse field component and an axial field component (cf. (2.3.40)). To proceed, substitute the j th unperturbed basis function into (3.4.8) and integrate over the cross-sectional region A. Suppressing the arguments for each j , the components satisfy an equation of the form

´

´ ∇ · (E × h∗j + e∗j × H)dA = i ω ±ε e∗j · E dA. ¹ º» ¼ A A

(3.4.10)

Sj

Apply the following identity to the left side

´ É ∂ (3.4.11) ∇ · S j dA = ∂ z S j · ½z d A + S j · ½n dL , A LA A where A is the transverse region in the (x , y ) plane, L A is a line integral in that plane around this region, and ½ n is the outward normal unit vector to the contour of the line ´

integral. The integral over the region A is evaluated over an infinite area, and thus the line integral is evaluated at infinity. For any guided mode or evanescent mode, this line integral evaluates to zero because these modes decay exponentially in the transverse direction. For radiation modes, the integral is also zero, but the reasoning is more subtle and is based on the rapid oscillatory nature of a radiation mode as the transverse distance from the center of the waveguide approaches infinity.9 Setting the line integral to zero, the left side of (3.4.10) is equal to the first term on the right of (3.4.11). Substitute the form of the unperturbed mode given in (3.4.9) into the expression for S j given on the left side of (3.4.10), and substitute the form of the perturbed field given in (3.4.3) into S j . Making these substitutions, suppressing the arguments of the functions, and interchanging the summation and the integration, the left side of (3.4.10) is

´ · ¸ Å∂ da j (z) ak (z )ei (β −β ) z ek × h∗j + e∗j × hk · ½ z dA = µ2 , ∂ z dz A k º» ¼ ¹ j

k

(3.4.12)

µ2δ jk

where the mode-orthogonality condition given in (2.3.33) has been used. Using this expression for the left side of (3.4.10), and substituting (3.4.9) on the right side of (3.4.10), gives

´ µ dadzj (z ) = 12 i ωeiβ z ±ε(x , y, z )e∗j ( x , y ) · E (z, y, z )d A, j

A

(3.4.13)

where the arguments are written explicitly. The right side of (3.4.13) depends on the form of the term ±ε(r)E(r) = ±P (r), where ±P(r) is the perturbation in the material polarization caused by the coupling. 9 See Marcuse (1972).

3.6 Historical Notes

149

This perturbation may be linear or nonlinear. To analyze a linear perturbation, (3.4.3) is used to expand the field in the perturbed waveguide in terms of the modes of the unperturbed waveguide so that

±ε(r)E(r) = ±ε( r)

Å k

ak (z)ek ( x , y )e−i βk z .

(3.4.14)

Substituting this expression on the right of (3.4.13) and interchanging the summation and integration yields the set of coupled differential equations given in (3.4.4). 3.4.2

Solution of the Coupled Equations

The number of coupled equations in (3.4.4) can be reduced by observing that for the coupling process to be significant, the term ei(β j −βk )z must vary slowly compared with the spatial variation of the envelope a(z ). This requirement is called phase matching. For modes that are not phase-matched, the rapidly varying spatial term ei(β j −βk )z averages to zero over the slowly varying changes in a(z ) and the mode coupling may be insignificant.

3.5

References A thorough discussion of ray optics can be found in the classic work of Born and Wolf (1980). The application of ray theory to optical fibers is provided in Okoshi (1982) and Gowar (1984). Many electromagnetic texts study the wave-propagation characteristics of metal waveguides that affect the form of Maxwell’s equation because the boundaries can conduct current. Because there is no free charge in a dielectric, the theory of dielectric waveguides differs from the theory of metallic waveguides. A practical introduction to modes in dielectric waveguides is given in Diament (1990). Advanced treatments include Marcuse (1972), Marcuse (1974), Okoshi (1982), Snyder and Love (1983), and Okamoto (2006). Mode coupling and the orthogonality of modes are discussed in Snyder (1972) and Kogelnik (1975). The analysis of lightwave propagation in graded-index fiber is discussed in Gloge and Marcatili (1973), and in Marcuse (1972).

3.6

Historical Notes It appears that the first derivation of the modes of a step-index optical fiber was given in Carson, Mead, and Schelkunoff (1936). The analysis of a fiber with an index that varies across the transverse dimension of the fiber appears in Kawakami and Nishizawa (1965). These results were mostly of theoretical interest until Kao and Hockham (1966) predicted that optical fiber transmission would be a viable alternative to electrical transmission if the loss could be reduced to 20 dB per kilometer. This goal was achieved four years later at Corning glass by the team of researchers Kapron, Keck, and Mauer (1970). This breakthrough heralded the beginning of guided lightwave communications. Linearly polarized modes were discussed in Gloge (1971a). An overview of the development of fiber optics is presented in Hecht (2004).

150

3 The Guided Lightwave Channel

3.7

Problems 1 Coupling efficiency into a fiber (a) Suppose that the radiation emitted by a lightwave source is conical, independent of , and has a small numerical aperture. Show that the solid angle subtended by the lightwave source is given by NA2 , where NA is the numerical aperture. (b) A source emits light with a power P and with an angular distribution I P cos , where I is the power per solid angle (with units W/sr) in the direction . Show that the coupling efficiency into the fiber is equal to NA 2. (c) The radiation pattern for many sources can be modeled as I m 1 P cosm 2 , where m is an integer. Find the coupling efficiency for a lightwave source of this form. (This should reduce to part (b) for m 1.)

φ

²

²≈π

θ/π θ

)

(θ) =

(θ)

(θ) = ( +

θ/ π

=

2 Coupling of a point source into a fiber A step-index multimode fiber has core and cladding refractive indices n 1 and n2 with n1 n2 , a core radius a, and a length L much larger than a. An isotropic point emitter (which emits light equally in all directions) is placed against one end of the fiber, centered along the axis of the fiber. The emitter emits a very short pulse of total energy E in , which is approximated by a power waveform at the fiber input Pin t E in t , where t is a Dirac impulse. Assuming that many modes propagate in the fiber and neglecting loss in the fiber and reflection from the end faces, use ray optics to determine the following. (a) The energy E that is launched into the fiber. (b) The power waveform Pout t at the fiber output. Note that because Pin t is proportional to t , this is simply a scaled form of the impulse response of the multimode fiber, with the scaling constant being the coupled energy.

>

( )=

δ( )

δ( )

δ( ) ()

()

3 Graded-index fiber A parabolic graded-index fiber with 2 has 0 1 and a radius a 50 µm. (a) Determine the spatial period 2 K of a ray trajectory. (b) Determine the maximum launch angle max of a ray that can be guided by the fiber. (c) Plot the ray components x z and y z when x0 y0 a 2 and x0 y0 . max (d) Describe the trajectory of the ray.

α= ³ = π/

θ

4 TE and TM modes (a) Starting with

()

±= .

θ

()

∇ × E = − ∂∂Bt , ∇ × H = ∂∂Dt , ∇ · B = 0, ∇·D=0

and the constitutive relationships

=

= = /

θ =θ =

3.7 Problems

151

B = µ0H , D = ε0 E + P , derive Maxwell’s equations restricted to a monochromatic field. (b) Now suppose that E(x , y , z ) = [ E x (x , y )½ x + E y (x , y )½ y + E z (x , y )½ z]e−i β z and − i β z H(x , y, z ) = [ Hx (x , y)½ x + H y (x , y )½ y + Hz ( x , y )½ z]e . Substituting this form into the monochromatic form of (2.3.1a) and into the monochromatic form of (2.3.1b), show that each transverse field component (E x , E y , Hx and Hy ) can be written in terms of the axial components (E z and Hz ) and thus show that a transverse electromagnetic (TEM) mode cannot propagate in a dielectric slab waveguide. 5 Boundary conditions for TE and TM modes (a) Specialize Maxwell’s equations to a monochromatic field propagating in the z direction when all field components have a dependence of the form ei(ωt −β z ) . (b) Now suppose that E E y x z y, as was the case for the slab waveguide. Determine the relationship between E y and Hz , and thus show that Hz is proportional to dE y dx. (c) Repeat part (b) if H H y x z y and show that E z is proportional to n2 n1 2 dHy dx. (d) Explain why there is a difference in the boundary conditions between the TE and TM modes for dielectric materials.

= ( , )½

/

( / )

/

=

( , )½

6 Derivation of the Bessel differential equation Starting with the form of the Helmholtz equation given in (3.3.19) in cylindrical coordinates and trying a solution of the form E z r z f r exp i z , use the separation-of-variables method to derive the Bessel differential equation given in (3.3.20).

( , ψ, ) = ( )

[− (νψ +β )]

7 Normalized frequency A fiber has the following specifications: index of refraction n 1 46, normalized index difference 0 0036, and core diameter d 8 3 µm. (a) Derive the expression for the normalized frequency V in terms of the following. i. The numerical aperture. ii. The index difference. (b) For a fiber that has a core index of 1.5 and 0 1%, what is the largest core that can support single-mode operation at a wavelength of 1 3 µm?

±= .

= .

±= .

= .

.

8 Coupling Using quantitative reasoning, explain why the inherent reversible nature of lightwave propagation does not imply that the coupling between two structures is the same in both directions. 9 Bending loss in a fiber A lightwave source with a numerical aperture NAs is coupled into a fiber with a numerical aperture NA f that has a circular bend radius rb shown in the figure below.

152

3 The Guided Lightwave Channel

Bend radius rb

Given rb and NA f and using ray optics, what is the largest numerical aperture of a lightwave source that can be coupled into the fiber without loss as the signal propagates through the bend? 10 Mode orthogonality (a) Starting with (3.3.45), derive explicit expressions for the LP01 mode and the LP11 mode. (b) Show that the modes are orthogonal with

´

A

L P01(r, ψ) L P11(r, ψ) dA

= 0,

where A is the cross-sectional region of the fiber. 11 Linearly polarized modes (requires numerics) The normalized frequency V of a step-index fiber is 4. (a) Using the mode characteristics of a linearly polarized mode given in Figure 3.15, determine which modes are guided in the fiber, and estimate the normalized propagation constant b (cf. (4.2.1)) for each guided mode. (b) For the two modes with the largest values of b, use a root-finding algorithm to numerically find the values of pa, qa, and b. (c) Using (3.3.45), plot the radial dependence of the intensity of the field for the mode closest to cutoff.

±

12 Mode parameters for small Using (3.3.12b) and (3.3.22), show that the condition ak0 n V , where a is the fiber radius.

³

±±

1 is equivalent to

13 Gaussian approximation to the LP01 mode (requires numerics) The shape of the LP01 mode resembles that of a two-dimensional gaussian function. This exercise investigates this relationship. 2 in the (a) Normalize the LP01 given by f r so that the total power f r mode is unity. (b) Determine the constant A in terms of so that an azimuthally two-dimensional 2 2 symmetric gaussian field g r Ae −r / 2σ has unit power. (c) Define the overlap between these two fields as

( , ψ)

κ

| ( , ψ)|

σ ( , ψ, σ) =

κ(σ) =.

µ´

A

f (r, ψ) g(r, ψ, σ)dA

¶2

,

where A is the transverse region of the fiber and d A is a differential area. Using numerics, plot κ(σ) and determine the value of σ that maximizes κ . (d) Using this result, determine the proportion of the power in the LP01 mode that can be represented by a gaussian function with the optimal value of σ .

3.7 Problems

153

14 Linearly polarized modes For a linearly polarized mode, the transverse field component E x is much larger than the axial field component E z . Similarly, Hy is much larger than Hz . Using this approximation and the monochromatic form of (2.3.1a) and (2.3.1b), with E x and Hy related by the plane-wave relationships given in (2.3.19), show that

∂ Ex ≈ ik n E ≈ 0 0 z ∂x

and

∂ H y ≈ ik n H ≈ 0, 0 z ∂y

where the derivatives of the axial field components E z and Hz are neglected as compared with the derivatives of the transverse field components E x and H y . 15 Mode confinement factor Derive an expression for Pcore P in terms of pa, qa, and V , where P is the total power carried in the fiber and Pcore is the power carried in the core. This value is called the mode confinement factor.

/

16 Cutoff conditions Show that as n1 goes to n2 , the cutoff condition for the HE21 mode can be simplified to J0 pa 0, which is the same cutoff condition as the TE 01 mode and TM01 mode given in (3.3.44), and the LP11 mode given in (3.3.29).

( )=

17 Characteristic equation for a linearly polarized mode with a small normalized frequency V (a) Starting with the linearly polarized mode characteristic equation given in (3.3.27), derive an expression for the characteristic equation for the LP 01 mode in terms of qa and V as V approaches zero. (The small-argument expansions of the Bessel functions will be needed: K 0 x log e eγ x 2 , where 0 5772 is Euler’s constant, x K 1 x 1, and x J1 x J0 x x 2 2.) (b) Using this characteristic equation, derive an approximate expression for the cladding field and show that the cladding field decays with a logarithmic dependence of the form E a 1 C log e r a . Determine the constant C.

γ= .

( )≈

( )[ −

( )≈− ( /) ( )/ ( ) ≈ / ( / )]

18 Modes of an infinite parabolic-index-profile graded-index fiber Let the inhomogeneous index-of-refraction profile for a graded-index fiber be given as x2 y2 n2 x y n 20 1 2 a2

( , )=

Â

µ + ¶Ã − ± ,

where a is the core radius. (a) Assuming a solution of the form U (r) = U (x , y, z ) = AU (x , y )e −iβ z ,

154

3 The Guided Lightwave Channel

where A is a constant, substitute this form of solution and the index-ofrefraction profile given above into the scalar Helmholtz equation

∇2 U ( x , y , z) + n(r )2 k02U (x , y , z) = 0, and show that

(b)

∂ 2U (r) + ∂ 2 U (r) + Âk 2n 2 µ1 − 2± µ x 2 + y 2 ¶¶ − β 2 à U (r) = 0. 0 0 ∂x2 ∂y2 a2 a2 Using the separation-of-variables method with U (x , y ) = f (x )h (y ), show that Æ 22 Ç 2k0 n0 ± 2 1 d2 f (x ) − x = −K1 , 2 f (x ) dx a2 Æ 22 Ç 1 d2 h( y) − 2k0an20 ± y2 = − K 2 , h( y) dy 2

where K 1 and K 2 are separation constants. (c) Show that the two separation constants satisfy

(d)

β 2 = k02n20 − K 1 − K 2. Making √ the change √of variables X = C x and Y = C y, where C = 1 / 2 (n0 k0 2±/a ) = V /a, show that the two equations can be written as d2 f ( X ) − (µ − X 2 ) f ( X ) = 0, dX 2 d2 h(Y ) dY 2

1

− (µ2 − Y 2 )h (Y ) = 0,

where µi = K i / C 2 for i = 1, 2. (e) The equations are now in a standard form with solutions given by 2 f ( X ) = Hi (X )e− X /2 , 2 h(Y ) = H j (Y )e−Y /2 ,

where Hi (x ) and H j (x ) are Hermite polynomials given by dn − x 2 e , dx n with µ1 = 2i + 1 and µ2 = 2 j + 1. Using these solutions, show that the form of the transverse scalar field can be written as ·√ x ¸ ·√ y ¸ −V (x 2+ y 2)/2a2 Hj e , V V U (x , y) = A i j Hi a a with the propagation constant given as Hn (x ) = (−1)n ex

β i j = n 0k0

Æ



2

2± 1−2 (i n 0 k0a

Ç1/2 + j + 1) .

4

The Linear Lightwave Channel

An ideal lightwave channel conveys signals without signal impairments. A practical lightwave channel conveys information at a high rate only by accommodating and mitigating signal impairments. These impairments can be classified as follows. Attenuation is the irretrievable loss of signal energy that affects the magnitude of the lightwave signal. Attenuation mechanisms were discussed in Chapter 3. Dispersion is a frequencydependent phase shift of the lightwave signal that redistributes the signal energy without loss. Linear dispersion, which is discussed in some detail in this chapter, occurs when the frequency-dependent phase shift and delay do not depend on the lightwave signal strength. Nonlinear dispersion, which is discussed in Chapter 5, occurs when the phase shift does depend on the lightwave signal strength. Dispersion need not be a fundamental limit to communication, but it may be difficult to accommodate. Dispersion creates interference by redistributing the energy from a pulse to other pulses within the same datastream or to pulses in other datastreams. Interference is discussed in Chapter 11. Distortion is the combined effect of attenuation and dispersion causing a change in the shape of a signal pulse. Noise is the uncontrollable random energy imposed on the signal due to a variety of causes. Sources of noise are discussed in Chapter 6. This chapter discusses several forms of linear dispersion. The resulting linear distortion is described first in general, then in detail. Mode-dependent group delay or intermodal dispersion redistributes the signal energy of a multimode fiber in time because the signal in each mode propagates at a different velocity. Wavelengthdependent group delay or intramodal dispersion redistributes the signal energy of a single spatial mode in time because each wavelength component of the signal in that mode propagates at a different velocity. Intramodal dispersion has a waveguide dispersion component that depends on the waveguide geometry and a material dispersion component that depends on the wavelength-dependent index of refraction. Waveguide dispersion is due to the modal structure, and is caused by the core and the cladding having different indices. Part of the mode power resides in the core and part resides in the cladding (cf. Figure 3.12). Material dispersion is caused by the frequency-dependence of the index of refraction. This suggests that different wavelength components have different phase shifts, as is indeed the case. Both intermodal dispersion and intramodal dispersion can depend on the polarization, resulting in polarizationmode dispersion. This chapter concludes with a detailed study of polarization-mode dispersion.

156

4 The Linear Lightwave Channel

n2

Fast ray

n1

θc

Two rays Single pulse in time

Fast ray

Slow ray

θ

n2

τ

= τ2 - τ1

L

Slow ray

Figure 4.1 The differential transit time is the difference in the transit time between the fastest ray, shown as a solid arrow and slowest ray, shown as a dashed arrow.

4.1

Ray Dispersion A simplified description of dispersion follows from a ray-optics model using the difference in the arrival time of rays launched at different angles, as shown in Figure 4.1. This time difference is called the differential transit time and depends only on the geometry of the fiber. The fastest ray travels along the central axis of the fiber, and has a delay τ1 = L /c = Ln 1/c0 , where the index n1 is a constant that depends only on the wavelength of the lightwave carrier. The slowest ray is guided along a longer path with multiple reflections at the critical angle θc , and has a delay τ2 = L /(c sin θc ). The maximum delay spread τmax = τ2 − τ1 between the fastest ray and the slowest ray is 1 1 − Ln τmax = c Ln sin θ c 0 ±c 0² Ln 1 n 1 − n2 = c n 0

=

Ln 1 ±, c0

2

(4.1.1)

where sin θc = n2 / n1 from (3.2.2), and (n1 − n2 )/ n2 ≈ (n1 − n 2)/ n1 = ± because n1 ≈ n 2. Other rays at other angles have delay spreads smaller than τmax. Taken together, the many rays spread an impulse at the input by Ln 1±/ c0 . As an example, let ± = 0.001, n 1 = 1.5, and L = 5 km. Then the maximum delay spread is τmax = 250 ns. Estimating the useable bandwidth as B = 1/τ = 4 MHz and a simple intensity modulation format with a spectral rate efficiency of 2 bits/s per hertz, the approximate information rate without any method of mitigation is 8 Mb/s. A graded-index fiber has a much smaller delay difference because rays coupled at larger angles propagate through a region with a lower index and so travel faster. This effect compensates in large part for the longer path length through the fiber. This compensation reduces the differential transit time, which thereby narrows the received pulse width and so increases the bandwidth. The ray-optics analysis of dispersion and the wave-optics analysis of dispersion are reconciled in Section 4.4, where it is shown that the differential transit time derived using ray optics is the lowest-order term of a more comprehensive treatment of dispersion using wave optics.

4.2 Wave Dispersion

4.2

157

Wave Dispersion To determine the effect of dispersion on a lightwave signal in the wave-optics signal model requires an expression for the delay as a function of both the mode and the wavelength. Because a Fourier decomposition of a pulse involves a distribution of frequencies or wavelengths, a wavelength-dependent group delay results in pulse spreading. An accurate model of the mode-dependent and wavelength-dependent group delay is developed using wave optics. The propagation constant β , which is the phase shift per unit length (in radians per meter), for a guided mode in a lossless waveguide is derived in Chapter 3. For a single frequency ω, this leads to a z-dependent phase shift given by e−iβ z . Because the mode structure varies with the frequency ω or, equivalently, with the wavelength λ , the propagation constant β for each mode varies with ω as well. The frequency-dependent propagation constant β(ω) is called the dispersion relationship. Because β(ω) can be different for different modes, it is indexed by m for the mth mode when appropriate. This section determines βm (ω) for two of the geometries that were discussed in Chapter 3. These are a slab waveguide and a step-index fiber with ± much smaller than one.

4.2.1

Dispersion in a Slab Waveguide

For a lossless waveguide, the dispersion in the mth mode is produced by a frequencydependent (and perhaps polarization-dependent) phase shift eiβ m(ω) z . The function βm (ω) contains all of the relevant information needed to compute the dispersion. However, because the function βm (ω) is indirectly described by several equations, it must be computed numerically for each mode. It is conventional to concisely restate the dispersion relationship β(ω) in terms of another function b(V ) to be described in this section. This widely used alternative function is defined so that the term b, called the normalized propagation constant, varies only between zero and one for any geometry. The function b(V ) depends on the normalized frequency V defined in (3.3.12). The form of the function b( V ) depends on the geometry and can be reconverted to β(ω) as needed. To numerically evaluate β(ω) , a concise and convenient way of expressing β(ω) is needed. To this end, define the normalized propagation constant b as (cf. (3.3.11))

µ

b

³ ´2 ³ pa ´2 =. qa = 1− , V V

(4.2.1)

where V = 2π(a/λ 0) n21 − n22 is the normalized frequency (cf. (3.3.12)). Because 2π/λ0 = k0 = µ ω/c0, the value of the normalized frequency V , which is also written V = ω(a/c0 ) n21 − n 22, is directly proportional to the frequency ω. The parameters b and V replace the parameters β and ω. Using (3.3.10b), (3.3.11), and (4.2.1), write the real propagation constant β in terms of the normalized propagation constant b as follows:

β 2 = n 22k02 + b k 20

³

n21 − n22

´

,

(4.2.2)

158

4 The Linear Lightwave Channel

and solve for b to obtain

(β 2 / k02 ) − n22 . n21 − n22 The term β/ k0 is confined to the interval n2 ≤ β/ k0 ≤ b=

(4.2.3)

n1, which means that b is confined between zero and one. The lower limit corresponds to a plane wave propagating in an infinite region with an index n2, which is the index of the cladding. Then b = 0. The upper limit corresponds to a plane wave propagating in an infinite region with an index n1 , which is the index of the core. Then b = 1. Using these two limiting values, the normalized propagation constant is confined to the interval 0 ≤ b < 1. When b equals zero, the mode will not propagate. The mode is then cut off . When b approaches one, the mode is well-guided, has most of its power in the core, and is nearly a transverse electromagnetic mode with the E field and the H field nearly transverse to the direction of propagation. Because β and k 0 depend on ω, the value of b also depends on ω or, equivalently, on the normalized frequency V . This leads to the normalized dispersion relationship b(V ) that is equivalent to β(ω), but is more convenient for computation. To derive the normalized dispersion relationship b(V ) for a slab waveguide, use the normalized spatial frequency pa that describes the oscillatory nature of the mode within the core and the normalized exponent qa that describes the rate of decay of the mode within the cladding. As shown in Section 3.3.1, the allowed values for pa and qa satisfy the equation of a circle ( pa )2 + (qa )2 = V 2 (cf. (3.3.11)), with the frequency-dependent value of V (cf. (3.1.1)) being the radius. The allowed values for pa and qa must also satisfy the geometry-specific characteristic equation for each mode. The characteristic equation for the first even TE mode as a function of pa and qa (cf. (3.3.9a)) is shown as a solid curve in Figure 4.2(a). Circles for several values of the (a) 4 3

(b)

d

First even TE mode

1

First even 0.8 b

c b

1

1

First odd

Second even

0.4 0.2

a 0

d

a

b

aq

0.6

2

c

2

pa

3

4

0

π

2

π

5

V

10

Figure 4.2 (a) Four points of the dispersion relationship for the first even TE slab-waveguide

15

mode. (b) Normalized dispersion relationship b(V ) for the first three TE (solid) and TM modes (dashed) for a slab waveguide with n1 = 1.5 and n 2 = 1. The labeled points in part (a) correspond to the labeled points in part (b).

4.3 Narrowband-Signal Dispersion

159

normalized frequency V are also shown as dashed curves. Figure 4.2(b) presents this relationship in a different way that is sometimes preferred. The intersections determine the normalized dispersion relationship b(V ) for each allowed mode. As an example, to derive b(V ) for an even TE mode of a slab waveguide, (3.3.9a) is used to eliminate qa in (3.3.11) so that

³ ´ ( pa)2 1 + tan2 ( pa) = V 2. (4.2.4) √ Then, using (4.2.1), substitute pa = V 1 − b into (4.2.4) to give ³ √ ´ b . (4.2.5) tan2 V 1 − b = 1−b Now replace tan (x ) by tan(x ± m π), take the square root of (4.2.5), and solve for V , ¶ ¸ · 1 b m π + arctan . (4.2.6) V (b) = √ 1−b 1−b The sign of tan(x ) = tan(x ± m π) is chosen to be consistent with (3.3.9a). The inverse of (4.2.6) is the normalized dispersion relationship b(V ). It is shown in Figure 4.2(b) for several even TE modes. Similar expressions can be derived for the normalized dispersion relationship for an odd TE mode, and for both an odd and an even TM mode. The dispersion relationships for the first three TE modes and the first three TM modes of a slab waveguide are shown in Figure 4.2(b).

4.2.2

Dispersion for a Linearly Polarized Mode

The normalized dispersion relationship b(V ) for each linearly polarized mode of a fiber can be determined by the same method as was used for the slab waveguide. Only the form of the expression changes. The intersection of a circle of radius V with a branch of the characteristic function of an LPν² mode given in Figure 3.15 specifies one point for the normalized dispersion relationship bν²(V ) for each LP mode. The functional forms of bν² (V ) for each of the first two LPν² modes are shown in Figure 4.3. For each LP mode, the solutions for pa and qa depend only on ± and not on the individual indices of the core and the cladding, provided that ± is much smaller than one. Therefore, a graph of the normalized dispersion relationship b(V ) provides a general set of curves that can be used to determine the propagation characteristics of a linearly polarized mode. These curves are valid for any values of n 1 and n2 provided that ± is much smaller than one. This is the reason why b(V ) is often preferred to β(ω) .

4.3

Narrowband-Signal Dispersion The dispersion relationship β(ω) for a single spatial mode in a single polarization mode is sketched in Figure 4.4 along with the magnitude of the spectrum S (ω) of a lightwave pulse s (t ). The lightwave signal is a narrowband waveform. An approximation to the dispersion relationship for any mode can be derived by approximating the dispersion

160

4 The Linear Lightwave Channel

(a) 4

(b)

LP 11

LP01

1 0.8

c

LP01

c

LP11

0.6 b

aq

2

b

0.4 b

d

0.2

d a

0

a

1

0

2 pa

2

3

4

4

5

6

7

8

V

Figure 4.3 (a) Characteristic equations for the LP01 and LP11 modes along with circles for several

values of the normalized frequency V . The intersections define the normalized dispersion relationship b( V ) for each mode. (b) b (V ) for the first two linearly polarized modes of a step-index fiber. The labeled points correspond to the points in part (a).

( )

β ω

β

S( ω)

ωc

ω

Figure 4.4 For a narrowband lightwave pulse, the dispersion relationship

represented by a power-series expansion about the carrier frequency.

β(ω) can be accurately

relationship β(ω) for that mode by the significant terms of a power-series expansion about the carrier frequency ωc . The coefficients of this expansion are related to the mechanisms that produce dispersion. The first few terms of a power-series expansion of β(ω) for a single mode about the carrier frequency ωc are

β(ω + ωc ) = β0 + β1 ω + 21 β2 ω2 + 61 β3 ω3 + · · · , where the coefficients are the derivatives of the dispersion relation evaluated at the carrier frequency ωc as follows:

(4.3.1)

β(ω) for that mode

4.3 Narrowband-Signal Dispersion

161

β0 = β(ωc ) = ωcc , ¹ dβ(ω) ¹¹ β1 = dω ¹ = v1 , g ω=ω ¹¹ 2 ¹ β2 = d dβ(ω) ω2 ¹¹ω=ω .

(4.3.2b)

τ = L/vg = β1 L .

(4.3.3)

(4.3.2a)

c

(4.3.2c)

c

The term c = ωc /β0 is known as the phase velocity. The term β1 is the group delay per unit length, and vg = 1/β1 is known as the group velocity. The modal group delay is defined as The modal group delay contributes the same delay over length L to all frequency components in the narrowband complex envelope of a pulse, and so of the pulse itself, as the pulse propagates through the dispersive medium. The modal group delay does not depend on the frequency (or wavelength) components of the baseband pulse, but may depend on the mode. The parameter β2 is called the group-velocity dispersion coefficient, or the chromatic dispersion. This parameter gives the frequency derivative of the modal group delay per unit length and corresponds to the frequency (or wavelength) dependence of the group delay. This term contributes a group delay per unit length that varies with the frequency and with the mode. The group velocity v g and the group-velocity dispersion coefficient β2 are constants that depend on the carrier frequency ωc , which is the frequency at which the expansion coefficients are evaluated, on the structure of the fiber, which controls the form of the modes, and on the dispersion relation β(ω) for a given mode supported by the fiber. The third-order term β3 in the expansion can usually be neglected unless the second-order term β2 is zero or close to zero. The frequency region in which the group delay increases as the frequency increases (or as the wavelength decreases) is called the normal dispersion regime. This means that shorter-wavelength components have a smaller group velocity. The frequency region for which the group delay decreases as the frequency increases (or as the wavelength decreases) is called the anomalous dispersion regime. In this regime, shorter-wavelength components have a larger group velocity. 4.3.1

Narrowband Dispersion

The conditions for guided wave propagation can be defined using the normalized propagation constant b defined in (4.2.3). As b approaches zero, the mode becomes poorly guided and the value of β(ω) approaches ω n2/ c0 . This is the same value as the propagation constant of a plane wave at frequency ω propagating in an infinite medium with an index n2 equal to the cladding index. The dispersion relationship β(ω) for this plane wave is a straight line (cf. (2.3.21)) with a slope β/ω = n2/c0 , which is shown in Figure 4.5(a) as the dotted line with the smaller slope. As b approaches one, the mode becomes well-guided and the propagation constant β approaches n1k 0. This is the same as the propagation constant of a plane wave propagating in an infinite medium with an

162

4 The Linear Lightwave Channel

(a) Forbidden region

Mode A

slope n1 /c 0 Mode A

β

Modal delay

(b) vgA

Guided modes

v–1 0A

Mode B

Mode B

v0–1B

vgB

cA

cB

slope n2 /c 0 Unguided modes Mode B cutoff

ωc

z= 0

z=L

ω

β(ω) for two modes. The phase velocity ωc /β is determined using the value of β(ω) at ωc . The group velocity is determined using the value of the slope of β(ω) at ωc . The dotted lines are the limiting values defined by the core index n1 and the cladding index n2 . (b) The proportion of the pulse power that propagates in each mode has a different group velocity. At a distance z = L , the two pulses separate in time.

Figure 4.5 (a) The solid lines sketch the dispersion relationship

index n1 equal to the core index. The dispersion relationship for this plane wave is a straight line with a slope β/ω = n 1/c0 , which is also shown in Figure 4.5(a) as the dotted line with the larger slope. For any guided mode, the allowed values of the dispersion relationship β(ω) lie in a region bounded by these two lines. A value of β(ω) that lies in the region above the upper line in Figure 4.5(a) corresponds to a mode propagating in a medium with an index greater than the core index n 1. This region is forbidden by the geometry of the problem. Values of β(ω) that lie in the region below the lower line are unguided modes. The schematic dispersion relationship for two modes is shown in Figure 4.5(a) along with the limiting slopes for a guided mode. For any given mode, as ω increases, the mode becomes better guided, with β approaching n1 k0 and b approaching one. For each value of ω, there may be more than one solution for the propagation constant β for a fiber that supports more than one mode, as is shown in Figure 4.3 using the normalized dispersion relationship b(v). For a given mode, the phase velocity and the group velocity for a passband pulse are determined from the dispersion relationship β(ω). The distinction is that the phase velocity applies to the carrier, whereas the group velocity applies to the narrowband complex envelope of the pulse a(z , t ). The carrier and the envelope travel at different velocities. To show this more directly, express the real passband envelope º a(z , t ) of a narrowband pulse in terms of the complex i arg[a( z ,t )] envelope a(z , t ) = |a (z, t )| e with temporal Fourier transform A(z , ω). Then

ºa(z , t ) =

»

¼

∞ 1 Re eiωc t A (z , ω) eiωt dω 2π −∞

where A(z , ω) = A (0, ω)e −iβ(ω)z β(ω) in the exponent. Then,

½

,

(4.3.4)

= A(0, ω)e−i(β z+β ωz) up to the first-order term for 0

1

4.3 Narrowband-Signal Dispersion

»

¼

163

½

∞ A (0, ω)e i[ω(t −β1 z )] dω ºa (z, t ) = 21π Re ei (ωc t −β0 z) −∞ = |a(z , t − β1 z )| cos (ωc t − β0 z + arg[a(z , t − β1 z )]) ,

(4.3.5)

where the shifting property of the Fourier transform given in (2.1.17) has been used. The first factor of (4.3.5) is the lightwave signal envelope | a(z, t − β1z )|. It has a group velocity given by 1 /β1 (cf. (4.3.2b)) where β1 is the slope of β(ω) evaluated at the carrier frequency ωc . These slopes are shown in Figure 4.5(a) for both modes. The second factor of (4.3.5) is the carrier. Because the phase of the narrowband complex envelope arg[a (z, t − β1 z )], as approximated, is constant in time over an interval 1/ωc and is constant in space over an interval 1/β0, the carrier has a phase velocity given by c = ωc /β0 . This velocity corresponds to a point on the dispersion curve. The phase velocity c and the group velocity vg are shown for each of the two pulses in Figure 4.5(b). The velocity of the carrier is denoted by an arrow that starts on the solid curve depicting the carrier. The velocity of the envelope is denoted by an arrow that starts on the dashed curve depicting the envelope. In general, the group velocity is different for each mode, as is the phase velocity. For a single pulse coupled into more than one mode, the mode-dependent group velocity will cause the energy in different modes to separate as the pulse propagates. This temporal separation at a distance L is shown in Figure 4.5(b). Because each mode experiences a different delay, the superposition of all modes at the output of the fiber produces a pulse that is different than the input pulse. This delay distribution due to the fiber geometry produces the distortion known as linear intermodal dispersion. This is characterized in Section 4.5.

4.3.2

Material Dispersion

The index of refraction is a property of the material. The frequency- or wavelengthdependence of the index of refraction n(ω) is called the material dispersion. For each guided mode in a fiber, the material dispersion can be quantified by considering plane-wave propagation in an unbounded medium with a dispersion relationship β(ω) = ωn(ω)/c0 given by (2.3.23). The group index N for. a narrowband lightwave pulse propagating in an unbounded medium is defined as N = c0/vg , where the group velocity v g is the velocity of the complex envelope of the pulse. The group index is the apparent fiber index as seen by the complex envelope. The group index N (ω) can be written as a function of ω, using (4.3.2b), as dβ(ω) 1 d(ω n(ω)) . N (ω) = = = c . vg dω c0 dω 0 1

Therefore, N (ω) =

d(ω n (ω)) dω

= n(ω) + ω dnd(ω) ω .

The group index N is also expressed in wavelength units by substituting ω and d ω = −(2π c0/λ 2)dλ to write

(4.3.6)

= 2π c0/λ

164

4 The Linear Lightwave Channel

1.49 N(λ)

)λ(N ,)λ(n

)mk mn(/sp( )λ(λD

1.48

100 0 –100 –200 –300 –400 –500 –600

1.47 1.46

n(λ)

1.45 1.44 0.5 0.6 0.7 0.8 0.9 (a)

–700 0.5 0.6 0.7 0.8 0.9 1

1

1.1 1.2 1.3 1.4 1.5 λ ( µm)

(b)

1.1 1.2 1.3 1.4 1.5

λ (µm)

Figure 4.6 (a) Plots of the index and the group index as a function of wavelength for a typical

silica glass. (b) Plot of the material dispersion coefficient.

dn(λ) , (4.3.7) dλ where n(λ) and N (λ) are understood to mean n (2π c0 /λ) and N (2π c0 /λ). The material dispersion is conventionally expressed using a material dispersion coefficient Dλ defined in terms of either the group index N (λ) or the index of refraction n(λ), as defined by N (λ) = n(λ) − λ



dN (λ) 0 dλ ± ² 1 d dn (λ) n(λ) − λ c0 dλ dλ

= c1 =

2 = − cλ d dnλ(λ) . 2 0

(4.3.8a)

(4.3.8b)

The units of Dλ are time-distance−2 and are generally written as ps/(nm · km). Plots of n(λ), N (λ), and Dλ for silica glass in these units are shown in Figure 4.6. Figure 4.6 shows that over the range of wavelengths used for most lightwave communication systems (800 to 1600 nm), the index varies by less than 1%, and the group index reaches a minimum at a wavelength near 1300 nm. The dispersion coefficient Dλ , which is proportional to the derivative of the group index, is zero at this minimum. This wavelength is called the zero material dispersion wavelength. Near this wavelength, the second-order term β2 is close to zero, so the third-order term β3 in (4.3.1) can be important.

Dispersion and Absorption in a Material

In a lossy dispersive material, a fundamental relationship always exists between the real frequency-dependent material absorption coefficient αm (ω) (cf. Section 3.1.2) and the real frequency-dependent index of refraction n(ω). To derive this relationship, consider a plane wave propagating with attenuation in the z direction. The z dependence of the complex electric field in a lossy material can be written as E(z ) = E0e−ik0 n(ω) z e−(αm (ω)/2)z ,

(4.3.9)

4.3 Narrowband-Signal Dispersion

165

where k0 = 2π/λ0 is the free-space wavenumber. The absorption coefficient for the field amplitude αm (ω)/2 includes a factor of two because αm (ω) itself is defined as the absorption coefficient for the lightwave power. The resulting complex dispersion relationship β(ω) is defined as

β(ω) =. k0 n (ω) − iαm (ω)/2,

(4.3.10)

where n(ω) is the real frequency-dependent index of refraction. Then E0e−ik0 n (ω)z e−(αm (ω)/2)z

= E0 e−iβ(ω)z .

(4.3.11)

For a lossless material β(ω) = k0 n(ω) is real, but for a lossy material an imaginary part is introduced into the dispersion relationship to incorporate loss into that expression. When β(ω) is complex due to the absorption coefficient αm (ω), the index of refraction can be regarded as complex as well, and then denoted nc (ω). Using β(ω) = k0 nc (ω) and (4.3.10), the complex index is written as nc (ω) = n(ω) − iαm (ω)/2k0,

(4.3.12)

showing that the real index n(ω) is the real part of the complex index nc (ω) and that the material loss coefficient αm (ω) is proportional to the imaginary part of the complex index nc (ω). The complex index nc (ω) and the spectral susceptibility χ(ω) (cf. (2.3.9)) are related √ by nc (ω) = 1 + χ(ω). For a weakly absorbing material with |χ(ω)| much smaller than one, this is approximated by nc (ω) ≈ 1 +

1 2

χ(ω).

(4.3.13)

Using (4.3.12) and equating the real part and the imaginary part of each side gives

αm (ω) ≈ −k0 χ (ω), n(ω) ≈ 1 + 21 χ (ω). I

R

(4.3.14a) (4.3.14b)

Recall that the Fourier transform of the spectral susceptibility χ(ω) is the temporal susceptibility X (t ). Because the time-domain function X (t ) of the physical system is a causal impulse response, the real part χ R (ω) and the imaginary part χ I (ω) of the spectral susceptibility χ(ω) are related by the Kramers–Kronig transform given in (2.1.23). Therefore, for a weakly absorbing material, the material absorption coefficient αm (ω) and the real index n (ω) given in (4.3.14) are also related. 1 Plots of αm (ω) and ²n(ω) are shown in Figure 4.7 for the functional form of χ(ω) given in (2.3.9). The relationships given in (4.3.14) are valid for any material for which the material response is appropriately modeled as a simple harmonic oscillator described by the complex spectral susceptibility χ(ω) given in (2.3.9). The use of this model implies that any dispersive material must have absorption. However, the model does not require the absorption to be strong in a specific wavelength band. 1 In general, the magnitude and phase are uniquely related by a Kramers–Kronig transform only when the

phase satisfies a minimum phase condition. For details see Bode (1950) and Lenz, Eggleton, Giles, Madsen, and Slusher (1998).

166

4 The Linear Lightwave Channel

stinU yrartibrA

Change in index Δ n

Absorption α m

0

0

ω

Frequency ω

α

Figure 4.7 (a) Absorption coefficient m and the change in index

dielectric material.

±n for a weakly dispersive

For example, while silica glass has significant absorption in the ultraviolet region of the spectrum, it has essentially no absorption in the wavelength bands used for lightwave communications. In these bands, the material absorption αm is negligible compared with the scattering loss αs (cf. Section 3.1.2). Therefore, the total attenuation coefficient κ = αm + αs defined in (3.1.3) is dominated by the loss from Rayleigh scattering (cf. Section 3.1.2) and not by material absorption. For narrowband lightwave signal propagation in a silica glass fiber, the Rayleigh scattering is nearly constant with frequency. Under this approximation, narrowband lightwave signal propagation in a fiber is modeled using a constant attenuation coefficient κ , which does not depend on frequency, and a frequency-dependent index n(ω). This means that while the attenuation reduces the overall energy in a narrowband lightwave signal, it does not change the shape of the pulse and it does not produce distortion. For other materials used in lightwave communication system components, such as semiconducting materials, both the frequency-dependent absorption αm (ω) and the frequency-dependent index n(ω) can produce signal distortion. 4.3.3

Narrowband Signal Propagation

The governing equation for linear narrowband signal propagation in a dispersive medium with the bandwidth of the complex envelope much smaller than the carrier frequency ωc can be derived using a frequency-domain signal A (0, ω) that is the Fourier transform of the complex envelope a (0, t ) (cf. (3.3.47)) in a single mode at z = 0. This signal experiences a frequency-independent attenuation e−κ z /2 and a frequency-dependent phase shift e−iβ(ω+ωc ) z as it propagates in the z direction. The frequency-domain signal A (z , ω) at a distance z is given by

.

A(z , ω) = A (0, ω) e−(κ/2+i β(ω+ωc ))z .

Using the power-series expansion of the dispersion relationship (4.3.1) along with (4.3.2), this equation is written as A (z , ω) = A (0, ω)eD z ,

(4.3.15)

β(ωc + ω) given in (4.3.16)

4.4 Group Delay

167

where the term D is defined as

D =. −κ/2 − i(ω/vg ) − iβ2 ω2/ 2,

(4.3.17)

with the term β0, which amounts to a z-dependent phase e−iβ0 L of the carrier frequency, suppressed in the expansion of β(ωc + ω). Given the form of (4.3.16), the spectrum A(z , ω) of the complex envelope a (z, t ) is the solution to the differential equation d A(z , ω) = A (z, ω) D dz

³ ´ = A(z, ω) −κ/2 − i (ω/vg ) − iβ2ω 2/2 .

(4.3.18) (4.3.19)

Use the differentiation property of the Fourier transform given in (2.1.12) on the inverse temporal Fourier transform of A (z , ω) to produce the partial differential equation

∂ a(z , t ) + 1 ∂ a (z, t ) + κ a(z , t ) − i β ∂ 2a (z , t ) = 0. 2 ∂z vg ∂ t 2 2 ∂ t2

(4.3.20)

This linear partial differential equation governs propagation of a narrowband complex envelope a(z , t ) in a fiber. Another form of this equation is obtained by defining the . traveling timeframe τ = (t − z /vg ) in the reference frame of the traveling envelope. In the traveling timeframe, (4.3.20) is

∂ a(z , τ) + κ a (z, τ) − i β ∂ 2a (z, τ) = 0. 2 ∂z 2 2 ∂τ 2

(4.3.21)

When z is regarded as a parameter, (4.3.21) describes a family of linear time-invariant systems. This viewpoint is adopted in Section 8.1 where the impulse response h (t ) and the transfer function H ( f ) of narrowband, linear time-invariant signal propagation in an optical fiber are discussed. In Section 5.4.1 of the next chapter, a generalized nonlinear form of this partial differential equation that governs the propagation of the complex envelope in a material with a nonlinear response is developed.

4.4

Group Delay Group delay is the delay that the modulating pulse experiences as it propagates within a fiber. The group delay differs from mode to mode and differs from wavelength to wavelength within a given mode. The wavelength dependence of the group delay within a mode is the source of group-velocity dispersion for that mode. The mode-dependent and wavelength-dependent group delay leads to a group-delay spread. The spread of group delays is an important topic because it determines the baseband bandwidth of a modulated lightwave at the output of an optical fiber. This section studies both mode-dependent group delay and the wavelength-dependent group delay within a mode. Because the dispersion relationship βm (ω) depends on both the frequency ω and the mode m (cf. Figure 4.3), each mode experiences a different group delay β1m L over a . distance L, where β1m = dβm (ω)/dω at ω = ωc (cf. (4.3.2b)).

168

4 The Linear Lightwave Channel

The mode-dependent group delay for any mode can be determined from the dispersion relationship βm (ω) for that mode. This topic is the first and longest topic of the section. First, an approximate expression is derived for a multimode step-index fiber with a normalized index difference ± much smaller than one. This expression is then specialized to a graded-index fiber and a step-index fiber. Finally, a different, more accurate expression valid for an arbitrary number of linearly polarized modes in a step-index fiber is derived.

4.4.1

Mode-Groups in a Step-Index Fiber

Because there may be hundreds of modes that can propagate in a multimode fiber and their individual group delays may cluster, it is convenient to organize the modes into mode-groups. A mode-group consists of all of the modes that have the same, or nearly the same, modal group delay. Portions of a common launched pulse propagating in different modes within a mode-group have similar group delays and can be studied together without reference to a specific mode in that mode-group. To develop an expression for the group delay of a mode-group in a fiber that supports a large number of modes, consider the dependence of the propagation constant β on the parameter pair (ν, ²) specifying the linearly polarized modes in a step-index fiber. The approximate dependence of β on the parameter pair (ν, ²) can be determined using the large-argument expansion Jν ( pa ) ≈ (2/πv)1/2 cos ( pa − νπ/2 − π/4) for the Bessel function. Substituting the approximation into the cutoff condition Jν−1(V ) = 0 given in (3.3.29) gives cos( pa − νπ/2 + π/4) ≈ 0.

The cosine function is zero whenever the argument is equal to (²+1/2)π for an integer ². Solving for pa gives

(4.4.1) ≈ π2 (ν + 2² + 1/2). All pairs (ν, ²) with the same value of ν + 2² have nearly the same value of pa and, consequently, nearly the same value of the propagation constant β. The set of modes with the same value of ν + 2² defines a mode-group. The nonnegative integer g = ν + 2² pa

is called the mode-group index. Each mode in the mode-group is regarded as having the same value of β, with pulses in that mode-group traveling at the same group velocity, so the propagation constant β can be indexed by g in place of m. Accordingly, the total pulse energy within a mode-group can be treated as forming a single propagating pulse with regard to the group delay. The multiple pulses remain aligned, so there is stronger mode coupling between modes within the same mode-group than between modes in different mode-groups. The coupling between the modes in a mode-group redistributes the power among the modes in that mode-group. Over a distance long enough that the mode coupling is significant and approaches equilibrium, the redistributed power can be treated as a uniform distribution across all modes within a mode-group. For large values of the mode-group index g, there are approximately g/2 nonnegative paired integer solutions (ν, ²) to the expression g = ν + 2². Each (ν, ²) pair

4.4 Group Delay

169

with nonzero ν corresponds to both a sine dependence and a cosine dependence for the azimuthal component. This statement holds for both orthogonal polarization modes. Therefore, each such (ν, ²) pair corresponds to four modes. Because there are about g/2 nonnegative integer solutions to the equation g = ν + 2², there are about 2g modes in most mode-groups. Substituting g = ν + 2² into (4.4.1) and solving for pa leads to pa ≈ gπ/2 for large g, where the factor of one-half inside the parentheses of (4.4.1) has been neglected. The maximum value of the mode-group index g, denoted G, occurs at the mode cutoff condition when pa equals the normalized frequency V . Substituting g = G and pa = V into pa ≈ g π/2 gives G ≈ 2V /π . Summing over the mode-groups, the total number of linearly polarized modes M in a large-core step-index fiber is approximately M



G ¾ g=0

2g

2

≈ G 2 ≈ 4 πV 2 ,

(4.4.2)

under the condition that G 2 is much larger than G .

4.4.2

Mode-Groups in a Graded-Index Fiber

The index of refraction profile n (r ) for a graded-index fiber is conventionally modeled using a power-law profile of the form given in (3.3.49). This expression, repeated here, is

³

n2 (r ) = n 20 1 − 2±

³ r ´α ´ a

,

where ± is much smaller than one, α is the power-law index profile parameter defined in (3.3.49), n0 is the index at the center of the fiber core, and a is the core radius. These terms are shown schematically in Figure 3.1. The power-law index profile presented in Figure 3.18 shows that, as α goes to infinity, the index profile approaches that of a step-index fiber. For α equal to two, the index profile is parabolic, as given in (3.2.10). For a weakly guided graded-index fiber with a power-law index profile, the total number of linearly polarized modes M in both polarizations in the graded-index fiber can be approximated by (cf. (4.4.2)) (4.4.3a) ≈ G 2 = α +α 2 ±a 2n20 ω2 /c2 2 ≈ V2 α +α 2 , (4.4.3b) where (3.3.12b) has been used to write V 2 ≈ 2± a2n 20ω 2/c2 for a graded-index fiber. An approximate dispersion relationship βg (ω) for a multimode graded-index fiber M

expressed in terms of the mode-group index g is the topic of this section. The dependence of β on g defines the intermodal dispersion. The dependence of β on ω defines the intramodal dispersion.

170

4 The Linear Lightwave Channel

The dispersion relationship βg (ω) for a graded-index fiber with the power-law index profile in (3.3.49) is given by2

( ) βg (ω) ≈ n0 k0 1 − 2± Z (g ³) 1/2 ,

where

.

Z (g³ ) = (g³ )2α/(α+2) .

(4.4.4) (4.4.5a)

In this expression g³ = g/G is defined as the normalized mode-group index. It satisfies 0 ≤ g³ ≤ 1. Solving for G in (4.4.3a) and substituting the resulting expression into (4.4.5a) gives

± (α + 2)g 2c2 ²α/(α+2) (ω2 n20±)−α/(α+2), (4.4.5b) a2 α where n0 and ± are functions of ω when the material dispersion is considered. Expression (4.4.4) provides a closed-form approximate expression for βg(ω) for a Z (g) =

multimode graded-index fiber with a power-law index profile. It can be used to assess both the mode-group-dependent delay, which depends on the mode-group index g, and the wavelength-dependent group delay, which depends on the frequency ω. The propagation constant βg (ω) defined in (4.4.4) varies from βmax = n0 k0 when g equals zero for the lowest-order mode-group to βmin = na k0 when g equals G for the highest-order mode-group, where 1 − ± = na / n 0 has been used and na is the index at the core/cladding interface (cf. Figure 3.1(c)). The plane-wave dispersion relationships for these limiting cases are straight lines as shown in Figure 4.5(a).

Group Delay as a Function of the Mode-Group

When there is a large number of modes, the normalized mode-group index g³ = g/ G can be approximated as a continuous quantity. Using (4.4.4) and the approximation √ 1 − δ ≈ 1 − 21 δ − 81 δ 2 gives the mode-group delay as (cf. (4.3.3))

τ ≈ β1 L = cL ddω 0

³

³

n0 ω 1 − ± Z

´´ − 12 ± 2 Z 2 ,

(4.4.6)

where k0 = ω/c0. The derivative d(n0 ω)/dω , evaluated at the center of the graded-index core, denoted Nc , is the group index of refraction at the center of the core (cf. (4.3.6)). Substituting this expression into (4.4.6) and evaluating d Z /dω leads to an approximation of the mode-group delay,

τ≈

L Nc c0

±

1+±

α − 2 − 2y Z + ±2 3α − 2 − 4y Z 2 ² , α+2 2 α+2

(4.4.7a)

where Nc is the group index of refraction at the center of the core, and y

= Nn 0ω± dd±ω = − Nλn±0 dd±λ . c c

The derivation of (4.4.7) is asked for as an end-of-chapter exercise. 2 See Gloge and Marcatili (1973).

(4.4.7b)

4.4 Group Delay

171

Expression (4.4.7b) accounts for the frequency dependence of the normalized index difference ± between the core and the cladding. This frequency dependence is neglected in the analysis of group delay in a step-index fiber (cf. (4.4.9)). For a graded-index fiber, the frequency dependence is significant for values of α near two because the linear term in (4.4.7a) would be zero were the correction term not included.

4.4.3

Step-Index Multimode Fiber

The dispersion relationship βg (ω) for a power-law index profile given in (4.4.4) is specialized to a step-index fiber in this section. In this regard, note that as the power-law profile parameter α approaches infinity, the index profile approaches that of a step-index fiber. The term α/(α + 2) in (4.4.3b) then approaches the value one. For this case, the total number of modes supported in the two polarizations is approximately V 2/ 2. The total number of modes supported in the two polarizations in a multimode stepindex fiber based on the approximation given in (4.4.2) is approximately 4V 2/π 2 . This value differs by a factor of 8 /π 2 compared with the approximation based on a power-law profile parameter α given in (4.4.3b). Because 8/π 2 is close to one, the two approximate analyses are in reasonable agreement.

Group Delay as a Function of the Mode

The group delay for an LPν² mode of a step-index fiber depends on the two parameters (ν, ²) that characterize the azimuthal and radial dependence of the mode. Designating the pairs of parameters (ν, ²) by the single mode index m and ignoring polarization, this section derives an expression for the modal group delay τm under the narrowband approximation of the dispersion relation β(ω). This expression is derived for an individual mode and remains valid even for a fiber that does not support a large number of modes. Therefore, it avoids using the mode-group index g. This treatment provides a more accurate expression for the mode-dependent group delay τm for a linearly polarized mode of a step-index fiber. To derive the mode-dependent group delay τm , start with the expression for the dispersion relationship given in (4.2.2). For the mth mode, (4.2.2) can be solved for βm (ω) as µ βm (ω) = ω cn2 1 + bm (n21 − n 22)/ n22 , 0

where k0 has been replaced by ω/c0 , and the indices n1 and n2 depend on frequency because of material dispersion. This is approximated for ± ´ 1 (cf. (3.2.5)) by using (n 2 − n2 ) / n 2 = (n − n )(n + n )/n 2 ≈ 2± and approximating the square root in a 1 2 1 2 1 2 2 2 power series up to the first-order term. Then

βm (ω) ≈ ω nc2 (1 + bm ±) ≈ ω nc 1 (1 + bm ±). 0

0

The approximation of the square root holds because 0 than one.

(4.4.8)

≤ bm < 1 and ± is much smaller

172

4 The Linear Lightwave Channel

Most terms on the right are functions of ω, so the left depends on ω through these terms and is written as βm (ω). Evaluating the derivative of βm (ω) with respect to ω at the carrier frequency ωc , the mode-dependent group delay τm defined in (4.3.3) is

τm = L dβdmω(ω) ± ² d(n1 (ω)±(ω)) d(ωn1 (ω)) L d(ωbm (ω)) + dω = c n1 (ω)±(ω) dω + ωbm (ω) . dω 0

(4.4.9)

Examining this expression, the second term is the derivative of (n1 (ω) − n2(ω)) with respect to ω. Because the core index n1 (ω) and the cladding index n2 (ω) are nearly equal and have nearly the same frequency dependence, this term is negligible for a step-index fiber. The last term in (4.4.9) is the frequency-dependent group index N1 (ω) defined in (4.3.6). Now use V = ωa NA/ c0 (cf. (3.3.12b)) and dω = dV c 0/(a NA) to obtain the group delay τm for the mth mode as

τm ≈

L c0

±

d( V bm ) N 1 + n1 ± dV

²

,

(4.4.10)

where the dependence of the terms on ω is suppressed. The first term is the group delay L N1 /c0 caused by material dispersion. The second term is the group delay caused by the waveguide geometry. Comparing the second term with the differential transit time τ = Ln 1±/ c0 derived using ray optics (cf. (4.1.1)), an additional multiplicative factor d(V bm )/dV is seen in (4.4.10) for the dependence of the delay on the waveguide geometry. This additional factor is due to the wave nature of the modes and is plotted in Figure 4.8 for the LP01 and LP 11 modes. For a large normalized frequency V corresponding to a waveguide with a large radius compared with the wavelength, Figure 4.8(a) shows that the factor d(V bm )/dV approaches one. For this limiting case, the second term in (4.4.10), derived using wave optics, approaches the maximum delay spread derived using ray optics, as given in (4.1.1). This conclusion validates the use of ray optics to determine the maximum delay spread for a large-core fiber caused by the fiber geometry. (a)

(b)

1.2

LP01

τ

1 0.6

LP 01

edutilpmA

V d/) bV(d

0.8

LP11

0.4

LP11 Sum

0.2 1

2

3 V

4

5

6

Time

( )/dV for the first two linearly polarized modes along with a schematic representation of the frequency content of a lightwave pulse. (b) The output due to a single pulse on a noncoherent carrier coupled into two modes with different group delays. The output pulse power is the sum of the powers in each mode.

Figure 4.8 (a) Group-delay factor d V bm

4.4 Group Delay

173

For an arbitrary value of V , the group delay for each mode can be determined from Figure 4.8. As an example, for V equal to 3, the group delay for the LP01 mode and the group delay for the LP11 mode are approximately equal. Therefore, at this value of V , the two modes travel at the same group velocity and, accordingly, belong to the same mode-group. A pulse that propagates in only these two modes has no intermodal dispersion. However, referring to Figure 4.8 for V equal to 2.75, the group delay for the LP01 mode is larger than the group delay for the LP11 mode. In contrast, for V larger than 3, the group delay is larger for the LP11 mode. This means that the relative group delay between any two modes is a strong function of the wavelength as expressed by the normalized frequency V (cf. (3.3.12)).

4.4.4

Wavelength-Dependent Group Delay

The intramodal dispersion or wavelength-dependent group delay describes the spreading of a pulse in a single mode due to the frequency or wavelength dependence of the dispersion relationship β(ω). This spreading can be quantified by the wavelength dependence of the group delay, denoted τ(λ) . An approximation to τ(λ) is given by the first two terms of a power series evaluated at the carrier wavelength λc . The approximate wavelength-dependent group delay, denoted τλ , is

¹ dτ(λ) ¹¹ . τλ = τ(λc ) + (λ − λc ) dλ ¹

λ=λc

= τ(λc ) + (λ − λc )sλ, (4.4.11) where the second term is characterized by sλ = dτ/dλ| λ=λ , which is the slope of the wavelength-dependent group delay evaluated at the carrier wavelength λc . This approximation is analogous to the approximation for the dispersion relationship β(ω) given in (4.3.1). That expression relates the group-velocity dispersion coefficient β2 to the slope c

of the modal group-delay term. The expression in (4.4.11) relates the term sλ of the wavelength-dependent group delay to the slope of the group-delay term τλ (cf. (4.4.10)). It is given by sλ

=

dτ(λ) dλ

= =

± »

½²

d L d(V b) ( n 1±) + N1 d λ c0 dV » L d(n 1±) d( V b) + (n1 ±) ddλ c0 dλ dV

± d(V b) ² dV

½ + dNd1λ(λ) ,

where all derivatives are evaluated at the carrier wavelength λ c , and the parameters b, V , n, N , and ± all depend on λ. The first term on the right is nearly equal to zero because the dependence of the core index n1 on λ is nearly the same as the dependence of the cladding index n2 on λ (cf. (4.4.9)). The middle term can be written as

± V b) ² ± d(V b ) ² dV d = (n1 ±) dV . (n1 ±) ddλ d(dV dV dλ

Finally, the last term in the expression for sλ is proportional to the material dispersion term Dλ as expressed in (4.3.8),

174

4 The Linear Lightwave Channel

dN 1(λ) dλ Combining these statements gives sλ

≈L



n1 ± c0



= c 0 Dλ .

d2 (V b) dV 2

¸

dV dλ

¸

+ Dλ .

(4.4.12)

The first term in (4.4.12) is called the waveguide-dispersion coefficient and denoted Dguide . It can be further simplified using V = 2π NA λ−1 a from (3.3.12b) and dV = −(V /a)dλ. This gives Dguide = −

n1 ± d2 (V b ) V . c0λ dV 2

(4.4.13)

Plots of the factor V d2 (V b)/dV 2 used to describe waveguide dispersion and the factor d(V b )/dV used to describe the group delay (cf. (4.4.10)) are shown in Figure 4.9 for the first two linearly polarized modes. The waveguide-dispersion factor V d2 (V b )/dV 2 for each mode, shown as a dashed curve in Figure 4.9, is proportional to the slope of the group-delay factor d(V b )/dV , shown as a solid curve. This derivative relationship is shown separately in Figure 4.8. Figure 4.9 plots the group-delay factor d( V b)/dV for the same two modes as well as the slope of that term for each mode at the value V = 3. For each mode, the slope at that value of V is proportional to the waveguidedispersion factor for that mode. The waveguide dispersion varies from mode to mode, and can be different even for two modes with the same group delay. This is considered in an end-of-chapter exercise. The total group-velocity dispersion coefficient D is defined as D

=. Dguide + Dλ,

LP01

1.5

LP 11

1.5

1.25

1.25

1

1.00 0.75

0.75

0.5

0.5

0.25

0.25

0

0 –0.25

(4.4.14)

–0.25 1

2

3

4

5

6

V

2.5

3

3.5

4

4.5

5

5.5

V

( )/

Figure 4.9 Plots of the group-delay factor d V b dV (solid curve) given in (4.4.10) and the waveguide-dispersion factor V d2 V b dV 2 (dashed curve) given in (4.4.13) for the first two

linearly polarized modes.

( )/

6

4.5 Linear Distortion

175

and has a component related to the waveguide dispersion coefficient Dguide and a component related to material dispersion coefficient Dλ . Using this definition, (4.4.12) becomes sλ

= L D.

(4.4.15)

The total group-velocity dispersion coefficient D, which is expressed in terms of the wavelength, can be related to the group-velocity dispersion coefficient β2(ω) given in (4.3.2c), which is expressed in terms of the frequency using β2 (ω)dω = D d λ . Solving for D and using dω/dλ = −2πλ2/ c0 gives3 D=−

2π c

λ2 β2 (ω).

(4.4.16)

Within the narrowband approximation, the dispersion coefficient D is the principal parameter that characterizes the wavelength-dependent group delay within a single mode.

4.5

Linear Distortion Linear distortion is caused by the combination of linear dispersion and attenuation. Linear dispersion is caused by the composite of the delay in separate wavelengths, modes, and polarizations. A single-mode fiber has linear dispersion because, in general, each wavelength of a pulse propagates at a different velocity. A multimode fiber also has additional dispersion because different modes can have different delays. These delays are quantified by dispersion relationships (cf. Figure 4.3(b)) that may depend on the polarization. For a narrowband signal in a single mode, the attenuation is appropriately modeled as independent of wavelength (cf. Section 4.3.2). Attenuation can be mode-dependent and polarization-dependent. It is due to both the fiber structure and external factors such as bending and coupling. In this section, attenuation is simply modeled as a scaling factor that does not depend on the wavelength, mode, or polarization. Later, in Section 4.6.4, polarization-dependent loss is considered. Here, we provide a comprehensive treatment of the nature of the distortion and its relationship to the characteristics of the optical fiber.

4.5.1

Distortion from Mode-Dependent Group Delay

A span of a multimode fiber of length L has a distribution of group delays τm because, in general, different modes travel at different group velocities. The proportion of the 3 If the group-velocity dispersion coefficient is expressed using frequency units of hertz instead of radians/s,

then the factor of 2π is omitted. Typical units for D are ps/( nm · km ). If D = 17 ps/(nm · km) at a wavelength of 1500 nm, then β2 is approximately −125 ps 2 /km in frequency units of hertz and is −20 ps2 /km in frequency units of radians/s.

176

4 The Linear Lightwave Channel

signal power Fm in each mode depends on the coupling of the lightwave source into the fiber modes at the input to the span as well as mode coupling among modes as the signal propagates within the span. This leads to a distribution of the total power P among the modes at the output of the fiber segment.

Delay Spread

A multimode fiber used with a noncoherent lightwave source results in a channel that is linear in the lightwave power. This is discussed extensively in Section 8.2.2. For a channel that is linear in the lightwave power, define the delay spread σ inter as 2 σinter =. τ 2 − τ 2 ,

(4.5.1)

∑M F τ is the weighted average of the modal group delays τ , where τ = m m =1 m m where the weight Fm is the proportion of the power in the mth mode, and where τ 2 = ∑mM=1 Fm τm2 is the weighted average of the square of the group delays. The delay 2 spread σinter characterizes the intermodal dispersion. The root-mean-squared delay spread for a fiber channel that is linear in the lightwave power is interpreted in Chapter 8 as the width of the impulse response h(t ). The width of the transfer function H ( f ), which is the Fourier transform of h (t ), quantifies the effective bandwidth B and is approximately the reciprocal of the delay spread σinter . For a multimode fiber that supports a large number of modes, the mode-group delay can be regarded as a continuous function τ( g³ ) of the normalized mode-group index g³ 2 as given in (4.4.6). Accordingly, the delay spread σinter is given by 2 σinter = τ 2 − τ2 =

¼1 0

F (g³ )τ 2 (g³ )dg ³ −

¶¼ 1 0

F (g³ )τ(g³ )dg ³

¸2

,

(4.5.2)

where F (g³ ) is the continuous equivalent of Fm . It is the power-density distribution as a function of the normalized mode-group index g³ . A plot of the root-mean-squared delay spread per unit length σinter / L is shown in Figure 4.10 as a function of the power-law index profile parameter α given in (3.3.49) for three different modal power-density distributions F (g ³ ). The figure shows that the root-mean-squared delay spread depends both on the modal power-density distribution F (g³ ) and on the power-law index profile parameter α of the graded-index fiber. Comparing curve (a), which uses a uniform power-density distribution across all normalized mode-groups, with curve (c), which uses a uniform power-density distribution across that 20% of the mode-groups having the smallest normalized mode-group index g ³, the minimum root-mean-squared delay spread decreases by more than an order of magnitude using a different value of α to achieve the minimum delay spread. The modal power-density distribution F (g³ ) at the output of the fiber span depends on the strength of the mode coupling and can change over time due to environmental conditions. Because the reciprocal of the root-mean-squared delay spread is a measure of the effective bandwidth, this means that the effective bandwidth of a multimode gradedindex fiber is a sensitive function of α and of the modal power-density distribution F (g³ ), which can vary with time. When the modes are strongly coupled in a random manner,

4.5 Linear Distortion

177

1.000 )mk/sn( htgneL tinU rep daerpS yaleD SMR

(a)

0.100

(b) (c)

0.010

0.001 1.8

1.9

2.0 Power±law Parameter α

2.1

σ

2.2

/

Figure 4.10 Root-mean-squared delay spread per unit length inter L as a function of the

graded-index profile parameter α for three modal power-density distributions F ( g³ ): (a) uniform distribution over all mode-groups, (b) uniform distribution only over 0 < g³ < 0.5, and (c) uniform distribution only over 0 < g ³ < 0.2. The other parameters are ± = 0.01, y = 0, and Nc = 1.5.

the root-mean-squared delay spread grows as the square root of the length L of the fiber. This behavior is analogous to the random-walk nature of random-polarization mode dispersion discussed in Section 4.6. It can be analyzed using a similar formalism.4

Mode-Group Delay Spread

Figure 4.10 shows that there is an optimal value αequal of the modal power-law index profile parameter that minimizes the delay spread for a given power-density distribution F (g³ ). To estimate the minimum delay spread for a uniform F (g³ ), determine the maximum over α of the difference between the fastest mode-group and the slowest mode-group as a function of α . The slowest mode-group is the lowest-order mode-group with g = 0, and Z (g³ ) = 0 (cf. (4.4.5a)). The group delay given in (4.4.7) for this modegroup is τg0 = L Nc /c0 . This is the group delay of a ray that travels along the axis of the fiber. When g is nonzero, Z (g ³ ) is nonzero as well. In this case, as a function of the normalized index difference ±, (4.4.7a) has a linear term and a quadratic term. When the numerator of the coefficient for the linear term is nonzero, the quadratic term can be neglected because ± is small. Then the appropriate mode-group delay is

±

τ( g³ ) ≈ L Nc 1 + ± α − 2( y + 1) Z (g³ ) c0

α+2

²

,

( ) 2α/(α+2)

(cf. (4.4.5a)), where y is a correction term given by (4.4.7b) and Z (g³ ) = g³ ³ ³ with the dependence on g explicitly shown. The function Z (g ) is always monotonically 4 See Ho and Kahn (2011).

178

4 The Linear Lightwave Channel

increasing because α is positive. Whenever α is larger than 2( y + 1), the mode-group delay τ(g³ ) increases as g³ increases, which means that the higher-order mode-groups travel slower than the lower-order mode-groups. Whenever α is smaller than 2( y + 1), the higher-order mode-groups travel faster than the lower-order mode-groups. Near α ≈ 2(y + 1) an optimal value αequal exists such that the delay for the lowest-order modegroup, at g ³ = 0, is equal to the delay for the highest-order mode-group, at g³ = 1 (or g = G ).

Minimum Delay Spread To find the value of the index profile parameter αequal that minimizes the delay spread, refer to (4.4.7). Setting Z = g /G = 1 corresponds to the highest-order mode-group (cf. (4.4.5)). Setting Z = 0 corresponds to the lowest-order mode-group. Then the delay for the highest-order mode-group, denoted τG , is equal to the delay for the lowest-order mode-group, denoted τg0, whenever 2 ± α −α2+( y 2+ 1) + ±2 3α −α +2 −2 4y = 0. Solve for α in terms of y and a power series in ±. Neglecting all terms except terms that are linear in y or ± , the optimal value αequal of the power-law profile that equalizes the

mode-group delay of the lowest-order mode-group and the highest-order mode-group is

αequal ≈ 2( y + 1 − ±).

(4.5.3)

The derivation of this expression is posed as a problem at the end of the chapter. With this value of α, any other mode-group with a normalized mode-group index g³ between the two limiting values of g ³ = 0 and g³ = 1 used to derive αequal will travel faster and have a smaller group delay. Now substitute the value αequal of the index profile parameter that equalizes the modegroup delay for g³ = 0 and g³ = 1 back into (4.4.7), noting that y ´ 1 and ± ´ 1. This gives the mode-group delay τ(g³ ),

±

2 2 τ(g³ ) ≈ L Nc 1 − ± Z (g ³ ) + ± Z (g ³ )2

c0



L Nc c0

±

2

²

2

² ( ) ± ³ ³ 1− Z (g ) 1 − Z (g ) , 2 2

under the condition that αequal is chosen to satisfy (4.5.3). Now define δτ( g³ ) as the delay spread between the slowest mode-group, which has a delay L Nc /c0, and any other mode-group g³ . This delay spread is given by

δτ(g³ ) ≈

L Nc c0

− τ(g³ ) =

( ) L Nc ±2 Z (g³ ) 1 − Z (g ³) . c0 2

The maximum delay spread τmax occurs when Z (g³ ) = 1/2, so that

τmax = max τ(g³ ) = g³

L Nc ±2 . c0 8

4.5 Linear Distortion

179

The maximum delay spread τmax can be viewed as the spread of the worst-case impulse response of the fiber. It is a factor of ±/8 smaller than the maximum delay spread for a step-index fiber given in (4.1.1) using ray optics with the corresponding bandwidth increased by the same factor. This large increase in the bandwidth is the principal reason why a graded-index fiber with a power-law index profile parameter α close to two is preferred to a step-index fiber for large-core fiber.

4.5.2

Distortion from Wavelength-Dependent Group Delay

The distortion caused by wavelength-dependent group delay within a single mode depends on the wavelength power-density spectrum Sλ (λ) of the lightwave signal. Using wavelength spectra in this section, the root-mean-squared spectral width σλ of S λ (λ) is called the linewidth. The linewidth depends both on the spectral width of the unmodulated lightwave carrier and on the type of modulation. The variance of the wavelength power-density spectrum Sλ (λ) of the complex envelope is

σλ = 2

¼

1 ∞ 2 λ Sλ (λ)dλ, P −∞

(4.5.4)

where P is the total lightwave power and the root-mean-squared linewidth is σλ . A modulated waveform s (t ) using a noncoherent carrier centered at λ c usually can be appropriately modeled as a stationary random process that depends on the power-density spectrum of the lightwave source and not on the statistical properties of the modulating signal. This means that the linewidth σλ of the modulated lightwave signal can be dominated by the unmodulated spectral width – called the intrinsic linewidth of the noncoherent carrier – and not on the narrower spectral linewidth of the modulating signal. In contrast, for a modulated waveform using a coherent carrier, the linewidth σλ of the modulated lightwave signal is dominated by the spectral linewidth of the modulating signal and not by the narrow linewidth of the coherent carrier. For a fully noncoherent carrier, the wavelength autocorrelation function R (λ, λ³ ) which is the dual to the time autocorrelation function given in (2.2.55), can be written as R (λ, λ³ ) = Sλ (λ − λc )δ(λ − λ ³).

(4.5.5)

The proof of this is asked for as an end-of-chapter exercise. This expression states that two different wavelength components λ and λ ³ of the modulated noncoherent source are uncorrelated. Therefore, the analysis used in Section 4.5.1 for the modedependent group delay between the uncorrelated modes can be recast to analyze the wavelength-dependent group delay between the uncorrelated wavelength components within a single mode. To do so, replace the discrete mode-dependent group delay τm with the continuous wavelength-dependent group delay τλ . The continuous wavelength-dependent group delay τλ is deduced from the discrete mode-dependent delay τm by replacing the summation used for the modal delay by an integral over the wavelength components within a single mode. Similarly, the proportion Fm of

180

4 The Linear Lightwave Channel

the power in a mode is replaced by the power density Sλ (λ)/ P for a wavelength component. The linear, wavelength-dependent term of the power-series expansion of the wavelength-dependent group delay τλ is λsλ (cf. (4.4.11)). Using this expression, the 2 variance σintra of the wavelength-dependent group delay is 2 σ intra = µτλ2¶ ¼∞ = P1 sλ2λ 2S λ (λ)dλ −∞ = σλ2sλ2 ,

(4.5.6)

where σλ2 is given in (4.5.4). Now substitute sλ = L D from (4.4.15) into (4.5.6) and take the positive square root to obtain

σintra = L σλ|D | = L σλ|Dλ + Dguide |,

(4.5.7)

.√ where | D | = D 2 is the absolute value of the total group-velocity dispersion coefficient

D defined in (4.4.14). This expression states that the wavelength-dependent delay spread

σintra within a single mode is the product of three terms: the root-mean-squared width σλ of the power-density spectrum of the modulated lightwave signal; the total group-

velocity dispersion coefficient D; and the length of the fiber L . As a numerical example, a lightwave signal at a carrier wavelength of 1550 nm with a modulation bandwidth of 100 GHz corresponds to a modulation linewidth of 0.8 nm (cf. (2.2.58)). Over a distance of 50 km in a conventional fiber with a group-velocity dispersion coefficient D of 17 ps/(nm · km ), the delay spread σintra is 680 ps. This delay spread is an estimate of the width of the impulse response of the fiber segment. Later, in Section 8.1, methods based on linear-systems theory are presented to study the shape of the output pulse.

4.5.3

Dispersion-Controlled Optical Fiber

The wavelength-dependent group delay and the total group-velocity dispersion D = Dλ + D guide for the single mode of a single-mode fiber can be controlled through the design of the fiber because the waveguide dispersion coefficient Dguide depends on the design of the fiber. A fiber that is modified to control the dispersion characteristics is called a dispersion-controlled fiber. The general characteristics of a dispersioncontrolled fiber are a small core and multiple layers with different indices and perhaps different index profiles. As an example, for wavelengths larger than the zero-dispersion wavelength (typically λ ≈ 1300 nm for silica glass), the material dispersion coefficient Dλ is positive and the waveguide dispersion coefficient Dguide (cf. (4.4.13)), which is a function of the fiber geometry, is negative. In this regime, the material dispersion and the waveguide

4.5 Linear Distortion

181

dispersion for a single mode have opposite effects on the total intramodal dispersion. Therefore, by design of the fiber geometry to control Dguide, the wavelength-dependent dispersion can be controlled. A fiber for which the waveguide dispersion and the material dispersion cancel out at a specific wavelength is called a dispersion-shifted fiber. If Dguide = − Dλ over a range of wavelengths, then the waveguide dispersion can equalize the material dispersion over a range of wavelength subchannels. Instead, a fiber can be designed to increase the linear dispersion so as to reduce nonlinear interference between wavelength subchannels. Nonlinear interference is discussed in Section 5.5.

4.5.4

Independent Sources of Distortion

The effect of several independent sources of linear distortion can be combined by summing the variances σi2 of their delay distributions. The total variance σT2 is the sum

σ2 =

¾

T

i

σi2,

(4.5.8)

where the index i ranges over the independent distortion mechanisms that are included in the analysis. As an example, consider the combined effect of mode-dependent delay, which depends on the modal power-density distribution F (g³ ) as a function of the normalized mode-group index g³ , and the wavelength-dependent group delay, which depends on the proportion of the power Sλ (λ)/ P in each wavelength component. The combined variance is determined using (4.5.1) or (4.5.2) along with (4.5.7). This is 2 2 σ 2 = σinter + σintra . (4.5.9) The wavelength-dependent delay spread σintra can vary from mode to mode. AccordT

ingly, (4.5.7) is modified to read

¹ ¹ σintra = L σλ ¹ Dλ + Dguide ¹ , m

m

(4.5.10)

where Dguidem is the waveguide-dispersion coefficient defined in (4.4.13) for mode m. For a fiber that supports many modes, the mode-dependent waveguide-dispersion coefficient Dguidem is usually insignificant. In this case, the total variance σT2 is the sum of 2 the variances due to delay spread σinter given by (4.5.2), and the wavelength-dependent 2 group-delay variance σintra = Dσ λ L . The wavelength-dependent group-delay spread is the square root of this term. Modifying the expression given in (4.5.2) to include the wavelength-dependent groupdelay spread significantly modifies the minimum achievable group-delay spread as shown in Figure 4.11. This figure uses the same parameters as used in Figure 4.10. Comparing the figures, curve (a) in Figure 4.11 is the same as curve (c) in Figure 4.10, not including wavelength-dependent group-delay spread. Curve (b) in Figure 4.11 shows that including the wavelength-dependent group-delay spread limits the minimum achievable root-mean-squared delay spread and significantly reduces the dependence of

182

4 The Linear Lightwave Channel

1.000 )mk/sn( htgneL tinU rep daerpS yaleD SMR

(b)

0.100

(a)

0.010

Wavelength± dependent delay

0.001 1.8

1.9

2.0 Power±law Parameter α

2.1

2.2

Figure 4.11 Root-mean-squared delay spread of the total delay distribution per unit length

σT /L

as a function of the power-law index profile parameter α for a uniform modal power-density distribution F (g ³ ) in the range 0 < g³ < 0.2. (a) No material dispersion (y = Dσ λ = 0). (b) With parameters y = D σλ = 0.02 ns/km.

the delay spread on the number of mode-groups that have nonzero power at the output of the fiber. Curve (b) in Figure 4.11 shows that the minimum achievable delay spread per unit length is now limited by the wavelength-dependent delay spread per unit length, which has a value of σintra / L ≈ D σλ = 0.02 ns/km for the curves shown in Figure 4.11. The optimal value αequal of the power-law profile parameter that produces the minimum root-mean-squared delay spread also shifts in accordance with (4.5.3).

4.6

Polarization-Mode Dispersion The analysis of narrowband dispersion is now extended to include the polarization dependence. Polarization-dependent dispersion is called polarization-mode dispersion. An ideal fiber has no birefringence and so has no polarization-mode dispersion. When polarization-mode dispersion is not significant, the dispersion relationship for any mode can be derived by approximating the scalar dispersion relationship β(ω) by the significant terms of a power-series expansion about the carrier frequency ωc as given in (4.3.2). A practical fiber has birefringence caused by uncontrolled intrinsic and extrinsic factors. Intrinsic factors are an unavoidable consequence of imperfections in the manufacturing process. Extrinsic factors result from installation and the environment. Factors due to installation are mostly fixed. Factors due to random changes in the environment vary with time, so the resulting birefringence and polarization-mode dispersion may vary with time.

4.6 Polarization-Mode Dispersion

183

When polarization-dependent dispersion is significant, the scalar analysis is replaced by a vector analysis. The first-order polarization-dependent term is the derivative of a matrix equation describing the polarization transformation. This term expresses the polarization-dependent group delay between the two polarization modes of a single spatial mode of a birefringent fiber (cf. Section 2.3.4). The second-order polarization-mode dispersion arises from the quadratic term in the power-series expansion. This term produces polarization-dependent and wavelength-dependent group delay within a single spatial mode, which is called polarization-dependent group-velocity dispersion. 4.6.1

Jones Representation

For a lossless medium, the phase shift experienced by a dual-polarization lightwave signal in the Jones representation is the vector–matrix equation (cf. (2.3.58a)) r(ω, L ) = T(ω, L )s(ω, 0),

(4.6.1)

where s(ω, 0) is the Jones vector at the fiber input, and r(ω, L ) is the Jones vector at the fiber output at distance L. The matrix T(ω, L ) is the polarization transformation expressed as a 2 × 2 matrix with complex elements. The derivative of this equation is

dr(ω, L ) = T³ (ω, L )s(ω, 0), (4.6.2) dω where T³ (ω, L ) = dT(ω, L )/dω, and where ds(ω, 0)/dω = 0 because, for a welldefined state of polarization at the input, s(ω, 0) does not change with frequency. For any lossless polarization-dependent medium described by a transformation T(ω, L ), the two orthogonal eigenvectors of T(ω, L ) define two orthogonal polarization modes, which need not be linearly polarized modes. This decomposition describes the effect of the polarization-dependent medium on the carrier. For the special case of a linearly birefringent medium, the two orthogonal eigenvectors of T(ω, L ) define linearly polarized modes and define the slow optic axis ¿ eslow and the fast optic axis ¿ efast . Similarly, for a linearly birefringent medium, the two orthogonal eigenvectors of T ³(ω, L ) define two principal polarization axes of the fiber, which will be shown to be co-aligned with the optic axes, and so are the same. For other polarization-dependent media, such as a circularly birefringent medium (cf. Section 2.3.4), the two orthogonal polarization modes defined by the eigenvectors of T(ω, L ) need not define linearly polarized modes described by optic axes. Similarly, the two eigenvectors of T³ (ω, L ) need not be linearly polarized and need not be coincident with the optic axes. For this reason, the eigenvectors of T³ (ω, L ) are, in general, called principal polarization states. This decomposition into the principal polarization states describes the effect of the polarization-dependent medium on the group delay. It will be shown that, when a lightwave signal is in either one of the two principal polarization states, the lightwave signal, to the first order of approximation, experiences a constant group delay that is not a function of frequency. Restricting the discussion to a linearly birefringent medium, the input and output polarization states can be expressed using the two optic axes. In this case, T(ω, L ) can

184

4 The Linear Lightwave Channel

be factored into the product of a common scalar term e−iβ(ω) L and a diagonal matrix D(ω, L ) so that (4.6.1) can be written as r(ω, L ) = e−iβ(ω) L D(ω, L )s(ω, 0), where

D(ω, L) =

»

e−i ±φ(ω, L )/2 0

(4.6.3)

½

0

ei ±φ(ω, L )/2

.

(4.6.4)

This 2 × 2 unitary matrix describes the differential polarization-dependent and frequency-dependent phase shift ±φ(ω, L ) in a fiber segment of length L. The differential phase shift ±φ(ω, L ) between the two optic axes is given by (cf. (2.3.66))

±φ(ω, L ) = (βslow (ω) − βfast (ω)) L , (4.6.5) where βslow (ω) and βfast(ω) are the frequency-dependent propagation constants along the slow optic axis and the fast optic axis, respectively. The derivation of a narrowband approximation for the polarization-dependent group delay repeats the steps used to derive (4.4.10), but includes the effect of polarization. The polarization-dependent phase velocity is determined using (4.6.5). The polarizationdependent group delay is determined by differentiating (4.6.3) with respect to ω to give

´ d d ³ −i β(ω)L r(ω, L ) = e D(ω, L ) s(ω, 0) dω ±dω ² = −iτ D(ω, L ) + ddω D(ω, L ) e−iβ(ω) L s(ω, 0),

(4.6.6)

where the derivatives are evaluated at the carrier frequency ωc , with τ = L dβ(ω)/dω|ωc being the common group delay defined in (4.3.3). Solving for s(ω, 0) in (4.6.3) gives s(ω, 0) = eiβ(ω) L D−1(ω, L )r(ω, L ). Substituting this expression into (4.6.6) and recalling that the lossless polarization transformation D(ω, L ) is a unitary matrix with D−1 (ω, L ) = D† (ω, L ) allows (4.6.6) to be written as ³ ´ dr = − i τ I + iDω D† r, (4.6.7) dω

¹

.

with the arguments ω and L suppressed, where Dω = dD/dω¹ω=ω and I is the identity c matrix. The first term, τ I, in the parentheses on the right is a 2 × 2 diagonal matrix describing the polarization-independent group delay (cf. (4.3.3)). The second term, i DωD† , is a 2×2 matrix describing the polarization-dependent group delay. For a linearly birefringent material, Dω can be written as (cf. (4.6.4))

Dω =.

dD ¹¹ ¹ dω ω=ωc

= ±φ 2

³»

−ie−i ±φ/2 0

0

iei ±φ/ 2

½

,

(4.6.8a)

4.6 Polarization-Mode Dispersion

185

where

±φ ³ = ddω (β slow(ω) − βfast(ω)) L ± ² = v 1 − v1 L, (4.6.8b) slow fast both φ and φ³ depending on ω and L . The 2 × 2 matrix i Dω D† in (4.6.7) is a hermi-

tian matrix with two real eigenvalues and two orthogonal eigenvectors corresponding to the principal polarization axes. These eigenvectors form a basis that can be used to decompose an arbitrary polarization state. For a linearly birefringent medium, the two orthogonal eigenvectors of D define the slow optic axis ¿ eslow and the fast optic axis ¿ efast, respectively. The orthogonal eigenvectors ¿ e+ and ¿ e− of i DωD† at the output of the fiber segment correspond to the real eigenvalues ±±τ/2. The proof that iDωD† is hermitian with real eigenvalues ±±τ/2 that sum to zero is asked for as a problem at the end of the chapter. The two eigenvectors ¿ e+ and ¿ e− define the principal polarization axes. An input polarization state s described by ¿ e+ experiences a group delay τ+ = τ + ±τ/2, which is independent of frequency. Likewise, an input polarization state s described by ¿ e− experiences a group delay of τ− = τ − ±τ/2, which is also independent of frequency. The differential group delay ±τ is the difference τ+ − τ− between the group delays for polarizations oriented along the two principal polarization axes. The value of the differential group delay ±τ can be calculated from the determinant of the 2 × 2 matrix iDω D†. Because iDω D† is a hermitian matrix, the determinant is the product of its two eigenvalues ±τ/ 2 and −±τ/2, so

−±τ 2/4 = det(iDωD† ) −±τ 2/4 = i 2 det Dω det D† À (4.6.9) ±τ = 2 det Dω, where det D = 1 and, using the identities for 2 × 2 unitary matrices, det AB = det A det B, det A† = det A = 1, and det i A = i2 det A = −det A (cf. (2.1.87b)). The quantity ±τ/ L is the differential delay per unit length caused by polarizationdependent group delay for a single segment of fiber for which the linear birefringence does not change over the length of the segment. This value is typically expressed in units of picoseconds per kilometer. For a linearly birefringent material, substitute (4.6.8a) into (4.6.9) to give ±φ ³ = ±τ . Now use (4.6.8a) and (4.6.4) to form i DωD† ,

» −e−i ±φ/2 ±τ iDω D = i 0 2 » ½ ±τ 1 0 = 2 0 −1 . †

2

0

ei ±φ/2

½»

ei ±φ/2 0

0

½

e−i ±φ/2 (4.6.10)

Expression (4.6.10) shows that for any lossless linearly birefringent material, the two eigenvectors of the diagonal matrix iDω D† , which define the principal polarization

186

4 The Linear Lightwave Channel

states, are aligned with the two eigenvectors of the diagonal matrix D, which define the optic axes. In conclusion, for the case of linear birefringence, the principal polarization state ¿ e+ is aligned with the slow optic axis and the principal polarization state ¿ e− is aligned with the fast optic axis. This is the usual case for optical fibers, but need not be true for other kinds of birefringence. 4.6.2

Stokes Representation

A Stokes representation of the polarization using the Poincaré sphere provides additional insight into the vector nature of polarization-mode dispersion. The real-valued Stokes vector s¸ = (s1, s2, s3) corresponding to the complex-valued Jones vector J = (x 1, x2 ) can be determined using (2.3.54), where the arrow overbar distinguishes the Stokes representation from the Jones representation. A lossless polarization transformation, denoted S, must preserve the magnitude of r¸, and the transformed vector must remain on the surface of a constant-radius Poincaré sphere. The matrix S then represents a transition along an arc on the surface of the Poincaré sphere that can be described by the set of angles (χ, ξ) defined in (2.3.55). The right side of (4.6.7) is rewritten in the Stokes representation in terms of a cross product d ¸r (ω, L ) = ±¸τ × r¸, (4.6.11) dω

where ±¸ τ × denotes the 3 × 3 matrix operation SωST with real elements in the Stokes representation that is the equivalent of the 2 × 2 matrix operation iDω D† with complex elements in the Jones representation. The vector ±¸ τ = (±τ1 , ±τ2 , ±τ3 ) in the Stokes representation is called the polarization-mode dispersion vector. This vector corresponds to the eigenvector ¿ e+ in the Jones representation, with the orthogonal Stokes vector −±¸ τ corresponding to ¿e− . The orientation of this vector with respect to the output polarization state r¸ defines the rate of change d¸r /dω of the output polarization state as a function of frequency as expressed by the cross product given in (4.6.11). The differential group delay ±τ is the projection 21 ±¸ τ · r¸ of the output polarization state r¸ onto the vector ±¸τ /2. This projection varies from ±τ/2 to −±τ/2 as the output state r¸ varies from an alignment with ±¸τ to an alignment with −±¸τ , which is the orthogonal principal state of polarization located on the opposite side of the Poincaré sphere. The two unique components of the complex transformation iDω D† in the Jones representation are related to the three real components of the polarization-mode dispersion vector ±¸ τ in the Stokes representation as given by5 iDω D†

» ±τ2 − i ±τ3 ½ . 1 = 21 ±τ ±τ −±τ1 2 + i ±τ3

(4.6.12)

For a lossless linearly birefringent material, equating (4.6.10) to (4.6.12) gives the three components of the polarization-mode dispersion vector ±¸ τ in a Stokes 5 See Gordon and Kogelnik (2000), Section 5.

4.6 Polarization-Mode Dispersion

187

representation as ±τ1 = ±τ , ±τ2 = 0, and ±τ3 = 0. This shows that for a linearly birefringent fiber, the polarization-mode dispersion vector ±¸ τ = (1, 0, 0) in a Stokes representation corresponds to the vector ¿ eslow for the slow optic axis in a Jones representation. The fast optic axis ¿ efast is orthogonal to the slow optic axis and is described by a point that lies on the opposite side of the Poincaré sphere. Including the polarization-independent group delay τ , the equation

±τtotal = τ + 21 ±¸τ · r¸ expresses the total group delay ±τtotal in the Stokes representation. 4.6.3

(4.6.13)

Distortion from Polarization-Dependent Group Delay

The orientation of the optic axes in a practical fiber varies along the length of the fiber because of varying environmental conditions. For a short segment of fiber, the orientation of the optic axes can be regarded as fixed. When a single lightwave pulse with a power Ptotal (t ) is launched into a fiber, part of the pulse power F+ propagates in principal polarization component ¿ e+ and part of the pulse F− propagates in principal polarization component ¿ e− . The total pulse power Ptotal(t ) divided into the two delayed components with different delays generates the output pulse power Pout (t ) given by Pout (t ) = F+ Ptotal(t

+ ±τ/2) + F− Ptotal (t − ±τ/2).

(4.6.14)

The power in each component depends on the orientation of the output polarization state r¸ defined in (4.6.7) with respect to the polarization-mode dispersion vector ±¸ τ defined in (4.6.11). The orientation of the optic axes of a linearly birefringent fiber randomly changes over a span of length L because of environmental changes. To estimate the number of segments that can be modeled as independent, use a Jones representation. Let the total lightwave power P at the fiber input be oriented along the slow optic axis ¿ eslow . The direction of this axis defines one basis vector (cf. (2.3.51)) and a corresponding polarization eigenmode of a segment of fiber that has a constant linear birefringence. Unless the birefringence is random, all power launched into a polarization mode remains in that mode. However, random linear birefringence causes a change in the eigenmodes of the material along the direction of signal propagation. This change can be expressed as a rotation of the slow optic axis ¿ eslow . This rotation means that part of the launched power is now coupled into the second polarization mode. The random rotation of the optic axes along the length of the span leads to random length-dependent powers P slow (L ) and P fast( L ) in each polarization mode defined at the output of the span. The polarization decorrelation length L pol is defined as the length for which the normalized difference between the mean random power in each of the two polarization modes is 1 /e2 , or

µP slow( Lpol )¶ − µ P fast (L pol)¶ = 1 . P e2

(4.6.15)

Typical values of the polarization decorrelation length L pol range from tens of meters to hundreds of meters. The random orientation of ¿ eslow within one fiber segment of length L pol can be approximated as independent of the orientation of ¿ eslow for other

188

4 The Linear Lightwave Channel

fiber segments. Then the number of quasi-independent segments N is approximately L / L pol , where L is the length of the total span. For L = 100 km and L pol = 100 m, there are approximately 1000 quasi-independent fiber segments. The random orientation of the optic axes expressed in a Jones representation leads to a random polarization-mode-dispersion vector ±¸ τ with three real components in a Stokes representation. A discretized statistical model of ±¸ τ views the fiber span as a concatenation of N fiber segments each of length L / N . Each segment has an independent local polarization-mode-dispersion vector ±¸ τ i expressed using a common polarization basis. The random vector ±¸ τ defined over the complete fiber span of length L is modeled as the sum of N independent random vectors ±¸ τ i,

±¸τ =

N ¾ i =1

±¸τ i .

(4.6.16)



As the number of segments is increased, the random vector ±¸ τ = i ±¸τ i tends towards a zero-mean spherically symmetric gaussian random variable with a probability density function given by (cf. (2.2.21)) f (±¸ τ) =

1

(2πσ 2 )3/2 e

−|±¸τ |2/ 2σ 2 .

(4.6.17)

Because ±¸ τ is spherically symmetric, the random magnitude is described by a maxwellian probability density function (cf. (2.2.48)). The random angle is uniformly distributed on the surface of the Poincaré sphere. Rewriting the variance σ 2 in (4.6.17) in terms of the mean differential group delay µ±τ¶, the maxwellian distribution can be written as f (±τ) =

32 ±τ 2

π µ±τ¶ 2

3

2 2 e−4 ±τ /πµ±τ¶ .

(4.6.18)

This probability density function characterizes the random, polarization-dependent, √ difcan ferential group delay. The mean delay µ±τ¶, typically expressed in units of ps/ km, √ vary widely depending on the type of fiber and the environmental conditions.6 The L dependence of the mean differential group delay is a characteristic of the random-walk nature of the sum of random vectors. Combining the random power in each principal polarization state with the random differential delay ±τ , the random output lightwave power P out (t ) can be written as P out(t ) = F Ptotal(t

+ ±τ/2) + (1 − F ) Ptotal(t − ±τ/2),

(4.6.19)

where Ptotal (t ) is the total power in the two polarization modes. This expression states that the probability density function of the proportion F of the power in each polarization mode is uniform and the probability density function of the differential group delay ±τ is maxwellian, with an expected value µ±τ¶. 6 Modern single-mode fibers typically have

µ±τ ¶ smaller than 0.1 ps/



km.

4.6 Polarization-Mode Dispersion

4.6.4

189

Distortion from Polarization-Dependent Loss

Lightwave components may have polarization-dependent loss in addition to polarization-mode dispersion. This loss is more significant for a lightwave component that has a rectangular guiding structure that supports both TE and TM modes (cf. Section 3.3) than for the cylindrical structure of an optical fiber. When a conventional fiber is coupled to a rectangular lightwave component, a portion of the lightwave field in the fiber will couple into the TE mode and a portion of the lightwave field will couple into the TM mode. Whenever the two polarization modes have different coupling efficiencies, there will be polarization-dependent loss. To analyze polarization-dependent loss, let T(ω, L ), abbreviated T, be a polarization transformation expressed as a 2 × 2 matrix from an input state to an output state that includes polarization-dependent loss. The transformation T can always be factored into the product of two matrices. Thus,

T = DA,

(4.6.20)

where the square matrix A is a nonnegative-definite hermitian matrix expressing the frequency-independent, polarization-dependent loss and D is a unitary matrix expressing the frequency-dependent and polarization-dependent dispersion. The matrices A and D need not commute and need not share a common set of eigenvectors. This decomposition is an instance of the general fact that any full-rank nonhermitian square matrix T can be decomposed as T = DA. For a hermitian matrix A and a unitary matrix D, the following expressions hold:7

A2 = T†T and D = TA−1 . (4.6.21) The conditions under which A has an inverse are considered in a problem at the end of

the chapter. The eigenvectors of T define the polarization modes. If the medium is lossless, then A is simply the identity matrix and T = D. Then the polarization transformation affects only the phase. When there is polarization-dependent loss, T is neither unitary nor hermitian. In this case, the differential polarization transformation given in (4.6.3) is modified to use T(ω, L ). It then becomes r(ω, L ) = e−[(κ/2)+iβ]L T (ω, L )s(ω, 0),

(4.6.22)

where κ is the common frequency-independent and polarization-independent loss. The frequency-independent differential polarization-dependent loss is incorporated within the matrix T. As was done in the analysis used for polarization-mode dispersion, each eigenvalue ±κ p of A is the differential polarization-dependent loss for the corresponding eigenvector of A. The largest and smallest values of the loss, κ ± κ p , occur whenever the input field is oriented along one of the eigenvectors of A. In general, when there is

A

A

A M U MU

A =U M U

7 Given the symmetric matrix 2 , the matrix is defined by writing 2 in diagonal form as 2 † 2 , 2 where is a diagonal matrix and is a unitary matrix. Defining as the matrix generated by taking the square root of each diagonal element of 2 , the matrix is then † .

M

U

M

A

190

4 The Linear Lightwave Channel

polarization-dependent loss, these eigenvectors need not be oriented along the polarization axes, which are the eigenvectors of T, and need not be oriented along the principal polarization axes, which are the eigenvectors of iT ωT† . Now decompose T differently with the ordering reversed with respect to (4.6.20),

T = A³ D,

(4.6.23)

where A³ is again a nonnegative-definite hermitian matrix and D is the same unitary matrix specified by (4.6.20). If A³ = A, then the commutator [A, D] equals the zero matrix, and the loss is polarization-independent. In this case, T is a normal matrix and can be diagonalized with a unitary matrix whose columns are the eigenvectors of T. When the loss is polarization-dependent, then T is neither a hermitian matrix, which describes a lossless transformation, nor a normal matrix, which describes a transformation with polarization-independent loss. Lossy transformations that are mode-dependent or polarization-dependent can be expressed using the singular-value decomposition given in (2.1.91). This decomposition is used in Section 11.4.3 to study polarization-dependent propagation effects. In the presence of polarization-dependent loss, the analogous form of the polarizationindependent group-delay term iDωD† given in (4.6.7) is iTω T−1. In general, the eigenvectors of iT ωT−1 need not be orthogonal. The eigenvalues, denoted χ = ±τ loss + iη , are complex in general. The real part ±τ loss corresponds to the differential group delay in the presence of polarization-dependent loss. The imaginary part η represents the differential polarization-dependent loss. In general, the eigenvalue ±τ/2 of the matrix D that excludes polarization-dependent loss is not equal to the real part, ±τ loss , of the complex eigenvalue χ of the matrix T that includes polarization-dependent loss. Accordingly, the presence of polarizationdependent loss can change the differential group delay because the eigenvectors of iTω T−1 are not orthogonal. This nonorthogonality can lead to interference between the two principal polarization states, thereby potentially producing a larger differential group delay and more distortion than in the case in which the attenuation does not depend on the polarization.

4.7

References Dispersion in optical fibers is discussed in Marcuse (1974), in Okoshi (1982), and in Okamoto (2006). Complex propagation constants in materials with absorption are considered in Saleh and Teich (1991). The conditions for which a Kramers–Kronig transform applies to the magnitude and phase are discussed in Bode (1950) and Lenz, Eggleton, Giles, Madsen, and Slusher (1998). The zero-dispersion wavelength of optical fiber used for communication is discussed in Payne and Gambling (1975). The principal polarization states were introduced in Poole and Wagner (1986), and are summarized in Gordon and Kogelnik (2000). The statistics of polarization-mode dispersion are covered in Foschini and Poole (1991). Polarization-dependent loss

4.9 Problems

191

is treated in Huttner, Geiser, and Gisin (2000). The probability density function of polarization-mode dispersion delay is presented in Karlsson (2001).

4.8

Historical Notes The study of dispersion in glass has its roots in antiquity, as discussed in Darrigol (2012), with material dispersion discussed in the famous treatise Opticks by Newton and Innys (1730). Dispersion in optical fiber has a much shorter history. The combined effect of material and waveguide dispersion for linearly polarized modes was first considered by Gloge (1971b). It appears that polarization-mode dispersion was first discussed by Rashleigh and Ulrich (1978).

4.9

Problems 1 Delay spread using ray optics The maximum delay spread in ray optics is the difference between the delay of the ray that takes the longest time and the delay of the ray that takes the shortest time to travel the same distance in a fiber. A distribution of delays results from a distribution of rays coupled into the fiber at various angles. Suppose that the propagation times associated with this distribution of rays are uniformly distributed between the limiting values of 1 and 2, where 2 is larger than 1. (a) Determine the functional form of the distribution of the ray delays. (b) Determine the root-mean-squared delay spread. (c) Now suppose that the distribution of delay times determined in part (a) is used to model the impulse response h t for the fiber. Determine the frequency response H in terms of the differential transit time 2 1. (d) A lightwave source is characterized by a numerical aperture NAs that is smaller than the numerical aperture NA of the fiber. Two rays coupled by this source into the fiber are to be compared. One ray is incident along the axis and one ray is defined by NAs , where NAs is much smaller than one. Determine the ratio of the differential transit time using this lightwave source relative to the differential transit time using a different lightwave source with a numerical aperture equal to the numerical aperture of the fiber.

τ

τ

τ

()

(ω)

τ

τ −τ

2 Dispersion relationship The dispersion relationship expressed as a power series up to the cubic term is given by (4.3.1) and is repeated below:

β(ω)

β(ω + ωc ) = β0 + β1 ω + 21 β2ω2 + 61 β3ω3 . Regarding the expansion as exact, let ω³c = ωc + ±ω be chosen as a new carrier frequency.

192

4 The Linear Lightwave Channel

(a) Determine a power-series expansion for β(ω + ω³c ). (b) Determine the new group velocity and the new phase velocity. (c) Explain why the new velocities are different. 3 Frequency dependence of the core index and the cladding index Using n1 n2 n1 , show that

± = ( − )/ d(n1±) d dn 1 dn2 = ( n1 − n2 ) = − , dω dω dω dω

and discuss the reasons why this term can be neglected for a practical step-index fiber for which ± is much smaller than one. 4 Modes from the dispersion relationship (requires numerics) The simplified form of the dispersion relationship for an LP mode enables one to use a direct method to determine the mode field distribution without the need to specify n1 or n2 . Suppose that V 4 5. Substituting (4.2.1) into (3.3.27) and choosing the positive sign produces the single equation in qa

= .

³À 2 ´ 4.5 − (qa )2 K 1(qa ) ´ = qa . 4.52 − (qa )2 ³ À K 0(qa ) J0 4.52 − (qa )2 Examining Figure 3.15, there are four modes that propagate for V = 4.5. These are µ

J1

LP01 , LP11 , LP21, and LP02 . (a) Using Figure 3.15 for the initial estimate, determine the exact values of pa, qa, and b for the four allowed LP modes. (b) Substitute these values into (3.3.45) to plot the radial dependence of the field intensity both in the core and in the cladding.

5 Mode-group density The mode-group density in an optical fiber, defined as d dg, represents the closeness of the mode-group spacing with respect to the mode-group index g. (a) Starting with the approximate expression for the dispersion relationship given in (4.4.4), derive as approaches infinity. This corresponds to a step-index fiber. (b) What is the corresponding density with respect to the mode index m? (c) Repeat for 2. Compare the mode-group density of a step-index fiber with the mode-group density of a parabolic power-law graded-index fiber. Comment on the result.

±β =. β/

±β α

α=

6 Index of refraction, group index, and material dispersion coefficient for silica glass An empirical expression called the Sellmeier formula is often used to model the index n of glass as a function of wavelength. One form of the Sellmeier formula for silica glass is

(λ)

n (λ) =

Á

1+

1.0955 × 10 18λ2 1018λ 2 − 1002

18 λ 2 + 10018. 9λ×2 −109000 , 2

4.9 Problems

193

where λ is the wavelength in meters. (a) Plot the material dispersion coefficient Dλ over the range from 500 to 1500 nm. As a check, refer to Figure 4.6, which used the same formula. (b) Determine the minimum material dispersion and the dispersion slope at the minimum. Express the slope in units of ps /(nm 2 · km ). (c) What is the maximum spectral width of a pulse at 1300 nm that will limit the material dispersion to 50 ps for a fiber with length 75 km and a material dispersion coefficient of Dλ = 1.2 ps/(nm · km )? 7 Fiber modes and dispersion A step-index fiber with a numerical aperture equal to 0.15 and a core index n1 N1 1 5 operates at 850 nm and supports two modes with normalized propagation constants b 0 4 and b 0 75. (a) What is the core diameter of the fiber? (b) Using the figure shown below, determine the distance into the fiber at which the modal delay between the two modes is 2.5 ns:

= .



= .

= .

1.4

LP21

1.2 V d/) bV(Vd

1

LP 01

0.8 0.6

LP11

0.4 0.2 1

2

3

4 V

5

6

7

8

Delay terms for first three linearly polarized modes.

8 Narrowband signal scattering loss Define as the fractional bandwidth of a narrowband signal with respect to the carrier frequency f c for which the passband bandwidth B of the signal, centered at the carrier, can be written as B fc. (a) Derive an expression for the ratio of the Rayleigh scattering at the carrier f c to the Rayleigh scattering at the band edge f c B 2 in terms of . (b) Solve for the value of such that the Rayleigh scattering at the band edge is within 1% of the Rayleigh scattering at the band center defined by the carrier. (c) Using the value of in part (b), determine the passband bandwidth B of a lightwave signal at a carrier wavelength of 1500 nm such that the scattering at a band edge is within 1% of the scattering at the carrier. (d) On the basis of these results, is a frequency-independent signal attenuation term appropriate?

δ



δ

δ

+ /

δ

194

4 The Linear Lightwave Channel

9 Single-mode fiber dispersion A lightwave system of interest operates at 1550 nm using a single-mode step-index fiber. The transmitted lightwave signal has a spectral width of λ 0 05 nm and transmits a pulse with a root-mean-squared width of 100 ps. This pulse propagates 50 km in the fiber. The total intramodal dispersion coefficient D in the fiber in units of ps nm km is modeled as

σ = .

/( · )

D

=

S0 4



¸ λ40 λ − λ3 ,

where λ0 = 1310 nm is the zero-dispersion wavelength and the dispersion slope parameter S0 has units of ps/(nm 2 · km ). (a) Determine the dispersion slope parameter S0 required to limit the root-meansquared delay spread to 25 ps. (b) Is there dispersion when S0 = 0 and the system operates at the wavelength λ0 ? Provide quantitative reasoning for your answer. 10 Dispersion (a) Using Figure 3.15, determine the number of modes at 900 nm that propagate in a fiber with a core diameter of 7 microns, a numerical aperture of 0.15, and an index n 1 45. (b) Determine the root-mean-squared delay spread per unit length inter L in units of ns/km when the lowest-order mode contains 80% of the power and the remaining power is distributed uniformly among all other modes that propagate. Use the figure provided for Problem 7 to determine the delay values. (c) Using Figure 4.6 for the group index, determine the material dispersion coefficient Dλ at 900 nm when the power-density spectrum has a spectral width of 1 GHz. (d) Determine the waveguide dispersion coefficient Dguide for the two guided modes with the largest values of b. (e) Determine the intramodal dispersion coefficient D Dλ Dguide for the two guided modes with the largest values of b.

= .

σ /

=| +

|

11 Number of modes for a multimode step-index fiber Derive the relationship expressing the number of modes M in a multimode stepindex fiber as a function of the area of the core, the wavelength , the index of the core n1 , and the index of the cladding n 2.

A

λ

12 Dispersion in a fiber that supports two modes Consider a fiber with a core diameter of 7.5 microns and with 10−3. (a) Using Figure 3.15, estimate the smallest wavelength such that the fiber supports only two modes. (b) Using Figures 4.6 and 4.9, estimate the total intramodal dispersion per unit length at the wavelength determined in part (a) as a function of the rootmean-squared spectral width λ of the modulated lightwave signal.

±= λ

λ

σ

4.9 Problems

195

(c) Let the normalized (complex-baseband) power-density spectrum be given by

¹¹

¹ λ ¹¹² . ±λ ±λ ¹ Using the results from part (b), estimate ±λ so that σintra = | D |σλ L is smaller than 100 ps when L = 50 km. 1

Sλ (λ) =

±

1 − ¹¹

13 Output pulse for a gaussian power-density spectrum Let the normalized power-density spectrum of a modulated lightwave signal be

Sλ (λ) =



1

2πσλ

2 2 e −(λ−λ c ) / 2σ λ ,

as a function of the wavelength λ , where the carrier wavelength is λc = 1350 nm. The fiber has a core diameter of 9 microns and a numerical aperture of 0.15. The transmitted pulse is a square pulse of duration T = 200 ps over a fiber span of length 75 km, with an intramodal dispersion coefficient of D = 8 ps/(nm · km). Using Figure 4.6 for the index (or group index) and the figure provided in Problem 7 for the delay terms, determine the following. (a) The normalized frequency V of the fiber. (b) The value of sλ = dτ/dλ|λ=λ c . (c) The total root-mean-squared delay spread σintra over the span length L (see (4.5.7)). 14 Group delay as a function of the mode-group The approximate expression for g given in (4.4.4) is repeated below:

β (ω) βg (ω) ≈ n0 k0 (1 − 2± Z )1/2 ,

where k0 = ω/c0 and Z is given in (4.4.5). (a) Evaluate dZ / dω where n0 and ± are both functions of ω . Note that d(n0 ω)/dω evaluated at the center of the graded-index core, and denoted Nc , is the group index of refraction at the center of the core (cf. (4.3.6)). Using this expression, show that ± α ²N Z dZ = − α + 2 n c ω (2 + y ), dω 0 where y is a correction factor for the material dispersion given in (4.4.7b). It is repeated here as λn0 d± . n 0ω d ± = − y= Nc ± dω N c ± dλ

(b) Evaluate dβg (ω)/dω and expand the resulting expression in a power series in ± Z to show that

±

τ( g³ ) ≈ L Nc 1 + ± α − 2 − 2y Z c0

which is (4.4.7a).

α +2

² 2 3 α − 2 − 4y ± 2 + 2 α +2 Z ,

196

4 The Linear Lightwave Channel

15 Optimal value for index profile Starting with

2 ± α −α 2+−22y + ±2 3α −α +2 −2 4y = 0, show that when both ± and y are small and α is larger than one, the optimal powerlaw index profile parameter αequal is given by αequal = 2(1 + y − ±),

which is (4.5.3). 16 Polarization-mode dispersion vector (a) Using the identity AB † B†A† , and differentiating DD† I with respect to , where D is a unitary polarization-state transformation, show that the transformation iDωD† is hermitian. (b) Using the differential relationship D d D d Dω and det D 1 for a unitary matrix, show that the trace of Dω D† is equal to zero, which implies that the eigenvalues of Dω D† sum to zero.

ω

( ) =

=

(ω+ ω) = + ω

|

|=

17 Evaluation of the differential group delay Let the form of the unitary matrix that describes a lossless polarization transformation in a Jones representation be given by (cf. (2.3.59))

»

D = −ab∗

b a∗

½

.

(a) Show that the differential group delay ±τ can be written in terms of the matrix elements as

±τ = 2

À

aωaω∗ + bωb∗ω ,

where the subscript ω indicates a derivative with respect to ω. (b) Using (4.6.12), derive expressions for the components ±¸ τ = (±τ1 , ±τ2 , ±τ3 ) of the polarization-mode dispersion vector in a Stokes representation in terms of a, aω , b, and bω . 18 Wavelength-dependent group velocities Prove, or provide a counterexample to, the following statement: “Every wavelength in an optical fiber propagates at a different group velocity.” 19 Dispersion relationship from ray optics Modes in a slab waveguide can be reconciled with ray theory by letting the ray define the direction of a plane wave propagating in the slab waveguide. In this reconciliation, a mode is formed by the interference with itself of a propagating plane wave that “zig-zags” between the core/cladding interfaces. The direction of the plane wave is shown as the arrow within the core of the fiber in Figure 3.10. This arrow defines a ray associated with the plane wave. The z components of the two interfering plane waves produce a traveling wave, while the transverse components add to produce a standing wave.

4.9 Problems

(a) Using k x

197

= p and k z = β, rewrite p2 + β 2 = (n1 k0 )2

in terms of the components k x and kz of the wavevector k in the slab waveguide. (b) Derive an expression for the angle θ that the plane wave makes with respect to the normal of the core/cladding interface in terms of kx , n, and k0 . (c) Consider a slab waveguide that supports a plane wave with a polarization that is transverse to the direction of propagation. Upon reflection from the core/cladding interface, a consequence of Maxwell’s equations is that this plane wave experiences a phase shift φT E given by

φT E = −2 arctan

±µ

2

sin

θi − (n2 / n1 )

2

Â

cos θi

²

,

where θ is the angle from the normal to the core/cladding interface. Using this expression, determine the total phase shift that the plane wave experiences after two reflections consisting of one reflection from each boundary of the slab waveguide. (d) A guided mode in the slab waveguide is generated whenever the total phase shift φT E after two reflections is equal to m2π , where m is an integer. This condition states that the field adds constructively with itself after a reflection from each interface. Derive this condition and show that

µ(

1 − cos2 θ

)

− (n 2/ n1 )2 = tan(n1k0 a cos θ − mπ/2). cos θ

Values of θ that satisfy this expression are the allowed angles for the rays that produce a self-consistent phase after two reflections. Each of these angles defines a mode with a corresponding propagation constant β given by (3.3.10a). 20 Bandwidth-dependent launch conditions Consider a uniform mode distribution in which the total power is uniformly distributed among all the modes in a fiber. The expressions for the group-delay terms become

µτ ¶ =

1 M

M ¾

m =1

τm and µτ ¶ = 2

1 M

M ¾

m =1

τm2 ,

where M is the number of modes. Suppose that a fiber has the following parameters: V = 5, n 1 = 1.46, ± = 0. 0036, and N 1 = 1.48. (a) Determine the delay spread σinter for the case of a uniform mode distribution across the LP 01, LP11, and LP 21 modes. Use the figure provided in Problem 7 to determine τm for each mode. (b) Determine the delay spread σinter when the power in the LP11 mode is half the power in the LP01 mode and the power in the LP21 mode is half the power in the LP11 mode. (c) Which launch condition produces the smallest value of σinter ? Why?

198

4 The Linear Lightwave Channel

21 Measurement of group-velocity dispersion coefficient A lightwave signal at carrier frequency c is intensity-modulated, leading to a lightwave signal s t A 1 m cos 0t , where 0 is a modulation frequency. The real, positive parameter m is called the modulation index. The signal is transmitted through a single-mode fiber of length L with the dispersion relationship approximated by (4.3.1), which is repeated here as

() = ( +

ω ω )

ω

β(ω)

β(ω + ωc ) = β0 + β1ω + 21 β2 ω2 + · · · , where terms beyond β2 can be neglected. Assuming a linear fiber channel and

neglecting fiber attenuation, do the following. (a) Find an expression for the output intensity signal P (L , t ) = 21 | s (t )|2 at a distance L . This intensity signal has components at several frequencies, which are integer multiples of ω0 . Identify the components and the corresponding frequencies. Hereafter, let Pn (t ) denote the component of P ( L , t ) at frequency n ω0 . (b) First measurement technique. Suppose that the carrier frequency ωc is fixed and that Pn (t ) is measured for ω0 ranging between zero frequency and some suitably high frequency. (The magnitude of Pn (t ) is the time-independent factor that multiplies the time-dependent cosine term at frequency ω0 .) Sketch the magnitude of Pn (t ) as a function of n. Explain how such a technique can be used to measure the magnitude of β2 . Can the sign of β2 be measured using this technique? (c) Suppose that you are given a fiber segment of length L = 80 km with an overall group-velocity dispersion coefficient D (cf. (4.4.14)) lying between +15 and +20 ps/(nm · km) as a function of wavelength. The first measurement technique is used to determine the precise value of D at a wavelength in the region of λ = 1550 nm. To minimize cost and measurement time, the range of modulation frequencies f 0 = ω0/2π should be chosen so that (a) the frequencies are as close to zero frequency as possible, and (b) the modulation frequency span should be as small as possible. Because high-frequency measurement equipment can be costly, criterion (a) is more important than criterion (b). Over what range of f0 must one measure? (d) Second measurement technique. Suppose that ω0 is fixed, and the phase of the cosinusoidal term Pn (t ) is measured for two closely spaced values of the carrier frequency ωc . Explain how such a measurement can be used to measure the magnitude of β2 . Can the sign of β2 be measured using this technique? (e) Suppose the second measurement technique is used to characterize the fiber described in part (c) using f 0 = ω0 /2π = 1 GHz. The phase of Pn (t ) is measured at two values of the carrier frequency f c = ωc / 2π that differ by 10 GHz. What is the expected phase difference measured in radians between the two measurements? 22 Wavelength autocorrelation function Let s t y t e i2π fc t be a modulated wide-sense-stationary lightwave signal. Let S f be the Fourier transform of a finite segment of a realization of s t .

( )

()= ()

()

4.9 Problems

199

(a) In the limit as the segment length goes to infinity (cf. (2.2.55)), show that the frequency autocorrelation function can be written as

µ S( f ) S( f ³)¶ =

¼ ∞¼ ∞

³ ei2π( f − f )t R (t

− t ³ )ei2π( f − f ³ )(t −t ³) dt dt ³, c

−∞ −∞ where R (t − t ³ ) = µ y(t )y ∗ (t ³ )¶ is the autocorrelation function of the enve³ ³ lope y(t ), and a superfluous term ei2π f t e−i2π f t = 1 has been included in the

integrand to enable factoring. (b) By making the change of variable t be separated, leading to

− t ³ = u, show that the double integral can

µ S( f )S( f ³ )¶ = S ( fc − f )δ( f − f ³ ), where S ( f ) is the Fourier transform of R (t ). This is (4.5.5) expressed using frequency instead of wavelength.

23 Wavelength-dependent group delay Refer to Figure 4.9, which shows the group-delay factor for two linearly polarized modes. (a) Find the value of the normalized frequency V for which the group-delay term for the LP01 mode is equal to the group-delay term for the LP11 mode. (b) For this value of V , is the group-velocity dispersion coefficient the same for every mode? Explain. (c) Over the range of values shown in Figure 4.9, is there a value of V for which the group-velocity dispersion of the LP01 mode is equal to the group-velocity dispersion of the LP11 mode? 24 Polarization-dependent loss Any nonhermitian full-rank square matrix T can be factored as the product of two matrices,

T = DA, where

A2 = T †T and D = TA−1 ,

and the matrix A must also be full-rank. Derive the transformation matrix T for an ideal polarizer that passes all of the signal along one axis and no signal along the orthogonal axis. Show that the resulting matrix A is not full-rank and thus is not invertible for this extreme case of polarization-dependent loss. 25 Worst-case first-order polarization-mode dispersion A single realization Pout t of the pulse response for first-order polarization-mode dispersion (cf. (4.6.19)) can be written as

()

Pout (t ) = F Ptotal(t

− ±τ/2) + (1 − F ) Ptotal(t + ±τ/ 2).

200

4 The Linear Lightwave Channel

In this expression, F is a realization of a uniform random variable for the proportion of the total lightwave signal power in each polarization mode and ±τ is a realization of a maxwellian probability density function. Show that for any value of ±τ , the root-mean-squared width of the sum of the two pulses is maximized when F = 1/ 2.

5

The Nonlinear Lightwave Channel

An ideal lightwave channel is linear. However, nonlinearities can significantly modify the behavior of a practical lightwave channel. Indeed, an unwanted nonlinearity can be the primary limitation on the performance of the lightwave communication system. As some critical system parameters are varied, a nonlinearity that was a minor impairment might quickly become the cause of complete system failure. Thus it is necessary to understand the causes of nonlinearities, and to either accomodate them or avoid them by proper design. Mitigation of a nonlinear distortion on the modulation is possible, in principle, either at the transmitter or at the receiver. For a nonlinearity that is only a small perturbation of a linear model, this mitigation is amenable to tractable algorithms. For a nonlinearity that is a large perturbation, mitigation is still possible, but is subject to practical cost and complexity constraints. The purpose of this chapter is to understand and explain nonlinear effects due to propagation within an optical fiber against the backdrop of the linear framework developed in Chapter 4. Other nonlinear effects due to lightwave amplification are considered in Section 7.7.4. The most important nonlinearity affecting lightwave signal propagation, and the primary topic of this chapter, is an intensity-dependent shift in the carrier phase of a lightwave. Different manifestations of this basic effect are important under different operating conditions and range from the generation of new frequency components to the spectral broadening of lightwave signals caused by mixing. The waveguide modes derived in Chapter 3 are valid for an index of refraction that does not change with the strength of the electric field E . This requires the dependence between the electric field E and the material polarization P to be linear. A modern single-mode fiber core has an extremely small cross-sectional area. For this kind of fiber, a launch power on the order of milliwatts produces a large lightwave intensity1 within the core that can cause a nonlinear, intensity-dependent change in the index, and consequently an amplitude-dependent change in the carrier phase of the lightwave signal. This phase change is small over a short distance, but can be significant and disruptive over a long propagation distance. Moreover, as will be described, for a wavelength-multiplexed system consisting of multiple datastreams modulated onto separate single-mode wavelength subcarriers, the instantaneous signal intensity in one subcarrier can change the fiber index, thereby 1 For a fiber with a cross-sectional area of 50 intensity of 100 MW/m2 within the core.

µm2 , a launch power of 5 milliwatts (5 mW) generates an

202

5 The Nonlinear Lightwave Channel

creating anomalous time-varying phase changes for signals in other subcarriers. The same nonlinearity can mix several wavelength subcarriers and redistribute the signal energy among the subcarriers. At the same time, the linear wavelength-dependent group delay, discussed in Chapter 4, disperses the signal energy in time. Individually, either the nonlinear intensity-dependent behavior or the linear wavelength-dependent behavior can be studied analytically. However, the interplay between these two impairments produces a complex form of intensity-dependent and wavelength-dependent distortion. For this realistic case, an accurate analysis requires numerical methods. This chapter discusses the physical mechanisms that produce nonlinearities and thereby change the propagation characteristics of the lightwave channel. After describing the nonlinear interactions that can occur within a fiber, the governing partial differential equation for the most significant nonlinearity is derived. This equation, known as the nonlinear Schrödinger equation, is used to develop simple descriptions that illustrate the kinds of nonlinear distortion and nonlinear interference that can occur in a lightwave channel.

5.1

Anharmonic Material Response For a low-intensity incident lightwave field, the material response can be modeled as a collection of harmonic oscillators excited by the incident field (cf. (2.3.8)), with the frequency of the scattered lightwave field equal to the frequency of the incident field. This scattered field is the source of the material polarization and hence of the index of refraction. As the field intensity increases, the oscillators describing a dielectric material response can become anharmonic, leading to a nonlinear response of a dielectric material.

5.1.1

Wave-Optics Description

Within a wave-optics description, an anharmonic material response means that the relationship P = f (E ) between the electric field and the material polarization field is no longer linear, with the index of refraction now being a function of the field strength. The functional form of a weak nonlinearity f (x ) near the value x = 0 usually can be approximated by the first several terms of a power-series expansion, f (x ) = f ± (0)x

+ 21 f ±± (0)x 2 + 61 f ±±±(0)x 3 + · · · .

(5.1.1)

The strength of the nonlinearity is studied with respect to the relevant terms of this series. Consider an isotropic dielectric material, with x representing one component E (t ) of a time-varying incident electric field and f (x ) representing the corresponding response due to the material polarization P (t ). When only the linear term of this expansion is significant, the material response can be modeled as a harmonic oscillator with a linear restoring force as given in (2.3.8). When higher-order nonlinear terms are significant, the material response becomes an anharmonic oscillator, which is an oscillator with a nonlinear restoring force.

5.1 Anharmonic Material Response

Substituting these terms into (5.1.1) gives

µ



P (t ) = ε±0 χ²³E (t ´) + ε0χ (2) E (t )2 + 4ε0 χ (3) E (t )3 + · · · , ± ²³ ´ P L( t ) PNL ( t )

203

(5.1.2)

where the right side is separated into a linear polarization term P L(t ) and a nonlinear polarization term P NL (t ). For later convenience, the constants for the nonlinear terms are written as ε0χ (2) and 4ε0χ (3) . The constants χ (2) and χ (3) are called nonlinear susceptibility coefficients. The nonlinear terms that contribute to the overall material polarization field P (t ) depend on the type of material. An ideal bulk silica glass is an isotropic material. The properties of an isotropic material are not affected by replacing the position vector r with −r. A material with this property is called a centrosymmetric material. For such a material, the material response P (t ) does not depend on the orientation of E (t ) within the material, which means that a change in the sign of E (t ) must produce a corresponding change in the sign of P (t ). But a change in the sign of E (t ) would not produce a change in the sign of the quadratic nonlinear term of P (t ). Therefore, a quadratic nonlinearity cannot exist in a centrosymmetric material such as glass. Consequently, the lowest-order nonlinear term present in (5.1.2) is the cubic term.

5.1.2

Photon-Optics Description

Expression (5.1.2) is a wave-optics description of the nonlinear material polarization field P NL (t ). Photon optics provides a complementary description of the same nonlinear effect. A photon at a wavelength λ = 2π c/ω has energy E = ±ω and a momentum p = h/λ = ±k (cf. (1.2.3)), where k = nω/c0 is the wavenumber in the medium. This section describes nonlinear interactions in the context of preserving the energy and the momentum of the interacting photons. Some nonlinear interactions occur between photons and other quanta of energy within the material (cf. Section 3.1.2). Other nonlinear interactions within a host material occur between photons. All of these nonlinear interactions must preserve the total energy. Because the energy E = ±ω of a photon is directly proportional to the frequency of the carrier, the lightwave carrier frequencies ω that are able to interact are constrained. The total momentum p = ±k of the interacting quanta, where k = n ω/c0 , must also be conserved. For a well-guided mode in an optical fiber, the propagation constant β is approximately equal to the wavenumber k. This means that β ≈ p/±, implying that the propagation constant β must also be conserved or nearly so. The propagation constant β and the momentum for other forms of quanta are related by a similar expression. The conservation of the propagation constant β for interacting quanta during a nonlinear interaction is called phase matching. Intuitively, the phase-matching condition allows waves with matched values of β to mix over a long interaction length within the fiber. Were β equal to k, the propagation constant β would be equal to p/± and conserving β would be equivalent to conserving the momentum p. This is true for a plane wave in an

204

5 The Nonlinear Lightwave Channel

unbounded medium, but is not exactly true for a guided mode in an optical fiber because every guided mode has both an axial field component and a radial field component (cf. Chapter 3). This causes β to be slightly different than k. Consequently, a nonlinear interaction may still be possible when the phase-matching condition is not exactly satisfied because that interaction may exchange the momentum in the axial direction described by β with the momentum in the radial direction. This exchange preserves the overall vector momentum but does not necessarily preserve the component of the momentum in the axial direction described by the phase-matching condition on β.

5.2

Kinds of Nonlinearities This section provides an overview of the kinds of nonlinear interactions that can occur within an optical fiber. For a dielectric material such as glass, the most significant nonlinearity is the optical Kerr nonlinearity, which will be the primary topic of this chapter, but is introduced only briefly in this overview section. Other nonlinear effects, with different causes and different effects, may also be significant. These include Raman scattering and Brillouin scattering. These nonlinear effects are discussed briefly in this section, and not again. These three effects are the only nonlinear effects that will be discussed.

5.2.1

The Kerr Nonlinearity

The most significant nonlinearity for a lightwave communication system is the Kerr nonlinearity. This overview section briefly introduces the Kerr nonlinearity and several other nonlinear interactions. Later, Section 5.3 presents a detailed discussion of how lightwave propagation in an optical fiber is modified by the presence of the Kerr nonlinearity. The Kerr nonlinearity results from a nonresonant, nonlinear interaction between the lightwave field and the electronic energy transitions in the glass used for an optical fiber. Because glass is a centrosymmetric material, the nonlinear material response, expressed as the material polarization , does not have a second-order restoring term. Therefore, with the first-order linear term implicit, the Kerr nonlinearity is due primarily to the most significant nonlinear restoring term, which is the third-order, cubic term in (5.1.2). Linear propagation in a lossless optical fiber is characterized by an index-dependent propagation constant β and a z-dependent phase shift eiβ z as described in Chapter 4. The Kerr nonlinearity causes intensity-dependent changes in the index of refraction of an optical fiber, leading to a corresponding signal-dependent change in the carrier phase of the propagating lightwave signal. These intensity-dependent changes can distort a datastream modulated onto a single carrier or can cause datastreams modulated onto multiple subcarriers to mix, thereby creating new frequency components. For a single-carrier channel, phase modulation caused by the Kerr nonlinearity is called self-phase modulation. For a channel with multiple subcarriers, the intensity of one subcarrier can change the index seen by other subcarriers. This form of nonlinear

P

5.2 Kinds of Nonlinearities

205

interchannel interference is called cross-phase modulation. Section 5.3 analyzes the effect of a Kerr nonlinearity on the propagation constant β . Section 5.4 analyzes the effect of the Kerr nonlinearity on the resulting signal distortion. When several wavelengths interact through the Kerr nonlinearity, mixing is possible as allowed by the conservation of energy and momentum of the interacting photons. This mixing can produce new frequency components that are not present in the original signal. The cubic nature of the Kerr nonlinearity requires three frequencies to mix to create a fourth. This resulting distortion is called four-wave mixing. It produces new frequency components through the nonlinear mixing of different subcarriers. These new frequency components are almost always unwanted. Four-wave mixing is discussed in Section 5.5. 5.2.2

Raman Scattering

The second nonlinearity described in this section is Raman scattering. This scattering is caused by the nonlinear interaction of a lightwave field with quantized high-frequency vibrational/rotational molecular motions within a glass. The nonlinearities arising from these molecular interactions are described using a form of the expansion given in (5.1.2). Because Raman scattering is caused by a different physical mechanism than the Kerr nonlinearity, the consequences are different. For a sufficiently intense lightwave field, the nonlinearity causes the scattered field to contain new frequency components called sideband components given by ωs = ωc + mωr , where ωc is the incident frequency, ωr is a frequency describing the effect of the Raman scattering, and m takes on integer values, both positive and negative. Within photon optics, this expression is the conservation of energy between the incident photon ωc , the scattered photon ωs , and the allowed frequencies m ωr of a set of quantized local lattice vibrations indexed by m. The associated quanta of energy are called phonons. When a narrowband information-bearing signal is modulated onto a lightwave carrier, each of the sidebands generated by the nonlinearity is also modulated by the same information-bearing signal. The fundamental frequency ωr of the quantized lattice vibration depends on the constituent molecular species and can be calculated using quantum theory. It is approximately 14 THz for silica glass. The number of significant new frequency components, as indexed by m, depends on the intensity of the lightwave field. Because components of the scattered field are not at the same frequency as the incident field, the energy at the original lightwave frequency is not conserved. Raman scattering is an inelastic scattering process that is distinguished from an elastic linear scattering process, such as Rayleigh scattering (cf. Section 3.1.2), for which the frequency of the scattered field is equal to the frequency of the incident field. Raman scattering has two forms, called Stokes scattering and anti-Stokes scattering. For m = −1, there is a new field component at the frequency ωs = ωc − ωr due to an intensity-dependent energy transfer from the incident field to the material. This interaction is called Stokes scattering. For m = 1, the increase in the frequency of the new field component ωs = ωc +ωr as compared with the frequency of the incident field component at ωc is caused by an intensity-dependent energy transfer from the material

206

5 The Nonlinear Lightwave Channel

to the new field component. This interaction is called anti-Stokes scattering. The energy for this transfer must be supplied by thermally excited energy states within the material. Because the material must supply energy for anti-Stokes scattering, this process is much weaker than Stokes scattering. The ratio of the strength of the Stokes component to the strength of the anti-Stokes component depends on the probability that the energy states of the material that interact with the lightwave field are occupied, a probability that is temperature-dependent. When in thermal equilibrium with the environment, this probability is determined by the Boltzmann factor e−h fr / kT0 given in Section 6.1, where kT0 is the mean thermal energy of the material, T0 is the temperature, and fr = ωr /2π . For Stokes scattering, the energy difference between the scattered field at ωs and the incident field at ωc is the energy absorbed by the material in the form of phonons. When the incident field intensity at ωc exceeds a material-dependent threshold, the generation of the light at frequency ωs = ωc − ωr can become self-reinforcing. An incident lightwave signal at ωs can then be amplified using the signal at ωc as the source of the energy for the amplification. This type of gain process is called stimulated Raman scattering and requires a high-power lightwave as the energy source. Therefore, it is not generally observed in a conventional lightwave system. However, by launching a high-constant-power lightwave at ωc into a fiber, this effect can be used to amplify a modulated lightwave signal at ωs , where ωs = ωc −ωr . This type of lightwave amplifier is briefly discussed in Section 7.4.5.

5.2.3

Brillouin Scattering

An optical fiber can support a random distribution of thermally generated acoustic modes. Each acoustic mode alternately densifies and rarifies the medium, thereby changing the index of refraction on the spatial scale of the wavelength of a supported acoustic mode. The third nonlinearity, Brillouin scattering, is caused by the scattering of a lightwave field from these traveling acoustic waves. These traveling acoustic waves are distinguished from the localized lattice vibrations that generate Raman scattering. While Brillouin scattering can occur both in the forward and in the backward direction, we focus on backward scattering because that kind of scattering is problematic. Referring to a photon-optics description, Brillouin scattering becomes strongest when the phase-matching condition is satisfied. This condition can be written as

βs = βc + K ,

(5.2.1)

where βc is the propagation constant of the incident lightwave, βs is the propagation constant of the scattered lightwave, and K is the propagation constant of the acoustic wave taken to be in the same direction as the incident lightwave. For a well-guided mode, the propagation constant βc of the incident lightwave is approximately 2π/λ (cf. Section 4.3.1), where λ is the lightwave wavelength in the core. Similarly, the propagation constant K of the acoustic wave is approximately 2 π/±a , where ±a is the acoustic wavelength in the core.

5.2 Kinds of Nonlinearities

207

The most significant form of Brillouin scattering occurs when the propagation constant βs of the backscattered wave has the same magnitude and the opposite sign as the propagation constant βc of the incident wave. Using (5.2.1) and βc = −βs gives K = 2βs . This means that the acoustic wavelength ±a is half the lightwave wavelength λ . Because there is a distribution of thermally generated acoustic waves, there is a range of propagation constants that can satisfy a phase-matching condition and a corresponding range of wavelength components of the incident lightwave field that can be backscattered. The periodic index perturbation caused by each guided acoustic wave is seen by the lightwave as a Bragg diffraction grating that travels at the phase velocity va of the acoustic mode. Because the index grating is moving, it produces a Doppler shift ²a = va /± a in the backscattered lightwave field so that ωs = ωc + ²a . This expression is simply the conservation of energy in a photon-optics description expressed in terms of the frequency. For typical fibers, the frequency shift ²a is on the order of one to ten gigahertz. As the lightwave power increases, the lightwave can induce density changes in the material, thereby itself causing the generation of acoustic waves. Beyond a materialdependent threshold, this nonlinear electrostrictive force can cause the interference between the backscattered lightwave field and the forward-propagating lightwave field to become self-reinforcing, resulting in stimulated Brillouin scattering. This kind of backscattering can be a significant signal impairment both because it depletes the forward-propagating lightwave energy and because it creates a backward-propagating lightwave that must be suppressed before entering the lightwave source at the transmitter. For a typical silica glass fiber, the large spatial extent of the index diffraction grating created by the acoustic wave means that the range of frequencies that can be efficiently scattered by stimulated Brillouin scattering is much smaller than the range of frequencies that can be scattered by stimulated Raman scattering, which relies on a local molecular interaction, as was discussed in the previous subsection, or by a Kerr nonlinearity, which relies on electronic energy transitions. The lightwave signal loses its energy due to Brillouin scattering only in a small bandwidth on the order of tens of megahertz. However, the large spatial extent of the index diffraction grating also means that the strength of stimulated Brillouin scattering can be several orders of magnitude larger than the strength of stimulated Raman scattering. The onset of stimulated Brillouin scattering can be inhibited by suppressing the power-density spectrum of the modulating signal over the spectral bandwidth of the Brillouin scattering. Modulation formats such as intensity modulation for which the signal is large at the carrier frequency (cf. (1.3.5)) are susceptible to stimulated Brillouin scattering. The large signal energy at the carrier frequency can be strongly backscattered towards the transmitter, leading to instabilities. For the same total power, a lightwave modulation format that uses a more uniform power-density spectrum over a larger bandwidth has less energy near the resonance frequency and so is less susceptible to stimulated Brillouin scattering.

208

5 The Nonlinear Lightwave Channel

5.3

Signal Propagation in a Nonlinear Fiber The Kerr nonlinearity in a glass is a nonresonant interaction of the lightwave field with electronic energy transitions. Because the interaction is nonresonant, the nonlinear response is treated as instantaneous with respect to typical lightwave modulation frequencies. Because glass is a centrosymmetric material, the lowest-order nonlinear term PNL (t ) caused by the material polarization in (5.1.2) is the cubic or Kerr term. Although always present, the Kerr nonlinearity is significant only for strong signals because it depends on the cube of the lightwave amplitude. Considering the nonlinear material polarization caused by a single electric field component and considering only the time dependence, we can write

PNL (t ) = 4ε0χ (3) E 3 (t ).

(5.3.1)

In an isotropic material such as glass, an electric field component at a single frequency E (t ) = Re[ Ae iωc t ] with the electric field amplitude A = | A |eiφ generates a corresponding nonlinear polarization component PNL (t ). The nonlinear material polarization component must satisfy both the conservation of energy and the conservation of momentum. The condition on the energy is determined using cos3(ωc t ) = 41 cos(3ωc t )+ 43 cos (ωc t ). The resulting nonlinear material polarization component PNL (t ) has a term PNL (ωc ) at ωc and a term PNL (3ωc ) at 3ωc given by PNL (ωc ) = 3ε0 χ ( 3) A | A| 2,

(5.3.2a)

= ε0χ (3)| A|3ei3φ .

(5.3.2b)

PNL (3ωc ) = ε0χ (3) A3

Therefore, for a single electric field component at a single frequency ωc , the Kerr nonlinearity modifies the material response as seen by the incident frequency ωc given by (5.3.2a), and produces a new frequency component at 3 ωc given by (5.3.2b) taking energy away from the original signal at ωc . For a nominal carrier wavelength in the infrared of 1500 nanometers, this triple-frequency term has a wavelength of 500 nanometers, which is green light.

5.3.1

Phase Matching

Expression (5.3.2) is necessary, but not sufficient. The nonlinear material polarization term is also subject to the additional condition of phase matching. The right side of (5.3.2a) is a cubic function A A A ∗ of the complex electric field amplitude A. The term A corresponds to an electric field that has a spatial dependence given by e−iβ(ω c )z . The term A∗ has a spatial dependence given by eiβ(ωc ) z . Therefore, the product term A A A ∗ has a spatial dependence given by e−iβ(ω c )z . For (5.3.2a), the nonlinear polarization term PNL (ωc ) at frequency ωc in (5.3.2a) generates an electric field component with a frequency ωc and a spatial dependence of e−iβ(ω c )z . Therefore this nonlinear polarization term is phase-matched.

5.3 Signal Propagation in a Nonlinear Fiber

209

The right side of (5.3.2b) is a cubic function A A A and has a spatial dependence of e−iβ(3ωc ) z . The nonlinear polarization term PNL (3ωc ) on the left side of (5.3.2b) has a frequency 3ωc and a propagation constant β(3ωc ). Equating the propagation constants on the left and right sides of (5.3.2b) gives the phase-matching condition as β(3ωc ) = 3β(ωc ). This condition is not satisfied for well-guided modes in a standard fiber. In such a fiber, the propagation constant β(ω) is well approximated by n 1(ω)k0 (cf. Figure 4.5), where n1 (ω) is the frequency-dependent index of the core and k0 = ω/c0 is the freespace wavenumber. Using this expression, the phase-matching condition would require n1 (3ω) = 3n1 (ω) for some wavelength of interest. Examining Figure 4.6, the index varies by less than 2% over the wavelength range of 500 nm to 1500 nm used for standard lightwave communication systems. Therefore, the phase-matching condition cannot be satisfied, meaning that the term at 3ωc is not phase-matched and hence is not efficiently generated. Moreover, the weak signal at 3 ωc is at a wavelength that is strongly scattered in the fiber (cf. Figure 3.2). It is therefore ignored in the further analysis. Only the term PNL (ωc ) given in (5.3.2a) will be studied. To study phase matching when multiple subcarriers are present, consider a lightwave field that consists of the sum of three frequency components,

E (t ) = A j cos(ω j t ) + A k cos (ωk t ) + A ³ cos (ω³ t ).

(5.3.3)

When this expression is substituted into (5.3.1), the resulting nonlinear polarization NL ( t ) contains frequencies at ω j ² ωk ² ω³ generated by the mixing of the three frequencies. A primary contribution to NL (t ) will be at frequency ωi , provided that the four frequencies {ωi , ω j , ω k , ω ³ } satisfy

P

P

ωi = ω j + ωk − ω³

(5.3.4a)

and also satisfy the phase-matching condition

βi = β j + βk − β³. This is written simply by replacing ω in (5.3.4a) with β .

(5.3.4b)

These expressions are a generalization of (5.3.2), which is based on three carriers at the same frequency. That analysis showed that the term 3ω in (5.3.2b) cannot be phase-matched. Similarly, when the carrier frequencies are distinct, there are only specific combinations of ω j ² ωk ² ω³ that can be phase-matched, with the term given in (5.3.4a) being the primary contribution because that choice of signs yields the most readily achieved phase-matching condition in an optical fiber. Other combinations of signs such as ω j + ωk + ω³ cannot be phase-matched and are not considered.

5.3.2

Intensity-Dependent Index Change

To convert PNL (ωc ) = 3ε0 χ (3) A| A |2 into an intensity-dependent index change, write PNL (ωc ) = ε0´χ A so that ´χ = 3χ ( 3) | A |2 describes the nonlinear change in the

210

5 The Nonlinear Lightwave Channel

total susceptibility. This nonlinear change is here taken to be real.2 Using (2.3.28), write | A|2 = 2η0 I / n, where n is the index of refraction in the absence of the nonlinearity, and η = η0/ n is the impedance of the glass material. This gives

(3)

´χ = 6χ n η0 I

(5.3.5)

as the change in the susceptibility due to the nonlinearity. The corresponding change ´n in the index is related to ´χ by

∂ n ´χ = ´χ , ´n = ∂χ 2n

(5.3.6)

where ∂ n/∂χ = 1/2n follows from n2 = (1 + χ) (cf. (2.3.12)). Substituting ´χ from (5.3.5) into this expression gives the intensity-dependent change in the index ´ n as n2 I , with the constant n2 given by

3η0 χ ( 3) . n2 The intensity-dependent index n( I ) is then

n2 =.

n( I ) = n + n2 I .

(5.3.7)

(5.3.8)

The material constant n 2, which is the primary consequence of the Kerr nonlinearity, is called the nonlinear index coefficient and has units of meters-squared per watt because the intensity I has units of watts per meter-squared (W/m 2). This parameter is a measure of the strength of the nonlinearity and is approximately 2. 5 × 10−20 m2/W at a wavelength of 1500 nm for pure or weakly doped silica glass. 5.3.3

Nonlinear Propagation Constant

The intensity-dependent perturbation ´ n = n2 I in the index of refraction given in (5.3.8) generates an intensity-dependent change ´β(z , t ) in the propagation constant of a fiber mode. This section shows that ´β(z , t ) is proportional to the squared magnitude |a (z, t )|2 of the narrowband complex signal envelope (cf. (3.3.47)), written as The proportionality constant

´β(z, t ) = γ|a(z , t )|2 .

(5.3.9)

γ =. nA2k0

(5.3.10)

eff

is defined as the Kerr nonlinear fiber coefficient and Aeff is the effective area of the fiber, to be defined for this purpose. The constant γ is a measure of the strength of the Kerr nonlinearity within a fiber. For typical single-mode fibers, values of γ range from one to two radians/watt · kilometer.3 2 Although the nonlinearity can produce intensity-dependent absorption through the imaginary part of

this effect is typically negligible for silica glass at the near-infrared wavelengths used for lightwave communications. 3 In specially designed fibers that intentionally enhance this nonlinear effect to create devices such as lightwave mixers, γ can be in excess of 20 radians/W-km.

χ,

5.3 Signal Propagation in a Nonlinear Fiber

211

Effective Area

The effect of the intensity on the nonlinear change ´β(z , t ) in the propagation constant can be described by defining an effective area that decreases with increasing intensity. In this way, the change ´β(z , t ) in the propagation constant is expressed indirectly in terms of an equivalent change in the cross-sectional area of the fiber core. To derive the effective area, first consider a single well-guided mode with a transverse dependence f (x ) in a slab waveguide that supports only one mode. Using (3.3.7a) and (3.3.10a), the form of the solution in the core satisfies d2 f (x ) dx 2

+

µ

n21 k20 − β2



f (x ) = 0.

(5.3.11)

This differential equation is unchanged if n21 k20 − β 2 is held constant. A small perturbation ´n(x ) in n 1 can be offset by a small change ´β in the propagation constant. Then, up to first-order terms (with z and t suppressed), it follows that n k2

´β = 1β 0 ´n (x ).

Because β must be independent of x , this first-order perturbation analysis replaces the x-dependent ´ n(x ) by a weighted average based on the intensity | f (x )|2 with

´β =

·∞

n1 k02 −∞ ´n (x )| f (x )|2 dx ∞ | f (x )| 2 dx . β ·−∞

This method of analysis extends to a single-mode cylindrical fiber with ´ n(x ) replaced by ´ n(r, ψ). The effect of the spatially varying index ´n(r, ψ) on ´β is treated using a weighted spatial average over the cross-sectional area A of the fiber. To proceed as for the slab waveguide, multiply each side by the field intensity | U (r, ψ)|2 (cf. (2.3.29)) and integrate over A. This gives the change ´β in the propagation constant caused by the intensity-dependent nonlinear index change as

·

2 ´β = β A ´·n (r, ψ)|U (r,2 ψ)| dA . A | U (r, ψ)| dA Because ´n (r, ψ) = n2 I (r, ψ) = n2 |U (r, ψ)|2 , and n1k 0 ≈ β (or b ≈

n1k02

1) for a

well-guided mode (cf. Figure 4.3), this becomes

· | |4 ´β = n2 k0 ·A U (r, ψ) 2 dA . A |U (r, ψ)| dA Define the effective area Aeff of a single-mode fiber as ¶2 µ· ∞ · 2π 2 | U ( r , ψ)| r dr d ψ Aeff =. ·0∞ ·02π |U (r, ψ)|4r dr dψ 0 (· ∞0|U (r )|2r dr )2 = 2π ·0∞ , 4 0 | U (r )| r dr

(5.3.12)

(5.3.13)

212

5 The Nonlinear Lightwave Channel

where the integral over ψ evaluates to 2π because the only mode supported by a singlemode fiber is azimuthally symmetric. Including both the time dependence and the z dependence in ´β gives

´β(z, t ) = n2k0 PA(z, t ) , eff

·

(5.3.14)

where P (z , t ) = A |U (r, t )|2 dA = | a(z , t )| 2 is the lightwave power in the single mode with a complex envelope | a(z , t )|2. Substituting P (z, t ) = |a(z , t )|2 into (5.3.14) yields (5.3.9) as was stated earlier. The effective area is a scaling factor that accounts for the dependence of the transverse mode profile on the strength of the nonlinear coefficient γ for a single-mode fiber. Decreasing the effective area increases the nonlinear coefficient and thus increases the strength of the nonlinear interaction. Increasing the effective area reduces the nonlinear coefficient and thus reduces the strength of the nonlinear interaction.

Nonlinear Phase Shift

The total phase shift φNL (L ) caused by the change ´β(z, t ) in the propagation constant β over a distance L due to the Kerr nonlinearity is called the nonlinear phase shift. It is given by

φ (t ) =

¸L

NL

Using (5.3.9), this becomes

φ (t ) = NL

0

¸L 0

´β(z , t )dz .

(5.3.15a)

γ|a (z, t )|2 dz ,

(5.3.15b)

which is a time-dependent phase shift in the signal at distance L caused by the timedependent modulation of the complex envelope a(z , t ). The nonlinear phase shift is an intensity-dependent form of nonlinear distortion. This means that a large intensity in any wavelength component of a signal can cause unwanted spreading of the spectrum. By adding the nonlinear change ´β(z , t ) in the propagation constant to the unperturbed value of β in the absence of the nonlinearity, the phase φ(t ) of the complex signal envelope is modified to read

φ(t ) = ωc t − (β + ´β(z , t ))z. The term ´β(z , t ) is always nonnegative, so the nonlinear phase shift is always nonnegative. The phase shift causes an instantaneous frequency ω = dφ(t )/dt now given by

ω( z, t ) = dtd (ωc t − β z − ´β(z , t )z ) = ω − γ dP (z , t ) z , c

dt

(5.3.16)

where (5.3.9) has been used and P (z, t ) = |a(z , t )|2 is the instantaneous pulse power at time t and distance z. At the leading edge of a pulse, dP /dt is positive, so the

5.3 Signal Propagation in a Nonlinear Fiber

213

instantaneous frequency is smaller than the carrier frequency ωc and the instantaneous wavelength is larger than the carrier wavelength λc . The time-varying frequency change at the leading edge of the pulse is called a negative chirp or a red-shift because the instantaneous frequency decreases and the wavelength shifts towards longer wavelengths. At the trailing edge of a pulse, dP /dt is negative so that the instantaneous frequency is larger than the carrier frequency ωc , and the instantaneous wavelength is shorter than the carrier wavelength λc . The time-varying frequency change at the trailing edge of the pulse is called a positive chirp or a blue-shift because the instantaneous frequency increases and the wavelength shifts towards shorter wavelengths. The magnitude of the frequency shift is proportional to the time derivative of the pulse power P (t ), and hence depends on the shape of the pulse. 5.3.4

Characteristic Lengths

Signal propagation in a nonlinear fiber depends on the launched lightwave power, on the attenuation in the fiber, and on the linear dispersion and polarization. Four characteristic length scales quantify the effect of these various parameters as discussed in Section 5.3.5. The nonlinear length L NL quantifies the magnitude of the nonlinear phase shift. The effective length Leff quantifies the attenuation in the fiber. The dispersion length L D quantifies the amount of linear dispersion relative to the timewidth of an input pulse. The walk-off length measures the temporal separation of two pulses propagating in two separate subcarriers at two different group velocities. These length parameters depend on the fiber characteristics and system parameters such as the average launched power and the pulse width. To derive the first two length scales, the change in the propagation constant ´β is written in terms of the attenuating z-dependent mean lightwave power P (z ) so that

´β( z) = γ Pin e−κ z, where Pin is the mean input power and κ is the attenuation coefficient.4 Therefore, the accumulated nonlinear phase shift over the length of the fiber, denoted by φ (L ), is ¸L φ (L ) = γ Pin e−κ z dz . (5.3.17) NL

NL

0

The nonlinear length is defined as L NL

=. γ P1 .

(5.3.18)

in

The nonlinear length is not defined until the launch power is specified. The effective length is defined as

. L eff =

¸

0

L

e−κ z dz

µ ¶ = κ −1 1 − e−κ L .

(5.3.19)

/ = 4.34 = 0. 046 km− 1. Then

4 This form for the attenuation is expressed in terms of Nepers/km which is a factor 10 log 10 e

smaller than κ expressed in dB/km (1 Neper/km = 4.34 dB/km). For 0.2 dB/km, κ γ ≈ 1–2 radians/(W · km).

214

5 The Nonlinear Lightwave Channel

The nonlinear phase shift given in (5.3.20) is expressed in terms of a ratio of these two length scales, (5.3.20) φNL ( L) = LLeff = γ L eff Pin . NL The smaller the nonlinear length, the larger the nonlinear phase shift; the smaller the effective length, the larger the attenuation and thus the smaller the nonlinear phase shift. If the attenuation is sufficiently large that κ L is much larger than one, then, using (5.3.19), the effective length L eff is approximately the reciprocal κ −1 of the attenuation per unit length measured in km−1. Substituting (5.3.18) into (5.3.20) and using L eff ≈ κ −1 , the maximum nonlinear phase shift is approximately φNL (L ) ≈ Pin . (5.3.21)

κ/γ

When φNL is much smaller than one, the nonlinear phase shift is insignificant, and linear methods accurately describe lightwave signal propagation. This condition requires that the mean lightwave input power Pin be much smaller than κ/γ . For typical optical fibers, κ/γ is on the order of tens of milliwatts. As Pin approaches this value, the nonlinear phase approaches one radian, leading to significant distortion from an intensity-dependent phase shift. The third length scale, called the dispersion length, describes the effect of linear dispersion. This length scale is defined as the distance at which the root-mean-squared timewidth σintra of the wavelength-dependent delay spread is equal to the root-mean. squared timewidth Tin of the input pulse s (t ), where Tin = Trms (0) and √ σintra = L D σλ| D| (cf. (4.5.7)). This means that the width of the pulse at L D is 2 larger than the root-mean-squared timewidth of the launched pulse. Solving for L D gives LD

=. σ T|inD| .

(5.3.22)

λ

The dispersion length is not specified until the modulation pulse is chosen. For a coherent carrier, the spectral bandwidth σλ of the modulated lightwave signal is equal to the spectral bandwidth of the modulating data waveform. For this case, σλ /λ = σ f / f , σω = 2πσ f , and Tin ≈ 1/σω so that the dispersion length can be written as LD

2

= (λ2/2πTcin)σ |D | = Tβin , 2 ω

(5.3.23)

where (4.4.16) has been used. The fourth length scale is the walk-off length. This is defined as the distance along the fiber after which two pulses launched at the same time in adjacent subcarriers are separated in time by the root-mean-squared timewidth Tin of the input pulse. In a system with multiple evenly spaced subcarriers, the frequency difference ´ω between adjacent subcarriers produces a differential group delay ´τ between pulses in those subcarriers. But ´τ = ´β1 z by (4.3.3) and ´β1 ≈ β2 ´ω by (4.3.2c), so

´τ ≈ β2 ´ω L = D ´λ L ,

(5.3.24)

5.3 Signal Propagation in a Nonlinear Fiber

215

Time Lwo t = Lwoβ2 Δ ω

t =t 1

Subcarrier zero t =0

Subcarrier one

z=0

z = Lwo Space

Figure 5.1 Three snapshots in time of two pulses in space on two separate subcarriers. The

walk-off time L woβ2 ´ω is the time required for the two pulses to separate in space by the input pulse timewidth Tin.

where D is the total dispersion coefficient in wavelength units (cf. (4.4.14)). The differential time delay per unit distance between subcarriers is the product β2 ´ω = D´λ. The distance L at which ´τ is equal to the root-mean-squared timewidth Tin of the input pulse is defined as the walk-off length Lwo . It is given by L wo

in in = β T´ω = DT´λ . 2

(5.3.25)

The value of ´τ in (5.3.24) at the walk-off length is defined as the walk-off time. The relationship between the walk-off length and the walk-off time is shown schematically in Figure 5.1. The walk-off length can be regarded as the same for adjacent subcarriers.

5.3.5

Classification of Nonlinear Channels

The four characteristic length parameters described in Section 5.3.4 define several classes of nonlinear lightwave channels.

Linear Channel with Dispersion

When the nonlinear length L NL of the fiber is much larger than the effective length L eff , then, using (5.3.20), φNL is much smaller than one and the nonlinear phase shift is insignificant. These conditions signify a linear lightwave channel wherein dispersion is

216

5 The Nonlinear Lightwave Channel

the dominant consideration. A systems-level model of a linear lightwave channel with dispersion is presented in Section 8.1.1.

Nonlinear Channel

When the nonlinear length L NL of the fiber is smaller than the effective length L eff , then, from (5.3.20), φNL is larger than one radian and there is a significant nonlinear phase shift. If, in addition, the linear dispersion length L D is much larger than the effective length L eff , then the linear dispersion is negligible over the length L eff for which a significant nonlinear phase shift can occur. These conditions signify a nonlinear, nondispersive lightwave channel. The propagation of a single pulse in a nonlinear, nondispersive lightwave channel is discussed in Section 5.4.2.

Nonlinear Channel with Dispersion

When the nonlinear length L NL of the fiber is smaller than the effective length L eff and the dispersion length is also smaller than the effective length L eff , then, over the effective length of the fiber, the signal experiences both dispersion and a nonlinear phase shift. In this case, the energy is redistributed in frequency by the intensity-dependent nonlinear phase shift and is redistributed in time by the wavelength-dependent group delay caused by linear dispersion. When both impairments are significant, they interact. This interaction causes the overall energy redistribution to be significantly different than the energy redistribution for either impairment considered separately. An analytic approximation that describes the evolution of the pulse width for a weakly dispersive nonlinear lightwave channel is presented in Section 5.4.2.

Nonlinear Channel with Interference

In a channel with multiple subcarriers whose walk-off length L wo is much smaller than the length of a fiber span,5 the nonlinear interference that a single symbol in one subchannel experiences is averaged over many symbols in other subchannels. Invoking the central limit theorem leads to the assertion of a signal-dependent gaussian probability distribution for the nonlinear interference. Interchannel interference is discussed in Chapter 11.

5.4

Single-Carrier Nonlinear Schrödinger Equation The governing equation describing narrowband lightwave signal propagation in a nonlinear lightwave channel with a cubic phase nonlinearity is an equation known as a nonlinear Schrödinger partial differential equation. This section uses this partial differential equation to quantify the nonlinear distortion of a single pulse. First a scalar form of the nonlinear Schrödinger equation is derived. Then the vector form of the nonlinear

= 50 ps in two different subchannels separated by ´ f = 50 GHz (´λ = 0.4 nm at 1550 nm) that propagate in a fiber with D = 17 ps/(nm · km ) have a walk-off length of approximately 7 km (cf. (5.2.25)).

5 Two pulses with the same root-mean-squared timewidth of T in

5.4 Single-Carrier Nonlinear Schrödinger Equation

217

Schrödinger equation is described. This form is used to analyze nonlinear polarization effects for a lightwave with a single carrier. The next section goes on to describe a system of simultaneous coupled scalar nonlinear Schrödinger partial differential equations for a channel with multiple subcarriers. The solution to this system of equations quantifies the crosstalk between subcarriers due to the Kerr nonlinearity. A system consisting of a set of coupled vector Schrödinger equations that describes nonlinear cross-talk between the polarization components is not presented herein. 5.4.1

Nonlinear Narrowband Signal Propagation

The propagation of a narrowband complex signal envelope a(z , t ) in a linear dispersive medium was studied in Section 4.3.3 with the Fourier transform A(z , ω) of the complex envelope a(z , t ) satisfying (4.3.20). Now, in this section, (4.3.20) is amended to include the cubic Kerr nonlinearity term. This new term appears as an additional spatially varying phase shift e−i ´β z , where ´β is the change in the propagation constant β caused by the Kerr nonlinearity. Including this change, the modified differential equation for the Fourier transform A (z, ω) written in a concise form (cf. (4.3.18)) now reads d A(z , ω) = A (z, ω)( D − i ´β), dz

(5.4.1)

where the operator D is −κ/2 − i(ω/vg ) − iβ2ω 2/2 (cf. (4.3.17)). An inverse Fourier transform applied to (5.4.1) replaces A (z , ω) with a(z , t ). To this end, use the differentiation property given in (2.1.12) and (4.3.17) to obtain the governing partial differential equation for the complex envelope a(z , t ) as

∂ a (z, t ) + 1 ∂ a(z , t ) + κ a(z, t ) − i β ∂ 2 a(z , t ) = −i ´β( z, t ) a(z, t ), 2 ∂z vg ∂ t 2 2 ∂t 2 where the attenuation κ and group-velocity dispersion β2 are specified at the carrier frequency, and the term ´β is treated as a constant with respect to the inverse Fourier transform. The nonlinear change ´β( z, t ) in the propagation constant is approximated by (5.3.9), which, when substituted for the right side of the preceding expression, gives

∂ a(z , t ) + 1 ∂ a (z, t ) + κ a(z , t ) − i β ∂ 2a (z, t ) = −iγ|a(z , t )|2a (z, t ). 2 ∂z vg ∂ t 2 2 ∂t 2

(5.4.2)

A partial differential equation of this form is known as a nonlinear Schrödinger equation. The nonlinear Schrödinger equation is the starting point for the analysis of nonlinear narrowband signal propagation in an optical fiber. Given a complex envelope a(0, t ) at the input, a solution to this partial differential equation describes the propagation of the narrowband complex envelope a(z , t ) on any fiber with the group-velocity dispersion β2. All of the boundary conditions imposed by the geometry of the fiber insofar as they affect the form of a(z , t ) have been reduced in Chapter 4 to the single constant β2 appearing in this partial differential equation.

218

5 The Nonlinear Lightwave Channel

An accurate solution to (5.4.2) requires numerical methods. However, perturbation methods do provide accurate conclusions for weak nonlinearities as well as considerable insight. The nonlinear Schrödinger equation is simpler when written in the traveling time. frame τ = t − z /vg of the traveling signal envelope (cf. (4.3.21)). Then, with two terms moved to the right side, this becomes

∂ a(z , τ) = − κ a(z , τ) + i β ∂ 2a(z , τ) − iγ|a(z , τ)|2a (z, τ). 2 ∂z 2 2 ∂τ 2

(5.4.3)

This differential equation may be written compactly in the form

∂ a(z , τ) = (D+N) a (z, τ), ∂z where the nonlinear term N in the traveling timeframe τ is defined as . N = −iγ|a (z, τ)|2 ,

(5.4.4)

(5.4.5)

and the linear dispersive term D is defined as D

=. −κ/2 + i(β2 /2)∂ 2 /∂τ 2.

(5.4.6)

The latter term D is the inverse Fourier transform of the linear term D given in (4.3.17) in the traveling timeframe.

Vector Nonlinear Schrödinger Equation

The vector form of the single-carrier nonlinear Schrödinger equation is used to analyze the effect of the nonlinearity on the polarization. Such an effect is the power-dependent and polarization-dependent cross-phase modulation, which is called cross-polarization modulation. The form of this equation depends on the strength of the nonlinearity as measured by the nonlinear length L NL (cf. (5.3.18)). It also depends on the polarization decorrelation length L pol (cf. (4.6.15)), which is the propagation length in the fiber over which the birefringence can be treated as constant. When the polarization decorrelation length L pol is much less than the nonlinear length L NL , the linear birefringence is averaged over many orientations of the polarization state. For a lossless fiber with κ equal to zero and with no polarization-mode dispersion, the appropriate form of the vector nonlinear Schrödinger equation (cf. (5.4.3)) that describes the propagation of both polarization components is

∂ a(z , τ) = i β ∂ 2a(z , τ) − i 8 γ|a(z , τ)|2a(z , τ), 2 ∂z 2 ∂τ 2 9

(5.4.7)

where a(z , t ) is the vector envelope, which has two components. Each component of a(z, t ) is the complex envelope for one polarization component. The factor of 8/9 results from the averaging over the orientation of the birefringence. This partial differential equation is called the Manakov equation.6 If polarization-mode dispersion is significant, there is an additional polarization-dependent dispersive term. Including this term results in an equation called the Manakov–PMD equation. 6 See Menyuk and Marks (2006).

5.4 Single-Carrier Nonlinear Schrödinger Equation

5.4.2

219

Nonlinear Distortion for a Single Pulse

A description of the propagation of a single pulse in a weakly dispersive nonlinear fiber is given by the solution to (5.4.3), but such a solution is not available as an analytic formulas. Numerical methods are required. However, insight into the solution follows from setting any two of the terms on the right side of (5.4.3) to zero. This gives three simplified problems. The case of a lossy, linear nondispersive fiber with γ = β2 = 0 is trivial: the envelope is given by a(z , τ) = a (0, τ)e(−κ+iβ0 )z . The pulse retains its shape, acquires a linear phase, and is attenuated (cf. (4.3.1)). The case of a linear dispersive fiber with γ = κ = 0 was discussed in Chapter 4. This effect is described by a quadratic phase term in the frequency domain scaled by β2. The case of a lossless and nondispersive nonlinear fiber κ = β2 = 0 is treated next.

Distortion for a Nonlinear Channel

For a single isolated pulse, nonlinear distortion is generated by self-phase modulation (cf. Section 5.2.1), which is an intensity-dependent index change caused by the signal itself. For a lossless nondispersive fiber, D = β2 = 0, and (5.4.4) reduces to

∂ a(z , τ) = Na (z, τ) ∂z = −iγ|a (z, τ)|2 a(z , τ). (5.4.8) Moreover, for a weak nonlinearity, γ|a(z , τ)|2 is much smaller than a(z , τ). Considering the complex envelope a (z, τ) as a point in the complex plane, as z changes, the nonlinear term on the right of (5.4.8) adds the small contribution ´ a(z, τ) = Na(z , τ) that is in phase quadrature with the complex envelope a(z , τ). Adding this term to a(z , τ) gives the complex envelope a(z +´ z, τ) at z +´z. This change, shown schematically in Figure 5.2, changes the phase but does not change the magnitude |a(z , τ)|. Subsequent incremental contributions continue to change the phase of the complex envelope a(z , τ) without change to the magnitude. Therefore a(z , τ) ≈ a(0, τ)e−iφNL (τ,z ) ,

.

(5.4.9)

where a (0, τ) = s (τ) = |s(τ)|e iφ0 is the complex amplitude of the lightwave signal at z = 0, and φNL (z , τ) is the nonlinear phase shift caused by the intensity-dependent change in the index. i γ|a(z,τ)| 2a(z,τ )Δz ]) t,z(a[mI

a(z,τ)

φ

a(z + Δz,τ) Re[a(z,t)]

Figure 5.2 The intensity-dependent phase change adds a small contribution

phase quadrature with the complex envelope a (z , t ).

´a (z , τ) that is in

220

5 The Nonlinear Lightwave Channel

For a lossless medium |a(z , τ)| 2 = |s (τ)| 2. Substituting this expression into (5.4.8) gives a first-order differential equation with solution a(z , τ) = a(0, τ)e−i γ| s (τ)|

2z

.

(5.4.10)

Comparing (5.4.10) and (5.4.9) shows that the intensity-dependent nonlinear phase shift is

φ (z , τ) = γ|s (τ)|2 z = γ|s (t − z /vg )|2 z, NL

(5.4.11)

with the first equation expressed in the traveling timeframe and the second equation expressed in the fixed timeframe. Expression (5.4.10) states that when both the linear dispersion and the attenuation are negligible, the effect of the nonlinear distortion is a time-dependent phase shift φNL (τ, z ) that has a nonlinear quadratic dependence in amplitude given by (5.4.11). The nature of this nonlinear quadratic dependence in (5.4.10) is illustrated in Fig2 ure 5.3 using s (t )e−iγ s (t ) z for an input gaussian pulse s (t ) shown in Figure 5.3(a) with a gaussian spectrum shown in Figure 5.3(b). The magnitude of the Fourier transform 1.0

edutingaM evitaleR

edutilpmA evitaleR

0.8 0.6 0.4 0.2 0.0

0.5

–200

(c)

–100

0 Time (ps)

100

200

z = 5 km

z=0

(b)

0.8 0.6 0.4 0.2 0.0

0.4

–0.06 –0.04 –0.02 0 0.02 Frequency (THz) (d)

0.04

0.06

z = 10 km

edutingaM evitaleR

edutingaM evitaleR

0.4

0.3

0.3

0.2

0.2

0.1

0.1 0.0

1.0

z=0

(a)

–0.06 –0.04 –0.02 0 0.02 0.04 Frequency (THz)

0.06

0.0

–0.06 –0.04 –0.02 0 0.02 Frequency (THz)

0.04 0.06

γ Pin = 1 radian/kilometer. (a) A 50 ps root-mean-squared width gaussian pulse s (t ) in time at z = 0. (b) The corresponding magnitude of the Fourier transform | S( f )|. (c) The magnitude of the spectrum |S ( f )| at z = 5 km. (d) The magnitude of the spectrum | S( f )| at z = 10 km.

Figure 5.3 The effect of a nonlinear phase shift for

5.4 Single-Carrier Nonlinear Schrödinger Equation

221

of s (t )e−iγ s (t ) z is shown in Figures 5.3(c) and (d) for two fiber lengths, showing the nonlinear redistribution in frequency of the pulse energy. As the pulse propagates, the pulse magnitude |s (t )| is unchanged, but the phase modulation broadens the spectrum considerably as a function of the propagation distance as is shown in Figures 5.3(c) and (d). The leading and trailing edges of the pulse shift the instantaneous frequency ω(t ) in time in accordance with (5.3.16). In a lossless fiber, these shifts due to the nonlinear phase redistribute the signal energy to new frequency components. This nonlinear redistribution of the signal energy produces interference between the shifted frequency components, which can be seen in panels (c) and (d). When attenuation is present, the nonlinear change in phase φNL as given by (5.3.20) accumulates only over a limited distance because, as the amplitude attenuates, the pulse leaves the nonlinear regime. This limited distance is approximated by the effective fiber length L eff defined in (5.3.19). When linear dispersion is present, the accumulated phase spreading can be converted into amplitude distortion. 2

Distortion for a Nonlinear Channel with Dispersion

When the linear dispersion is significant, a full and accurate solution of (5.4.4) for the pulse envelope a(z , t ) in a nonlinear, dispersive medium typically requires numerical methods. Nevertheless, the propagation can be understood heuristically as the incremental combination of the three terms on the right of (5.4.3). For a fiber increment of length ´z, regard the three effects to be applied in sequence within that increment. Then move to the next fiber increment. There are L /´ z increments in a fiber of length L . Letting ´ z go to zero completes the intuitive understanding. Thus let z ³ = ³ ´z. Then, with the initial waveform a(0, τ), the iteration is a(z³+1, τ) = T N L (´z )TL (´z )T A(´z )a(z³ , τ),

(5.4.12)

terminating at ³ = L /´z, where T N L (´z ), TL (´z ), and T A(´z ) are the three transformations corresponding to the three effects over a distance ´z. The transformation T N L (´z ) is a phase shift in the time domain with a quadratic dependence on the pulse amplitude. The transformation T L (´z ) is a quadratic phase shift in the frequency domain, which depends on the fiber geometry through the propagation constant β . The transformation T A(´z ) is an exponential attenuation in ´ z. Finally, a( L , τ) = lim a (z L /´z , τ)

´z→0

(5.4.13)

is the output complex envelope at distance L . At each step of the iteration, the discrete model given by (5.4.13) takes the output of the previous step, attenuates it, disperses it, and then applies an intensity-dependent phase shift. The resulting signal is then passed to the next iteration step. This iterative procedure is the basis of a numerical method to solve the nonlinear Schrödinger equation and is discussed in Section 5.6. As the signal propagates, the overall signal is attenuated and the nonlinear phase shift at each step becomes smaller. Therefore, the nonlinear phase shift occurs primarily at the beginning of the fiber or just after each amplification stage.

222

5 The Nonlinear Lightwave Channel

A nonlinear phase can also transfer energy from the lightwave signal to the additive noise within the lightwave amplifier. This effect is called nonlinear phase noise and is discussed in Section 7.7.4.

Pulse Propagation in a Nonlinear Weakly Dispersive Channel

The propagation of a single pulse in a single spatial mode of a lossless, nonlinear, weakly dispersive channel, as described by the nonlinear Schrödinger equation, can be reduced 2 (z ) as the pulse to the study of the change in the mean-squared pulse timewidth Trms evolves under the combined effects of weak dispersion and self-phase modulation. · ∞linear | In a lossless channel, the energy E = −∞ a(z , τ)| 2 dτ does not depend on z. The mean-squared timewidth is (cf. (2.1.30)) 2 Trms (z ) = τ 2 (z) − τ 2 (t ),

where the mth temporal moment of the complex envelope is

τ =. m

¸

1 ∞ m τ |a(z , τ)|2 dτ. E −∞

With κ set to zero, multiply both sides of (5.4.3) by τ m a ∗(z , τ) and integrate over τ . The resulting expression can be reduced to

¹

¸

º

β2 ∞ τ m−1 Im a∗ ∂ a dτ. d m τ ( z) = m dz E −∞ ∂τ

(5.4.14)

The derivation of this evolution equation is provided as an end-of-chapter exercise. For a weakly dispersive channel, the complex envelope a (z, τ) derived for the nonlinear, nondispersive channel given in (5.4.9) approximates the envelope for the nonlinear, weakly dispersive channel. This approximation breaks down when weak linear dispersion is considered in isolation without also considering an intensity-dependent phase shift. For an initial envelope given by a(0, τ) = s (τ), substitute (5.4.9) into (5.4.14), and integrate by parts to give

¸

∞ d m m (m − 1) γβ τ m−2 |s (τ)|4 dτ, τ ( z) = 2z dz 2E −∞

where Im [a∗ ∂ a/∂τ] = −2γ|s (τ)|3 d| s(τ)|/ dτ = −(γ/2)d| s (τ)| 4/dτ has been used. For the first moment, m = 1, d τ(z )/dz = 0 and the mean value τ( z ), expressed in the traveling timeframe, does not change as the pulse propagates. For the second moment, m = 2, and the evolution equation for the mean-squared value reduces to d 2 τ (z ) = dz

γβ2 z ¸ ∞ |s(τ)|4 dτ E

−∞

= γβ2 Peffz ,

where

· ∞ |s (τ)|4 dτ −∞ Peff = · ∞ . |s (τ)|2 dτ −∞

(5.4.15)

(5.4.16)

5.5 Interference in a Nonlinear Fiber

223

The term Peff is an effective power used to determine the strength of the nonlinearity as influenced by the shape of the initial pulse magnitude |s (τ)| . 2 Integration of (5.4.15) gives the perturbation ´ Trms (L ) in the mean-squared timewidth at z = L . This is

¸L 2 ´ Trms (L ) = τ 2 (L ) = γβ2 Peff z dz 0 = 12 (β2 L )(γ Peff L ).

(5.4.17)

The term β2 L represents the increase in the mean-squared timewidth that would occur due to linear dispersion alone (cf. (5.3.23)). The multiplier γ Peff L expresses the powerdependent increase in the dispersion caused by the Kerr effect (cf. (5.3.15)). The change 2 ´ Trms (L ) depends on both self-phase modulation and linear dispersion. When the Kerr effect is negligible, this method of analysis is not valid and cannot be used to study the shape of the output pulse. Using (5.4.17) with z = L, the total mean-squared timewidth of the pulse is 2 Trms ( L) = Tin2

2 + ´Trms (L) = Tin2 + 21 (β 2 L )(γ Peff L)

(5.4.18)

at the fiber output. As an example, consider a gaussian pulse of the form 2 2 s (t ) = s0e−t /2Tin

at the input. The effective power for this pulse is readily evaluated as Peff = √1 s02 2

= √12 P,

(5.4.19)

where P = s02 is the instantaneous lightwave signal power at τ = 0. Combining this expression with (5.4.18), the mean-squared timewidth of the output pulse for a gaussian pulse at the input is 2 Trms (L ) = Tin2 + √1

2 2

γβ2 P L 2.

(5.4.20)

This expression provides a simple estimate of the pulse spreading caused by the combined effect of linear dispersion β2 and weak, nonlinear self-phase modulation.

5.5

Interference in a Nonlinear Fiber Nonlinear interchannel interference occurs whenever information-bearing pulses modulated onto separate subcarriers overlap in time and space as characterized by the walk-off length (cf. (5.3.25)). The presence of a pulse in one subchannel can modulate the phase of an overlapping pulse in another subchannel through the process of cross-phase modulation, thereby producing nonlinear interchannel interference. Moreover, when three or more pulses at distinct frequencies are present, new frequency components can be generated through the process of four-wave mixing, as is discussed later in the section.

224

5 The Nonlinear Lightwave Channel

5.5.1

Cross-Phase Modulation

The relative strength of cross-phase modulation compared to self-phase modulation can be determined by considering two field components at frequencies ω1 and ω2,

E (t ) = A 1 cos(ω1t ) + A 2 cos(ω2 t ),

(5.5.1)

where A is the electric field amplitude. Substituting this expression into (5.3.1) and expanding the cubic term using standard trigonometric identities, the time-harmonic form of the phase-matched nonlinear polarization term PNL (ω1) at ω1 caused by phase modulation is PNL (ω1) = 3ε0χ (3) A (ω1)

µ

¶ | A1 |2 + 2 | A2|2 .

(5.5.2)

The first term within the right-most parenthesis of (5.5.2) is the self-phase modulation. The second term is the cross-phase modulation. For | A 1| = | A 2| , the cross-phase modulation term is twice as large as the self-phase modulation term. When m subcarriers are present and each interacts with the same strength, then the crossphase modulation due to the m subcarriers is 2m times stronger than the self-phas modulation.

5.5.2

Nonlinear Schrödinger Equation for Multiple Subcarriers

For a more complete analysis of cross-phase modulation one posits an extension of the nonlinear Schrödinger equation that is suitable for a channel with multiple subcarriers equally spaced by ´ω in frequency about a central subcarrier frequency ω0. Using the expansion of the dispersion relation β(ω) for a narrowband signal given in (4.3.1), the complete envelope a (z, t ) of a channel with 2M + 1 subcarriers indexed with respect to the central subcarrier is written as a(z , t ) =

M » m =− M

µ¼

( − z /v ) − (m ´ω) 2β z/2½¶ , g 2

am (z, t ) exp i m ´ω t

(5.5.3)

where the group velocity vg and the group-velocity dispersion coefficient β2 are evaluated at the central subcarrier frequency ω0 for which m = 0. This expression states that relative to the central subcarrier, other subcarriers experience different linear phase shifts and quadratic phase shifts. The linear phase term produces a subcarrier group delay τm = (m ´ω)z /vg that scales linearly with respect to the difference m ´ω between the subcarrier frequencies. The quadratic-phase term produces a wavelength-dependent group delay that scales quadratically with respect to the difference m ´ω between the subcarrier frequencies. Using the traveling timeframe of the central subcarrier, substitute (5.5.3) into the nonlinear Schrödinger equation given in (5.4.3). Using (5.4.4), the governing equation for the complex envelope am (z , τ) for the mth subcarrier at frequency ωm = ω0 + m ´ω can be written as

5.5 Interference in a Nonlinear Fiber

∂ am (z , τ) = (D + N)a (z, τ). m m ∂z

225

(5.5.4)

The linear operator Dm defined in (5.4.6) depends on m because β2 depends on the subcarrier frequency ωm . The nonlinear term N = −iγ |a(z , τ)| 2 defined in (5.4.5) does not ∑ depend on m. Rather, it depends on the total complex envelope a(z , τ) = m am (z , τ) of all subcarriers defined in (5.5.3). For a single subcarrier, (5.5.4) reduces to (5.4.4). For multiple subcarriers, the equations are coupled through N, which is dependent on all subcarriers. The nonlinear coefficient γ in the term N has been regarded as a constant, evaluated for the central subcarrier. This is because for an envelope a (z, τ) composed of multiple subcarriers whose bandwidth is small compared with the central carrier frequency, the variation in the mode profile across the subcarriers is small. For this conventional case, the effective area for each subcarrier is approximately equal to the effective area A eff of the central subcarrier at ω0 . For M = 1, the channel has three subcarriers at frequencies ω−1 , ω0, and ω1. The nonlinear polarization source term PNL (t ) given in (5.3.1) is proportional to the product of the signal in each of the three subcarriers. It can be written as

PNL (t ) ∝ |a(z , τ)|2a (z, τ) = a ∗(z , τ)a(z , τ)2

= (a∗−1 + a∗0 + a1∗ )(a−1 + a0 + a1 )2,

(5.5.5)

with the exponential part of each term in (5.5.3) omitted for brevity. Expand the right side of (5.5.5) and apply the conservation of energy by equating the resulting frequencies. This gives four frequency-matched nonlinear terms that affect the complex amplitude a0 of the central subcarrier, a| a|2



2 2 ∗ −iβ 2 ±|a0²³| a´0 + ±(2|a1 | a0 +²³2|a−1 | a0´) + ±2a1a−1a0²³e

2

self-phase modulation

cross-phase modulation

´ω2 z ,

four-wave mixing

´

(5.5.6)

where only the third term on the right has a phase mismatch between the subcarriers 2 given by e−i β2 ´ω z . The first term on the right causes self-phase modulation. With any number of subcarriers, this term exists for each carrier. The parenthesized term in the middle causes cross-phase modulation. This term is nonzero whenever more than one subcarrier is present. The final term is called four-wave mixing.7 This term always involves three distinct subcarriers, with the phase term being the phase mismatch between these subcarriers. Each subset of three subcarriers produces a four-wave-mixing term, with each term having a different phase mismatch. Only terms with a small phase mismatch produce a significant effect. 7 The reason for the name “four-wave mixing” will be evident in the following subsection.

226

5 The Nonlinear Lightwave Channel

5.5.3

Nonlinear Interference

The direct way to avoid nonlinear interference is to simply avoid the nonlinearity by reducing the total power in the fiber. However, reducing the power reduces the distance before the signal needs to be regenerated or amplified. This section discusses uncompensated nonlinear interference generated by phase modulation and four-wave mixing.

Uncompensated Interference from Nonlinear Phase Modulation

A useful approximate expression can be derived for the nonlinear interference generated by cross-phase modulation using two subcarriers in the same polarization mode. The first subcarrier is at frequency ω0 . The second subcarrier is at frequency ω1 = ω0 +´ω . Using (5.3.9) and (5.5.6) with a−1 = 0, the nonlinear change in the propagation constant ´β for the first subcarrier caused by the combination of self-phase modulation and cross-phase modulation is (cf. (5.3.9))

µ ¶ ´β( z, t ) = γ|a(z , t )|2 = γ |a0(z , t )|2 + 2 |a1 (z, t )|2 .

As before, the accumulated nonlinear phase shift over the length L of the fiber, denoted φNL (L , t ), is (cf. (5.3.15)) NL

¸

L

´β(z , t )dz ¸ Lµ ¶ =γ |a0 (z, t )|2 + 2 |a1(z , t )|2 dz .

φ (L, t ) =

0

0

(5.5.7)

The first term |a0(z , t )|2 on the right describes self-phase modulation. The second term 2|a1 (z , t )|2 describes cross-phase modulation generated by a pulse on the second subcarrier overlapping a pulse on the first subcarrier. A simple model of the interaction of the two pulses can be developed whenever the walk-off length L wo is much smaller than the dispersion length L D . For this case, the total length L of a fiber segment can be divided into two regions. The first region, z ≤ L wo, is a nonlinear, low-dispersion regime in which the two pulses overlap and interfere through cross-phase modulation. In the second region, L wo ≤ z ≤ L, those two pulses do not overlap, and there is no longer mutual cross-phase modulation between those two pulses. In this second region, the nonlinear phase generated by cross-phase modulation is transformed into amplitude distortion by the linear dispersion. This is significant when the fiber segment is much longer than the dispersion length L D. If the group velocity v1 for the second subcarrier is greater than the group velocity v0 for the first subcarrier, then, after a distance z, the pulse in the first subcarrier is advanced in time by τ1 = D ´λ z (cf. (5.3.24)), where the time τ1 is measured in the traveling timeframe of the traveling pulse in the first subcarrier and ´λ is the subcarrier spacing in wavelength units. Including an attenuation factor e−κ z , the nonlinear phase

5.5 Interference in a Nonlinear Fiber

227

shift φNL (L wo , τ) experienced by the pulse in the first subcarrier at z = L wo is a modified form of (5.5.7) given by

¸L µ ¶ |a0 (0, τ)|2 + 2 |a1 (0, τ − τ1 )|2 e−κ z dz , φ ( Lwo , τ) = γ (5.5.8) 0 where τ1 = D ´λ z. The first term of the integrand describes the phase shift due to self-phase modulation. It is µ ¶ φSPM ( Lwo , τ) = (γ/κ) 1 − e−κ L |a0 (0, τ)|2 . When κ L wo is much smaller than one, there is little attenuation over the walk-off length L wo. Then 1 − e−κ L ≈ κ L wo and the nonlinear phase shift φSPM( L wo, τ) generated from self-phase modulation reduces to the lossless case given in (5.4.11) with z = L wo and |a0(0, τ)| 2 = |s (τ)|2. wo

NL

wo

wo

The second term of the integrand describes the phase shift due to cross-phase modulation. It is

φXPM( L wo, τ) = 2γ

¸L 0

wo

|a1(0, τ − D ´λ z)|2e−κ z dz .

(5.5.9)

This term is the interchannel interference in the first subcarrier generated by the pulse in the second subcarrier over the walk-off length L wo. Neglecting the linear dispersion for z < L wo , the complex envelope for the pulse in the first subcarrier at z = L wo is a0 (L wo , τ) = a0 (0, τ)eiφXPM( L wo,τ) e−κ L wo .

(5.5.10)

This expression states that after propagating a distance equal to the walk-off length L wo , the complex envelope a0(0, τ) is attenuated by a factor of e−κ L wo and has accumulated a phase φXPM from the nonlinearity given by (5.5.9). In the region L wo ≤ z ≤ L , the pulses do not overlap. When the self-phase modulation is negligible for each isolated pulse within this region, then N = 0 in (5.4.4). Taking the Fourier transform and integrating the resulting linear differential equation in z from L wo to L gives (cf. (4.3.16)) A (L , ω) = A( L wo, ω) eD(ω)( L − L wo ) ,

(5.5.11)

where A(z , ω) is the Fourier transform of a (z, τ) and D(ω) is defined in (4.3.17). The effect of the nonlinear interchannel interference between an unmodulated carrier and a modulated carrier caused by cross-phase modulation in a dispersive fiber channel is simple to interpret. The instantaneous frequency change ω(z , t ) generated in the unmodulated carrier from cross-phase modulation can be determined from (5.3.16) and is proportional to d P1 (t )/dt , where P1(t ) is the signal power in the interfering datastream. Linear dispersion converts these frequency fluctuations caused by the nonlinearity into amplitude fluctuations in the unmodulated carrier at the output of a fiber segment. These amplitude fluctuations in the initially unmodulated carrier are proportional to dP1(t )/dt . This behavior can be seen in Figure 5.4 for a set of experimental conditions that produce cross-phase modulation in the unmodulated carrier. (Note that the amplitude of the cross-phase modulation is about three orders of magnitude smaller than the amplitude of the datastream.)

228

5 The Nonlinear Lightwave Channel

(a)

)stinu .bra( ytisnetnI

1.2 1 0.8 0.6 0.4 0.2 0

0

2000

4000

6000

8000

10000

12000

Time (ps)

(b) 0.002 0.001 0 –0.001 –0.002

Figure 5.4 (a) Data waveform. (b) The resulting measured and rescaled cross-phase modulation

interference in a second unmodulated signal at a different wavelength for a single 130 km segment of fiber. Adapted from Figure 6 of Hui, Demarest, and Allen (1999).

Uncompensated Nonlinear Interference from Four-Wave Mixing

Four-wave mixing is best described using a photon-optics model. Through the process of four-wave mixing, when three or more distinct subcarriers are present, a subset of three frequencies {ω j , ω k , ω ³ } can interact to generate a new frequency component at ωi . This frequency must conserve energy and must satisfy the phase-matching condition as stated in (5.3.4b). Nonlinear interchannel interference generated by four-wave mixing requires at least three subcarriers. For this case, the strength of the interference depends on the term ψ =. exp(−iβ2 ´ω2 z ) given in (5.5.6), where ´ω is the subcarrier spacing in angular frequency. This term is the spatial phase mismatch between adjacent subcarriers spaced by ´ω . Using (5.5.6), this phase-mismatch term can be written as

ψ =. e−iβ

2

´ω2 z

= e −i2π z/±,

(5.5.12)

where ± = 2π/(β 2 ´ω2 ) is the spatial period of the phase-mismatch term. When more than three subcarriers are present, additional phase-mismatch terms are produced. The subcarrier frequencies ωm = ω0 + m ´ω correspond to spatial periods that decrease linearly with β2 and decrease quadratically with the subcarrier frequency offset (m ´ω)2 . The strength of four-wave mixing depends on the ratio of the spatial period ± generated by adjacent subcarriers to the spatial width of a symbol pulse. The spatial width of a pulse is v g Trms , where v g is the group velocity of the central subcarrier and Trms is the root-mean-squared timewidth of the pulse. For the case in which the spatial period ± of the phase-mismatch term is much smaller than the spatial width v g Trms of the pulse, many cycles of the phase-mismatch term are generated by adjacent subcarriers over the spatial duration of the pulse. The strength of four-wave mixing can be reduced by decreasing the spatial period ± of the phase mismatch term compared with the spatial width vg Trms of the symbol pulse. The decrease in the spatial period increases the number of phase oscillations over the width of the pulse and, by averaging, reduces the effect of the four-wave mixing. Additional subcarriers may generate additional phase-mismatch terms from nonadjacent subcarriers spaced by m ´ω. Each of these terms has a spatial period shorter than ±,

5.6 Computational Methods

229

with more cycles over the spatial duration of a pulse, which leads to weaker four-wave mixing than that generated by the adjacent subcarrier term given in (5.5.12). Examining (5.5.12), the spatial period of the phase-mismatch term may be reduced by increasing the group-velocity dispersion coefficient β2 or by increasing the frequency spacing ´ω between adjacent subcarriers. For this reason, a wavelength-multiplexed system may use a dispersion-controlled fiber with a value of β2 designed to minimize nonlinear interchannel interference. Four-wave mixing also occurs between pulses within a single datastream modulated onto a single subcarrier. This effect is called intrachannel four-wave mixing, and produces nonlinear intersymbol interference. Two conditions must be satisfied for strong intrachannel four-wave mixing. The first is that the effective length L eff (cf. (5.3.19)) is much greater than the nonlinear length L NL (cf. (5.3.18)), so that four-wave mixing can occur before the power in the datastream is attenuated. The second condition is that the effective length L eff is much larger than the dispersion length L D (cf. (5.3.23)). This condition leads to sufficient linear dispersion to enable separate pulses within a datastream of a single subcarrier to overlap in time and thereby interfere before the pulse is significantly attenuated.

5.6

Computational Methods Pulses dispersed on a linear channel combine by linear superposition. These pulses can be separated at the receiver by sampling the output of an appropriate detection filter to produce a Nyquist pulse as discussed in Chapters 9 and 10. Pulses on a nonlinear channel interact in a more complicated way and are not easily separated at the receiver. A complete analysis of the nonlinear interactions that occur within a fiber requires a statistical model of the entire channel with multiple subcarrier datastreams. An analysis based on one or two pulses is not sufficient. Moreover, the pulse streams on all subcarriers need to be incorporated into a combined analysis. The standard method of numerical analysis for a channel with multiple subcarriers in a single spatial mode solves the nonlinear Schrödinger equation by decoupling the linear propagation from the Kerr nonlinearity. This method of separating the linear and the nonlinear propagation characteristics, as suggested by (5.4.12), is called the split-step Fourier method. The multichannel nonlinear Schrödinger equation given in (5.5.4) is repeated here:

∂ am(z , τ) = (D + N)a (z , τ). m m ∂z

(5.6.1)

Am (z , ω) = A m (0, ω)eDm (ω) z ,

(5.6.2)

Setting N = 0, the resulting linear equation can be Fourier transformed into the frequency domain. The spectrum A m (z , ω) of the complex envelope am (z , ω) for subcarrier m satisfies (cf. (4.3.16)) where Dm (ω) is the linear propagation term in the frequency domain defined in (4.3.17). Then Am (z , ω) is updated using (4.3.16) to write

230

5 The Nonlinear Lightwave Channel

Am (z + ´ z , ω) = A m (z , ω)e Dm(ω)´z .

(5.6.3)

= 0 in (5.6.1) gives (cf. (5.4.10)) a(z + ´ z , τ) = a (z, τ)e−iγ |a (z ,τ)| ´ z . (5.6.4) ∑ 2 The term | a(z, τ)|2 = m |a m (z, τ)| depends on the m complex envelopes on the m subcarriers and thus the complex amplitudes am (z , τ) of the m terms are coupled

Next, setting D m

2

through this term. The computation of am (z , τ) consists of alternately computing (5.6.3) and (5.6.4) over a small spatial step ´z. Because the step size needs to capture the interaction between the linear and nonlinear effects, ´z should be much smaller than the dispersion length L D , which describes the length over which linear dispersion is important, and ´z should be much smaller than the nonlinear length L NL , which describes the length over which the nonlinear phase shift φNL is important. An initial input am (0, τ) is given for each subcarrier m. Then the total nonlinear phase shift φNL (´z ) = γ|a (0, τ)| 2 ´z determined from all subcarriers is applied to each complex envelope am (0, τ) to produce a modified envelope am± (0, τ). Next, the spectrum of the modified envelope A±m (0, ω) is determined by taking the Fourier transform of a±m (0, τ). That spectrum is used in (5.6.3) to determine the effect of the linear dispersion and attenuation in the frequency domain over a distance of ´ z, which is given by A m (´z, ω) = A±m (0, ω)eDm (ω)´z .

(5.6.5)

The spectrum A m (´z, ω) is then Fourier-transformed back into the time domain to produce the complex envelope am (´z , t ) for each subcarrier m after a step of length ´z. This process is repeated throughout the length L of the fiber segment, requiring L /´ z iterations. The many variations of this procedure include splitting the linear term to produce a symmetric algorithm, adjusting the step size to capture pulse interactions between subcarriers, and using a nonuniform step size to prevent numerical resonance artifacts. This procedure also forms the basis of the nonlinear compensation technique discussed in Section 11.6.2. In principle, the split-step Fourier method can be used for either forward projection or backward projection. By using the method in the forward direction, the output of the fiber in response to the multiple subcarriers expressed by inputs am (0, τ) can be computed. Amplification and noise can be inserted between fiber segments as the computation proceeds. Attenuation within the fiber segment can be included. By using the method in the backward direction, the complex envelopes am ( L , τ) at a distance L can be back-projected to determine an estimate of the transmitted complex envelopes am (0, τ) at the input of a fiber segment. Noise, however, is not removed during back-propagation of the received signal, so the estimate of am (0, τ) is still noisy, but much improved. This procedure is computationally intensive and is currently not practical for real-time applications in long-distance spans.

5.9 Problems

5.7

231

References The general topic of nonlinear optics using both wave optics and photon optics is discussed in Shen (2002) and in Boyd (2013). Nonlinear effects in fibers are considered in Stolen (1979), in Chraplyvy (1990), in Forghieri and Chraplyvy (1997), in Agrawal (2008), and in Ballato and Dragic (2014). Self-phase modulation in optical fibers is discussed in Stolen and Lin (1978) and Potasek and Agrawal (1987). Cross-phase modulation is discussed in Marcuse, Chraplyvy, and Tkach (1994), in Chiang, Kagi, Fong, Marhic, and Kazovsky (1994), in Shtaif (1998), and in Hui, Demarest, and Allen (1999). Four-wave mixing is discussed in Shibata, Braun, and Waarts (1987), in Eiselt (1999), and in Yu, Reimer, Grigoryan, and Menyuk (2000). Raman amplification is discussed in Bromage (2004), with the frequency response considered in Blow and Wood (1989). Stimulated Brillouin scattering in optical fibers is discussed in Ippen and Stolen (1972). The evolution of a pulse under the combined effect of a Kerr nonlinearity and weak linear dispersion is discussed in Marcuse (1992). The derivation of the nonlinear Schrödinger equation is discussed in Menyuk (1999), and in Menyuk and Marks (2006).

5.8

Historical Notes Fundamental work in the physics of nonlinearities in fibers includes studies on selfphase modulation by Stolen (1979), cross-phase modulation by Chraplyvy, Marcuse, and Henry (1984), and four-wave mixing by Hill, Johnson, Kawasaki, and MacDonald (1978). The system-level consequences of the nonlinear effects due to propagation within an optical fiber were considered in Waarts and Braun (1986), in Inoue and Shibata (1989), and in Inoue (1993). The system-level implications of nonlinearities for wavelength-multiplexed systems led to changes in the design of dispersion-controlled fiber by Chraplyvy, Tkach, and Walker (1995) that could mitigate the effect of fiber nonlinearities.

5.9

Problems 1 Nonlinear terms (a) Expand the cube

(

A j cos(ω j t ) + Ak cos (ωk t ) + A ³ cos(ω³ t )

)3

into a summation of ten product terms, one of which is 6A j Ak A ³ cos(ω j t )cos (ωk t )cos(ω³ t ). (b) Using sum and difference cosine formulas, expand the product term cos (ω j t )cos(ωk t )cos (ω³ t ) and show that it can be written as

232

5 The Nonlinear Lightwave Channel

cos(ω j t )cos(ωk t )cos(ω³ t ) =

( (ω + ω + ω )t ) j k ³ ( ) 1 + 4 cos (ω j − ωk + ω³)t ( ) + 14 cos (ω j + ωk − ω³)t ( ) + 1 cos (ω − ω − ω )t . 1 4

cos

j

4

³

k

(c) What ( proportion of the ) total power on the left side is contained in the term cos (ω j − ωk − ω³ )t on the right side? 2 Effective area (a) The commercial single-mode fiber known as Corning SMF-28 has a core diameter of d 8 3 µm. At an operating wavelength of 1500 nm, the fiber specifications are 0 0036, n 1 47, and V 2 09. Using these values, determine the linearly polarized mode parameters p and q (cf. (3.3.27)). (b) Calculate eff for the Corning SMF-28 fiber at 1500 nm. Compare the 2 calculated value with the measured value of 80 µ m . (This requires numerical integration of (5.3.13).)

= .

´= .

λ= ≈ .

≈ .

A

λ=

3 Nonlinear effects in a multimode fiber Using simple scaling arguments, show that, for lightwave power levels less than 100 mW, nonlinear effects are not evident for a graded-index multimode fiber with a core diameter of 50 microns. 4 Evolution equation for the mean-squared pulse width The evolution equation for the mean-squared pulse width is to be derived in this problem starting from the nonlinear Schrödinger equation given in (5.4.3). (a) Starting with (5.4.3) and setting 0, derive the expression that is produced when both sides of (5.4.3) are multiplied by m a∗ z and integrated over . (b) Show that when the expression derived in part (a) is added to its complex conjugate, the equation

κ=

¸

τ

¾

d m β2 ∞ τ m a ∗ ∂ 2 a ³τ ´ = i dz 2E b −∞ ∂τ 2

( , τ)

τ

¿ 2 a∗ ∂ − a ∂τ 2 dτ

is obtained using

¾ ∂a ∂a ∗ ¿ ¸∞ d + a dτ = τ m dz |a(z , τ)|2 dτ = Eb dzd ³τ m ´. a∗ ∂ z ∂ z −∞ −∞ (The argument for the envelope a(z , τ) is suppressed.) ¸∞

τm

(c) Integrate by parts using the constraint that the signal has finite energy. (This constraint implies that a(z , τ) and all of its derivatives must vanish at ²∞ .) Show that the resulting expression is

¸

¾

d m β2 ∞ τ m−1 a ∂ a∗ ³τ ´ = im dz 2E b −∞ ∂τ

¿

− a ∗ ∂ a dτ. ∂τ

5.9 Problems

233

(d) Add and subtract a∗ ∂ a/∂τ inside the integral and separate the integral into two contributions that can be written as

β2 d m ³τ ´ = im dz 2Eb

¹¸ ∞

−∞

º ¸∞ τ m −1 ∂τ∂ |a(z , τ)|2dτ − 2 τ m−1a∗ ∂∂τa dτ , −∞

where a ∂ a ∗/∂τ + a∗ ∂ a/∂τ = ∂| a(z, τ)|2 /∂τ has been used. (e) Equate the real parts of both sides of the equation, and show that the resulting expression is (5.4.14). 5 Phase matching

(a) Starting with a(z , t ) =

µ¼

M » m =− M

am (z, t )exp i m ´ω(t

½¶ − z /vg ) − (m ´ω)2 β2 z /2 ,

= 1. For example, the term a1 is µ¼ ½¶ . a1 = a1 (z, t )exp i ´ω(t − z /vg ) − ´ω2 β2 z /2 .

write out the terms for M

(b) Form the product (a∗−1 + a0∗ + a1∗ )(a−1 + a0 + a1)2, then determine the phasematched terms for which the frequencies sum to zero. (c) Collect the phase-matched terms to show that a∗ a2

µ ¶ ≈ |a0 |2 + 2 |a1 |2 + 2 |a−1|2 a0 + 2a1a−1 a0∗ e−iβ

2

´ω2 z ,

as appears in (5.5.6). 6 Mean-squared output width for a weakly dispersive nonlinear fiber A generalized gaussian pulse is defined as

µ

2 2 m G (t ) = e−( t / 2σ )

¶n

,

where m and n are parameters. The area under this pulse can be expressed as

¸∞

µ ¶ = 23/2σ n−1/2m µ 1 + (2m )−1 , −∞ . ·∞ where µ( x ) is the gamma function defined as µ(k ) = 0 x k −1 e−x dx (cf. (2.2.45)). (a) Let n = 1. For pulse G (t ), determine the effective power · ∞ |G (τ)|4 dτ , Peff (m ) = ·−∞ ∞ 2 −∞ | G (τ)| dτ which was defined in (5.4.16). For m = 1, this term reduces to (5.4.19). (b) Plot six generalized gaussian pulses for 1 ≤ m ≤ 3 using n = 1 and σ = 1 on the same figure. Calculate the scaling factor Peff (m ) for each pulse. On the G (t )dt

basis of these four pulses, what kind of pulse experiences more pulse spreading in a weakly dispersive fiber? Why?

234

5 The Nonlinear Lightwave Channel

(c) Determine an expression for the instantaneous frequency shift ω(t ) = ωc − γ dP(z , t )/dt given in (5.3.16) as a function of time in terms of the parameters m and σ for a pulse whose power is a generalized gaussian pulse with n = 2. (d) Plot ω(t ) with σ = 1 for 1 ≤ m ≤ 3. Compare these four curves with the results derived in part (b). 7 Four-wave mixing A wavelength-multiplexed system operates at = 1550 nm with D = 17 ps nm km . at this wavelength and express the value in (a) Using (4.4.16), determine 2 units of ps 2/km. (b) Derive the spatial period of the phase-mismatch term e−i2π z /± given in (5.5.12) when f 100 GHz ( 0 75 nm at 1550 nm). (c) Determine the symbol rate R for which the spatial period of the phasemismatch term is one-tenth the spatial interval of a symbol when T 1 R is the temporal interval of the symbol. Comment on the result.

λ

( · )

/

β (ω)

´ =

±

ψ=

´λ = .

±

= /

8 Nonlinear fiber parameters A standard single-mode fiber is given with D 17 ps nm km , 1 3 radians W km , and 0 2 dB/km. A second fiber has D 2 3 ps nm km , 2 radians W km , and 0 2 dB/km. An input gaussian pulse has a peak power of 50 mW and a root-mean-squared temporal width of 50 ps. (a) For each fiber, determine the nonlinear length L NL , the dispersion length L D , the effective length L eff , and the walk-off length L wo with subchannels separated by 100 GHz. (b) For a single unamplified segment of fiber, derive the peak pulse power for each fiber such that the total accumulated phase shift is smaller than 0.1 radians. (c) For the same coupled power and the same pulse, which fiber produces a smaller nonlinear phase shift? (d) For the same coupled power and the same pulse, which fiber produces a larger dispersion-limited distance? (e) On the basis of the results of the previous parts, discuss the circumstances under which each fiber should be preferred.

. γ=

/( · ) κ = . /( · ) κ = .

=

/( · ) γ = = . /( · )

9 Four-wave mixing products Consider a set 1 N of N unmodulated carrier frequencies within the same single-mode fiber. Four-wave mixing generates mixing products at frequencies jk ³ j k ³ , where j k ³ 1 N . (a) Show that the frequency of the four-wave mixing product will coincide with one of the frequencies in the original set when j k ³ . Show that this condition corresponds to self-phase modulation and, for N 3, has three permutations. (b) Show that the frequency of the four-wave mixing product will coincide with one of the frequencies in the original set when j ³ or when k ³ . Show

{ω , . . . , ω }

ω = ω +ω −ω

ω , ω , ω ∈ {ω , . . . , ω }

ω =ω =ω =

ω =ω

ω =ω

235

5.9 Problems

(c)

(d)

(e) (f)

that these conditions correspond to cross-phase modulation, and that for N = 3, each condition has six permutations. Show that when ω j µ = ωk µ = ω³ , permuting the first two indices does not change the frequency of the four-wave mixing product. Show that for N = 3, there are three such degenerate combinations. Using the results of parts (a)–(c) list the nine distinct four-wave-mixing products generated for N = 3 and plot a histogram of all 27 possible four-wave-mixing products. Repeat for N = 4 and show that there are 24 distinct four-wave-mixing products. Generalize the results of parts (d) and (e) to show that the number of distinct four-wave-mixing products for N frequencies is (N 3 − N 2 )/2.

10 Intrachannel four-wave mixing A long-distance wavelength-multiplexed system operates at 1550 nm with D 17 ps nm km , = 0.2 dB/km, and a fiber nonlinear coefficient 2 rad/(W-km). (a) Estimate the effective length L eff . (b) Determine the input power P to produce a nonlinear phase shift NL of 1 radian. (c) Determine the root-mean-squared input pulse timewidth Tin such that the dispersion length L D is half the effective length. Estimate the minimum symbol rate and the launch power for which intrachannel four-wave mixing is significant.

=

/(

·

λ=



γ=

φ

11 Split-step Fourier method (requires numerics) A gaussian envelope a z centered at a carrier wavelength of 1550 nm has a peak power equal to 100 mW. The root-mean-squared timewidth rms of a 0 is equal to 50 ps. A single pulse is launched into a 200 km segment of optical fiber with the following parameters: 2 = 20 ps2 km, = 0.2 dB/km, and 3 rad/(W-km). (a) Determine the dispersion length L D and the nonlinear length L NL . (b) Estimate the bandwidth in radians per second that contains 99% of the total power in a 0 and use this bandwidth to determine an appropriate sampling rate R s 2 Ts in angular frequency that satisfies the sampling theorem, where Ts is the sampling interval in time. (c) Using Ts , construct a sequence of samples of a 0 kTs , where k is an integer, and apply a numerical Fourier transform to this sequence to give A 0 m Rs , where m is an integer. Plot both a 0 kTs and A 0 m R s . (As a check, A 0 m R s should be a sequence of real values with rms and the root-meansquared bandwidth rms (in radians per second) of A 0 m R s being exact reciprocals so that rms 1 rms (cf. Table 2.1). (d) Using the values determined in part (a), determine an appropriate step size z for the split-step Fourier method.

( , τ)

λ= σ

β −

( , τ) = π/

/

κ

)

γ=

ω

(,

(,

( , τ)

(,

²

σ = /²

)

)

(, ) σ (, )

(,

)

´

236

5 The Nonlinear Lightwave Channel

(e) Using the results of the previous parts of the problem and following the description of the split-step Fourier method given in Section 5.6, use the split-step Fourier method to determine the evolution of a(z , τ). Plot the pulse power |a(z, τ)|2 in 20 km increments over the 200 km segment. Using these plots and the results from part (a) determine the following ranges. i. The range of distances for which the nonlinear phase dominates. ii. The range of distances for which linear dispersion dominates. iii. The range of distance for which both physical effects must be considered.

6

Random Signals

Random signals are present in any communication system because the information that is conveyed is random and because additional randomness in the form of noise is always present. The performance of a well-designed communication system is always determined by a tension between the signal and the noise. The randomness in a signal that conveys information is described by a prior probability distribution. The randomness in the noise is described at a fundamental level using quantum theory, which is not considered in this chapter. The present chapter develops a more elementary description of randomness starting from the discrete photon-optics signal model. Using this model, a treatment of random signals is provided that is valid for existing guided-lightwave communication systems, both for weak lightwave signals and for strong lightwave signals. For strong signals, the probability density functions become consistent with the continuous gaussian distribution and its variants that are presented in Chapter 2. These probability distributions describe both random signals that convey information and sources of noise. It is shown in Chapter 15 that this treatment is a special case of a more general quantum-optics signal model, as is depicted in Figure 6.1. To motivate the discussion, consider a pulse of light that is described using photon optics as a puff of photons. The pulse is characterized by a discrete random photon arrival process ±(t ). This form of uncertainty is regarded as photon noise within the photon-optics signal model. Later, in Chapter 15, we will show that this uncertainty is a fundamental consequence of the way that measured outcomes are described using quantum optics. There is also randomness introduced into the electrical signal after the lightwave signal has been photodetected. This randomness is primarily caused by the interactions between the lower-frequency photodetected electrical signal and the external environment. This interaction produces a form of noise called thermal noise. To construct a reliable communication system, the relevant probability distribution function that quantifies each form of noise must be known or approximated. Each such probability distribution function could be either a continuous probability density function or a discrete probability mass function. A random signal that conveys information is characterized by a prior probability distribution. Chapter 14 will discuss a fundamental connection between the probability distribution for the noise and the optimal prior probability distribution that conveys the maximum rate of reliable information, which rate is called the channel capacity.

238

6 Random Signals

Discrete-energy (photon) description of randomness Quantum description of randomness Continuous-energy (wave) description of randomness Figure 6.1 Models for randomness and noise in photon optics and wave optics are limiting forms of quantum-optics models. The model for photon optics is also related to the model for wave optics.

In this chapter, we begin with a discussion of the physical origin of noise, and of how noise sources are described in the relevant signal model, being cognizant that this development can also be applied to random signals that convey information. The probability distribution functions are then derived for several fundamental noise sources for a single independent degree of signaling freedom as defined by an isolated temporal or spatial mode of the physical channel. Later, these probability distribution functions are used to study a bandlimited channel that consists of multiple modes, possibly with the noise in each mode correlated with the noise in the other modes.

6.1

The Physics of Randomness and Noise The random sources that are of interest in communication systems are signal sources and noise sources. The distribution of the input symbols of the signal describes an intentional form of data-dependent randomness that is controlled by the coded-modulation process. The distribution of the noise sources describes unintentional forms of randomness that are not controlled by the user. In this chapter, these random sources are characterized using a combination of photon optics and wave optics. The treatment of lightwave communication systems is more complicated than the treatment of lower-frequency systems because of the discrete photon noise. To include this discreteness, a convenient artifice, and our usual practice, is to ignore discrete sources of randomness, such as photon noise, when describing lightwave signal propagation. This source of noise is incorporated later into a photon-optics description of lightwave amplification and photodetection. This approach allows us to treat lightwave signals before photodetection, even low-energy signals, as continuous random waveforms. These continuous random waveforms, which may be noise signals or information-bearing signals, produce magnitude and phase modulation in the lightwave signal incident on the photodetector, but do not explicitly include the discrete nature of the lightwave signal. The discrete nature is manifested only during the interaction of the lightwave with a material system. Otherwise, the lightwave is considered to be a continuous waveform. The inherent discrete nature of a lightwave signal is included in the photodetection and amplification process by overlaying the photon-optics signal model onto

6.1 The Physics of Randomness and Noise

239

the wave-optics signal model. This overlay consists of treating the continuous-energy quantities within the wave-optics signal model as arrival rates within the discrete photon-optics signal model. In particular, shot noise is electronic noise in the continuous photodetected waveform that is caused by photon noise. At a functional level, the effect of photon noise can be incorporated by defining a transformation called the Poisson transform. This transformation converts a continuous probability density function, which does not include the effect of photon noise, into a probability mass function, which does include the effect of photon noise. The partitioning of random lightwave signals in this manner is useful because the use of the wave-optics signal model facilitates the comparison of a lightwave communication system with other communication systems operating at lower frequencies for which the discrete nature of a lightwave signal is not evident. While this separation into two distinct signal models is useful, it is incomplete. A photon-optics description of randomness and noise is incomplete because it does not include the effect of the carrier phase. A wave-optics description of randomness and noise is incomplete because it does not consider the discrete nature of lightwave energy. Formal reconciliation of the discrete and continuous aspects of lightwave signals is discussed in Chapter 15.

6.1.1

Randomness and Entropy

A random signal depends on the number of independent degrees of signaling freedom that the physical channel supports in time, space, and polarization. With regard to random sources of information, an independent degree of signaling freedom can convey one symbol. For a bandlimited gaussian random process, the number of independent degrees of signaling freedom can be precisely related to the number of basis functions or modes required to accurately describe the random process. This relationship is discussed in Section 6.5. This section begins the study by restricting attention to a single degree of freedom, such as one temporal sample of a single spatial mode studied in Chapter 3. The randomness for this single spatio-temporal mode is quantified by a probability distribution function. Let m be a discrete random variable denoting the number of photons within a single isolated mode, and let the integer m be a realization of the random variable m. The probability mass function1 p(m) for m is now determined by separating the problem into two parts. In the first part, the probability mass function that maximizes an appropriate measure of the randomness is determined without considering interactions with the random source of energy that produces this probability mass function. The second part introduces an external random source of energy and determines the probability mass function that is generated by the interaction of a single mode with that external source of energy. 1 As is common practice, in this and later chapters, the subscript denoting the functional form of the

probability distribution is often suppressed, with the form implicitly defined by the argument of the function. This means that p( s) and p (m) may be different functions, possibly with some confusion.

240

6 Random Signals

Table 6.1 The entropy or differential entropy of several probability distributions

p(m) or p( x ) Geometric Exponential Uniform

±

1 N 1 + N 1 +N 1 − x /N e N 1 N

²m

(

Entropy H (m) or H (x )

)

k (1 + N)log e (1 + N) − N log e N k (1 + log e N ) k log e N

The first task requires a measure of the randomness. For this purpose, a probability distribution function, whether it describes a physical system or an information source, is quantified using a scalar quantity known as the entropy. For a discrete probability mass function p(m), the entropy is defined as

.

H (p ) = −k

∞ ³ =

m 0

p(m)loge p(m).

(6.1.1)

The units used for the constant k depend on the application. For physical systems, k = 1.38 × 10 −23 J/K and is called the Boltzmann constant. Then the entropy H is expressed in joules per kelvin. For information systems, k = 1 nat,2 and it is common to omit the symbol k when writing the entropy. The entropy of a continuous probability density function p(x ) is infinite. Informally, this can be seen as a consequence of the fact that it takes an infinite number of bits to specify an arbitrary real x. Instead, the randomness of p(x ) is quantified using the differential entropy, also called H or H (p ), and given by

.

H (p ) = −

´∞

−∞

p(x )log p (x )dx .

(6.1.2)

The differential entropy can be negative so it is not the same as the entropy. The entropy or differential entropy of several probability distributions is given in Table 6.1.

The Geometric Probability Mass Function

The maximum-entropy principle is a widely accepted principle of inference. It is motivated by statistical systems that may be composed of a large number of constituent subsystems. The maximum-entropy principle states that a statistical ensemble will take on a probability distribution function consistent with the physical constraints that maximizes the entropy. This principle is accepted here without discussion, but is revisited later when we study systems described by quantum optics that may be comprised of a small number of constituent subsystems, or subsystems that do not strongly interact. The statistical properties of the energy for an ideal, maximally random discrete source, either a noise source or an information source, expressed using the random 2 In information theory, another common unit for entropy is bits, where 1 nat = log 2 bits, which can be e

realized by using a base-two logarithm in the definition.

6.1 The Physics of Randomness and Noise

241

number of photons m, are governed by a maximum-entropy probability mass function p(m). ∑ p(m) = 1, the maximizaIn addition to the standard probability constraint that ∞ m =0 tion of the entropy is constrained by requiring the expected number of photons S, given by

∞ ³ . m p(m), S = ±m² =

(6.1.3)

=

m 0

to have a specified value. If the photons are from an information source such as the output of an encoder, then the expected number S equals E, where E is the expected number of signal photons. If the photons are from a noise source, then the expected number S equals N0 , where N0 is the expected number of noise photons. In general, a mode will have a combination of signal photons and noise photons with an expected value S = E + N 0. To determine the probability mass function p(m) that maximizes the entropy under a constraint on the expected value, form the augmented objective function F

=−

∞ ³ =

m 0

p(m)log e p(m) − λ1

∞ ³ =

( ) − λ2

mp m

m 0

∞ ³ =

m 0

p (m),

where λ1 and λ2, called Lagrange multipliers, are constants that can be determined later ∑ m p(m) = S and ∑∞ p (m) = 1. The constraint by using the two constraints ∞ m=0 m =0 that p(m) is nonnegative will be temporarily ignored. The maximization of this augmented objective function over all probability mass functions p(m) determines the maximum-entropy probability mass function. This maximization can be determined using the methods of elementary calculus. Instead, we use the more general method of variational calculus that will be used later in other contexts. Observe that, to first order, a small variation in the functional form of the maximizing probability mass function p(m) will not affect the maximum value of the objective function F. The method is to write this variation as p(m) + ² y(m), where ² is a small scalar parameter and y(m) is an arbitrary function. To first order in ² , the maximum of the objective function F is not affected by the addition of the term ²y (m). Now vary the sum F about the maximum value by varying ² . Expand the terms and gather the terms in powers of ². The zeroth-order term equals F . If p(m) achieves the maximum value of F , then the first-order term in ² must be zero. Therefore, by gathering the first-order terms in ² and setting the sum of these terms to zero, we have

∞ ( ³ =

m 0

)

loge p(m) + 1 + λ 1m + λ2 y (m) = 0,

thereby ensuring that to first order in ² , F does not change as y (m) is varied. Because the left side must be zero for every choice of the arbitrary function y (m), this expression can be satisfied only if the parenthesized term is zero for every m. Setting the parenthesized term to zero and solving for p(m) gives p(m) = K u m ,

(6.1.4)

242

6 Random Signals

where K = e−( 1+λ 2) and u = e−λ1 . For each m, this term is nonnegative, so no harm was done by ignoring the nonnegativity constraint. Finally, choosing the constants λ1 and λ 2 to satisfy the two stated constraints completes the solution for p(m), which can be written as ± S ²m 1 , m = 0, 1, 2, . . . , (6.1.5) p ( m) = 1+S 1+ S where S is the expected number of photons in the mode. This probability mass function is a special case of the geometric probability mass function, which is any probability distribution of the form p(m) = q (1 − q )m , where 0 ≤ q ≤ 1. The geometric distribution when written as in (6.1.5) as a function of the number of photons is called the Gordon probability mass function or the Gordon distribution. It is the maximum-entropy distribution for the number of photons in a single mode subject to the constraint that the mean ±m² is equal to S. The entropy H of the Gordon distribution is determined using (6.1.1) together with (6.1.5) and is H

=−

∞ ³ =

m 0

p(m)loge p(m)

= loge (1 + S) + S loge (1 + 1/S) = (1 + S)loge (1 + S) − S loge S = g(S), where

.

g(x ) = (1 + x )log (1 + x ) − x log x

(6.1.6a) (6.1.6b)

is called the Gordon function, and with a base-e logarithm H is given in units of nats. The maximum entropy for a single mode Hmode with a finite mean S is Hmode = kg (S) = k [(1+S)log(1+ S)−S log S], where the constant k and the base of the logarithm define the units for the entropy.

External Energy Sources

The next task is to specify the external energy source that provides the photons needed to populate the mode. For a set of modes used for communications, the total energy in a mode is the combined effect of the encoder, which is a controlled external energy source used to convey information, and an uncontrolled energy source, which is regarded as noise. The photon distribution p(m) resulting from an external thermal energy source is derived first, where it is shown that this source leads to a maximum-entropy distribution for p (m). Next, a combination of an external signal source and an independent external noise source is considered.

External Thermal Energy Source When the external source of energy is uncontrolled and random, the expected number of photons in a mode is the expected number of noise photons N0 . When this uncontrolled

6.1 The Physics of Randomness and Noise

243

source of energy is a large external environment at a finite temperature T0, this external environment acts as an unlimited reservoir of thermal energy that populates or depopulates the mode with photons so as to achieve equilibrium between the mode and the external environment. We posit that initially no photons are in the mode. The external environment can provide energy to a mode or can accept energy from a mode in the form of photons so as to achieve equilibrium. The second law of thermodynamics states that the external environment will continue to provide energy to the mode, and the mode will continue to accept energy until the entropy Htotal of the total system is maximized. The system is then said to be in thermal equilibrium with the surrounding environment. The time required for a mode in an initial state with no photons to reach thermal equilibrium depends on the geometry that defines the mode and its coupling to the environment. Because only equilibrium conditions are considered at this time, the rate of energy transfer does not now concern us. Once established, and in the absence of any other influence, thermal equilibrium will be maintained between the external environment and every mode with which it interacts. The second law of thermodynamics also states that the energy transferred from the external environment to the mode results in an entropy reduction in the environment of ³ H = − N 0/ T0 = −N0h f / T0 with units of joules per kelvin, where N0

=. N0/ h f

(6.1.7)

is the expected number of noise photons transferred between the external source of energy with a power density spectrum of N0 and the mode. This is the physical origin of thermal noise. Using this expression and (6.1.6), the total entropy Htotal of the system in thermal equilibrium is Htotal = Hmode + ³ H

( ) = k (1 + N0)loge (1 + N0) − N0 loge N0 − N0h f / T0 ,

where N0 is the expected number of thermally generated noise photons in the mode at equilibrium. Our task is to maximize Htotal by the choice of N0. To this end, let x = N0/(1 + N 0) so that N 0 = x /(1 − x ). Making this substitution yields Htotal(x ) = −k

±

±

x

+

log e x

1−x

hf kT0

²

²

+ loge (1 − x ) .

The total entropy Htotal (x ) is a maximum when the derivative with respect to x is zero, dHtotal(x ) dx

1

= (1 − x )2

±

log e x

+

hf kT0

²

= 0.

Because (1 − x ) = 1/(1 + N0 ) is nonzero for all nonnegative N0 , the second term must be zero. Therefore, solving for x yields x

= e−h f / kT . 0

244

6 Random Signals

Then N0

=. 1 −x x

−h f / kT0

= 1 −e e−h f /kT

0

1

= eh f / kT − 1

for f

0

Substituting N 0 into (6.1.5) gives p ( m) =

1

1 + N0

±

N0 1 N0

≥ 0.

(6.1.8)

²m

+

= (1 − e−h f /kT0 )e−m(h f / kT0).

(6.1.9)

When written in this way as a function of h f / kT0 , the geometric probability mass function is called a Bose–Einstein probability mass function. It is the maximum-entropy probability mass function of the number of photons in a single spatio-temporal mode in thermal equilibrium with an environment at temperature T0.

External Nonthermal Energy Source The external source of the energy that populates a mode need not be the thermal energy from the external environment. Instead, the mode could be populated by a controlled source of signal energy with mean E . In a lightwave communication system, the source of energy that populates a mode is controlled by an information source such as the output of an encoder. In this context, the coded modulation can be viewed as controlling the probability distribution of the photons in a mode. Chapter 14 shows that when only the signal is controlled, the maximum amount of information that can be conveyed using one mode, called the single-letter channel capacity, is achieved when the external source of signal energy is chosen so that the distribution p(s) for the number of signal photons is the maximum-entropy probability distribution given in (6.1.9) with kT0 replaced by E . When the mode is populated by a combination of independent signal and noise, the signal source should be chosen so that the probability distribution of the sum p(s + n) is again a maximum-entropy distribution. This statement is developed in Section 14.2.

6.1.2

Photon–Matter Interactions

Several types of processes can occur when a set of populated electromagnetic modes interacts with a material system that has a discrete set { E k } of allowed energy states. This interaction is formally treated using quantum theory. However, a description using a judicious combination of wave optics and photon optics, as provided herein, captures the main points and is adequate for our immediate purpose. A material system has a set of allowed discrete energy states, or energy levels, determined by quantum theory. When appropriate, a set of closely spaced energy levels within an interval can be approximated as a continuum of levels. The probability of filling these

6.1 The Physics of Randomness and Noise

245

energy states from the external environment is determined by a maximum-entropy argument similar to the argument that resulted in the Bose–Einstein distribution derived in the previous section. The probability that an allowed energy state with energy E is occupied is proportional to e− E / kT0 (cf. (6.1.9)). The proportionality constant is obtained when needed by summing, or integrating e− E / kT0 over all allowed E , but it is rarely needed. Depending on the pairwise energy separation between the allowed energy states compared with the energy of a photon, a lightwave signal may or may not interact with the material system because energy must be conserved in such an interaction. Consider a photon with an energy h f that might interact with a material system with a set of discrete energy levels. Of the allowed discrete energy levels in the material system, consider two energy levels, E 1 and E 2 , that have an energy difference equal to the energy of the incident photon, with ³ E = E 2 − E 1 = h f . Let p( E 2) be the probability that the material system is in a state whose energy is E 2 and let p(E 1 ) be the probability that the material system is initially in a state whose energy is E 1. Now constrain the material system to be in thermal equilibrium. The assumptions used in the previous section to derive the allowed energy states of an electromagnetic mode in thermal equilibrium also apply to the allowed energy states of a material system in thermal equilibrium. For a material system coupled to an external environment, the allowed energy states E k will fill until the material system is in thermal equilibrium. The probability mass function p( E ) of the energy in these discrete material energy states has the same form as (6.1.9), but discretized to fit the allowed energy levels of the material system. Once thermal equilibrium has been reached, the probability that an allowed energy state is occupied is an exponentially decreasing function of the energy E k of that state. Therefore, when E 2 is larger than E 1 , the probability p( E 2 ) is smaller than the probability p(E 1 ). The ratio of these probabilities is called the Boltzmann factor, p ( E 2) p ( E 1)

= e−³E /kT ,

(6.1.10)

0

where ³ E = E 2 − E 1 is the energy difference between the two states, which, to absorb the photon, is equal to h f . There are three common photon–matter interactions, diagrammed in Figure 6.2, that can occur between the two energy states of the material system and an incident photon, which result in the emission or the absorption of a photon. In this chapter, these interactions are accepted as fundamental. These interactions are as follows. 1. The probability that a single photon for which h f = E 2 − E 1 is absorbed is proportional to the number of such photons incident to the system, and is E2 hf

E2 σa

E1 (a)

E2 σe

hf

hf hf

E1 (b)

σs

hf

E1 (c)

Figure 6.2 Photon–matter interactions. (a) A photon can be absorbed. (b) A photon can stimulate

the creation of another photon with the same frequency and polarization. (c) A photon can be spontaneously emitted.

246

6 Random Signals

proportional to the probability p( E 1 ) that the material system is in a state with an energy E 1 . 2. An incident photon with energy h f = E 2 − E 1 stimulates the creation of an identical second photon, taking that energy from the higher-energy state. This process is called stimulated emission and induces the decay of the material state with a higher energy E 2 into a state with a lower energy E 1 . The stimulated photon is emitted into the same mode as the incident photon and has the same wavelength and polarization. This stimulated photon becomes part of the amplified wave propagating in the same mode and direction as the incident wave. Stimulated emission is the fundamental quantum-level amplification process using the allowed energy states of a material system. This process can be described using either photon optics or wave optics as is appropriate. The probability of a stimulated emission event is proportional to the number of incident photons and is proportional to the probability p( E 2 ) that the material system is in a state with an energy E 2. Given that an incident photon either can be absorbed with a probability proportional to p (E 1 ) or can stimulate the emission of a second photon with a probability proportional to p( E2 ), a necessary condition for a net gain is that p( E 2) must be larger than p (E 1 ). Examining (6.1.10), this condition cannot be satisfied for a material system in thermal equilibrium. Generating nonequilibrium conditions with p( E 2) larger than p( E 1) requires an external energy source other than, and different from, the ambient external environment. 3. A photon is emitted spontaneously from a material state with an energy E 2 into a material state with a lower energy E 1. The mean time that the state remains in a higher-energy state before spontaneously relaxing to a lower-energy state is called the upper-state lifetime of the transition. The emitted photon has an arbitrary polarization and a direction of propagation governed by the geometry of the system and not by an incident lightwave. This spontaneous emission has no classical counterpart. The spontaneously emitted photons are a fundamental source of noise in a communication system. Once generated, the noise photons become part of the propagating wave, which can be subsequently amplified by the process of stimulated emission. Photons that are generated by stimulated emission or by spontaneous emission must be emitted into an electromagnetic mode determined by the geometry of the system. When there are m photons in a mode, each with an energy h f , it might be concluded that the total energy in that mode is mh f . Instead, quantum optics shows that the energy in the mode, irrespective of its source, is E (m) = (m +

1 2

)h f ,

(6.1.11)

differing from the energy mh f of m photons by the energy of half a photon. The factor h f / 2 is the energy in the mode when no photons are present. This energy is called the “vacuum-state” or “zero-point” energy and is indirectly observed when a vacuum state interacts with another mode that is populated. The vacuum-state energy is of fundamental importance, but has no classical interpretation. It does not follow from the wave-optics signal model or from the photon-optics signal model. It does follow from the quantum-optics signal model and is discussed in Section 15.3. Upon incorporating

6.1 The Physics of Randomness and Noise

247

the vacuum-state energy term, as we will do, the photon-optics model becomes a much better fit to the quantum-optics model. Because the vacuum-state energy, which is equivalent to half a photon, is present even when there are no photons in a mode, the effect of the vacuum-state energy cannot be directly observed by measuring the number of photons. The effect of the vacuumstate energy can be observed when a mode with no signal is coupled with another mode that contains a signal. This is discussed using a quantum-optics signal model in Section 15.5.3. For now, this half photon of energy is accepted and used whenever appropriate. 6.1.3

Expected Energy

Given (6.1.11), which is the energy in a single mode expressed in terms of the number of photons, and (6.1.9), which is the probability p(m) that m photons are in that mode, . the expected energy E ( f ) = ± E ( f )² for a distribution of modes is (cf. (2.2.3)) E( f ) =

∞ ³ = ∞ ³

m 0

E (m) p(m)

(m + 21 )h f (1 − e−h f / kT )e−m(h f /kT ) m =0 ² ± 1 1 = h f e h f /kT − 1 + 2 . =

0

0

0

(6.1.12)

Rewriting (6.1.12) using the hyperbolic cotangent function gives E( f ) = where coth x

±

hf hf coth 2 2kT0

²

,

= (ex + e−x )/(e x − e− x ). The mean energy can be written as E ( f ) = h f (N0 + 21 ),

(6.1.13)

(6.1.14)

where N 0 is the expected number of noise photons per mode in thermal equilibrium as a function of kT0 given by (6.1.8). For a controlled source of signal energy that populates the mode with the maximumentropy distribution, the mean thermal energy kT0 is replaced by the mean signal energy E used to populate the mode. This is the physical description of the encoding process. Note that the vacuum-state energy is always present and cannot be controlled by the encoding process. For a given value of the mean thermal energy kT0, the mean energy per mode E ( f ) is a function only of the frequency f . The frequency f characterizes the quantum of energy h f that can be exchanged between any given mode and the external environment. For a distribution of modes characterized by frequency f , the mean energy can be expressed as a (single-sided) power density spectrum N ( f ) with units of watts per hertz. A plot of (6.1.13) that is scaled by kT0 and denoted N ³ ( f ) is shown in Figure 6.3 as a function of the energy E ³ expressed in units of kT0 so that E = E ³ kT0. This figure also

248

6 Random Signals

Wavelength in microns 30000

3000

300

30

3

0.3

1014

1015

Frequency in hertz 10

10

11

10

1012

1013

100 )f(′ N ytisneD rewoP esioN delacS

10

1 E′ coth(E ′/ 2) 2

1

10–1

10–2

10–3 10–3

E′ E′

e –1 E′ 2

Quantumnoise regime

Thermal-noise regime 10–2

10–1

1

10

100

Photon Energy E ′ in Units of kT 0 Figure 6.3 Scaled noise power density spectrum N ³ f in units of kT0 and the two terms that

( )

comprise the power density spectrum given in (6.1.12). The lower x axis is the mean photon energy E ³ in units of kT0. The upper x axes are the frequency in hertz and the wavelength in microns for T0 = 290 K.

separately plots the two terms of (6.1.12) along with the vertical line h f = kT0 or E ³ = 1. Although the two noise terms combine into the single hyperbolic cotangent function given in (6.1.13), each term has a different physical origin and affects a lightwave system in a different way. The first term is the mean energy in the mode caused by the system being in thermal equilibrium and is called thermal noise. Examining Figure 6.3, it is evident that the thermal-noise term decreases rapidly for frequencies greater than kT0/ h or for energies greater than kT0 . This means that it is unlikely that a photon with an energy significantly larger than kT0 will be coupled into a mode in thermal equilibrium. The second term in (6.1.12) is the mean energy in the mode caused by quantum fluctuations as given in (6.1.11) and is called quantum noise. The fluctuations described by the second term are nonclassical and can be attributed, in part, to the interaction of an electromagnetic mode with the vacuum state.

Thermal-Noise Regime Referring to Figure 6.3, for typical lightwave wavelengths – on the order of one micron – the power density spectrum of the noise is dominated by the quantum-noise term, with the thermal-noise term being negligible. For this reason, it may appear that thermal noise is not an issue for lightwave communication systems. However, all existing lightwave communications systems use photodetectors in the receiver that convert the lightwave signal to an electrical signal. These devices cannot respond on the time scale of the lightwave frequency. Therefore, the photodetected signal consists of a lower-frequency copy of the complex envelope or the intensity of the lightwave signal. In this low-frequency

6.1 The Physics of Randomness and Noise

249

regime and at a normal environmental temperature, the thermal noise generated by the interaction of a low-frequency mode with the external environment is much larger than the quantum noise. This is evident in Figure 6.3. Cooling the device will change the relative scaling of the three horizontal axes shown in Figure 6.3 so that the quantum-noise regime will occur at a lower frequency because of the reduced thermal noise. To determine the power density spectrum of the thermal noise in this low-frequency regime, neglect the quantum-noise term h f /2 compared with kT0 in (6.1.12), treat the photon energy h f as a continuous variable, and apply L’Hôpital’s rule at f = 0 to give N0( f ) =

d(h f )/d f d(e−h f / kT0 )/d f

µµ µµ = kT0. f =0

(6.1.15)

The power density spectrum in a low-frequency regime is the constant kT0, as is evident in Figure 6.3 and contains equal amounts of power per unit bandwidth. Therefore, the low-frequency limit of thermal noise is called white noise because the power density spectrum is independent of the frequency. The constant thermal-noise power density spectrum kT0 defines a thermal-noise floor. The thermal-noise power density spectrum kT0 is typically expressed in units of dBm/Hz and is −174 dBm/Hz at a temperature of 290 K. 3

Quantum Noise Regime The fundamental vacuum-state fluctuations in a lightwave field are a source of quantum noise. These quantum fluctuations are independent of the temperature of the external environment and can both directly and indirectly affect the signal in a mode. When photodetected using a square-law photodetector, these vacuum-state fluctuations mix with an incident conventional lightwave signal producing fluctuations in the photodetected electrical signal. These unavoidable fluctuations produce the shot noise. The second source of quantum noise is from the fundamental fluctuations that lead to spontaneous emission within a lightwave amplifier. These fluctuations must be included in the analysis of noise in lightwave amplification (cf. Section 7.7). The fundamental quantum fluctuations in a lightwave field are seen in Figure 6.3 in the regime for which the energy of the photon h f is much larger than the expected thermal energy kT0. Then the thermal-noise term is insignificant. Examining (6.1.12), the ratio of the thermal-noise term to the quantum-noise term is on the order of e− h f / kT0 . At a wavelength of λ = 1550 nm and at a temperature of T0 = 290 K, this ratio e−h f / kT0 is approximately 10−14. Therefore, at normal operating temperatures, the thermal fluctuations in a lightwave signal are negligible. The fluctuations in this high-frequency regime are dominated by the quantum-noise term h f /2, as is evident in Figure 6.3. Total Noise In summary, there are two intrinsic kinds of noise generated in the two frequency regimes that contribute to the noise in a photodetected electrical signal. 3 In comparing this noise source with lightwave noise sources, we will often use units of A2/Hz, which is the

power density spectrum per unit resistance. These units are discussed in the section titled Notation.

250

6 Random Signals

Quantum noise equivalent to half a photon is dominant in the high-frequency regime in which the energy of the photon h f is much larger than the mean thermal energy kT0 . In the presence of a signal, this form of noise mixes with the signal, resulting in shot noise in the photodetected signal. Thermal noise is dominant in the frequency regime in which the energy of the photon h f is much smaller than the mean thermal energy kT0. This condition is satisfied at the frequencies of the electrical signal generated after photodetection, at which point thermal noise is added to the shot noise. Both sources of noise are described by (6.1.13). All fundamental sources of noise for lightwave communication systems can be traced back to this expression.

6.2

Probability Distribution Functions The maximum-entropy distribution for the energy derived in the previous section is fundamental and applies to any communication system. In this section, the maximumentropy distribution is used to derive the probability density function of thermal noise using wave optics, the probability density function of spontaneous emission noise using wave optics, and the probability mass function of photon noise using photon optics. In Section 6.3, the Poisson transform is derived, which relates the probability distribution function for the continuous energy in the wave-optics signal model to the probability mass function for the number of photon counts in the photon-optics signal model.

6.2.1

Thermal Noise

In the thermal-noise regime (see Figure 6.3), the energy h f of a photon is much smaller than the mean thermal energy kT0 . This means that the discrete energy difference h f in (6.1.9) between the allowed energy levels is small compared with kT0. Therefore, the discrete probability mass function of the energy p( E ) given in (6.1.9) can be regarded as continuous and described by a probability density function f ( E ). The change in notation from p(m) to f ( E ) (not to be confused with the frequency f ) indicates that the energy is now treated as a continuous random variable characterized by a probability density function f ( E ) instead of a probability mass function p(m). Replacing the discrete energy E m = mh f of m photons in (6.1.9) with a continuous energy E gives 1 − E / kT0 f (E ) = e for E ≥ 0, (6.2.1) kT0

where the normalizing constant 1/ kT0 ensures that f (E ) is a valid probability density function. The term kT0 is the mean energy in a single spatiotemporal mode of the system, such as a coherence time interval in a single spatial mode. The exponential probability density function has a mean kT0 and a variance (kT0)2, depending only on the temperature of the external environment. In this context, the exponential probability density function is called a Boltzmann probability density function.

6.2 Probability Distribution Functions

251

Energy, Power, and Complex Amplitude

A practical communication system transmits a finite number of symbols per unit time; each symbol is random with a finite energy. This means that a communication waveform is a bandlimited random process with a finite power density spectrum. The probability density function of the lightwave energy in a finite time interval can be related to the probability density function of the lightwave power. Consider a time interval T for a single spatial mode that is shorter than the duration τc of a coherence interval of a stationary, bandlimited random process (cf. (2.2.59)). Over this time interval, the values of the random process are highly correlated. Within some approximation, the random power P during a coherence interval τc is simply a scaled form of the random energy E in that coherence interval. Therefore, for T less than τc , the power P = E / T has a probability density function f ( P ) with the same functional form as the probability density function of the energy given in (6.2.1). Then f (P) =

1 − P / Pn e Pn

for P

≥ 0,

(6.2.2)

where Pn is the mean power. Later, in Section 6.5, the probability density function of the energy is derived for a time interval longer than the duration of a coherence timewidth τc . For this case, the probability density function of the energy is not proportional to the probability density function of the power. Instead, it can be approximated as a sum of independent random variables with each random variable defined over an interval of duration τc .

Lightwave Amplitude The lightwave power is the square of the lightwave amplitude so one might conclude that the probability density function for the lightwave amplitude can be derived directly from (6.2.2) using a square-root relationship involving a single real variable. The amplitude √ would then be given by A = P. However, such an approach would fail to generate a maximum-entropy distribution for the amplitude A, which is to be expected and perhaps required. With reference to the next subsection, this failure can be avoided by the assertion that the power must be expressed as the complex factorization P

= (x + i y)(x − iy) = zz∗ = |z |2 ,

(6.2.3a)

which can be written as

= x 2 + y2 , (6.2.3b) where z = x + iy = √ Ae iφ and z∗ = x − iy = Ae −iφ are complex amplitudes. Thus the square root satisfies P = x ´ iy = z or z ∗ and so P = | z| 2. This conclusion is an P

unavoidable consequence of respecting the maximum-entropy principle. This principle requires us to lift the problem from the nonnegative real line, where the power is defined, to the complex plane, where the complex amplitude z = x + i y is defined. For this purpose, we show next that the bivariate random variable (x , y ) underlying the power is a circularly symmetric gaussian random variable. The probability density function of this random variable is a maximum-entropy distribution. It is consistent with the

252

6 Random Signals

Rayleigh probability density function (cf. (2.2.34)) for the magnitude of the complex amplitude. In summary, working backwards from the maximum-entropy distribution of the photon energy, we conclude that the lightwave power is the magnitude of a complex random variable with a bivariate gaussian probability density function.

Maximum Entropy Bivariate Density

We now seek a relationship between the maximum-entropy distribution for the signal power given in (6.2.2) and the joint probability density function of the in-phase and quadrature components of a passband random process (cf. Section 2.2.2), which may describe either a noise source or an information source. This joint probability density function can be written as f ( A , φ), where A is the magnitude and φ is the phase. Alternatively, the joint probability density function can be written as f (x , y), where x is the in-phase component and y is the quadrature component. This relationship can be inferred from the maximum-entropy distribution for a bivariate probability density function. We now show that the maximum-differential-entropy bivariate probability density function constrained to have a real covariance matrix C is a zero-mean multivariate gaussian distribution given in (2.2.21) with N = 2 and is f ( r) =

√1

2π det C

1 T −1 e− 2 r C r ,

(6.2.4)

where r = (x , y) and C is the real covariance matrix defined in (2.2.22). This statement could be proved by again using variational calculus. Instead, an alternative method of proof is given. Let f (r) = f (x , y ) be the zero-mean bivariate gaussian distribution with a covariance matrix C and let f ³ (x , y) be any zero-mean bivariate probability density function with the same covariance matrix. We must show that f ³ (x , y ) has a smaller differential entropy. Given that loge f (x , y ) is a quadratic function of x and y because f (x , y) is gaussian, the following expression holds:

´ ∞´ ∞

f ³ (x , y)loge f (x , y )dx dy =

´∞´∞

f ( x , y )log e f (x , y)dx dy , (6.2.5)

−∞ −∞ −∞ −∞ because f ³ (x , y) and f ( x , y ) have the same covariance matrix

(negative) differential entropies satisfy

´∞´∞

−∞ −∞

f (x , y )log e f (x , y)dx dy −

=

´ ∞´ ∞ ´−∞∞ ´−∞∞ −∞ −∞

Now use the elementary inequality that log e x This gives

−H ( p) + H ( p³ ) ≤

´∞´∞

C.

Therefore the

f ³ (x , y)log e f ³ (x , y )dx dy

± f (x , y ) ² dx dy. e f ³ ( x , y)

f ³ (x , y)log

(6.2.6)

≤ x −1, with equality if and only if x = 1.

±

²

f ( x , y) f ³ (x , y ) 1 − ³ f (x , y) dx dy. −∞ −∞

(6.2.7)

6.2 Probability Distribution Functions

253

Because both f ³ (x , y ) and f (x , y ) are probability densities, the right side of (6.2.7) is equal to zero, leading to the inequality H ( p) ≥ H ( p³ ), which is satisfied with equality if and only if f ³( x , y ) = f ( x , y ). This conclusion states that the maximum-differentialentropy bivariate density function with covariance matrix C is the bivariate gaussian density function. When all the elements of C are specified, the maximum-differential-entropy density function is a bivariate gaussian density function with that covariance matrix C. However, if only the trace of C is specified, which is a constraint only on the power, and otherwise C is arbitrary, then of all such bivariate gaussian density functions, the bivariate gaussian density function with a diagonal covariance matrix and equal diagonal elements has the maximum entropy and is a product distribution. This is the conclusion asserted earlier. The maximum-entropy distribution is the circularly symmetric bivariate gaussian distribution with a real covariance matrix C given by σ 2 I, where σ 2 is the variance of one √ real gaussian signal component, I is the two-by-two identity matrix, and det C = σ 2. The gaussian probability density function has another desirable property. The sum of two gaussian random variables is another gaussian random variable. This property is called the additivity property. The additivity property does not hold for the Gordon probability mass function, which is the maximum-entropy probability mass function subject to a finite mean (cf. (6.1.5)). The additivity property does hold for a Poisson random variable (cf. (6.2.30)). The combination of both maximum entropy and additivity does not hold for either a discrete Gordon distribution or a discrete Poisson distribution, whereas the combination of both properties does hold for a continuous gaussian distribution. This statement has important consequences, which are discussed in Chapter 14 in the context of the channel capacity.

Phase-Insensitive Noise

The same conclusion as given in the previous subsection can be reached by starting from a different formulation. Common practice for low-frequency communication systems is to simply postulate bivariate noise with a joint probability density function f (n I , n Q ) that is required both to be a product distribution and to have circular symmetry. With a constraint on the power, these two conditions can be simultaneously satisfied only by a bivariate circularly symmetric gaussian random variable. This turns out to be a maximum-entropy distribution, even though this property was not imposed as a requirement. The corresponding channel is phase-insensitive, meaning that it cannot distinguish between the noise on a sine carrier and the noise on a cosine carrier. For this case, the in-phase and quadrature noise components are independent, with the corresponding bivariate gaussian density function given by f (n I , n Q ) = f n I (n I ) f n Q (n Q )

±

= √

1

2πσ

2 2 e−n I /2σ

²±



1

2πσ

−n2 /2σ 2 e Q

²

.

(6.2.8)

The two marginal probability density functions are both gaussian, with the same variance. The samples of the two noise components described by this joint distribution are independent.

254

6 Random Signals

For a magnitude/phase representation of a circularly symmetric gaussian distribution, the probability density function of the magnitude f ( A ) can be obtained by a change of variables using P = A2 /2 and d P /dA = A, where the factor of 1/ 2 accounts for the passband signal (cf. (2.1.74)). This variable transformation yields f ( A) =

A − A 2/2σ 2 e 2

σ

for A

≥ 0.

(6.2.9)

This is the distribution f ( A ) for the magnitude of a maximum-entropy passband random process. However, viewed in isolation as a density on the nonnegative real numbers, f ( A ) is not a maximum-entropy distribution. It is the Rayleigh probability density function defined in (2.2.34). The probability density function f (φ) for the phase is a uniform probability density function f (φ) = (1/2π)rect(φ/2π), which is the maximum-entropy distribution in the interval [π, π) . The proof of this statement is to be found as a problem at the end of the chapter. The joint probability density function of these two postulated independent components is f ( A , φ) = f A ( A ) f φ (φ) =

A − A2/ 2σ 2 e 2πσ 2

for A ≥ 0.

(6.2.10)

Transforming from polar coordinates to cartesian coordinates recovers (6.2.8). The postulated independence of the two signal components leads to a separable product distribution for a single complex symbol described by an in-phase component and a quadrature component. This statement is not true for a quantum-optics signal. A fundamental feature of quantum optics is an inherent dependence between the quantum equivalents of the in-phase and quadrature components of a single symbol. The fundamental dependence between the two signal components is evident only for small signal levels. In this regime, the description of a single complex symbol using a product distribution must be revisited when using quantum optics and is discussed in Chapter 15. However, the correlation structure of a complex symbol within quantum optics does not exclude the use of the maximum-entropy distribution for the discrete particle energy given by (6.1.5). Accordingly, that distribution, the Gordon distribution, is regarded as fundamental. In a large-signal regime for which no inherent correlation structure between the two signal components is evident, starting with (6.1.5) and applying conditions appropriate to wave optics leads to the separable bivariate gaussian probability density function given in (6.2.8).

Circularly Symmetric Gaussian Densities The two-dimensional gaussian distribution given in (6.2.8) is circularly symmetric (cf. (2.2.28)). The phase is uniform and is independent of the amplitude. The corresponding complex random variable given as n = n I + in Q is a zero-mean circularly symmetric gaussian random variable. These statements can be extended to a block n of length M of random complex components in which the components nk are independent, zero-mean circularly

6.2 Probability Distribution Functions

255

symmetric gaussian random variables with variance σ 2 per real component. The complex covariance matrix for this block, given by (2.2.30b), can be written as

Vnoise = ±n n†² = 2σ 2 I,

(6.2.11)

where I is the M × M identity matrix. The corresponding probability density function, determined using (2.2.29), is 1

f (n) =

2 2 e−|n | / 2σ ,

(6.2.12) (2πσ ) ∑ ∑ where | n|2 = k =1 |n k | 2 = k =1 (n2 + n2 ) is the squared euclidean length of the M

2

M

Ik

M

Qk

random complex vector n. This probability density function with independent complex circularly symmetric gaussian signal components nk is the maximum-entropy distribution for a block signal subject to a constraint on the mean block energy. When the components of every block of complex samples of a complex random process are jointly circularly symmetric gaussian random variables, the random process is a circularly symmetric gaussian random process. This type of random process is used to describe both noise processes and priors on random datastreams.

Offset Circularly Symmetric Gaussian Density The probability density function f ( P ) for the power of a circularly symmetric bivariate gaussian random variable changes when a simple bias that offsets the distribution is present. To determine f ( P ) with a bias, define x

=. |s + n|2,

(6.2.13)

where s is a constant offset with a power Ps , and n is a zero-mean, circularly symmetric gaussian random variable with variance σ 2 for each signal component. The probability density function of x is a noncentral chi-square probability density function with two degrees of freedom (cf. (2.2.38)). These degrees of freedom are the amplitudes of the in-phase and quadrature components. Assigning z = 2P, A2 = 2Ps , and σ 2 = Pn , and using f (z )dz = f ( P )dP , the probability density function of the lightwave power P is also a noncentral chi-square probability density function with two degrees of freedom given by f (P) =

1 −( P + Ps )/ Pn e I0 Pn

± 2√ P P ² s

Pn

for P

≥ 0.

(6.2.14)

The mean and the variance of f ( P ) are determined using (2.2.39) and are given as

± P² = Ps + Pn , σ 2 = 2Pn Ps + Pn2 . P

(6.2.15a) (6.2.15b)

If Ps = 0, then (6.2.14) reduces to an exponential probability density function given in (6.2.2).

256

6 Random Signals

6.2.2

Spontaneous Emission Noise

The origin of spontaneous emission noise is fully described only by using quantum optics. Our purpose in this chapter requires a simplified wave-optics description of this noise process consistent with the lightwave noise modeled using wave optics, but augmenting this model with photon optics as needed to include photon noise. Within wave optics, spontaneous emission in the lightwave signal often can be modeled as an additive circularly symmetric gaussian noise process. The conditions required for the validity of this model can be understood by considering thermally generated radiation at lightwave frequencies. Only when the temperature T0 is sufficiently high is the mean thermal energy kT0 on the order of the energy h f of a photon. In this case, the external thermal source of energy can create excited energy states in a material system that can subsequently produce optical thermal noise. Examples of optical thermal noise include both the sun and an incandescent light bulb. Spontaneous emission from a lightwave source can be generated at temperatures near room temperature. This requires a suitable external energy source other than the external thermal environment. When such a lightwave energy source, called an optical pump, is present, it couples lightwave energy into the material system, thereby creating excited energy states that can generate spontaneous emission. For a sufficiently large number of independent spontaneous emission events, reference to the central limit theorem suggests that the cumulative spontaneous emission can be treated as a circularly symmetric gaussian noise process. This kind of noise process is sometimes called pseudothermal noise, because the statistics are the same as optical thermal noise even though the source of the energy is different. In contrast to low-frequency thermal noise, spontaneous emission events need not be independent. The presence of feedback, gain, or other coupling mechanisms can introduce a correlation structure. Moreover, small structures support a limited number of spatial modes that may produce only a small number of spontaneous emission events, violating the conditions needed for the assertion of the central limit theorem. For either of these cases, modeling the spontaneous emission as a continuous, circularly symmetric gaussian noise process may be inaccurate. For conditions permitting the spontaneous emission noise process to be modeled as a circularly symmetric gaussian noise process nsp (t ), we can write the complex-baseband noise process as n sp (t ) = n I (t ) + in Q(t )

=



2Pn (t )eiφ(t ) ,

(6.2.16)

where Pn (t ) is a stationary random process for the spontaneous emission noise power and φ(t ) is a stationary random process for the phase. The probability density function f (E ) for the noise energy of a sample of this stationary random process is an exponential distribution given by (cf. (6.2.1)) f ( E) =

1 −E / Nsp e N sp

for E

≥ 0,

(6.2.17)

where Nsp is the mean noise energy expressed as a constant power density spectrum.

6.2 Probability Distribution Functions

257

.

The mean noise power ± P n ² = Pn can be written in several equivalent forms as (cf. (2.2.80)) Pn

· ¸ =. σsp2 = 21 |n sp (t )|2 =

´∞ 0

Nsp ( f )d f ,

(6.2.18)

where N sp ( f ) is the (one-sided) power density spectrum of the spontaneous emission noise. In contrast to the power density spectrum of thermal noise given in (6.1.15), which is essentially constant at low frequencies, the power density spectrum for spontaneous emission depends directly on the lightwave frequency and depends indirectly on the lightwave signal power through the interaction of the lightwave signal with the finite source of external energy that produces the spontaneous emission. The functional form for N sp ( f ) and the dependence of the spontaneous emission on the lightwave frequency and on the lightwave power are discussed in Section 7.7.

6.2.3

Photon Noise

The discrete probability mass function p (m) is now derived for the random number m of photon counts within an interval of duration T . This probability mass function is generated from a random photon-arrival process, ±( t ), expressed as a sequence of impulses

±(t ) =

³ ´

δ(t − t ´),

(6.2.19)

where the impulse δ(t − t ´ ) models the arrival of the ´th photon at the random time t ´ . This photon-arrival process is shown in Figure 6.4(a). The corresponding photoncounting process m(t ) defined over a time interval of duration T is an integer-valued random counting process which can be written as

( ) =.

mt

´ t +T /2 t −T / 2

±(τ)dτ.

(6.2.20)

The counting process is a Poisson counting process whenever m(t ) is described by a Poisson probability mass function for each t and T . This random counting process is shown in Figure 6.4(b). The Poisson counting process m(t ) is characterized by a photon-arrival rate R. The constant R is the expected number of arrivals per unit time. It can be generalized to a time-varying arrival rate R(t ). The number of counts of a Poisson counting process in two nonoverlapping intervals are statistically independent for any size or location of the two intervals. This property is called the independent-increment property.

Photoelectron Noise

Photodetection converts an incident photon into a photodetection event called a primary photoelectron. Each primary photoelectron is generated with an electric potential V and an energy E = eV that is equal to the energy h f of the incident photon. These primary photoelectrons may be subsequently amplified, leading to multiple secondary photoelectrons for each primary photodetection event.

258

6 Random Signals

(a) ) t(Φ t (b)

)t(m t Figure 6.4 (a) A realization of a random photon-arrival process

generates the Poisson counting process m(t ).

±(t ). (b) The integral of ±(t )

Photodetection converts photons to primary photoelectrons. Accordingly, the photonarrival process ±( t ) has a corresponding photoelectron-arrival process g(t ). Similarly, the photoelectron-arrival rate µ(t ) corresponds to the photon-arrival rate R(t ), and the photoelectron-counting process ξ(t ) corresponds to the photon-counting process m(t ). Because the conversion of photons to photoelectrons is not perfect, the photon-arrival rate R(t ) is not generally equal to the photoelectron-arrival rate µ(t ), with

µ(t ) = ηR(t ),

(6.2.21)

where η is a positive constant, not larger than one, called the quantum efficiency. This constant characterizes the mean rate of converting photons into photoelectrons. When η equals one, g(t ) = ±(t ), ξ(t ) = m(t ), and the photoelectron stream after photodetection is equal to the photon stream before photodetection. When η is less than one, the random arrival times for the photoelectrons are generated by randomly and independently deleting photon arrivals, with the probability of a deletion given by 1 − η. This results in another Poisson counting process, but with a reduced arrival rate. The relationships between the photon-counting process and the photoelectron-counting process are summarized in Table 6.2. Because both the photon-counting process and the primary photoelectron-counting process are described by a Poisson counting process, our convention is to use a photon Poisson counting process and distinguish between a photon-counting process and a photoelectron-counting process only when needed.

Relationship to Wave Optics

The photon-arrival rate R(t ), the photoelectron-arrival rate µ(t ), and their integrals can be expressed in terms of wave-optics quantities. The integral of the photon-arrival rate (cf. (1.2.1) and (1.2.4))

( )=

E t

´ t +T /2 t −T /2

(τ)dτ

R

= E(t )/ h f

(6.2.22)

6.2 Probability Distribution Functions

259

Table 6.2 Signals for photon optics and wave optics in the optical domain and in the electrical domain using direct photodetection

Lightwave Photon-Optics Signal Model Point process Counting process Rate Average count

Electrical

±(¹t ) ( ) = ±(τ)dτ R(t ) = P ( t )/ h f ¹ E( t )= R(τ)dτ = E (t )/ h f

m t

(photons)

Wave-Optics Signal Model Lightwave power

P (t ) = h f R(t )

¹g(t ) ξ(t ) = g (τ)dτ µ(t ) = ¹ηR(t ) W(t )= µ(τ)dτ = ηE( t )

(photoelectrons)

R P (t ) = e µ(t )

r (t ) =

(photocurrent)

Lightwave energy

E (t ) = h f E(t )

W (t ) = e W(t ) = e ηE(t ) (photocharge)

gives the mean number of photon arrivals E(t ) over an interval of duration T in terms of the wave-optics energy E (t ) in that interval. Expression (6.2.22) has the same form as (6.2.20), but now using the photon-arrival rate R(t ) instead of the arrival process ±(t ) and using the mean number of photon arrivals E(t ) instead of the time-varying number of actual photon arrivals m(t ) given by (6.2.20). This mean number E(t ) is determined from wave optics without considering the statistics of the photon arrivals. Now recall that R(t ) = P (t )/ h f (cf. (1.2.5)), where P (t ) is the lightwave power (cf. (1.2.4)). Using (6.2.21) along with r (t ) = P (t ), where r (t ) is the photodetected electrical signal, the quantum efficiency η and the responsivity are related by4 (6.2.23) = η hef .

R

R

R

Using this expression, several other relationships between photon-optics quantities and wave-optics quantities are provided in Table 6.2. In that table, W(t ) is the mean number of primary photoelectrons generated over an interval of duration T and W (t ) is the mean photocharge over the same time interval.

Poisson Probability Mass Function

The probability mass function of the ideal photon-counting process m(t ) due to a known arrival rate R(t ) results in only photon noise when the incident lightwave power, modeled using wave optics, has no noise. For a constant arrival rate R, the probability pd of generating a count in a subinterval of duration ³t is proportional to the product of R and ³t . The subinterval ³t can be chosen small enough that the probability of generating one count within ³ t is, to within order ³t , given by

R

η and the responsivity describe the energy lost in the photodetection process. Mention of them will often be suppressed in later chapters.

4 Both the quantum efficiency

260

6 Random Signals

= R ³ t = RMT = ME ,

pd

(6.2.24)

where E = W = R T is the mean number of counts over an interval T of duration M ³ t.5 The probability that no counts are generated within an interval of duration ³ t is, to order ³t , 1 − R ³t . The probability of generating m independent counts within a time interval T of duration M ³t is given by a binomial probability mass function,

! ( p )m(1 − p ) − for m = 0, 1, 2, . . . , M. d !( − )! d Substituting pd = E/M from (6.2.24) yields ± E² − Em M! 1− p ( m) = m M M (M − m)! m! ²− m ± = M(M − 1)· ·M·(mM − m + 1) Em! 1 − ME . p ( m) =

M m M m

M m

M m

M m

(6.2.25)

Referring to (6.2.24), if R and T are both held fixed, then E is constant. Therefore, in the limit as E/M = R ³ t goes to zero, M goes to infinity. The first term in (6.2.25) approaches one because the numerator approaches Mm . In the last term, the finite value of m relative to the limiting value of M can be neglected. This term becomes (1 − E/ M)M , which goes to e− E as M goes to infinity. Therefore, the probability mass function of the number of counts generated over an interval of duration T is p ( m) =

Em m

!

e−E

for m

= 0, 1, 2, . . . ,

(6.2.26)

which is the Poisson probability distribution (or the Poisson probability mass function) with the mean number of counts ±m² given by R T = E. A random variable described by the Poisson probability distribution is called a Poisson random variable. Thus a Poisson counting process is described by a Poisson random variable on any observation interval. The variance of the Poisson probability distribution can be determined using the characteristic function defined in (2.2.15), C m (ω) =

∞ ³ =

m 0

eiωm p(m) =

∞ ³ =

m 0

= e−E

eiωm

∞ ³ =

m 0

Em −E e m

!

1 m

iω m ! (Ee ) .

∑ (1/m!)x m = ex with x = Eeiω . Then C (ω) reduces The summation has the form ∞ m m=0 to

iω C m (ω) = eE( e −1) .

(6.2.27)

η equal to one, the mean number of photon counts E is equal to the mean number of photoelectron counts W (cf. Table 6.2).

5 For ideal photodetection with

6.2 Probability Distribution Functions

261

Using (2.2.17), the mean-squared value is

µµ µµ 2 1 d µ ±m ² = µµ 2 dω2 Cm(ω)µµµ i ± ω=0º »2 ²µµ ω = eE(e −1) Ee iω + Eei ω µµ ω=0 2 = E+E .

(6.2.28)

σ m2 = ±m2² − ±m²2 = E

(6.2.29)

2

i

Accordingly

is the variance of the Poisson distribution. Expression (6.2.29) shows that the variance σm2 in the number of counts due to photon noise is equal to the mean number of counts E. This kind of signal-dependent noise is a unique aspect of lightwave signals. In contrast, thermal noise is not caused by the signal, and the variance (kT0 )2 of the thermal energy given in (6.2.1) is independent of the signal. The sum of two independent Poisson random variables m1 and m2 is a Poisson random variable m3 . Let p1(m) and p2 (m) be two Poisson probability distributions with mean values E1 and E2, respectively. Then the probability distribution p3 (m) for m3 is the convolution p3 (m) = p1 (m) ± p2 (m). The convolution property of a Fourier transform (cf. (2.1.14)) states that the two characteristic functions satisfy C 3(ω) = C 1(ω)C2 (ω), where ( iω C i)(ω) is the characteristic ( iω )function of pi (m). Using (6.2.27), substitute C1 (ω) = eE 1 e −1 and C 2(ω) = eE 2 e −1 on the right and take the inverse Fourier transform to give p 3 ( m) =

(E1 + E2)m e−(E +E ) m! 1

2

for m = 0, 1, 2, . . . .

(6.2.30)

Accordingly, the sum of two independent Poisson random variables with means E1 and E2 , respectively, is a Poisson random variable with a mean of E1 + E2 . Moreover, it is readily shown that if the sum of two random variables m3 = m2 + m1 is a Poisson random variable, and either of the two summands m1 or m2 is Poisson, then the other summand is Poisson as well. The proof of this statement is asked for as an end-of-chapter exercise. The difference of two Poisson random variables m1 − m2 can be defined only in a weak sense because this subtraction is not a meaningful notion as stated. To make the notion of a difference precise, one simply introduces a Poisson process with mean E1 − E2 , provided the mean E1 for m1 is greater than the mean E2 for m2. Then, when m1 is Poisson, m2 is Poisson as well.

262

6 Random Signals

6.3

The Poisson Transform The energy of a lightwave signal can have a combination of statistical uncertainty modeled using wave optics and additional uncertainty during the photodetection process caused by photon noise. Photon noise is a semiclassical description of a form of quantum uncertainty that can arise during a measurement. This form of uncertainty is fully described using quantum optics in Chapter 15. In this chapter, photon noise is described using a Poisson probability mass function. Within photon optics, a lightwave signal has both statistical uncertainty, described using a probability density function for the lightwave energy, and photon noise, described using a Poisson probability mass function. A wave-optics signal has only statistical uncertainty. The composite uncertainty is a mixed Poisson process expressed using a relationship known as the Poisson transform. This relationship is the topic of this section.

6.3.1

The Direct Poisson Transform

The Poisson probability distribution with mean E is p(m| E) =

Em m

!e

−E

for m = 0, 1, 2,. . . .

(6.3.1)

This expression is written as a conditional probability distribution to emphasize that it depends on the parameter E. We are interested in random E as described by probability density f (E). Using (2.2.10), the composite probability mass function p(m) for the number of counts over an interval of duration T is p ( m) =

=

´∞

p(m|E) f (E)dE

´ ∞ Em e− f (E)dE, m! 0 0

E

(6.3.2)

where f (E) is the probability density function for the mean number of counts in an interval T . Expression (6.3.2) is called the Poisson transform.6 6.3.2

The Inverse Poisson Transform

When given the discrete photon-optics probability mass function p(m), it is possible to recover the continuous probability density function f (E). Let µ be the Fourier-transform variable associated with the continuous variable E. The inverse Fourier transform of e−E f (E) is ´∞ 1 F (µ) = e−E f (E)eiµE dE . (6.3.3) 2π −∞ 6 In statistics, the Poisson transform is called a mixed Poisson distribution. (See Johnson, Kemp, and Kotz

(2005).) This terminology puts the emphasis on p( m|E) in (6.3.2) rather than on f (E). The term Poisson transform emphasizes the conversion from a probability density function to a probability mass function, as is appropriate for this book.

6.3 The Poisson Transform

263



m Insert the series expansion of ei µE = ∞ m =0 (iµE) /m!. Then, with the integration and summation interchanged, use (6.3.2) to write this as

F (µ) =

1 2π

´ ∞ Em (i µ) e− f (E)dE. m ! −∞ m =0 ∞ ³

m

E

(6.3.4)

But f (E) is zero for E less than zero. Therefore F (µ)

=

1 2π

∞ ³

m =0

(iµ)m p(m).

(6.3.5)

Moreover, returning to (6.3.3), the forward Fourier transform of both sides gives e−E f (E) =

´



−∞

F (µ) e−iµE dµ.

Substituting F (µ) from (6.3.5), the inverse Poisson transform is f (E) =

´

1 E ∞ ³ e (iµ)m p(m)e−iµE dµ. 2π −∞ m =0



(6.3.6)

The forward and inverse Poisson-transform pair shows that there is a one-to-one correspondence between the probability density function f (E) defined over an interval of duration T and the probability mass function p(m) of the counts over the same time interval. Because E = h f E, the probability density function of the conditional mean number of counts f (E) is the same as the probability density function f (E ) of the lightwave signal energy E . This means that there is a one-to-one correspondence between the continuous lightwave signal energy E defined using wave optics and the discrete number of counts m defined using photon optics. The Poisson transform reveals the parallel roles for the wave model and the particle model of light.

Characteristic Function

The relationship between the two random signal models can also be described by a relationship between their characteristic functions. The characteristic function C m (ω) of the probability mass function p(m) given in (6.3.2) is determined using (2.2.15) and is

² ∞ ±´ ∞ Em ³ e− f (E)dE eimω . m! 0 m =0 ( ) ∑ Interchanging the summation and integration, then using ∞ (Eeiω )m/m!= exp E eiω , Cm (ω) =

E

=

m 0

gives

Cm (ω) =

´∞

iω e E(e −1) f (E)dE

¼ 0 ( ω − )½ = e ( ( )) = C − i eiω − 1 . E ei

E

1

(6.3.7)

264

6 Random Signals

This is the characteristic function of a Poisson probability distribution given in (6.2.27) averaged over the probability density function f (E) for the random mean number of counts E. The expression given in (6.3.7) shows that the characteristic function C m (ω) for the number of counts p(m) using photon optics can be determined directly from the characteristic function C E (µ) derived from the wave-optics probability density function f (E) for the mean number of counts, where µ is the Fourier transform variable associated with E. Examining the arguments (of the characteristic function on each ) side of (6.3.7), the transformation µ → −i ei ω − 1 converts CE (µ) into C m (ω). The resulting characteristic function C m (ω) is periodic in ω for any characteristic function C E (µ) because of the periodic term eiω . The inverse Fourier transform of this periodic characteristic function C m (ω) results in the discrete probability mass function p(m). The mean and variance of p(m) are

· ¸

= ±E², σm = ±E² + σ 2 , m

2

E

(6.3.8a) (6.3.8b)

which can be obtained by applying (2.2.17) to (6.3.7). 6.3.3

Forms of Uncertainty

The Poisson transform provides a relationship between the wave-optics model of light and the photon-optics model of light. When viewed as a composite of the statistical uncertainty expressed using wave optics and the additional quantum uncertainty expressed as photon noise within photon optics, the Poisson transform provides a useful proxy for many, but not all, forms of uncertainty permitted within the quantum-optics model of a lightwave. Chapter 15 will show that within quantum optics, the Poisson transform is a specific representation of a general form of composite uncertainty that is fully described using quantum optics. Within quantum optics, this composite form of statistical uncertainty and quantum uncertainty is described using a mathematical object called a density matrix, which may be informally viewed as a generalized probability distribution (cf. Section 15.4). Within photon optics, the composite uncertainty can be described by the Poisson transform. This transform produces a probability distribution instead of a density matrix by regarding the quantum uncertainty as photon noise described by a Poisson probability mass function. Because quantum uncertainty is different than statistical uncertainty, each form of uncertainty is treated differently in quantum optics. Accordingly, each form of uncertainty is treated differently when using the Poisson transform. The effect of quantum uncertainty – expressed as photon noise – is determined first because that form of uncertainty is always present when a conventional lightwave is photodetected. The effect of statistical uncertainty – expressed using wave optics – is then overlaid using the Poisson transform.

6.3 The Poisson Transform

265

The relative strength of these two forms of uncertainty can be determined from the expression for the variance given in (6.3.8b). The first term of that expression is the expected number of counts. This value is equal to the variance of a Poisson probability mass function and describes the fundamental quantum uncertainty. This form of uncertainty is always present for conventional lightwaves. It need not be significant. The second term describes the statistical uncertainty and is the variance σE2 of the random mean number of counts E excluding the effect of photon noise. This form of uncertainty is described using wave optics. It need not be present. When both forms of uncertainty are present, the ratio ±E²/σE2 of the two terms that sum to give the variance σm2 in (6.3.8b) determines the relative importance of the two forms of uncertainty. When the variance σE2 of f (E) is much larger than the mean ±E², the statistical fluctuation in the mean number of counts E is much larger than the quantum fluctuation generated by the photon noise. In this regime, signals are accurately described using statistical wave optics based on continuous probability density functions. The Poisson tranform is not needed. Conversely, when σE2 is much smaller than ±E², the quantum fluctuations are much larger than the statistical fluctuations. In this regime, signals are accurately described using photon optics using a discrete probability mass function based on the Poisson transform. As an example, consider an incident lightwave signal with a constant photon arrival rate given by R = η( P / h f ) (cf. Table 6.2) that generates a mean count ±E² = RT over an interval of duration T . For this constant arrival rate, the continuous probability density function f (E) used in the Poisson transform is a Dirac impulse that is given . by f (E ) = δ(E − S), where S = ±m² is the mean number of counts. Substituting this expression into (6.3.2) gives

´ ∞ Em e− δ(E − S)dE m ! 0 Sm −S for m = 0, 1, 2, . . . , = m! e

p ( m) =

E

(6.3.9)

which is a Poisson probability distribution with mean S. This is the limiting case for a noise-free wave-optics lightwave signal with σE2 = 0. A communication channel that is limited by only photon noise is called a photon-noise-limited channel or a Poisson channel. 6.3.4

The Gordon Distribution

Suppose that a wave-optics noise source or information source is modeled by a maximum-entropy exponential distribution (cf. (6.2.1)). Because the random mean number of counts E is proportional to the mean lightwave energy E, and both means are random (cf. Table 6.2), the probability density function f (E) is also an exponential distribution with mean S given by 1 f (E) = e−E/ S S

for E ≥ 0.

(6.3.10)

266

6 Random Signals

Substituting (6.3.10) into (6.3.2), the Poisson transform of the exponential distribution f (E) is

´ ∞ Em p ( m) = e−E e− / dE S 0 m! ´∞ 1 = Sm! Eme− ( + − ) dE 0 ± S ²m 1 for m = 0, 1, 2,. . . . = 1+ S 1+S 1

E S

E 1 S 1

(6.3.11)

This is the Gordon probability mass function, which was derived directly as the maximum-entropy probability mass function in (6.1.5). Expression (6.3.11) states that the Poisson transform of the maximum-entropy exponential distribution is the maximum-entropy Gordon distribution.7 The Gordon distribution, or Gordon probability mass function, may be viewed as the composite of the quantum uncertainty, expressed by photon noise in a photon-optics signal model, and the maximum statistical uncertainty, expressed by an exponential distribution in a wave-optics signal model. The composite effect when both forms of uncertainty are present is determined by first considering the effect of the photon noise and then averaging that result over the maximum-entropy exponential distribution. Figure 6.5 combines the Poisson-transform relationship between the continuous maximum-entropy exponential distribution and the discrete maximum-entropy Gordon distribution with the relationship between the energy and the complex amplitude discussed in Section 6.2.1. This chain of mathematical relationships connects the maximum-entropy distributions for wave optics and photon optics. These relationships imply that the maximum-entropy distributions are mathematically interchangeable. This property arises naturally in quantum optics and is discussed in detail in Section 15.6. The variance of the Gordon distribution can be determined · ¸2 from (6.3.8b) using an exponential function for f (E) with ±E² = S and σE2 = E = S2. Substituting these expressions into (6.3.8b) gives

σ m2 = ±E² + σ 2 = S + S2,

(6.3.12)

E

combining the variance of an exponential distribution with that of a Poisson distribution.

Circularly Symmetric Gaussian Complex mplitud e

Squared magnitude Exponential Distribution “Lifting” based on maximum entropy

a

Continuous energy

Forward Poisson transform

Inverse Poisson transform

Gordon Distribution

Photon co unts

Figure 6.5 Mathematical relationships between the continuous and the discrete maximum-entropy

probability distributions.

7 The fundamental role of this distribution in photon optics was first discussed by Gordon (1962).

6.4 Power Density Spectra

267

The relative magnitude of S and S2 determines which form of probability distribution function is an appropriate replacement for the Gordon distribution. These two terms are equal when S equals one. When S is much smaller than one, the discrete Poisson probability distribution, which has only quantum uncertainty, is appropriate. When S is much larger than one, the continuous exponential probability density function, which has only statistical uncertainty, is appropriate. When S is on the order of one, both forms of uncertainty are evident. Then the Gordon distribution itself is appropriate.

6.4

Power Density Spectra This section derives the power density spectra for several kinds of lightwave signals in additive noise, and the corresponding electrical signals generated using direct photodetection. For some lightwave sources, such as laser diode sources, the power density spectrum must be treated in a different way because the noise in such a source is not necessarily additive. This is discussed in Section 7.8.

6.4.1

Power Density Spectrum of the Lightwave Noise Power

Consider a lightwave noise source n(t ) modeled as a stationary zero-mean circularly symmetric gaussian random process. The power density spectrum S Pn ( f ) of the lightwave noise power is the Fourier transform of the autocorrelation function R Pn (τ) (cf. (2.3.70)) of the random lightwave noise power Pn (t ), and is given by

.

R Pn (τ) = ± Pn (t ) Pn (t

+ τ)² . (6.4.1) (τ) for the lightwave noise power is a

The second-order autocorrelation function R Pn fourth-order correlation function with respect to the complex lightwave noise amplitude n(t ). For a circularly symmetric gaussian random-noise process n(t ), the fourth-order statistic R Pn (τ) can be expressed in terms of the autocorrelation function Rn (τ) (cf. (2.2.55)) and the mean noise power Pn = 21 ±|n(t )| 2². This is given by R Pn (τ) = Pn2 + |R n (τ)|2 ,

(6.4.2)

which is a consequence of the Isserlis theorem. The proof of this statement is the task of an end-of-chapter exercise. Expression (6.4.2) states that the autocorrelation function for the lightwave noise power R Pn (τ) is directly related to the squared magnitude | Rn (τ)|2 of the autocorrelation function of the circularly symmetric gaussian noise process n(t ). Expression (6.4.2) can be used to relate noise statistics for gaussian lightwave noise before and after direct photodetection. Taking the Fourier transform of (6.4.2), the power density spectrum of the lightwave noise power can be written as

S Pn ( f ) = Pn2δ( f ) + Sn ( f ) ± Sn∗ ( f ),

(6.4.3)

where Sn ( f ) is the power density spectrum of the lightwave noise amplitude, which is the Fourier transform of Rn (τ) (cf. (2.2.55)).

268

6 Random Signals

These expressions also apply to a modulated information waveform that is modeled as a circularly symmetric gaussian random process, as well as to the sum of such a signal and noise. These applications are described in the next section.

6.4.2

Power Density Spectrum of the Photodetected Signal with Additive Noise

The power density spectrum of the intensity of the sum of a lightwave signal and additive lightwave noise is developed in this section. The signal and the noise are first treated as zero-mean circularly symmetric gaussian random processes. The signal is then treated as a bias with this same noise. The power density spectrum for this case is compared with the corresponding probability density function given in (6.2.14).

Power Density Spectrum for Noise with a Random Signal When both the signal and the noise are maximally random, they can be treated as independent circularly symmetric gaussian random processes. As in Section 6.4.1, the 2 1 expected lightwave · ¸ noise1 power2 is Pn = 2 ±|n (t )| ². The expected lightwave signal . power is Ps = P s = 2 ±|s (t )| ². The autocorrelation function of the total lightwave power P (t ) = 21 |s (t ) + n (t )|2 is R P (τ) = ± P (t ) P (t

+ τ)² · ¸ = |s (t ) + n (t )|2 |s (t + τ) + n(t + τ)|2 . 1 4

(6.4.4)

The right side of (6.4.4), when expanded, becomes 16 terms. Because the means are zero and the signal and noise terms are independent, the eight terms containing only a single unsquared noise or signal variable are each equal to zero under the expectation and can be dropped. Of the remaining eight terms, two involve pseudocovariance functions of circularly symmetric gaussian random variables and so are zero (cf. Section 2.2.1), and can be dropped. Then collecting the six remaining terms gives R P (τ) = R Ps (τ) + R Pn (τ) + 2Ps Pn

+ R∗s (τ) Rn (τ) + Rs (τ) Rn∗(τ),

(6.4.5)

where R Ps (τ) is the autocorrelation function of the lightwave signal power, R Pn (τ) is the autocorrelation function of the lightwave noise power, and Rs (τ) and R n (τ) are the autocorrelation functions of the lightwave signal and the lightwave noise, respectively. Because this expression contains a cross term 2 Ps Pn , there is no simple scaling factor between the optical signal-to-noise ratio (OSNR) defined in (2.2.71b) and the electrical signal-to-noise ratio (SNR) defined in (2.2.71a). The Fourier transform of (6.4.5) is the power density spectrum S P ( f ) of the lightwave power.

Power Density Spectrum for Noise with a Bias A different power density spectrum is generated when the signal is treated as a constant with the noise still treated as a zero-mean circularly symmetric gaussian random process. The corresponding autocorrelation function can be obtained from (6.4.5) by replacing R Ps (τ) with Ps2 and replacing both Rs (τ) and R ∗s (τ) by Ps . Using these replacements

6.4 Power Density Spectra

in (6.4.5) along with R Pn (τ) of the lightwave power is

269

= Pn2 + | Rn (τ)|2 (cf. (6.4.2)), the autocorrelation function

( ) R P (τ) = P 2 + Ps Rn (τ) + Rn∗ (τ) + | Rn (τ)|2 ,

(6.4.6)

where P 2 = ( Ps + Pn )2 is the square of the total mean lightwave power. The Fourier transform of (6.4.6) gives the power density spectrum S P ( f ) of the lightwave power. Using the dual of the convolution property of the Fourier transform (cf. (2.1.15)) to write | Rn (τ)| 2 ←→ Sn ( f ) ± Sn ( f ), the power density spectrum S P ( f ) of the lightwave power can be written as

S P ( f ) = P 2 δ( f ) + 2Ps Sn ( f ) + Sn ( f ) ± Sn ( f ),

(6.4.7)

where Sn ( f ) = Sn∗ ( f ) for a conjugate-symmetric complex-baseband noise process. The first term is the total mean electrical power. The last two terms correspond to the two terms in the variance of the probability density function for circularly symmetric gaussian noise with a bias given in (6.2.15b). These expressions are used to determine the power density spectrum of a directly photodetected electrical signal in Section 8.2.5.

6.4.3

Power Density Spectrum of the Photodetected Signal with Shot Noise

The power density spectrum of a photodetected lightwave signal that includes shot noise ∑ is the topic of this section. The photoelectron arrival process is g(t ) = ´ δ(t − t ´ ), possibly with a time-varying random photoelectron arrival rate μ(t ) set to R(t ). Thus g(t ) has two forms of randomness. When the lightwave power P (t ) is a time-varying and random process, the photoelectron arrival rate R(t ) is also time-varying and random. Were the lightwave power P (t ) described by a single time-varying realization of a stationary random process, the arrival process g(t ), conditional on that realization, would be nonstationary. However, the lightwave power is a stationary random process with an infinite ensemble of possible realizations. This means that the ensemble of the random photoelectron arrival process g(t ) is independent of time, so the arrival process g(t ) is stationary. The power density spectrum Sg ( f ) of the arrival process g(t ) can be determined by writing Sg ( f ) as the limit of a random process G T ( f ) (cf. (2.2.54)), 1

¼

Sg ( f ) = lim |GT ( f )|2 T →∞ T

½

,

(6.4.8)

where GT ( f ) is the Fourier transform of a sample function of g(t ) truncated to a finite time interval of duration T . The transform pair δ(t − t ´ ) ←→ e−i2π f t´ and the definition ∑ δ(t − t ) (cf. Table 6.2) leads to of g(t ) = ∞ ´ −∞

GT ( f ) =

m ³

´=1

e−i2π f t ´ ,

(6.4.9)

270

6 Random Signals

where m is the random number of counts8 generated within the photodetector during an interval of duration T , and t ´ is the random arrival time for the ´ th count. Substituting (6.4.9) into (6.4.8) gives 1 Sg ( f ) = lim T →∞ T

¾³ m m ³ ´=1 j =1

e−i2π f t ´ ei2π f t j

¿

.

(6.4.10)

The expectation will be evaluated by averaging over the independent random arrival times t ´ of the m photoelectrons, then averaging over the random number of photoelectrons m within a time interval of duration T . To evaluate the expectation defined in (6.4.10), first average over the arrival times for each realization m of the random variable m. The double summation in (6.4.10) has m2 terms. Each of the m terms for which the indices j and ´ are the same is equal to one, so these m terms sum to m . For the remaining m 2 − m terms, the values of t ´ and t j are independent. For these terms, the expectation of the product is the product of the two individual expectations. Then the expectations can be moved inside the summations to give m 2 − m identical terms

¾³ m m ³

´µ´==1j j =1

e−i2π f t ´ ei2π f t j

¿ º »¼ ½¼ ½ = m2 − m e− i2π f t e i2π f t .

(6.4.11)

The probability density function of the arrival time used to evaluate each expectation in (6.4.11) is determined by first defining pt ´ (t ´ ) = lim³t →0 ( pd /³ t ) as the continuous probability density function of the time t ´ at which the ´th photodetection event occurs. Using (6.2.24) gives R(t´ ) , (6.4.12) pt ´ (t´ ) = E

which is the same for every ´. The normalizing denominator E is the mean number of counts over an interval of duration T . Using (6.4.12) for each expectation in (6.4.11) and adding the m diagonal terms, the expectation over the random arrival times, denoted ±|GT ( f )|2 ²t , can be written as

´ ∞ R(t ) ´ ∞ R(t ) ¼ ½ 2 − i2π f t 2 |G ( f )| t = m + (m − m) e dt ei2π f t dt −∞ E −∞ E 2 = m + (m2 − m) |U ( f )| , T

E2

(6.4.13)

where U ( f ) is the Fourier transform of a sample function of the random arrival rate R( t ) . Now recall that m is a random variable with realization m. To complete the expectation · ¸ of (6.4.3), note that because m is a Poisson random variable m = E and ±m2 − m² = E2 + E − E = E2. Therefore, with the expectation over m now included, the expectation over both the random arrival times and the random number of counts is 8 For ideal photodetection with the quantum efficiency equal to one, the number of photoelectron counts is

equal to the number of photon counts m.

6.5 Direct Photodetection with Gaussian Noise

2 ¼ ½ |G ( f )|2 = ±m² + ±m2 − m² |U ( f2 )| E 2 = E + |U ( f )| .

271

T

Combining (6.4.14) with (6.4.8) gives

¹ T/2

À

E

Sg ( f ) = lim T →∞ T

(6.4.14)

Á | U ( f )| 2 + T .

Next, substitute E = − T / 2 R(t )dt (cf. Table 6.2) into the preceding expression. The limit of the first term is ´ 1 T /2 E = lim R(t )dt = R, lim T →∞ T − T / 2 T →∞ T

which is the time average of the arrival rate R(t ). The limit of the second term is the power density spectrum SR ( f ) of the random arrival rate R(t ). Together these terms give

Sg ( f ) = R + SR( f ),

(6.4.15)

SR ( f ) = (R/ e)2 S P ( f ).

(6.4.16)

as the power density spectrum of the arrival process g(t ). The first term R of this expression is a constant white-noise term due to the impulsive nature of the photon noise. The second term SR ( f ) is the power density spectrum of the random arrival rate R(t ), which is related to the power density spectrum S P ( f ) of the incident lightwave power by (cf. Table 6.2)

Substituting this expression into (6.4.15) and multiplying both sides by e2 , the (twosided) power density spectrum Si ( f ) for the directly photodetected electrical signal before electrical filtering is given by

Si ( f ) = eR P + R2S P ( f ),

(6.4.17)

where e2S g ( f ) = Si ( f ) and R = (R/e) P , where P = ± P ² is the mean lightwave power. The constant term appearing first is the white electrical power density spectrum due to the shot noise. The second term with S P ( f ) given by (6.4.7) is the electrical power density spectrum from the incident lightwave signal, including additive lightwave noise, but excluding shot noise.

6.5

Direct Photodetection with Gaussian Noise A bandlimited lightwave channel used with direct photodetection and an integrating sampler is shown in Figure 6.6. Only additive gaussian noise in the lightwave channel is considered. Referring to that figure, a lightwave pulse s (t ) plus an additive white gaussian lightwave noise process n o(t ) is bandlimited by a noise-suppressing filter with a passband noise-equivalent bandwidth B defined in (2.2.77). Because of the bandlimiting filter, the gaussian noise is not white. This section derives probability distribution

272

6 Random Signals

Table 6.3 Expressions for the constant signal terms and the noise density terms for wave optics and photon optics in the optical domain and the electrical domain

Optical domain Signal term Wave signal model Photon signal model Relationship

E E

E

Noise density Nsp

= hf E

Optical filtering

s(t) + no(t)

Electrical domain

N sp

Nsp

Signal term

Noise density

= RE = ηE Ws = eWs

= R Nsp = ηNsp Wn = eWn

Ws

= h f Nsp

Square-law photodetection

Ws

Integration and sample

2

B

Wn

Wn

Sample

E T

Figure 6.6 Schematic of a direct-photodetection receiver for a bandlimited channel.

functions on the output sample for both the wave optics and the photon optics models. Only the elementary pulse s(t ) = rect(t / T ) without dispersion is considered in this section. In Chapter 10, the integration and the rectangular pulse shape will be replaced by a detection filter and an arbitrary pulse shape. To produce the sample E of the photodetected lightwave energy over an interval of duration T , possibly larger than 1/ B, the squared magnitude |r (t )| 2 of the bandlimited complex gaussian random process r (t ) = s (t ) + n o(t ) is integrated over an interval of duration T . The relevant probability density functions on the sample E will be derived in Section 6.5.1. Figure 6.6 can also be given a particle interpretation. Now the integration is replaced by an event counter, where an event can refer to a photon or a photoelectron. The relevant probability mass function for that case will be derived in Section 6.5.2. The associated probability mass function p(m) for the number of counts observed in a photon-optics detection process is determined by using the Poisson transform of the wave-optics distribution f (E). In this way, photon noise is included. The noise terms and the signal terms for wave optics and photon optics both in the optical domain and in the electrical domain are summarized and related in Table 6.3. Because all the terms are related by scaling factors, E denotes the random sample value, . E s = ± E ² denotes the mean signal, and Nsp denotes the mean noise with the specific units depending on the signal model as given in Table 6.3.

6.5.1

Continuous Probability Density Functions

¹

The random sample E = T |r (t )|2 dt is the integral of the squared magnitude of a complex bandlimited gaussian random process over a time interval of duration T . Because of the bandlimiting to bandwidth B, the gaussian random process is not white. In this section, the probability density function of the sample E will be approximated on the

6.5 Direct Photodetection with Gaussian Noise

273

basis of the timewidth–bandwidth product T B. This product is shown to be equivalent to T /τc , where τc = 1/ B is the coherence interval of duration τc (cf. (2.2.59)) of the random process. Later, in Section 6.6.3, this approximation is validated using an orthogonal expansion in signal space.

Single Coherence Interval

For a timewidth–bandwidth product T B that is at most one, the sample E is the integration over an interval that is at most the coherence interval τc . The integrand is a gaussian random process, but that process does not change over a coherence interval and is described by a gaussian random variable. Therefore, the value of the integrand is the squared magnitude 21 |n o| 2 of a circularly symmetric gaussian random variable, perhaps with an added offset s. The integration simply multiplies the random value 21 |s + no |2 by T .

Random Signal without an Offset When there is no offset, the sample E is the squared magnitude of a circularly symmetric gaussian random variable and is described by a central chi-square probability density function with two degrees of freedom. This is the same as an exponential probability density function (cf. (2.2.49)). The random magnitude of the sample is described by a Rayleigh probability density function (cf. (2.2.34)). These distributions are valid for the squared magnitude and the magnitude of any sample described by a circularly symmetric gaussian random variable, which may be a noise sample or a sample generated from a random signal that conveys information. Random Signal with an Offset The discussion of the previous paragraph is now expanded to include an offset by a constant signal. The squared magnitude of the sample E generated from a constant signal s added to a circularly symmetric gaussian random variable is described by a noncentral chi-square probability density function with two degrees of freedom (cf. (2.2.38)). The random magnitude of the sample is described by a ricean probability density function (cf. (2.2.33)). When a filter and a sampler are used in place of an integrator, the magnitude of the sample is still described by a ricean probability density function, but may have a different mean and variance.

Multiple Coherence Intervals

For a timewidth–bandwidth product T B that is larger than one, the sample E is the integration over multiple coherence intervals of duration τc . The integrand changes over this integration time. An approximation for the random sample divides the¹interval of length T T into K subintervals defined by ¶T /τc ¸. The random sample E = 0 | s + no (t )|2 dt at the output of the integrator is then approximated by a random process consisting of a sequence of random piecewise independent intervals, E

=

´T 0

|s + n o(t )|2 dt ≈ ≈

K ´ kτ ³

c

k=1

( k−1)τ c

|s + no (t )|2 dt

K µ ³ µµs + no µµµ2 τc . k k=1

(6.5.1)

274

6 Random Signals

In this expression, the integration is approximated by the summation in (6.5.1), which consists of K independent subsamples s + no k of the squared magnitude of the integrated lightwave signal, where {no k } is a set of independent and identically distributed zero-mean circularly symmetric gaussian random variables characterizing the coherence intervals of duration τc . The value of each subsample is multiplied by τc and then the results are added. A more elaborate justification for this approximation is discussed in Section 6.6.3.

Sample of a Random Signal without an Offset When the integration time is longer than a single coherence interval of duration τc and there is no offset, s = 0 in (6.5.1), and r (t ) = no (t ). The squared magnitude |no k |2 of each subsample is described by an independent exponentially distributed random variable. Within this approximation, the integration of the directly photodetected signal over an interval of duration T is replaced by the sum of K independent exponentially distributed subsamples so that E

= ≈

´ T /2

−T /2

³ K

k =1

|r (t )|2 dt

Ek ,

(6.5.2)

where E k is an exponential random variable. The mean N opt is given in various ways as Nopt

=. ± Ek ² = R Pn τc = R N sp ,

(6.5.3)

where Pn is the lightwave noise power, and Nsp is the power density spectrum of the white spontaneous emission noise. The last expression, Nopt = N sp , in (6.5.3) is valid for an ideal bandpass filter with τc = 1/ B so that Pn τc = Pn / B = N sp (cf. (2.2.78)). Because N opt = Nsp when = 1, N sp will be used as a summary notation for the mean noise. The probability density function f ( E ) for E is a central chi-square probability density function with N = 2K degrees of freedom (cf. (2.2.44)). Using N = 2K , 2σ 2 = N sp , and z = E , this is a gamma distribution given by (cf. (2.2.50))

R

R

f (E ) =

where ¶( K ) =

1 ( E/ Nsp )K −1e− E /Nsp , Nsp ¶( K )

¹ ∞ x K −1e− x dx is the gamma function.

E

≥ 0,

(6.5.4)

0

Sample of a Random Signal with an Offset The discussion is now expanded to include a constant offset. A constant lightwave signal added to a bandlimited circularly symmetric gaussian noise process with K degrees of freedom results in a biased gaussian noise process. When the intensity of this complex process is integrated over an interval of duration T , the integrand is a random variable described by a noncentral chi-square probability density function with N = 2K degrees of freedom (cf. (2.2.41)). This distribution is characterized by the mean lightwave signal

6.5 Direct Photodetection with Gaussian Noise

275

E s , the lightwave noise N sp in one coherence interval, and the number of degrees of freedom N = 2K . Using that expression, with N = 2K , A 2 = E s , 2σ 2 = Nsp , and z = E , gives f ( E) =

1 N sp

± E ²(K −1)/2 Es

± √

2 Es E e−( E + E s )/Nsp I K −1 N sp

²

,

E

≥ 0,

(6.5.5)

where Nsp is the power density spectrum of the spontaneous emission noise generated by direct photodetection (cf. Table 6.3). The mean and variance, determined using (2.2.42), are

·E ¸ = E + K N , s sp σ 2 = 2Es Nsp + K Nsp2 . E

(6.5.6a) (6.5.6b)

Characteristic Function for Wave Optics The characteristic function CE (µ) of a noncentral chi-square probability density function is given in (2.2.43). Using that expression, and assigning N = 2K , A2 = E s , and 2σ 2 = Nsp , gives

± iµ E ² s exp C (µ) = , (1 − iµ Nsp ) 1 − iµ Nsp E

1

K

(6.5.7)

where µ is the Fourier-transform variable associated with the sample E. When E s equals zero, (6.5.7) reduces to C E (µ) = (1 − iµ Nsp )− K , which is the characteristic function of a sum of K independent exponential random variables each with a mean Nsp . The sum of independent exponential random variables is a gamma random variable (cf. (2.2.50)) or, equivalently, a central chi-square probability density function with N = 2K degrees of freedom. Therefore, when E s = 0, the noncentral chi-square probability density function given in (6.5.5) reduces to the central chi-square probability density function given in (6.5.4).

6.5.2

Discrete Probability Mass Functions

Figure 6.6 can be interpreted to suit the photon-optics model. The square-law device becomes a photon detector and the integration becomes a counter. The probability density functions of Section 6.5.1 are converted to probability mass functions by the Poisson transform.

Characteristic Function for Photon Optics

The Poisson transform of (6.5.5) is the probability mass function p (m) for the random number of counts and incorporates both photon noise and additive gaussian noise. Using signal quantities appropriate for photon optics (cf. Table 6.2), the Poisson transform is readily evaluated starting with the characteristic ( function ) CE (µ) given in (6.5.7) and replacing the variable µ with the variable −i ei ω − 1 (cf. (6.3.7)). This replacement converts the characteristic function of the probability density function f (E) for the mean signal defined using wave optics to the characteristic function C m (ω) of the probability

276

6 Random Signals

mass function p(m) for the random number of counts defined using photon optics. Using this replacement and expressing the mean signal and noise in terms of counts gives C m (ω) =

À

1

(1 − N (e iω − 1)) sp

K

exp

( − 1) Á ( ) , 1 − Nsp eiω − 1 Es ei ω

(6.5.8)

where Es is the mean number of signal counts and N sp is the mean number of noise counts. This characteristic function is periodic in ω . The inverse Fourier transform of this periodic characteristic function produces a discrete probability mass function for the number of counts.

Probability Mass Function for Photon Optics

The probability mass function p(m) for the random number of photon counts is the inverse Fourier transform of the characteristic function Cm (ω) given in (6.5.8). This function can be expressed in terms of the generic Laguerre random variable m. Upon assigning x = Es/(N sp (1 + Nsp )) and a = Nsp in (2.2.51), the probability mass function is given by 9 p(m) =

m ( (Nsp ))m+ 1 + Nsp

e−E s/(1+Nsp ) L Km− 1 K

±

− N (1E+s N ) sp sp

²

for m = 0, 1, 2, . . . ,

(6.5.9) where LmK −1 (x ) is the generalized Laguerre polynomial (cf. (2.2.52)). A plot of (6.5.9) is shown in Figure 6.7(a). For comparison, plots of a noncentral chi-square probability density function and a gaussian distribution are shown in Figure 6.7(b). The mean and variance are determined using (6.3.8b) and are (cf. (6.5.6))

·m¸ = ±E²,

(6.5.10a)

σm2 = ±E² + σ 2 = Es + K Nsp + 2EsNsp + K N2sp . E

(6.5.10b)

The Laguerre probability mass function specializes to several other discrete probability mass functions. When there is no signal, the probability mass function p(m) is a sum of K independent Gordon distributions. This distribution is called a negative binomial probability mass function. Setting Es equal to zero in (6.5.9) and noting that ( ) LKm− 1(0) = K −m1+m , the probability mass function in (6.5.9) simplifies to p(m) =

±

K

−1+m m

²±

1

1 + N sp

²K ±

N sp 1 N sp

+

²m

for m = 0, 1, 2,. . . .

(6.5.11) When there is no lightwave noise (Nsp = 0), the characteristic function C m (ω) given in (6.5.8) reduces to (6.2.27), which is the characteristic function of a Poisson distribution with mean E s (cf. (6.3.8a)). When Es equals zero and Nsp is much smaller than one, ( ) iω C m (ω) can be approximated as 1 + K Nsp ei ω − 1 ≈ eK Nsp (e −1) , where 1 + x ≈ e x for x much smaller than one. The resulting expression is the characteristic function of 9 See Lachs (1965), and Section 2.4 and Appendix C of Gagliardi and Karp (1976).

6.6 Balanced Photodetection with Gaussian Noise

(a)

(b) –2

–4

–4

–6

–6

)E( f goL

)m (p goL

–2

–8

–10

–8

–10

–12

–12

–14

–14

–16

277

0

100

200

m

300

400

500

–16

0

100

200

300

400

500

E

( )

=

=

Figure 6.7 (a) Plot of the Laguerre probability mass function p m for E s 100, K 5, and N sp 5. (b) Plot of the noncentral chi-square probability density function f E (solid line) for

=

()

the same parameters as well as a gaussian probability density function with the same mean and variance (dashed line).

a Poisson distribution with mean K Nsp . This expression states that the probability mass function of any photon-optics signal will approach a Poisson probability mass function when the signal is sufficiently attenuated.

6.6

Balanced Photodetection with Gaussian Noise The probability density function of a stationary bandlimited lightwave signal modeled as a circularly symmetric gaussian random process is the topic of this section. This bandlimited gaussian random process may describe either an information source or a noise source. An electrical signal that is generated by balanced photodetection (cf. Section 7.3.2) is proportional to the received lightwave amplitude, and hence is also a circularly symmetric gaussian random process. The process will be expressed as an infinite sum of circularly symmetric gaussian random processes with random expansion coefficients b j as given in (6.6.1), which may have different variances, in general.

6.6.1

Orthogonal Expansion

A bandlimited circularly symmetric gaussian random process n (t ) over an interval of duration T can be expressed in terms of a suitable expansion using a set of basis functions {ψ j (t ), j = 1, 2, . . .} defined over the interval −T /2 < t < T /2. The expansion of a signal over a finite time interval in terms of an orthonormal basis is discussed in Section 2.1.3. The expansion coefficients b j for a waveform are determined by projecting the waveform onto a basis for that signal space as defined in (2.1.64). For a random process n(t ), the expansion coefficients b j are random variables given by (cf. (2.1.64))

278

6 Random Signals

bj

=

´ T /2 −T / 2

n(t )ψ ∗j (t )dt ,

(6.6.1)

where ψ j (t ) is an orthonormal basis function of that signal space. Each random expansion coefficient b j is generated by a projection of n (t ) onto a basis function. The random process can then be expressed as n (t ) =

∞ ³ j =1

b j ψ j (t )

for − T /2 < t

< T /2.

(6.6.2)

Applying the theory of Section 2.2.2, the coefficients of this expansion are circularly symmetric gaussian random variables because n(t ), which need not be a noise process, is a circularly symmetric gaussian random process. For a basis for which the set of coefficients {b j } are pairwise uncorrelated, the b j are independent because each random variable is gaussian (cf. (2.2.26)). The resulting joint probability density function is then a product distribution of the marginal probability density functions for the random variables {b j } in the expansion given in (6.6.2). To determine the condition that produces uncorrelated gaussian random variables, use (6.6.1), and form the correlation between b j and bk ,

¾ ¿ ·b b∗¸ = ´ T /2 n(t )ψ ∗ (t )dt ´ T /2 n∗ (t )ψ (t )dt 1 1 2 k 2 2 j k j 1 −T / 2 −T /2 ´ T /2 ´ T /2 · ¸ n(t1 )n∗ (t2 ) ψ ∗j (t1 )ψk (t 2)dt 1 dt 2 = −T /2 − T / 2 Á ´ T /2 À´ T /2 Rn (t1 − t2 )ψk (t2)dt2 ψ ∗j (t1 )dt1 , = −T /2

−T /2

(6.6.3)

where Rn (t1 − t 2) = ±n(t1 )n∗ (t 2)² is the autocorrelation function of the stationary, circularly symmetric gaussian random process. For uncorrelated gaussian random variables · ∗ ¸ b j and bk , the term on the left side of (6.6.3) must satisfy the condition that b j bk = λk δ jk , where δ jk is the Kronecker impulse and λ k is a constant. For an orthonormal basis, this statement implies that the term in the parentheses on the right side must be of the form

´

T/2

− T/2

R n (t 1 − t2 )ψk (t2 )dt2

= λk ψk (t1 ).

(6.6.4)

This is an integral equation with eigenvalues λk and eigenfunctions ψk (t ). Because the autocorrelation function for a stationary gaussian random process is conjugatesymmetric (or hermitian), meaning that R n (t 1 − t2 ) = R ∗n (t2 − t1 ), the eigenfunctions ψk (t ) are orthogonal and the eigenvalues λk are real.

Additivity Property of the Entropy

Using the basis {ψk (t )} defined by the solution to (6.6.4), the expansion coefficients b j are uncorrelated jointly gaussian random variables, and hence are independent. This means that the joint probability density function f (b) of the vector b = {b1, b2, . . .} of the expansion coefficients is a multivariate gaussian product distribution. It is circularly

6.6 Balanced Photodetection with Gaussian Noise

279

symmetric if and only if the random process n (t ) is circularly symmetric. Because this is a product distribution, the additivity property of the entropy implies that the entropy of this gaussian product distribution is the sum of the entropies of each component (cf. Section 6.1.1). This property of bandlimited gaussian random processes holds for any set of independent random variables described by a product distribution. However, in general, random variables can be uncorrelated, yet not be independent. In Chapter 14, this additivity property of the entropy is used to determine the channel capacity of a bandlimited channel modeled as a bandlimited gaussian random process.

Nongaussian Random Processes

A quantum of lightwave energy is large compared with the average thermal energy, which is written as h f ¹ kT0 . For a small signal level, the central limit theorem is not applicable because of the limited number of interaction events. This situation may be governed by a nongaussian random process that exhibits higher-order correlation properties that cannot be simply expressed in terms of the first-order and second-order correlation properties that would be the case for a gaussian random process. In the nongaussian case, decorrelating the random variables by reducing the secondorder correlation function to zero is not sufficient to guarantee that the resulting uncorrelated random variables are independent. This is because the higher-order correlation properties may preclude the factoring of the joint probability density function into the product of the marginal probability density functions of each signal component for any set of basis functions derived from the second-order statistics. For this case, the concept of statistically independent signal components and the corresponding additivity property of the entropy of the components breaks down. This behavior can be observed for a lightwave signal at small signal levels. This has fundamental consequences for conveying information at low light levels, as will be discussed in detail in Chapters 15 and 16. 6.6.2

Special Cases

Several solutions to (6.6.4) can be expressed in closed form, starting with a white gaussian random process. When the gaussian random process is a white-noise process, the real two-sided power density spectrum S ( f ) is equal to N0 /2. Substituting the corresponding autocorrelation function R (τ) = ( N0 /2)δ(τ) into (6.6.4) gives

´

N 0 T/2 δ(t1 − t 2)ψk (t2 )dt2 2 − T/2

= λk ψk (t1 ).

The sifting property of the Dirac impulse (cf. (2.1.2)) states that the integral on the left is equal to ψk (t1). This means that the eigenvalues for the Dirac impulse as a kernel are λk = N0 /2. These eigenvalues are real and independent of both k and the specific form of ψk (t ). Therefore, an additive white gaussian noise process can be expanded in any orthonormal set of real basis functions, with each independent signal component having a noise variance of N 0/2. The second special case is an ideal passband filter with a passband bandwidth B, and with a white gaussian random process at the input. For a noise process, the real two-sided passband power density spectrum is given by

280

6 Random Signals

N

(

à − f )/ B Ä + rect Ã(− f − f )/ BÄ) . c c

SÂn ( f ) = 0 rect ( f 2

The corresponding complex power density spectrum for the noise process n(t ) determined using (2.2.75) is

S n ( f ) = 2N0 rect( f / B ).

The autocorrelation function of n(t ) is the Fourier transform of Sn ( f ). Using (2.1.39), we have R n (τ) = 2N0 B sinc( B τ).

(6.6.5)

The eigenfunctions of (6.6.4) for this autocorrelation function are called radial prolate spheroidal functions. The eigenvalues λ k / N 0 normalized with respect to the power density spectrum are plotted on a base-ten log scale in Figure 6.8 for several values of T B. Referring to Figure 6.8, all eigenvalues λk / N 0 for k ² T B are approximately equal to one, and rapidly transition to extremely small values for k ³ T B. For example, at T B = 10, the eigenvalues go from λ10 ≈ 1 to λ15 ≈ 10− 5. This means that the expansion given in (6.6.2) has about T B significant terms, with the coefficient of each term being an independent, identically distributed circularly symmetric gaussian random variable with variance N 0. The expansion must have at least one term, so the number of significant terms K is ¶T B ¸. As T B becomes large, only a small amount of the total energy fails to be represented by the T B significant terms. Now consider a bandlimited gaussian information source s (t ) such as the random waveform generated by modulating the output of an encoder. The preceding analysis shows that this random waveform can be approximately described by an expansion 0 TB = 1

0N / k λ

–5

TB = 5

TB = 10

TB = 15

eulavnegiE 01goL

–10

–15

–20 0

5

10

15

20

Eigenvalue Index k Figure 6.8 Plot of the base-ten logarithm of eigenvalues λ k / N0 of (6.6.4) with Rn (τ) given by (6.6.5) for several values of T B.

6.6 Balanced Photodetection with Gaussian Noise

281

of the form given in (6.6.2) with the expansion coefficients forming a sequence {sk } of independent, identically distributed circularly symmetric gaussian random variables. Accordingly, a modified form of the sampling theorem for a random waveform s (t ) can be defined by equating the passband bandwidth B to twice the maximum baseband bandwidth and equating the sampling time Ts to 1/ B , which is the coherence timewidth τc of the bandlimited gaussian random process (cf. (2.2.59)). As T goes to infinity, the random information-bearing waveform s (t ) can be written as (cf. (2.1.80))

W

s (t ) =

∞ ³

j =0

W t − j ),

s k sinc (2

(6.6.6)

where the sinc function is the Fourier transform of the ideal rectangular filter used for bandlimiting, where = 1/2Ts is the Nyquist sampling rate, and where the encoder generates an independent circularly symmetric gaussian random variable s k from the user data at a rate R = B = 2 = 1/ Ts . Each sample s k in the sequence {s k } can transmit one symbol.

W

W

6.6.3

Direct Photodetection

An exact expression for the directly photodetected electrical signal generated from a bandlimited gaussian random process will be obtained directly from the expression for the lightwave signal energy E. This derivation validates the approximations of Section 6.5.1. Using (6.6.2) and the eigenfunctions {ψk (t )} of (6.6.4), the random lightwave energy E for an additive white gaussian noise process over an interval of duration T can be written as E

=

´ T /2

−T /2

n(t )n∗ (t )dt

= = =

´

T/2

∞ ³

− T / 2 j =1 ∞ ∞³ ³ j =1 k =1

∞³ ∞ ³

j =1 k =1

b j b∗k

∞ ³

´ T /2 −T /2

k=1

b∗k ψk∗ (t )dt

ψ j (t )ψk∗ (t )dt

b j b∗k δ jk

∞µ µ ³ = µb k µ2 , k =1

b j ψ j (t )

(6.6.7)

where the orthonormality of the basis functions has been used. For each k, the coefficient bk is a circularly symmetric gaussian random variable with a componentwise variance σk2 equal to Nk /2. Therefore, the random variable for each

µ µ2

lightwave energy component E k = µbk µ is an exponentially distributed random variable with mean ± E k ² = ±|bk |2 ² = λk = 2σk2 = Nk . The values N k depend on the form of bandlimiting. They need not be equal.

282

6 Random Signals

For the special case of an ideal bandlimited filter, Figure 6.8 shows that there are about K = ¶T B ¸ significant terms in the expansion given in (6.6.7), with all of the eigenvalues λk / N 0 for k ² TB approximately equal to one. Then, expression (6.6.7) can be approximated as K ³ µµ µµ2 bk , E=

(6.6.8)

k =1

where K = ¶ T B ¸ and the members of the set {|bk |2 } of the squared magnitudes of the expansion coefficients are independent identically distributed exponential random variables. This expression validates the approximate expression given in (6.5.2) with |bk |2 = E k

Characteristic Function Because the electrical signal generated from direct photodetection over a finite time interval of duration T is proportional to the lightwave energy, (6.6.7) can be directly compared with the expressions derived in Section 6.5.1. Each exponential random variable |bk | 2 has a characteristic function of the form C k (ω) = (1 − i ωλk )−1, which is the Fourier transform of the exponential probability density function. Because the random variables {|bk | 2} expressed in a suitable basis are independent, the joint probability density function of the total energy f ( E ) is a k-fold convolution of the k marginal expo1 − E /λk nential probability density functions f (E k ) = λ− , one for each component. k e Therefore, the characteristic function C (ω) of the total lightwave energy f ( E ) is the product of the individual characteristic functions Ck (ω) as given by C (ω) =

∞ Å

k =1

Ck (ω) =

∞ Å

k =1

(1 − i ωλk )−1.

This characteristic function can be inverted using a partial-fraction expansion to find f ( E ). This method of inversion is discussed as an end-of-chapter exercise.

Probability Mass Function When the eigenvalues λk of (6.6.4) are distinct, the probability density function f (E ) for the lightwave energy, as determined using a partial-fraction expansion, can be written as the mixture f (E) =

∞ ³ k =1

The relative weighting factor Rk

=

1 − E /λk Rk λ− , k e

∞( Å j =1 j µ= k

1 − λ j /λk

)−1

E

,

≥ 0.

(6.6.9)

(6.6.10)

is determined for each exponential term by a partial-fraction expansion and need not ∑ R = 1. When the be positive. Because f ( E ) is a probability density function, ∞ k =1 k

6.6 Balanced Photodetection with Gaussian Noise

283

Table 6.4 Probability distribution functions for a bandlimited gaussian random process expressed using wave optics and photon optics. The probability mass function, if defined, is the Poisson transform of the corresponding wave-optics probability density function for the lightwave energy.

Amplitude of one signal component

Magnitude of one complex signal component

Energy or photon number of one complex signal component

Energy or photon number of K uncorrelated signal components

Wave-optics signal model

Zero-mean gaussian

Rayleigh (6.2.9)

Exponential (6.2.1)

Sum of K Infinite weighted exponentials sum of (Gamma) (2.2.50) exponentials (6.6.9)

Photon-optics signal model

Undefined

Undefined

Gordon (6.3.11)

Sum of K Gordon Infinite weighted distributions sum of Gordon (Negative distributions binomial) (6.5.11)

Energy or photon number of correlated signal components

eigenvalues are not distinct, a modified form of partial-fraction expansion must be used to obtain f ( E ). Applying the Poisson transform term-by-term to (6.6.9), the discrete probability mass function p(m) for the number of counts is a sum of weighted Gordon distributions, each of the form of (6.3.11). A summary of the probability density functions for a bandlimited circularly symmetric gaussian random process is provided in Table 6.4.

Deterministic Signal-Plus-Bandlimited-Gaussian-Noise Process

The probability density function of a random signal that is the sum of a deterministic signal s (t ) and a bandlimited gaussian noise process n(t ) defined over the interval −T /2 < t < T /2 can be determined by expanding the deterministic signal s (t ) in the same basis {ψk (t )} that was used to decorrelate the bandlimited gaussian noise process so that s (t ) =

∞ ³

k =1

sk ψk (t ).

Therefore, the energy E of the random signal s(t )+ n(t ) is a random variable written as E

=

∞ ³ k =1

Ek

=

∞ ³

k =1

|sk + λk u|2.

(6.6.11)

This expression has the same form as (6.6.7), with the exponentially distributed random variable | bk | 2 replaced by a noncentral chi-square random variable with two degrees of freedom, where λk u = bk , with u being a zero-mean, unit-variance, circularly symmetric gaussian random variable. Because the probability density function f (E ) of E is a product distribution, the characteristic function C E (ω) is the product of the individual terms C k (ω) (cf. (2.2.16)). Using (2.2.43), we have

284

6 Random Signals

Table 6.5 Probability distribution functions for the sum of a deterministic signal and a bandlimited gaussian random process

Signal only

One independent component

K independent components

Correlated components

Wave optics

Dirac impulse

Noncentral chi-square with two degrees of freedom (6.5.5)

Noncentral chi-square with 2K degrees of freedom (6.5.5)

Inverse Fourier transform of (6.6.12)

Photon optics

Poisson (6.2.26)

Laguerre with K = 1 (6.5.9)

Laguerre (6.5.9)

Poisson transform of wave-optics distribution

C E (ω) =

∞ Å k =1

C k (ω) =

∞ Å k =1

±

1 iω|sk | 2 exp (1 − iωλk ) 1 − iωλk

²

,

(6.6.12)

where | sk | 2 is the lightwave signal energy in the kth signal component. The probability density function f ( E ) is the inverse Fourier transform of C E (ω). For the special case of K components that are independent and identically distributed, C E (ω) reduces to the characteristic function of a noncentral chi-square probability density function with 2K degrees of freedom (cf. (2.2.43)), where N = 2K , A 2 = K |sk |2 , and λk = 2σ 2. A summary of the probability distribution functions for the sum of a deterministic signal and bandlimited gaussian noise for wave optics and photon optics is provided in Table 6.5. 6.6.4

Spatially Correlated Modes

A random lightwave signal consisting of a zero-mean circularly symmetric gaussian random process in each of M spatial modes can be treated as a single spatiotemporal random process. Define a multivariate gaussian random variable r with components r m by simultaneously sampling the random process in all m modes. Each sampled component r m is a zero-mean, circularly symmetric gaussian random variable. The correlation among the spatial components is described by the complex covariance matrix V defined in (2.2.30b). This covariance matrix V is the spatial equivalent of the temporal autocorrelation function R (τ). To decorrelate the spatial components, choose a basis {ϕ m } that diagonalizes the covariance matrix V. Such a basis always exists. Then the multivariate gaussian random variable r can be written as (cf. (6.6.2)) r=

M ³

m =1

a m ϕm.

(6.6.13)

The components {am } are uncorrelated, and so the gaussian random variables are independent. The basis {ϕ m } that produces uncorrelated spatial components satisfies the eigenvalue equation

Vϕ m = μm ϕ m ,

(6.6.14)

6.7 Bandlimited Shot Noise

285

where μm = ±|am | 2² = 2σm2 = Nm is the eigenvalue for each spatial subchannel defined by a uncorrelated spatial component. This eigenvalue equation is the discrete equivalent of the integral eigenvalue equation for a bandlimited gaussian random process in a single spatial mode given in (6.6.4). For distinct eigenvalues, by following the same steps that led to (6.6.9), the joint probability density function f ( E ) for the lightwave energy is found to be f (E ) =

M ³

m =1

1 − E /μm Rm μ− m e

for E

≥ 0,

(6.6.15)

where Rm is the relative weighting factor for the mth spatial subchannel with the form given by (6.6.10). This decomposition yields M parallel, independent spatial subchannels. The mth sub1 E /μm , channel has an expected value μ m , a probability density function f ( E m ) = μ− m e and an entropy 1 + μ m (cf. Table 6.1). This entropy describes the ability of each spatial subchannel to convey information. This result, along with the temporal probability density function given in (6.6.9), will be used to determine the channel capacity in Chapter 14.

6.7

Bandlimited Shot Noise The continuous electrical waveform r (t ) generated by direct photodetection consists of the sum of a sequence of pulses that arrive randomly in time, r (t ) = g(t ) ± h (t )

=



³ ´

h(t

− t ´),

(6.7.1)

where g(t ) = ´ δ(t − t ´ ) is the photoelectron arrival process (cf. Table 6.2), and h (t ) = hs (t ) ± y (t ) is the impulse response of the bandlimited receiver composed of the photodetector impulse response h s (t ) and the detection-filter impulse response y (t ). The relationship between these signals is shown in Figure 6.9. This section discusses how the bandlimiting affects the probability density function of the shot noise. Additive noise is not considered in this section. First, in Section 6.7.1, a widely used approximate expression is derived, in which a sample of the bandlimited shot noise is modeled as a gaussian probability density

g(t) Photoelectron arrival process

Photodetector

Detection Filter

hs (t)

y(t)

r(t)

r Sample at T

Filtered output

()

Figure 6.9 The transition between the discrete photoelectron-arrival process g t , the bandlimited

continuous electrical signal after direct photodetection, the filtered signal r (t ) after the detection filter, and the random sample r .

286

6 Random Signals

function. Then, in Section 6.7.2, a more accurate expression is derived for the probability density function of a sample of bandlimited shot noise. That expression is stated in terms of the characteristic function.

6.7.1

Approximate Analysis

Approximate expressions are now derived for the power density spectrum of the bandlimited electrical signal generated using direct photodetection, and for the probability density function of a sample. A system in which shot noise is considered dominant is called a shot-noise-limited system. These approximate expressions are derived as a special case of a set of general relationships for the mean ±r ² and variance σr2 of the shot-noise-limited probability density function given by

±r ² = R(t ) ± h (t )|t = T , µ σr2 = R(t ) ± h 2(t )µµt = T ,

(6.7.2a) (6.7.2b)

where R(t ) is the arrival rate (cf. Table 6.2). This pair of statements, which taken together comprise a statement called Campbell’s theorem, will be developed in the next section. In this section, the incident lightwave signal power P is constant, as is the corresponding arrival rate R. For a photodetector with no ¹ internal gain, the integral of the impulse response is equal to the electron charge so that 0∞ h (t )dt = e. Using this expression and a constant arrival rate R, the application of (6.7.2a) yields r

R

=R

´∞ 0

h (t )dt

= e R,

(6.7.3)

where r = Re = P is the constant photocurrent generated from the constant waveoptics power P, and h(t ) is the impulse response of the receiver shown in Figure 6.9. Similarly, using (6.7.2b), the variance is

σr2 = R

´∞ 0

h2 (t )dt .

(6.7.4)

The integral on the right will be expressed in terms of the noise bandwidth B N given in ¹∞ (2.2.68b). Using 0 h (t )dt¹ = e to determine the normalization constant G, the right ∞ side of (6.7.4) is written as 0 h2 (t )dt = 2e2 BN so that

σr2 = 2e2 R B = 2er B . N

N

(6.7.5)

For a practical lightwave receiver and a moderate lightwave power, the timewidth of the receiver impulse response is large compared with the average time R−1 between arrivals. Under this condition, many independent arrival events are within the timewidth of the impulse response. Asserting the central limit theorem, the probability density function f (r ) for the sample approaches a gaussian probability density function with the

6.7 Bandlimited Shot Noise

287

mean given by (6.7.3) and the variance given by (6.7.5). The corresponding (one-sided) power density spectrum is determined using (6.4.17) and (2.2.63), and is

S( f ) =

ÆR ÇÈ

e P | H ( f )|2

É

Bandlimited shot noise

R

+ ÆR2 P2 |HÇÈ(0)|2 δ( f É) ,

(6.7.6)

Zero-frequency term

S

where is the responsivity, P ( f ) = P 2 δ( f ) is the power density spectrum of the constant wave-optics power, and H ( f ) is the transfer function of the receiver with a noise-equivalent bandwidth B N . Suppose that the receiver noise-equivalent bandwidth B N is large compared with the signal bandwidth Bsig so that the frequency response H ( f ) of the receiver over the signal bandwidth can be approximated as a constant. The power density spectrum of the bandlimited shot noise, which is the first term in (6.7.6), is then ( f ) ≈ e P = er, where the constant frequency response H ( f ) is set to one. When the inequalities

S

Bsig

R

ºB ºR

(6.7.7)

N

are satisfied, the bandlimited shot noise can be accurately modeled as a white gaussian noise process with a signal-dependent (one-sided) power density spectrum given by Nshot

= 2er = 2e R P,

(6.7.8)

as asserted by the central limit theorem. The condition given in (6.7.7) is often expressed in terms of the lightwave power P . The shot noise is then treated as white gaussian noise when P is much larger than eB N / (cf. Table 6.2 and (6.2.23)). As an example, when B N = 20 GHz and = 1 A/W, P is equal to eB N / when P ≈ −55 dBm. For a significantly larger lightwave power, the shot noise can be accurately modeled as a white gaussian noise process with a signaldependent power density spectrum.

R

R

6.7.2

R

General Analysis

An expression for the bandlimited shot noise with no constraint on the form of the bandlimiting and no constraint on the received power level is derived in this section. When there is no constraint on the received power level, there may be too few independent arrival events within the timewidth of the impulse response to assert the central limit theorem. In this case, the shot noise is expressed in the form of the characteristic function Cr (ω) for the probability density function f (r ) of the sample. To proceed, let f (r |m) be the conditional probability density function of the sample . value r given that m photoelectron arrivals have occurred. The continuous signal q = q (t ) due to a single arrival event is an electrical pulse given by

= h(t − τ), (6.7.9) where the observation time t is a parameter, τ is the random arrival time, and h (t ) is the q

impulse response of the receiver.

288

6 Random Signals

The conditional random variable r | m for the sample r at time t, given m photoelectron arrivals, is m ³

r |m =

´=1



m ³

=

− τ ´),

h(t

´=1

(6.7.10)

where τ ´ is the random arrival time for the ´th photodetection event. Because the photodetection events are independent, the random variables τ ´ are independent. Therefore, the conditional probability density function f (r | m) is the m-fold convolution of m independent and identical probability density functions f q (r ), each corresponding to the output pulse generated from a single photoelectron. This conditional probability density function can be written as f (r |m) = f q (r ) ± f q (r ) ± · · · .

Æ

ÇÈ

m times

É

(6.7.11)

When one is given f (r |m), the unconditioned probability density function of r is determined using (2.2.10), and is f (r ) =

∞ ³ =

m 0

p(m) f (r | m),

(6.7.12)

where p(m) is the Poisson probability mass function describing the distribution of the primary photoelectron events. When there is internal gain within the device, these primary events may lead to secondary photoelectron events. The probability density function f (r ) for the sample given in (6.7.12) cannot be summed to obtain a closed form, but the characteristic function Cr (ω), which is the Fourier transform of f (r ), can be determined. Taking the Fourier transform of each side of (6.7.12), and using (2.2.15), gives Cr (ω) = where C r | m (ω) =

∞ ³ =

m 0

p(m)Cr |m (ω),

´∞ −∞

(6.7.13)

f (r |m)eiωr dr

is the conditional characteristic function. This is the Fourier transform of (6.7.11). The m-fold convolution in (6.7.11) transforms into the m-fold product of the Fourier transform of m identical terms. Using (2.2.14), Cr |m (ω) can be written as C r | m (ω) =

=

´∞

−∞

f (r | m)ei ωr dr

m ´ ∞ Å

´=1 −∞

eiωr f (r )dr

6.7 Bandlimited Shot Noise

289

· ¸ = eiωq m =. x m, · ¸ · ( )¸ where the last line defines x = eiωq = exp iωh (t − τ) (cf. (6.7.9)) as an auxiliary random variable, with τ being the random arrival time and h(t ) being the impulse response. Now substitute C r | m (ω) into (6.7.13) and use a Poisson probability distribution with mean E for p(m). This yields ∞ Em ³ Cr (ω) = e− x m m! m=0 ∞ (E x )m ³ − E =e m! m =0 =e(−) ½ »» º º¼ (6.7.14) = exp E e iωh(t −τ) − 1 , E

E x 1

where series expansion of e x has been used to collapse the sum. Using the probability density function f (τ) = R(τ)/E of the random arrival time τ (cf. (6.4.12)), the expectation of eiωh (t −τ ) is (cf. (2.2.3))

¼

e

iω h(t −τ)

½ ´∞ = f (τ)ei ωh (t −τ) dτ −∞ ´ ∞ R(τ) = eiωh (t −τ) dτ. E −∞

(6.7.15)

Substituting (6.7.15) into (6.7.14) and using the normalization condition for f (τ) to ¹∞ write (1/E) −∞ R(τ)dτ = 1 gives the result Cr (ω) = exp

±´ ∞

−∞

(τ)

R

º iωh (t −τ) e

» ² − 1 dτ .

(6.7.16)

This is the characteristic function of the sample r of the filtered shot noise given both a known arrival rate R(t ) and a known impulse response h (t ).

Campbell’s Theorem

For a known arrival rate R, the mean and variance of r can be determined from (6.7.16) using (2.2.17),

. r = ±r ² = =

µ

µ 1 d Cr (ω)µµ i dω ω=0

´∞

−∞

R

(τ)h(T − τ)dτ,

(6.7.17)

where T is the duration of a signaling interval. With similar reasoning for the variance, the mean and the variance are now given by

290

6 Random Signals

µ ±r ² = R(t ) ± h (t )µt =T , µ σr2 = R(t ) ± h 2(t )µt = T .

(6.7.18a) (6.7.18b)

The pair of statements (6.7.18) is known as Campbell’s theorem. This theorem was used earlier in (6.7.2).

Internal Photodetector Gain

An avalanche photodiode (cf. Section 7.3.1) has a random internal photoelectron gain G. For this device, let k be the number of primary photoelectrons generated by the photodetection of the lightwave signal and let m be the random number of secondary photoelectrons after the internal photoelectron gain process, where ±m² = ±k²±G² because k and G are independent. The continuous signal q (t ) at the output of the internal gain process due to a single primary photodetection event given in (6.7.9) is modified to read q (t ) = Gh (t

− τ),

(6.7.19)

where the impulse response for each primary photoelectron need not be equal to the impulse response in the absence of gain. Setting t = T and replacing h(t −τ) in (6.7.16) with Gh (t − τ) gives Cr (ω) = exp

±´ ∞

−∞

R

(τ)

º¼ iωGh (T −τ)½ e

» ² − 1 dτ

where the expectation refers to the random internal gain G. Then Cr (ω) = exp

±´ ∞

−∞

R

²

(τ)(C G (ω h(T − τ)) − 1)dτ ,

(6.7.20)

where (2.2.14) has been used to write ±exp(iωG h(T − τ))² = C G (ωh(T − τ)), with C G (ω) being the characteristic function of the probability density function f (G ) of the random internal photoelectron gain G for a single primary arrival. When the arrival rate R(t ) is random, the characteristic function is averaged over the random arrival rate. When the arrival rate R(t ) is known, the mean and the variance for the sample value r are determined using a modified form of Campbell’s theorem (cf. (6.7.18)) given by 10 (6.7.21a) ±r ² = ±G²R(t ) ± h(t )|t =T , 2 2 2 (6.7.21b) σr = ±G ²R(t ) ± h (t )|t =T . A simplified form for the probability mass function p (m) for the random number of

secondary photoelectrons m generated after the internal amplification process can be determined for a constant arrival rate with R(t ) = R and a rectangular impulse response h (t ) = rect(t / T ). For this case, the integral inside the argument of the exponential function in (6.7.20) evaluates to E(CG (ω) − 1), where E = RT is the mean number of primary photodetection events. Setting r = m, the characteristic function C m (ω) for the number of secondary photoelectrons m generated by amplification is 10 See Section 6.6 E of Helstrom (1991).

6.9 Historical Notes

291

Cm (ω) = eE (CG (ω)−1)

= e−E

∞ Ek ³ C (ω)k . k! G k=0

(6.7.22)

The term C G (ω)k is the conditional characteristic function of the probability density function of the random gain G, given k primary arrivals. The inverse Fourier transform of CG (ω)k is the conditional probability p(m|k) that m secondary photoelectrons are generated by the internal amplification process, given k primary arrivals. Taking the inverse Fourier transform of both sides of (6.7.22) and using the two Fourier transform pairs Cm (ω) ←→ p(m) and CG (ω)k ←→ p(m|k) gives

∞ Ek ³ p(m|k). (6.7.23) k! k=0 ∑ This expression, summarized as p(m) = k p(k) p(m|k), is the unconditioned probabilp(m) = e−E

ity mass function of the number of secondary photoelectrons m generated by the internal amplification process. This unconditioned probability mass function is expressed in terms of the conditional probability mass function p(m|k) and the probability mass function p(k) for the number of primary arrivals. The probability mass function describing the primary arrival process is Poisson distributed only when the arrival rate R is a constant. This means that the incident lightwave signal has only photon noise. When there is additional noise in the lightwave signal, the primary photoelectron-arrival rate R is random. For this case, the Poisson probability mass function in (6.7.23) must be replaced by the Poisson transform of the continuous probability density function f (E) describing the additional randomness in the incident lightwave signal.

6.8

References Probability distributions of noise processes involving gaussian random variables are discussed in Simon (2007). Probability distributions of point processes are presented in Snyder and Miller (2012). The relationship between thermal noise and quantum noise is discussed in Oliver (1965) and in Marcuse (1980). Treatments of quantum noise are provided in Loudon (2000) and in Henry and Kazarinov (1996). A discussion of the relationship between photon noise and quantum noise is given in Shapiro (2012). Comprehensive treatments of photon optics are provided in Saleh (1978) and in Mandel and Wolf (1995). Mathematical properties of the Poisson transform, which is also called a mixed Poisson distribution, are presented in Johnson, Kemp, and Kotz (2005). Classical random lightwave fields are covered in Goodman (2015). The Laguerre probability mass function is covered in Lachs (1965), and in Gagliardi and Karp (1976).

6.9

Historical Notes The concept of entropy as a measure of randomness was firmly established within statistical mechanics by Gibbs (1914). Later, Shannon used a scaled form of the entropy to

292

6 Random Signals

quantitatively measure the information content of an information source. The origin of Shannon’s use of the word “entropy” is discussed in Price (1982). The deep connections between statistical mechanics and information theory are discussed in Jaynes (1957a, b) and influenced the development of information theory as applied to quantum optics. The mean and variance of shot noise were first reported by Campbell (1909a, b). The foundation of the modern analytical framework for both shot noise and gaussian noise traces back to a series of papers by Rice (1944, 1945). The methods introduced by Rice were adapted to photon optics by Mandel (1958, 1959). In particular, Mandel (1959) developed a formula to describe the photon statistics of directly photodetected light, which was later named the Poisson transform by Wolf and Mehta (1964). That work also developed the inverse Poisson transform. In this book, the Poisson transform is viewed as a composite of quantum uncertainty from photon noise and statistical uncertainty from classical sources of randomness. These fundamental statistical methods were further developed in Slepian and Pollak (1961) to analyze bandlimited gaussian random processes using prolate spheroidal wave functions. Later, in an enlightening paper Slepian (1976) clarified profound conceptual issues that arise when mathematical models of bandlimited and timelimited functions are applied to physical processes such as bandlimited noise.

6.10

Problems 1 Derivation of the Gordon distribution and entropy (a) Starting with p m K um , derive

( )=

p ( m) =

±

1

1+S

S

1+S

²m

,

∑ m f (m) = S and ∑∞ f (m) = 1. satisfying the constraints ∞ m =0 m =0 (b) Using the form of p(m) given in part (a), and the definition of the entropy H given by H

= −k

∞ ³ =

m 0

p(m)loge p(m),

derive the entropy of a Gordon distribution, which is stated in Table 6.1. 2 Maximum-entropy distribution without a mean constraint Following the procedure used to derive the Gordon probability mass function given in (6.1.5), but removing the finite mean constraint on the probability distribution function, show that the maximum-entropy distribution on a finite number of states is the uniform probability density function given by

f (m) = where M is the number of states.

1 M

,

6.10 Problems

293

3 The Bose–Einstein probability mass function and the Boltzmann probability density function (a) Starting with (6.1.9), derive an expression for the form of the probability density function of the energy f E , with E h f m. (b) Is the resulting probability density function a valid continuous probability density function? Explain your answer. (c) Take the limit of the expression in part (a) as h f goes to zero and show that the resulting expression is the Boltzmann probability density function.

( )

=

4 Noise figure The noise figure of an amplifier is defined in (2.2.84). Given an amplifier with a noise-equivalent bandwidth of 10 GHz and a noise figure of 10 dB over this bandwidth, determine the equivalent noise power at the input to the amplifier for noise that is thermally generated at a temperature of 290 K. 5 Coherence timewidth and bandwidth The root-mean-squared coherence timewidth rms of the autocorrelation function R (cf. (2.1.30)) has a different value than that of the coherence timewidth c defined by (2.2.59) and repeated here as

τ

(τ)

τc =.

τ

´∞

1

| R(τ)|2 dτ, | R(0)|2 −∞ where the autocorrelation function R (τ) is the Fourier transform of the power density spectrum S ( f ) (cf. (2.2.55)). Determine the relationship between the rootmean-square coherence timewidth τrms , and the coherence timewidth τc for the following power√density spectra in (a) and (b). (a) S ( f ) = (1/ 2πσ) e− f / 2σ . (b) S ( f ) = e−| f | . 2

2

(c) Citing a specific example, discuss why one definition of the coherence timewidth might be preferred over the other definition.

6 Probability density function of spontaneous emission An unpolarized spontaneous emission noise source n t has independent polarization noise components, each described by a circularly symmetric gaussian random variable with the same variance. The total noise in both polarization components has a constant two-sided power density spectrum N0 2. The noise is bandlimited using an ideal rectangular passband optical filter ho t with a complex-baseband transfer function Ho f given by

()

()

( )

Ho ( f ) =

Ê

1 0

/

for | f | < B /2 otherwise.

The filter does not affect the polarization. (a) Determine the probability density function of a sample of the spontaneous emission noise process n(t ) before filtering and the probability density function of a sample of the spontaneous emission noise power Pn (t ) = 21 | n(t )|2 before filtering.

294

6 Random Signals

(b) Determine the autocorrelation function R n (τ) of the lightwave amplitude for the filtered spontaneous emission noise. (c) Determine the autocorrelation function R Pn (τ) of the lightwave power for the filtered spontaneous emission noise. (d) Determine the coherence timewidth τc defined in (2.2.59) for the filtered spontaneous emission noise. 7 Filtered spontaneous emission The spontaneous emission noise in a single polarization of a lightwave is bandlimited using an ideal rectangular passband optical filter h o t with a complex-baseband transfer function Ho f given by

()

( )

Ho ( f ) =

Ê

1 0

for | f | < B /2 otherwise.

The resulting filtered lightwave noise power has an expected value ± P ². It is detected by an ideal photodetector with impulse response h(t ) equal to δ(t ). (a) Determine the power density spectrum S g ( f ) of the arrival process g(t ) within the photodetector. (b) Determine the power density spectrum Sr ( f ) of the filtered electrical signal r (t ) when the photodetected electrical signal is filtered by a detection filter with an impulse response y (t ) = e−t /τ u (t ). (c) Under what conditions are the statistics of the sample value r after the detection filter given by the following. i. An exponential probability density function. ii. A gamma probability density function. iii. A gaussian probability density function. 8 Bandlimited noise The electrical noise power generated by direct photodetection given in (6.5.3) was derived for B c 1, where c is the coherence timewidth defined in (2.2.59) and B is the passband noise-equivalent bandwidth (cf. (2.2.78)). The relationship between B and c is valid when the lightwave noise-suppressing filter is an ideal bandpass filter in the form of the rect function. (a) Derive a corresponding expression for B c for a lightwave noise-suppressing filter defined by a gaussian function with a root-mean-square width equal to B. (b) Quantitatively explain how the value of B c affects the statistics of the sample determined over an interval of duration T .

τ =

τ

τ

τ

σ

τ

9 Degrees of freedom of lorentzian-filtered noise Let the autocorrelation function of the noise process n t be given by

Rn (τ) = N e−α|τ| .

()

(a) Show that this autocorrelation function is generated by filtering white noise with a filter that has the transfer function 2α H (ω) = 2 . 2

α +ω

6.10 Problems

295

A filter of this form is called a lorentzian filter. (b) Write the integral in (6.6.4) using symmetric limits. Separate that integral into two regions and differentiate twice to produce a second-order differential equation of the form d2ψk (t 1) dt12

+ b2k ψk (t1) = 0.

Determine the expression for bk in terms of λ k , α, and N . (c) Now assume a solution of the form of

ψk (t1 ) = c1 eibt + c2e−ibt . (d) (e) (f) (g)

Substitute this form into the original integral equation and perform the integration for each of the two regions. Show that the resulting expression can be satisfied for all time only if c1 = c2 or if c1 = −c2 . Setting t1 = T , derive the expression that must be satisfied if c1 = c2 . Setting t1 = T , derive the expression that must be satisfied if c1 = −c2 . A solution to either of the two previous equations will produce an eigenvalue. By combining these two equations, show that

±

tan(bk T ) +

bk T αT

²±

tan(bk T ) −

α T ² = 0.

bk T

(h) Using the relationship between bk and λ k derived in part (b), plot the eigenvalues λk on a log plot. Compare the distribution of the eigenvalues for lorentzianfiltered noise for α T = 5 with the distribution of eigenvalues for an ideal bandpass filter for T B = 5 given in Figure 6.8. (i) Comment on the distribution of the eigenvalues for both kinds of filters with respect to the distribution of the entropy, which defines the ability of each degree of freedom to convey information. 10 Sum of Poisson random variables Prove that if the sum of two random variables, m3 m1 m2, is Poisson and either of the two summands, m1 or m2, is Poisson, then the other summand is Poisson as well.

= +

11 Circular symmetry A product bivariate random variable with bivariate probability density function f x y f x f y is known to be circularly symmetric in the x y coordinate system. Does this mean that it must be a bivariate gaussian random variable?

( , )= ( ) ( )

(, )

12 Derivation of the negative binomial probability mass function Using the integral

´



0

º »−(K +m) −1 1 e−E(1+Nsp ) E( K −1+m) dE = 1 + N− ¶( K sp

+ m),

show that the Poisson transform of a gamma probability density function is equal to the negative binomial probability mass function.

296

6 Random Signals

13 Derivation of the mean and the variance of the number of counts Starting with

C m (ω) =

´∞

eE

(e iω −1)

f (E)dE

¼ ( )½ = eE e ω −1 º º »» = C −i e i ω − 1 , 0

i

E

as given in (6.3.7), derive the mean and variance of p(m) in terms of ±E² and σE2 . The probability mass function p(m) is the Poisson transform of the probability density function f (E) for the mean number of counts. The result should agree with the terms in (6.3.8). 14 Partial-fraction expansion (a) Solve for the coefficients a and b of the following partial-fraction expansion of a characteristic function C

(ω)

1

a b = + (1 − i ωλ1 )(1 − iωλ2 ) 1 − iωλ1 1 − i ωλ2 . Determine the corresponding probability density function f ( x ). C (ω) =

(b) (c) Generalizing the results from parts (a) and (b), show that for a characteristic function given by C (ω) = with distinct written as

∞ Å

k=1

(1 − iωλk )−1

λk , the corresponding probability density function f (x ) can be f (x ) =

∞ ³ k =1

where Rk

= λk

1 − x /λk e , Rk

∞( Å j =1 j µ=k

x

1 − λ j /λ k

≥ 0,

).

15 Filtered shot noise and thermal noise An electrical waveform r t is generated by the direct photodetection of a random lightwave signal with a nonstationary arrival rate given by R t e−t / T u t , where u t is the unit-step function. The photodetector has an impulse response given by h t e−t /4T u t . Using Campbell’s theorem, derive the mean and the variance of the output electrical waveform r t .

()

() ( )=

()

()=

()

()

6.10 Problems

297

16 Noise power autocorrelation function The Isserlis theorem states that the expectations of four jointly gaussian random variables, X 1 , X 2 , X 3, and X 4, satisfy

± X1 X 2 X 3 X 4 ² = ±X 1 X 2²± X3 X 4 ² + ± X1 X 3 ²± X 2 X 4 ² + ± X 1 X 4 ²± X 2 X 3².

Using the asserted Isserlis theorem, prove (6.4.2) for circularly symmetric gaussian random variables.

7

Lightwave Components

The remarkable performance of lightwave communication systems must also be credited to the equally remarkable components that couple, generate, amplify, switch, and photodetect lightwave signals. These lightwave components have new and interesting physical attributes that are not always seen in the corresponding components used in lower-frequency communication systems. As an example, the use of an antenna to convert a radio-frequency electromagnetic field into a passband electrical signal preserves both the amplitude and the phase of the incident electromagnetic field within the electrical signal. This process can be fully described using the continuous waveoptics signal model. In contrast, an appropriate description of the photodetection of a lightwave signal at small signal levels requires the photon-optics signal model, which incorporates the discrete nature of the lightwave signal. The photon-optics model also must be used to analyze the generation, amplification, and photodetection of lightwave signals. These processes are distinctly different than the corresponding operations used for lower-frequency signals. We are interested in understanding the characteristics of conventional lightwave components, emphasizing how they differ from their lower-frequency counterparts.

7.1

Passive Lightwave Components Passive lightwave components are those that do not require an energy source. They are used to filter, route, combine, and separate the power, wavelength, and polarization of lightwave signals. The operating principles of these devices can be derived using wave optics. The desired functions are achieved using a combination of absorption, diffraction, interference, waveguiding, and mode coupling.

7.1.1

Lightwave Couplers

A lightwave coupler is a passive linear device that is designed to manipulate one or more input lightwave signals to produce one or more output lightwave signals. A simple example of a coupler with one input and one output is a coupler designed to couple light from a lightwave source into an optical fiber. A generic lightwave coupler with two inputs and two outputs is shown in Figure 7.1. In this coupler, two waveguides are brought into close proximity so that the mode

7.1 Passive Lightwave Components

(a)

s1(t)

299

z1 (t)

n2 n1 n2 n1 n2

s2(t)

z2 (t)

(b) n2

n1

n1

n2

n2

(c)

Figure 7.1 (a) A schematic representation of a symmetric directional coupler, showing the two

input signals and the two output signals. (b) The index profile for two step-index waveguides fabricated on the same substrate. (c) The mode for each isolated waveguide, showing the overlap between the modes, which defines the properties of the coupler.

in the first waveguide can overlap the mode in the second waveguide as is shown in Figure 7.1(c). The properties of the coupling between the two modes induced by this overlap depend on the structure and size of each waveguide, the distance between the waveguides, the length of the coupling region in which the fields overlap, and the wavelength. The coupler may be constructed by fusing two sections of optical fiber together or may be fabricated on a substrate, as is shown in Figure 7.1(b). The basic structure of Figure 7.1 can be adapted to a variety of functions. To model the function of a linear coupler mathematically, let s=

± s (t ) ² 1 s2 (t )

be a column vector representing the complex amplitudes of the lightwave signal in each of the two input modes. Let z=

± z (t ) ² 1 z2 (t )

be a column vector representing the complex amplitude of the lightwave signal in each of the two output modes. The two output signals can be expressed as z = Ts, where

T =.

±

t11 t21

t12 t22

²

,

(7.1.1)

which is known as the coupling matrix, describes the transformation from the complex amplitudes in the two input modes to the complex amplitudes in the two output modes. The elements of the coupling matrix T depend on the coupler geometry and can be determined using coupled-mode theory (cf. Section 3.4).

300

7 Lightwave Components

A coupler that is designed to preferentially direct the two inputs into the two outputs is called a directional coupler. When the two input and two output waveguides all have the same cross-sectional geometry and length, as is shown in Figure 7.1, the resulting structure is called a symmetric directional coupler. The parameters of the directional coupler can be chosen so that the output signals are related to the input signals by the complex coupling matrix 1

±

1

T= √

2

1 i

i 1

²

.

(7.1.2)

The output column vector z = T s is now given by

± z (t ) ² 1 ± s (t ) + is (t ) ² 1 = √ is1 (t ) + s 2(t )) . z2 ( t ) 1 2 2

(7.1.3)

This type of directional coupler is depicted symbolically as shown in Figure 7.2(a). An alternative representation of the coupling matrix given in (7.1.3) has no phase shift between s1 (t ) and s2 (t ) at the output port z 1(t ), and has a phase shift by π between s1 (t ) and s2(t ) at the output port z2 (t ). The corresponding coupling matrix is 1

T=√

2

±1 1

with the output column vector z = Ts given by

1 −1

²

,

(7.1.4)

± z (t ) ² 1 ± s (t ) + s (t ) ² 1 (7.1.5) = √ s1 (t ) − s2(t ) . z2 (t ) 1 2 2 At the output port z2 (t ), the input s2 (t ) is shifted by π relative to s1(t ) as compared with the output z 1(t ). Accordingly, this kind of coupler is called a 180-degree hybrid coupler.

s1 (t)

z1(t)

s1 (t)/ √2

s1 (t)

s2 (t)/ √2 s1 (t)/ √2

s2(t) s2 (t)

z2(t)

(a) 180-degree hybrid coupler

π/2 phase shift

is1 (t)/ √2

(b) 90-degree hybrid coupler

Figure 7.2 (a) A 180-degree hybrid coupler. (b) A 90-degree hybrid coupler. The two dashed

boxes are 180-degree hybrid couplers.

1 Review material on complex functions is provided in Section 2.1.

z11(t) z12 (t)

z21 (t) z22(t)

7.1 Passive Lightwave Components

301

A symmetric directional coupler used with a single input acts as a power splitter, and is called a 3-dB coupler. Asymmetric couplers with different power-splitting ratios can be constructed by changing the design of the coupling region. A different kind of coupler, called a 90-degree hybrid coupler, has two inputs and four outputs as shown in Figure 7.2(b). The two inputs to the coupler are s1 (t ) and s2 (t ). There are four outputs z i j (t ) for i = 1, 2 and j = 1, 2, with one form of the coupling matrix T given by

⎡ 1⎢ T= ⎢ ⎣ 2

1 1 1 1

1 −1 i −i

⎤ ⎥⎥ ⎦.

(7.1.6)

The operation of a 90-degree hybrid coupler can be regarded as two 180-degree hybrid couplers √ used in phase quadrature. The amplitudes of the output signals differ by a factor of 1/ 2 from the amplitudes given by (7.1.4). This is due to the 3-dB couplers being used for the splitting of the two input signals. The terms in the second column of the lower part of T are phase-shifted by π/2 as compared with the terms in the upper part of T . Therefore, the lower part of T describes a 180-degree hybrid coupler in phase quadrature with the upper 180-degree hybrid coupler. Other kinds of conventional couplers include a multimode interference coupler, which uses multiple inputs and multiple outputs, and a polarization combiner, which combines two orthogonally polarized inputs into a single output. The inverse function, which separates an incident polarization state into two orthogonal polarization states, is called a polarization beamsplitter. Another component, called a linear polarizer, is designed to pass one linear polarization component. 7.1.2

Delay-Line Interferometers

A schematic representation of a passive component known as a delay-line interferometer is shown in Figure 7.3. The incident lightwave signal s(t ) launched into the lower path of the first coupler at the left produces the coupler output signal (cf. (7.1.2))

± is (t ) ² , s= √ 2 s (t ) 1

which enters the central paths. The signal in the upper path of the interferometer is delayed by a time T , where T may be zero, depending on the desired functionality. The signal in the lower path of the interferometer is multiplied by eiψ to produce a phase shift of ψ . The signal in each path is then the input into a 180-degree hybrid coupler at Delay T s(t) Coupler

Phase shift ψ

z1 (t) z2 (t) Coupler

Figure 7.3 Schematic representation of a delay interferometer.

302

7 Lightwave Components

the right. Using the form of the coupling matrix given in (7.1.2), the output z after the 180-degree hybrid coupler is z = Ts

² ± is(t − T ) ² = s (t )eiψ ± ² iψ = 21 iss ((tt −− TT )) +− iss ((tt))eeiψ . 1 2

±

1 i i 1

(7.1.7)

The use of this configuration in a direct-photodetection demodulator to determine the relative phase shift between consecutive lightwave symbols is described later.

7.1.3

Multipath Interference

As a lightwave propagates across an interface between two lightwave components, a small difference in the index of refraction between the components at the interface can cause a reflection. Both the incident lightwave and the reflected lightwaves can subsequently re-reflect from other interfaces, causing multipath interference. Because the total field amplitude depends on the phase of the field components, this interference is a sensitive function of the location of the interfaces, and can have a substantial random component as the length between the interfaces varies because of environmental effects. To prevent the back-reflected lightwave caused by multipath interference from entering a lightwave transmitter, an optical isolator can be inserted in the lightwave path. This single-input single-output device has a low attenuation in one direction of propagation and a high attenuation in the other direction of propagation.

7.1.4

Optical Filters

Lightwave filters are used to control the spectral characteristics of the lightwave signal. Examples include filters that select wavelength subchannels and filters that reject lightwave noise before demodulation. Optical filter characteristics such as the desired passband and the out-of-band rejection are described in the same way as electrical filters with many electrical filter design techniques directly applicable to lightwave filters. Depending on the filter specifications, a lightwave filter can be fabricated using a combination of absorption, diffraction, interference, and coupled resonators. Complex optical filtering for both the in-phase and quadrature components is possible, but this type of filter usually requires some form of active phase control because of unknown phase offsets from fabrication errors and thermal effects.

7.2

Semiconductors Many lightwave components are semiconductor devices. The electrical and optical properties of a semiconductor material such as silicon can be controlled by incorporating

7.2 Semiconductors

303

(or doping) trace amounts of other elements into the pure semiconductor or into an alloy composed of two or more semiconductors, perhaps with other materials such as aluminum. Sophisticated processing methods use a combination of many different doped materials to construct a variety of lightwave devices, including photodetectors, lightwave amplifiers, and semiconductor lasers. Several lightwave components are based on a semiconductor p–n diode junction. A p–n diode junction consists of two adjacent regions of the semiconductor with different electrical properties. One region, called the n-region, has an excess of negatively charged electrons. An adjacent region, called the p-region, has an excess of vacant but available positions in the crystalline lattice that could hold an electron. Common practice is to view each vacancy in the lattice as if it were a positively charged particle called a hole. Together, electrons and holes are called charge carriers. As suggested by its name, a semiconductor is a material characterized by a range of energies, called the bandgap, that are forbidden to the charge carriers. The energy difference between the upper and lower extremes of the bandgap is called the bandgap energy E g as shown in Figure 7.4(a). The band of allowed energies for a free electron, called the conduction band, is above the bandgap. The band of allowed energies for a hole, called the valence band, is below the bandgap. An electron can transition2 from the conduction band to the valence band only by releasing energy in some form. For semiconductor materials used for lightwave sources, the release of energy is likely to be in the form of a photon. For other semiconductor materials such as silicon, the release of energy is unlikely to be in the form of a photon. An excitation event, such as the absorption of a photon, will move a bound electron in the valence band into the conduction band and leave behind a hole in the valence band, thereby creating an electron–hole pair. A de-excitation event will recombine an electron and a hole, thereby releasing energy as the electron returns to the valence band, (a)

(c)

(b) Conduction band

P-region

Valence band

laitnetoP nortcelE

Eg Incident photon

Charges unlikely to separate

Undoped intrinsic region

N-region

C i(t) =

(t)

Charges likely to separate Space

Figure 7.4 (a) A photodetection event in a photodetector. (b) Electron potential as a function of

space for a PIN photodiode, showing an absorption event in the intrinsic region and an absorption event in the p-region. (c) Circuit model of a photodiode.

2 Common practice is to say that the electron is in the conduction band, meaning that the energy of the

electron is in the conduction band.

304

7 Lightwave Components

annihilating the corresponding hole. For semiconductor materials used for lightwave sources, most of this released energy is in the form of photons. When a voltage is applied across a p–n diode junction such that the p-region is at a higher voltage, or potential, than the n-region, electrons in the conduction band and holes in the valence band will move towards one another and recombine, releasing a portion of the energy in the form of photons and producing a current which flows from the p-region to the n-region. This voltage is called a forward bias. Devices designed to operate with a forward bias are used for lightwave sources and lightwave amplifiers. When a voltage is applied so that the n-region is at a higher (hole) potential than the p-region, the device is reverse-biased with the electron potential higher in the p-region than in the n-region, as shown in Figure 7.4. Devices designed to operate with a reverse bias are used for photodetectors. A photon absorption event within a photodetector is described using photon optics. When the energy h f of the incident photon is greater than the bandgap energy E g , it is likely that the incident photon is absorbed, thereby creating an electron–hole pair, with the electron in the conduction band and the hole in the valence band. The material is opaque at that frequency. This electron–hole-pair-generation event, shown in Figure 7.4(a), is typically not sensitive to the polarization of the incident lightwave signal. The reverse bias then separates the electron–hole pair, so they cannot recombine, thereby creating an external electrical current that flows within the device from the n-region to the p-region.

7.3

Lightwave Receivers A lightwave receiver combines the functions of demodulation, detection, and decoding. Demodulation converts the lightwave waveform into an electrical waveform, usually at real-baseband or complex-baseband. Possibly two stages of carrier downconversion are used, first to an electrical waveform at an intermediate frequency, then to baseband. Detection transforms the baseband electrical waveform into a sequence of detected values. Decoding maps the sequence of detected values into an estimate of the transmitted codeword symbols from which the user data is recovered.

7.3.1

Photodetectors

There are two conventional photodetectors used in lightwave receivers. Both are constructed from semiconductor materials. The first is a photodiode, which is a photodetector that has no internal gain. The second is an avalanche photodiode, which is a photodetector that does have internal gain.

Photodiodes

The basic structure of a photodiode is a reverse-biased p–n diode junction. When an electric field is applied to this p–n junction, an internal electric potential and a corresponding

7.3 Lightwave Receivers

305

electric field proportional to the spatial gradient of the potential are generated. The region near the junction is called the depletion region because the internal electric field sweeps out all free electrons and free holes. The size of the depletion region can be increased by sandwiching a piece of intrinsic (undoped) semiconductor material between the p-region and the n-region. This intrinsic region, along with the spatially varying potential, is shown in Figure 7.4(b). This kind of structure is called a PIN photodiode (Positive–Intrinsic–Negative). When there is no lightwave signal, the generation of electron–hole pairs is unlikely for conventional photodetectors used in lightwave communication systems because the bandgap energy E g of the semiconductor material used for a lightwave photodetector is significantly larger than the average thermal energy kT0 .3 When there is a lightwave signal with the energy of the photon greater than the bandgap energy, photons are absorbed, thereby creating electron–hole pairs. To generate an external current, the electron and the hole must be spatially separated by the field after their creation so that they do not recombine. Separation is likely and recombination unlikely for an electron–hole pair created in the intrinsic region because of the presence of the electric field. Separation is less likely for an electron–hole pair created in either the p-region or the n-region because a significant potential gradient does not exist in these regions. The probability that the electron–hole pair created by an absorbed photon then separates and contributes to the external current is called the quantum efficiency η . The responsivity4 =. η eλ/ hc, which depends on the wavelength of the incident lightwave, is the external current generated per unit lightwave power. The quantum efficiency depends on several factors, including the device structure and the external bias. The use of an external bias increases the electric field within the intrinsic region, thereby increasing the speed at which the separated electron–hole pairs are swept out of the intrinsic region, and so increasing the probability that an electron–hole pair will not recombine. The speed of the photodiode may also be increased by increasing the thickness d of the intrinsic region because the junction capacitance C of the photodetector is approximately ± A /d, where A is the area of the device and ± is the permittivity. Increasing the thickness of the intrinsic region reduces the junction capacitance C and so reduces the RC time constant of the photodiode when it is connected to an external resistance R. The equivalent circuit of a photodiode with responsivity is a current source with i (t ) = P (t ), (cf. (6.2.23)), where P (t ) is the lightwave power. Including the junction capacitance, a simple circuit model of a photodiode is shown in Figure 7.4(c). The sequence of photodetection events is random because of the inherent random photon arrival times within the incident photon stream. This creates a form of noise called shot noise. Shot noise is distinguished from the noise generated by the random thermal excitation that occurs within the photodetector itself. The thermal excitation is

R

R

R

3 A typical bandgap energy for a semiconductor material used for a photodetector is a few electron volts (eV), where 1 eV 1 6 10−19 J. The average thermal energy at T 290 K is about 25 10 −3 eV.

= . ×

= η

4 A photon of energy h f produces an electron charge e with probability .

×

306

7 Lightwave Components

referred to as the dark current. The statistics of these forms of noise are discussed in Section 7.6.

Avalanche Photodiodes

A photodetector with a different physical structure, called an avalanche photodiode, is designed to produce internal gain by an impact ionization process. A typical structure of an avalanche photodiode consists of an absorption region, where the photons are absorbed, and a separate multiplication region designed to have an electric field that is much larger than that of a basic photodiode. In this region, energy is transferred from the large electric field, increasing the velocity of both electrons and holes, leading to further creation of electron–hole pairs. Figure 7.5(a) shows a simplified schematic representation of a device for which the absorption and multiplication occur within the same region. The absorption of a signal incident photon is shown as event A. The event is followed immediately by ionization event B, which generates one electron–hole pair. The negatively charged electron accelerates in a strong electric field until it has sufficient energy to create an additional electron–hole pair by impacting another lattice site of the semiconductor material. This process is called impact ionization and produces internal amplification within the device. The positively charged hole generated during event A also leaves the absorption region and enters a region where the electric field is small so that impact ionization is unlikely. The probability that an electron or hole will produce an ionization event is described by an ionization coefficient denoted αe for an electron and αh for a hole. An electron generated by the ionization event is, in turn, accelerated and, by impacting another lattice site, can then create another electron–hole pair shown as event C. Similarly, a hole generated by the ionization event B can create an additional electron–hole pair shown as event D. This process continues to generate a random number of secondary, tertiary, and subsequent electron–hole pairs. The expected number of such pairs caused by a single detected photon is a random variable whose expected value is called . the photoelectron internal gain G = ±G². Figure 7.5(b) shows each photoelectric conversion as an exponentially decaying current pulse with a random height corresponding to

(a)

A

(b)

laitnetoP nortcelE

P-region

B C

Incident Photon

D

N-region

Photon stream

Random arrival times

Isolated output pulses

Random gain per pulse

Total signal Space

Time

Figure 7.5 (a) Schematic representation of the gain process in an avalanche photodiode, showing

several ionization events. (b) The random arrival times of photons produce a random sequence of current pulses. The random internal gain within an avalanche photodiode produces a random amplitude for each pulse.

7.3 Lightwave Receivers

307

a random gain. The fluctuation of the gain about the mean is called the gain noise. It is discussed in Section 7.6. The gain process, shown schematically in Figure 7.5(b), shows two independent contributions to the overall randomness. The first contribution is the random time τ ² for each primary photodetection event consisting of the generation of an electron–hole pair. The second contribution is the random gain G for each primary electron or hole. Because electrons and holes move in opposite directions and each can generate additional electron–hole pairs, a sequence of events in an avalanche photodiode can form a so-called current loop within a region of high electric field. One such current loop B → C → B is shown in Figure 7.5(a). This current loop does increase the mean value of the gain, but, because the gain process is random, it also increases the variance of the gain, which is a form of noise. These current loops also produce a long transient response, thereby reducing the bandwidth of the device.

7.3.2

Lightwave Demodulators

This section discusses techniques that implement phase-synchronous heterodyne demodulation of a lightwave signal to an electrical signal at an intermediate frequency, techniques that implement phase-synchronous homodyne demodulation of a lightwave signal to a complex-baseband signal, and techniques that implement direct-photodetection demodulation of the lightwave intensity to an electrical signal.

Phase-Synchronous Heterodyne Demodulation

The phase-synchronous conversion of the lightwave modulation waveform into an electrical modulation waveform can be described mathematically simply as the translation of the lightwave signal to an electrical signal at passband or baseband. Frequency translation to an electrical signal at passband may be followed by a subsequent frequency translation of the electrical signal to a real-baseband or complex-baseband signal. This operation comprises the function of the lightwave demodulator. A device that implements frequency translation is called a mixer. An ideal mixer simply changes the carrier frequency without affecting the modulation. For homodyne demodulation, the carrier is removed entirely and the frequency-translated signal is a real-baseband signal or a complex-baseband signal. For heterodyne demodulation, the frequency-translated passband signal is another passband signal at a lower carrier frequency. Heterodyne demodulation is often implemented in lower-frequency systems by using a mixer to multiply an input passband signal by a reference signal at a different frequency. The immediate output of the mixer consists of the sum and difference frequency components, with the sum frequency component then rejected by filtering. The bandwidth B of the passband signal at the intermediate frequency f IF must be within the photodetector frequency response to generate an electrical signal. Because the direct multiplication of two lightwave signals would require a nonlinear material response (cf. Chapter 5), most lightwave systems implement mixing based on a mathematical identity of complex numbers,

308

7 Lightwave Components

Re[ab∗ ] =

1 4

³ ´ |a + b|2 − |a − b|2 ,

(7.3.1)

which is called a quarter-square multiplier. The right side replaces the operation of multiplication with the easier operation of squaring. For lightwave signals, the inner sum and difference operations are implemented using 180-degree hybrid couplers with a = s1 (t ) and b = s2(t ) (cf. (7.1.5)). The squared magnitude operation is implemented using direct photodetection. The final subtraction of the squared terms is implemented as the subtraction of electrical signals. The combination of these three operations has an overall scaling factor of 1/ 4, with one factor of 1/2 generated from photodetection of a passband signal (cf. (1.3.2)) and the second factor of 1/2 generated from the squaring of the output of the coupler (cf. (7.1.5)). The method of replacing a mixer with a lightwave coupler, two square-law photodetectors, and forming the difference of the electrical outputs is called balanced photodetection and is shown in Figure 7.6. To derive the output of a balanced photodetector, let the first input to the coupler be the incident complex lightwave signal s1(t ) = A s (t )ei(2π f c t +φs (t )), where As (t ) is the signal amplitude and φs (t ) is the signal phase. Let the second input be the local oscillator signal s2 (t ) = A LO ei2π f LOt where the phase of the local oscillator is aligned with the carrier. Then, using (7.3.1) with a = s1 (t ) and b = s2(t ), the electrical output i (t ) of the balanced photodetector is i (t ) = Re[ab∗ ]

where f IF

= fc −

= As (t ) A = As (t ) A

µ

i(2π( f c − f LO )t +φs (t )) LO Re e LO



cos (2π f IF + φs (t )),

(7.3.2)

f LO is called the intermediate frequency.

Mixer Image Modes

The elementary trigonometric identity underlying the multiplication operation in a mixer is cos a cos b =

1 2

cos (a − b) + 12 cos (a + b).

= 2π fc and b = 2π f , this becomes cos 2π f c cos 2π f = 12 cos (2π f c − 2π f ) + 12 cos (2π f c + 2π f ). The mixing creates an unwanted spectral (output) image at frequency f c + f

When a

LO

LO

LO

LO

. This image is far out of the bandwidth of the photodetector response and is not seen. z1 (t)

s1(t) Signal s2(t) Local oscillator

Direct photodetection

180-degree hybrid coupler z1 (t)

Direct photodetection

LO

i1 (t) = z1 (t) 2/2 |

|

Output

+ _

i(t) = i1 (t) – i2 (t)

i 2(t) = z2 (t) 2 /2 |

|

Figure 7.6 A schematic representation of a balanced photodetector based on a 180-degree hybrid

coupler.

7.3 Lightwave Receivers

309

However, any input signal or noise within a frequency band centered at 2 f LO − f c , also called an (input) image, will mix into the output signal at frequency fIF because cos(2π(2 fLO − f c )) cos 2π f LO = 21 cos(2π( f LO − f c )) + 21 cos (2π(3 f LO + f c )) . This means that any unwanted signal within the image frequency band centered at 2 f LO − f c will impair the desired signal. This image noise is discussed further in Section 10.2.5.

Shot Noise in Balanced Photodetection

Balanced photodetection uses two physical photodetectors in a balanced arrangement. Each contributes shot noise. In addition to the desired terms generated within the squarelaw photodetector, each photodetector also generates an identical noisy baseband term ) ( rbase(t ). For wave optics, this term is rbase(t ) = | s(t ) + no (t )|2 + A 2LO /2, with an additive lightwave noise source no (t ) included with the signal s (t ). This baseband term is the same for each photodetector, and cancels out during the final subtraction. Accordingly, it may seem that there is no additional baseband noise, only the noise at the intermediate frequency. However, the complete photodetection process can only be understood using the photon-optics signal model. Within this model, the photon-arrival rate R(t ) at each photodetector is proportional to the lightwave signal power P (t ), which is defined prior to photodetection using wave optics. Even though the same wave-optics signal is incident to both photodetectors, the random photon-arrival process ³( t ) (cf. (6.2.19)) causes independent shot noise within each photodetector. The shot noise does not cancel out in the final subtraction, and the sum of the two shot-noise terms appears at the output of the balanced photodetector. The effect of this noise on the probability of a detection error is discussed in Section 10.2.5. Notice that the analysis of balanced photodetection requires wave optics to analyze the couplers and requires photon optics to analyze the shot noise. The two lightwave models come together at the photodetector. This careful balance between the two models emphasizes that although properties of wave optics and photon optics appear incompatibile, neither can be omitted. The incompatibility between the two signal models as well as the artificial boundary between them can be removed only by using the quantum-optics signal model, which is discussed in Chapter 15.

Phase-Synchronous Homodyne Demodulation of a Complex-Baseband Signal

A fully modulated passband signal · s (t ) corresponds to the complex-baseband signal s (t ). A lightwave signal can be converted to a complex-baseband signal by first using a balanced photodetector to form an electrical signal at an intermediate frequency f IF, followed by a homodyne demodulation of the passband electrical signal. Alternatively, the lightwave signal can be directly homodyned to complex baseband. To this end, when the frequency of the input lightwave signal is equal to the frequency of the local oscillator, the balanced photodetector shown in Figure 7.6 implements the homodyne demodulation of only one component of the baseband signal. Therefore, a second balanced photodetector following a 90-degree hybrid coupler must be used to homodyne demodulate a lightwave waveform directly into a complex-baseband electrical waveform. Recall

310

7 Lightwave Components

z11 Lightwave signal s(t) = sI(t) + sQ(t)

Local oscillator sLO(t)

z12 90-degree hybrid coupler

z21 z22

Photodetector

Electrical signals + _

sI (t)

+ _

sQ (t)

Photodetector

Photodetector Photodetector

Figure 7.7 A balanced photodetector for the demodulation of the in-phase and quadrature

components of a lightwave signal based on a 90-degree hybrid coupler.

that the real part of the complex-baseband signal sI (t ) is the cosine-modulated component of the waveform · s (t ) (cf. (2.1.54)). The imaginary part s Q (t ) is the sine-modulated component of the waveform · s (t ). The balanced photodetector shown in Figure 7.7 uses four photodetectors – two for each signal component – to separately demodulate each component to a real-baseband waveform corresponding to s I (t ) or s Q (t ). Referring to Figure 7.2(b) with s L O (t ) as the local oscillator signal, the upper 180-degree hybrid coupler shown in that figure generates the lightwave signals required to produce the electrical in-phase signal component sI (t ). The lower 180-degree hybrid coupler generates the lightwave signals required to produce the electrical quadrature signal component s Q (t ). The four lightwave signals at the output of the 90-degree hybrid coupler are each directly photodetected and pairwise subtracted to produce two real-baseband electrical signals. Taken together, these two real-baseband waveforms form the complex-baseband waveform s (t ) = s I (t ) + is Q (t ).

Asynchronous Demodulation Based on Direct Photodetection

Asynchronous demodulation generates a real-baseband electrical waveform that is proportional to the received lightwave power and suppresses the carrier frequency. Asynchronous demodulation does not use an external phase reference such as would be provided by a local oscillator. The direct use of a photodetector to implement the optical/electrical conversion process is known as direct photodetection.

Differential Direct Detection Another instance of asynchronous demodulation measures the differential phase of a constant-amplitude phase-modulated lightwave. It uses a delay-line interferometer to process the received lightwave waveform before direct photodetection as is shown in Figure 7.8. When used as part of a phase-asynchronous demodulator, the first two sections of the delay-line interferometer excluding the second coupler can be considered as the front-end of a balanced photodetector. The single input to the delay-line interferometer is s(t ) = A s eiφs (t ) . Using (7.1.5) √ with s2 (t ) = 0, the two inputs √ to the balanced photodetector are a = A s ei(φs (t )+ψ) / 2 and b = A s eiφs (t −T ) / 2. These inputs are shown in Figure 7.8. Then, using (7.3.1) and setting the responsivity equal to one, the output of the phase-sensitive demodulator is

R

7.3 Lightwave Receivers

311

Delay-line interferometer

s(t)

Delay T

b

Direct photodetection

Phase shift ψ

a

Direct photodetection

+ _

r(t)

Balanced photodetector Figure 7.8 Direct photodetection using a delay-line interferometer. The functional block of a

balanced detector is also shown.

r (t ) = Re[ab∗ ] 2

= A2s Re[ei(φ (t )−φ (t −T )+ψ ] = i s cos (´φ(t ) + ψ), (7.3.3) where ´φ(t ) = φs (t ) − φs (t − T ) is the phase shift between the two paths, and i s = A2s /2 is the electrical current generated by direct photodetection. In this way, phases

s

sensitive demodulation is obtained with a square-law photodetector.

Photon Counting

Photon counting describes an asynchronous demodulator that counts individual photons using a photodetector with a large internal gain such as an avalanche photodetector. A sufficiently large internal gain is required to discriminate each individual directphotodetection event from the electrical thermal noise added after photodetection. This photodetection process uses an internal nonlinear thresholding circuit called a discriminator, which is shown schematically in Figure 7.9. Each pulse exceeding the physical threshold of the discriminator is counted as a detected photon. The amplitude of each pulse is not recorded as such. The width of an output pulse created by a single photon must be sufficiently smaller than the mean time interval between photons that the overlap of output pulses is infrequent and inconsequential. This requires the photonarrival rate R(t ) (cf. (6.2.21)), which is the reciprocal of the mean time interval between arrivals, to be small compared with the reciprocal response time. When this condition is not satisfied, the photon-counting receiver will often fail to resolve two closely spaced photon arrivals as separate events. With perfect photodetection, the probability mass function for the number of primary photoelectrons is equal to the probability mass function p(m) for the received number of photons (6.2.21) (cf. Table 6.2). This probability mass function is a Poisson distribution. The corresponding communication channel is called a Poisson channel. Referring to Section 6.2.3 and Table 6.2, practical photon counting is described by a nonideal transformation of the photon-arrival process ³(t ) (cf. (6.2.20)) into an estimate ¸m(t ) of the photon-counting process over a sampling duration Ts , which need not be equal to the symbol duration T . Ideally the estimate ¸ m(t ) should equal the true photoncounting process m(t ). The interplay between the integration time and the design of the detection filter is discussed in Section 9.2.5. An estimate of the photon-arrival rate ¸ R(t )

312

7 Lightwave Components

Direct photodetector with random internal gain

Incident lightwave signal

Primary arrival process

Random Random internal amplified pulses gain (time and height)

Sum Thermal Thermal noise floor noise Thresholded external signal

)t(m

hs(t)

Threshold

t Estimated count process

Figure 7.9 Photon counting. When the photodetector has a large internal gain and the

photoelectron arrival rate R is small, a nonlinear thresholding operation followed by summing provides an estimate ¸ m(t ) of the photon-counting process m(t ).

can be obtained by dividing the estimated photon-counting process ¸ m(t ) by the sampling ¸ interval of duration Ts . Scaling R(t ) by h f (cf. Table 6.2) gives an estimate ¸ P (t ) of the continuous lightwave power using photon counting. Several effects cause the estimated photon-counting process ¸ m(t ) to differ from the true counting process m(t ). A practical photodetector has a quantum efficiency η (cf. (6.2.21) that is less than one. This is the probability that a photon arrival produces a photoelectron. This means that over the same interval, the photoelectroncounting process ξ(t ) is less than the photon-counting process m(t ). Photons that are closely spaced in time cannot be resolved and may be detected as a single event. Also, the random gain of an avalanche photodetector required to discriminate a single photodetection event introduces additional randomness (cf. Section 6.7) compared with the internal photoelectron-arrival process g(t ) before the internal amplification process. Finally, random thermally generated dark current is generated within the photodetector, producing an extraneous constant photoelectron count rate µdark . This dark current can be reduced by cooling the device, but cannot be completely eliminated.

7.4

Lightwave Amplifiers Lightwave amplification differs from amplification used in lower-frequency systems. Lightwave amplification is studied using a combination of photon optics and wave optics. Indeed, a deft combination of photon optics and wave optics is necessary to fully describe lightwave amplification because the photons generated by the gain process must emit into an electromagnetic mode described by wave optics and thereafter become an indistinguishable part of the lightwave. Moreover, the photon energy E is related to the wave concept of frequency by the fundamental quantum-optics expression E = hf . Lightwave amplifiers currently in use for lightwave communication systems produce gain by the process of stimulated emission. This requires a material with gain sites that have one or more pairs of energy levels separated by the energy E = h f of the photon. A semiconductor could be such a material. In the process of amplification, a photon incident on a gain site within the lightwave amplifier induces an upper energy state to relax into a lower energy state, thereby emitting a photon. The emitted photon has

7.4 Lightwave Amplifiers

313

the same properties as the incident photon (cf. Section 6.1.2). This stimulated emission thereby produces gain. Lightwave amplifiers based on stimulated emission respond to the in-phase and quadrature components of a lightwave signal s (t ) = sI (t ) + is Q (t ) in the same way. Accordingly, a lightwave amplifier that uses stimulated emission is a phase-insensitive amplifier. Both signal components are amplified, without distinguishing the phases. Two phase-insensitive lightwave amplifiers based on stimulated emission are discussed in this section. The first uses two energy states of atoms in ionized form as the upper and lower energy levels. These ions are embedded into the core of a fiber. The second uses electron–hole recombination in a waveguide geometry fabricated using semiconductor materials. In contrast, a lightwave amplifier called a phase-sensitive amplifier responds to the in-phase and quadrature components differently. This kind of amplifier is currently not in common use. In a phase-sensitive amplifier, the gain is not generated by stimulated emission. Instead, the gain is generated by a nonlinear mixing process between the incident lightwave signal and another signal, usually called the pump signal. This interaction transfers energy from the pump signal to the incident signal, thereby providing gain. The energy transfer from the pump signal to the incident signal depends on the relative phase between the pump signal and the incident signal, so the gain can be different for the in-phase and quadrature components. Other types of lightwave amplifiers that use different physical mechanisms such as stimulated Raman scattering or stimulated Brillouin scattering are not discussed in this book.

7.4.1

Doped-Fiber Lightwave Amplifiers

The basic structure of a doped-fiber lightwave amplifier is shown in Figure 7.10. The doped fiber is produced by introducing trace amounts of a dopant that can provide optical gain into the core of a segment of optical fiber. The segment of doped fiber, along with the optical coupler that couples the pump source into the doped segment of a modematched fiber, is then fused with the undoped segments of fiber to produce an amplifying segment of fiber as is shown in Figure 7.10. The coupling of a doped-fiber amplifier to unamplified fiber segments is simplified by using the same geometry for both. The most common type of lightwave amplifier operates near 1500 nm and uses the rare-earth element erbium5 as the gain medium. Although other elements with different energy levels are used for lightwave amplification in other wavelength regions, only the characteristics of an erbium-doped fiber amplifier will be discussed as the exemplar case. For this kind of lightwave amplifier, the source of energy is another lightwave at a shorter wavelength. This lightwave, the optical pump, is coupled into the fiber along with the signal. Referring to Figure 7.11, an erbium gain site in the unexcited ground state E 0 absorbs a photon from the optical pump. This absorption event produces an 5 Erbium is element 68 in the periodic table.

314

7 Lightwave Components

Doped fiber

Input

Output Wavelength couplers

Pumps Figure 7.10 A schematic loop-representation of an erbium-doped fiber amplifier.

Optical pump (~980 nm)

E3 E2

Stimulatedemission event

E1 E0

Ground state

Figure 7.11 A schematic representation of a stimulated emission event for the discrete energy

levels of an isolated erbium atom.

excited energy state E 3 that is higher than the upper energy state E 2 used by the gain process. Almost all of the energy from the pump is absorbed by the multiple erbium gain sites within the doped-fiber structure. An excited energy state then relaxes by releasing some of its energy by a process not discussed herein, leaving the gain site in the energy state E 2, which is the upper of the two energy states used for amplification. Lightwave gain is generated when an excited erbium atom6 in an upper energy state at energy E 2 relaxes to the lower energy state at energy E1 through the process of stimulated emission as shown in Figure 7.11. The lower energy state, which is close to the ground state in energy, then relaxes to the ground state E 0 by another process not discussed herein. Lightwave noise is generated when the upper energy state at energy E2 relaxes to the lower energy state at energy E 1 through the process of spontaneous emission. This spontaneous relaxation has a characteristic time scale called the upper-state lifetime. This is the mean time that an erbium atom will remain in the upper energy state before spontaneously relaxing to the lower energy state. Because erbium atoms are randomly positioned within the host noncrystalline glass medium, each atom experiences a slightly different local field, causing slightly different values of the upper energy state and the lower energy state. Therefore, each stimulatedemission event responds to a photon at a slightly different wavelength. This results in a 6 The ionized state Er3+ is used for lightwave amplifiers.

7.4 Lightwave Amplifiers

315

)elacs gol( rewoP evitaleR 1500

1520

1540 1560 Wavelength (nm)

1580

Figure 7.12 The spontaneous-emission spectrum of an erbium-doped lightwave amplifier without

an input signal.

broad spontaneous-emission spectrum, which is shown as a function of the wavelength for one kind of amplifier in Figure 7.12. This type of spectral broadening is discussed in Section 7.4.4.

7.4.2

Gain in a Doped-Fiber Amplifier

Lightwave amplifier gain is analyzed in this section for the case of a single transition between two discrete energy levels E 2 and E 1 as is appropriate for a doped-fiber amplifier. For the alternative semiconductor lightwave amplifier discussed in Section 7.4.3, the analysis is similar, but accounts for the energy transition being between a continuum of levels in the valence band of a semiconductor and a continuum of levels in the conduction band of the semiconductor. Because the transition in a semiconductor amplifier can occur between any occupied energy state in the conduction band and any occupied state in the valence band, this results in a process that differs significantly from the process discussed in this section for a doped-fiber amplifier. The upper energy state E 2 in a doped-fiber amplifier has a density of N2 gain sites per unit volume. The lower energy state E 1 has a density of N 1 gain sites per unit volume. The gain G of a lightwave amplifier depends on the density of gain sites in each energy level and on the efficiency σe of a single gain site, such as an erbium atom, to generate a stimulated-emission event. This efficiency, which is usually frequency-dependent, is called the stimulated-emission cross section. For a system characterized by two energy levels with volume densities N 1 and N 2, the internal gain per unit length γint is defined as

γint =. σe N2 ,

(7.4.1)

where N2 is the density of the gain sites in the upper energy state (cf. Figure 7.11). The value of N 2 depends on the pump power, on the lightwave signal power within the amplifier, and on the upper-state lifetime. The lightwave signal induces a gain site in an upper energy state to relax to a lower energy state through the process of stimulated emission.

316

7 Lightwave Components

Erbium atoms are randomly positioned within the host glass medium. The maximum density N of erbium atoms in silica glass is on the order of one part in one thousand. This avoids a change in the gain characteristics that would be caused by closely spaced erbium atoms interacting with each other. The internal gain per unit length γ int , as given by (7.4.1), is limited by this low density. Therefore, to achieve a high overall gain G, a typical erbium-doped fiber amplifier consists of a doped-fiber segment several meters in length, which is conventionally depicted by the loop shown in Figure 7.10. Whereas an excited erbium atom can amplify a lightwave signal through the process of stimulated emission, an unexcited erbium atom can attenuate a lightwave signal by absorbing a photon and transitioning to an excited state. The product of the absorption cross section σa and the density N1 of the gain sites in a lower energy state is the internal absorption loss given in (3.1.1) as

αm = σa N1 .

The net gain per unit length γ is defined as the difference between the internal gain γint and the internal absorption loss αm ,

γ =. γint − αm = σe N 2 − σa N1 . A necessary condition to have a net gain is that σe N2 is larger than σa N1 .

(7.4.2)

When the density N 2 of the gain sites in the upper energy state is constant, the gain per unit length is constant. This is called the small-signal gain per unit length and is denoted by γ0. The incremental increase in the lightwave signal power ´ P (z ) over a distance ´ z, which is proportional to the power, is then ´ P(z ) = γ0 P (z)´ z. The corresponding differential equation for the spatial evolution of the lightwave signal power is dP (z ) dz

= γ0 P (z ).

.

(7.4.3)

Applying the initial condition at the input to the amplifier, P (0) = Pin , the solution to (7.4.3) is P (z) = Pin eγ0 z , which is an exponentially increasing function of the distance that the signal propagates in the amplifier. The small-signal gain G 0 after a distance . z = L is G 0 = eγ0 L , so the total gain G 0 = eγ0 L is an exponential function of the length L of the amplifier.

Gain Saturation

The derivation of the small-signal gain in (7.4.3) is valid provided that the gain per unit length γ is constant. However, as the signal amplitude increases, the rate of stimulated emission increases. This reduces the density N 2 of the gain sites in the upper energy state, and increases the density N 1 of the gain sites in the lower energy state so that the difference between the gain and the loss decreases. Because γ depends on a weighted difference in the densities as given in (7.4.2), the net gain per unit length is reduced. This effect is called gain saturation. Gain saturation can be analyzed by modifying (7.4.3) and writing the gain per unit length γ as a function of the lightwave signal power P (z ),

γ = 1 + Pγ(z0 )/ P , sat

(7.4.4)

7.4 Lightwave Amplifiers

317

where γ0 is the constant small-signal gain per unit length, and Psat is a constant called the saturation power.7 The second term in the denominator is caused by gain saturation, which decreases the upper energy state density N 2 and thus decreases the gain coefficient according to (7.4.2). The saturation power is defined such that when P (z ) = Psat , the gain per unit length γ is reduced to half of the unsaturated value γ0 . For a typical erbium-doped fiber amplifier, the saturation power Psat is a few milliwatts. Replacing γ0 P (z ) in (7.4.3) with γ P (z ) defined in (7.4.4) gives the differential equation γ0 P (z) . dP (z ) = dz 1 + P (z )/ Psat As P (z ) becomes large compared with Psat , the right side approaches the constant value γ0 Psat . Thus, as the amplifier saturates, the exponential growth in z transitions to linear growth in z. This transition is shown in Figure 7.13(b). To solve this differential equation, separate the variables P and z as

¹1

+ P

º

1 dP Psat

= γ0 dz .

Then integrate the right side from z = 0 to L, and the left side from P (0) . P ( L ) = G Pin , where G = P (L )/ Pin is the saturated gain. This yields

¹

log P

+

P Psat

=

Pin to

º »»G P »» = γ0 z »»L0 , P in

in

leading to an implicit equation for the saturated gain G given by G

= eγ L e(1−G) P /P = G 0 e(1−G) P /P . 0

in

in

(a) 20

(b)

Go = 20 dB

10 5 0 10–3

10–2

10–1

1

10

Input Power (P in/P sat)

102

103

(7.4.5)

sat

Go = 20 dB

5 )tas P/tuoP ( rewoP tuptuO

)Bd( niaG

15

sat

4 3 2 1 0

0

0.5

1

1.5

2

Input Power (Pin /Psat )

Figure 7.13 (a) Gain of a lightwave amplifier as a function of the ratio of the input power Pin to the saturation power Psat. (b) The output power Pout scaled by Psat . The output power grows exponentially for a small-input signal level and then transitions to linear growth for a large-input signal level as the amplifier saturates. 7 Expression (7.4.4) and the form of P can be derived from a set of coupled differential equations called sat

rate equations that govern the dynamics of the interaction of the lightwave signal and the amplifying medium. The rate equations for lightwave amplifiers are covered in Becker, Olsson, and Simpson (1999).

318

7 Lightwave Components

Accordingly, the gain is a nonlinear function of the power. To express G as a function of Pin / Psat requires numerical methods. A plot of G versus Pin / Psat showing the effect of gain saturation is shown in Figure 7.13(a).

Time Dynamics of Gain Saturation Gain saturation is also affected by the time dynamics of the energy transfer process within the material that provides the gain. The analysis of the time dynamics leads to the gain having a recovery time constant that depends on the upper-state lifetime of the gain site. The mean upper-state lifetime limits the rate of change of the density N 2 of the gain sites in the upper energy state, with the maximum rate of change of the gain being approximately the reciprocal of the upper-state lifetime. As a result, the saturated gain of an erbium-doped fiber amplifier has a time response that depends not on the instantaneous lightwave power, but on the time average of the lightwave power, where the time average is over the upper-state lifetime. A lightwave signal that is amplitude-modulated with a single frequency much larger than the maximum rate of change of the gain will see a constant saturated gain, with the constant gain depending on the incident lightwave power and the pump power. A lightwave signal that is amplitude-modulated at a single frequency much smaller than that of the maximum rate of change of the gain sees a nonlinear saturated gain that varies with the time-varying lightwave power of that low-frequency signal. This means that a low-frequency amplitude-modulated signal can modulate the amplifier gain seen by higher frequencies of the waveform. As an example, a typical erbium-doped fiber amplifier has an upper-state lifetime of approximately 10 ms, so the gain will vary in the presence of frequencies below 100 Hz. For a typical lightwave communication system that uses a lightwave amplifier, the modulation rate is greater than 1 GHz, but low-frequency components below 100 Hz will be seen in the modulated waveform when there are long runs of the same data value. This can modulate the gain seen by other wavelengths. To avoid this, the power density spectrum of the modulated lightwave signal may be controlled by using an appropriate code, as discussed in Chapter 13, so that the modulated waveform has a negligible amount of power at frequencies near or below 100 Hz. Therefore, the other frequency components of a high-data-rate lightwave signal experience a nearly constant gain. Although the gain of the amplifier can saturate, this effect does not produce nonlinear signal distortion as long as all the modulated frequency components are well above the maximum rate of change of the gain.

7.4.3

Semiconductor Lightwave Amplifiers

Another common lightwave amplifier is a semiconductor lightwave amplifier. A semiconductor amplifier is constructed using a waveguiding structure based on semiconductor materials, as shown in Figure 7.14(a). The lightwave signal must be coupled into and out of the semiconductor waveguide. Because the geometry and the numerical aperture of a semiconductor waveguide are different than the geometry of an optical fiber, the

7.4 Lightwave Amplifiers

(a)

319

(b) Injected carriers

Current

Waveguide

Fiber coupler

or Semiconduct substrate

Fiber coupler

Stimulatedemission event

Figure 7.14 (a) A schematic representation of a semiconductor lightwave amplifier. (b) A

schematic representation of a stimulated-emission event in a semiconductor caused by electron–hole recombination.

coupling of such an amplifier to a segment of fiber tends to be more difficult than the coupling of a doped-fiber lightwave amplifier to the same fiber segment. A gain site in a semiconductor amplifier is defined as a potential interaction between an electron in one energy state in the conduction band and a hole in one energy state in the valence band. Because the energy band is composed of a broad continuum of energy states, this means that a broad range of frequencies can be amplified. The gain spectrum is determined by the energy distributions of the electrons in the conduction band and the holes in the valence band. An incident current, called an injection current, excites electrons into the conduction band, thereby creating holes in the valence band. In a doped-fiber amplifier, the density of gain sites is small because erbium atoms must be noninteracting. By contrast, the density of gain sites for a semiconductor lightwave amplifier, which is related to the doping concentration, can be many orders of magnitude larger than the maximum density of erbium atoms in silica glass. Because the density of gain sites in a semiconductor amplifier is much larger than the density of gain sites in a doped-fiber amplifier, the same gain can be achieved in a much shorter length than for a doped-fiber amplifier. When the injection current is absent, an incident lightwave signal with a photon energy larger than the bandgap energy of the semiconductor will be absorbed, and the semiconductor material will be opaque at this wavelength, with the incident lightwave being absorbed within the material. When an injection current is applied, a surplus of electrons is excited into the conduction band and a surplus of holes is excited into the valence band. When a lightwave is present, an electron in the conduction band and a hole in the valence band can recombine by stimulated emission, thereby emitting a photon. The energy-level diagram of this recombination process is shown in Figure 7.14(b). For a sufficiently large injection current, the stimulated emission induced by the incident lightwave signal balances the absorption. At this point, the material is no longer opaque and instead becomes transparent. A further increase in the current produces a net optical gain for the incident lightwave. Electrons and holes may also recombine spontaneously, thereby emitting a spontaneous photon. The rate of spontaneous emission depends on the radiative recombination time, which is analogous to the upper-state lifetime of an erbium atom in ionized form.

320

7 Lightwave Components

7.4.4

Wavelength Dependence of the Gain

The probability of a stimulated-emission event depends on the wavelength of the incident lightwave signal and on the stimulated-emission cross section (cf. (7.4.1)) that describes a gain site. The properties of a gain site depend on the local electric field generated by the host material. When every gain site experiences the same local field and has the same properties, the gain is homogeneous. When different subsets of gain sites experience different local fields, the gain is inhomogeneous. For the inhomogeneous case, gain sites that experience the same local field intensity are regarded as forming a subset of gain sites, even though these gain sites are dispersed throughout the medium. When erbium is incorporated into glass, each erbium atom may experience a slightly different local field as a consequence of the noncrystalline nature of glass. This causes the spacing between the energy levels for a subset experiencing the same local field to be slightly different than the spacing between the energy levels for another subset experiencing a slightly different local field. The net effect produces an overall gain γ (λ) that is broadened in wavelength as compared with a hypothetical medium for which the local field is identical for every erbium atom. Therefore, the gain in an erbium-doped fiber amplifier is inhomogeneous because each subset of erbium atoms experiences a different local field that has different gain characteristics. A photon can stimulate a gain site only within a subset corresponding to its wavelength. The situation in a semiconductor lightwave amplifier is different because the energy levels are not discrete; they form a continuum. This means that the gain sites can interact, causing the density of gain sites at a given energy to be a function of other mechanisms that are not discussed herein. Depending on the operating conditions and the specific device structure, the gain in a semiconductor lightwave amplifier may be homogeneously broadened or inhomogeneously broadened. When the gain is homogeneously broadened, every electron–hole recombination event interacts with the incident light in the same way. When the gain is inhomogeneously broadened, different subsets of electron–hole recombination events will interact in a different way with the incident light. For a semiconductor amplifier with homogeneous gain, the signals in multiple passband lightwave subcarriers over a broad range of carrier wavelengths interact with the same set of gain sites. Every upper energy state that interacts with the first subcarrier is one gain site fewer that can interact with the second subcarrier. Therefore, when the gain is saturated, the presence of the first subcarrier can modulate the gain seen by the second subcarrier. This gain modulation process can produce interchannel interference. Gain modulation can be avoided by operating the amplifier in an unsaturated linear regime, thereby reducing the gain or by using (nearly) constant-amplitude signals in all subcarriers. The maximum rate of change of the gain for a semiconductor is approximately the reciprocal of the radiative-recombination lifetime within the semiconductor, which is on the order of one nanosecond. This means that the saturated gain of a semiconductor amplifier can change at a rate comparable to an amplitude-modulated signal in the gigahertz range. For an inhomogeneous-gain medium, such as an erbium-doped fiber amplifier, each subset of erbium atoms has slightly different energy spacings. This causes the gain for

7.4 Lightwave Amplifiers

321

each subset to be shifted in wavelength, which means that different modulated signals at different wavelengths will interact with different subsets of erbium atoms. Therefore, in an inhomogeneous-gain medium, multiple wavelength signals can be amplified in the presence of gain saturation without interchannel interference. The unique combination of broadband inhomogeneous gain and the long upper-state lifetime has led to the widespread deployment of erbium-doped fiber amplifiers.

7.4.5

Noise from Multiple Amplifiers

A long-haul fiber span consists of multiple segments of optical fiber and multiple lightwave amplifiers. After each fiber segment, to compensate for attenuation, the signal is amplified. Each amplifier, however, also introduces noise due to spontaneous emission within the amplifier. This section examines the relationship between the total length of a span and the number of amplified segments based on the tension between the gain of an amplifier and the gain-dependent noise introduced by the amplifier. The origin of this noise is discussed in Section 7.7. Let G j be the gain per segment for the j th fiber amplifier in a fiber span of J segments, each of length L. The jth amplifier has a noise power density spectrum N j ( f ). Let T j = e−κ L be the transmittance of one fiber segment (cf. (3.1.3)) with an attenuation of κ km−1 . A schematic diagram of a cascade of J amplified fiber segments is shown in Figure 7.15. The lightwave signal s J (t ) at the output after J fiber segments is the product of the combined gain–transmittance terms G j T j for all J segments. Then the output signal is s J (t ) =

J ¼

j =1

G j T j sin(t ).

(7.4.6)

The noise power density spectrum accumulates segment-by-segment. The noise N1 ( f ) generated in the first segment is multiplied by the gain–transmittance product of the second segment G 2T2 . Adding the noise from the second segment then gives N2 ( f ) + G 2 T2 N1 ( f ). Continuing this process, the noise power N J ( f ) after J fiber segments is NJ ( f ) =

⎛ J J ½ ⎝¼ j =1

k= j +1

⎞ G k Tk ⎠ N j ( f ).

(7.4.7)

Choosing the gain per segment to balance the attenuation per segment gives G k Tk = 1 for all k. When the noise power density spectrum added by each amplifier is equal, then L

sin(t)

1

G1

L

+ N1

2

G2

L

+

J

GJ

N2

Figure 7.15 (a) A schematic representation of J amplified segments of optical fiber.

+ NJ

sJ(t)

322

7 Lightwave Components

N j ( f ) = N ( f ). Substituting these expressions into (7.4.6), the signal at the output of the fiber span is equal to the signal at the input of the fiber span. Substituting the same expressions into (7.4.7) and expressing the noise as a function of the total span length L span = J L yields N span ( f ) = J N ( f ).

(7.4.8)

Now write N ( f ) = h f nsp (G − 1) by a forward reference to expression (7.7.8) of Section 7.7. Substituting this expression into (7.4.8) gives Nspan ( f ) = J h f nsp (G − 1).

(7.4.9)

This shows that for a fixed gain per segment G that balances the loss, the total noise grows linearly in the number of segments. Because the signal has the same value at the end of each segment, the signal-to-noise ratio SNR j at the end of each segment decreases as the number of segments is increased when the gain per segment is held fixed.

Noise from Distributed Gain

The signal-to-noise ratio at the output of the fiber span can be optimized by simultaneously adjusting the gain per segment G and the number of segments J . When the gain balances the loss in each segment, these are not independent parameters. To proceed, use the constraint that Ge−κ L = 1 to express the length L of a single segment in terms of G so that 1 L = loge G. (7.4.10)

κ

The total span has length L span equal to J L . Writing J as L span / L, substitute (7.4.10) into (7.4.8) to write the noise spectrum at the output of the fiber span as Nspan ( f ) = K

G−1 , loge G

(7.4.11)

where K = κ h f nsp L span . This expression states that the noise added over the full span depends on the ratio (G − 1)/ loge G. As the gain G per segment increases, the term G − 1 in the numerator grows faster than the term loge G in the denominator, so N span increases. There is more added noise as G increases. Therefore, a lower gain per segment, and more segments, yields less total noise. The minimum total noise is achieved as G approaches one, at which point (G − 1)/log e G = 1. The length of each segment approaches zero and the number of segments J approaches infinity with the product J loge G = κ L span remaining constant. This condition is equivalent to setting the gain per unit length, g = log e G / L, equal to the attenuation per unit length κ . A distributed lightwave amplifier has a constant gain per unit length. The gain is distributed throughout the entire length, L span of the span to balance the distributed loss. A uniform distribution of gain results in the least total noise at the fiber span output. For lumped amplification, the ratio (G − 1)/log e G appearing in Nspan ( f ) is larger than the ideal limiting value of one obtained by distributed amplification. To maintain the same signal-to-noise ratio, additional signal power must be added. Therefore, this ratio is the power penalty of using lumped amplification compared with using distributed

7.5 Lightwave Transmitters

323

amplification. For a lumped amplifier gain of 20 dB, this noise gain is 13.3 dB, which is a significant penalty. For a lumped amplifier gain of 10 dB, this noise gain is 6 dB, which is a more acceptable penalty, but requires shorter fiber segments and more amplifiers. Distributed amplification also reduces the effect of intensity-dependent nonlinearities (cf. Section 5.3) because the peak power does not become as large with distributed amplification. The improved performance of distributed amplification does motivate the use of a distributed lightwave amplifier, although current practical disadvantages limit the use of such amplifiers. One kind of distributed amplifier is based on the nonlinear Raman effect. 8 The amplification is generated by launching a high-constant-power lightwave pump source at a frequency that is shifted from the signal frequency, with the shift depending on the specific fiber composition. The lightwave pump is an energy source used to create gain for a modulated lightwave signal.

7.5

Lightwave Transmitters A lightwave transmitter consists of a lightwave source that generates the lightwave carrier and a modulator that is used to imprint information onto the lightwave carrier. The source and the modulator can be implemented within a single device by directly modulating the power of the lightwave source. Alternatively, these functions can be implemented separately, first generating an unmodulated lightwave carrier, then using an external modulator to control the amplitude and the phase of the lightwave. The lightwave sources used in guided-wave lightwave transmitters are light-emittingdiode structures based on specific semiconductor materials that can efficiently emit light. A light-emitting diode can be used directly as a noncoherent lightwave transmitter, or it can be embedded into a resonator, or optical cavity, thereby creating a laser diode to provide a coherent lightwave carrier.

7.5.1

Light-Emitting Diodes

A light-emitting diode is a semiconductor light source without a resonator. It produces spontaneous emission when an electron–hole pair recombines to generate a photon within the diode junction. The electrons and holes are first created by passing an injection current through the semiconductor diode. For an appropriate semiconductor material, it is likely that the electrons and holes recombine spontaneously, thereby generating spontaneous emission of photons. The spectrum of the spontaneous emission from a typical light-emitting diode shown in Figure 7.16(a) has a broad spectral width in wavelength, measured in tens of nanometers, and a wide spatial emission pattern. The broad spectral width limits the transmission length or information rate because the intramodal dispersion in a fiber (cf. (4.5.7)) is proportional to the spectral width σλ of the light source. The wide spatial emission pattern means that a light-emitting diode is commonly used with a multimode fiber that has a large numerical aperture. 8 See Bromage (2004).

324

7 Lightwave Components

(a)

rewoP

rewoP Wavelength (10 nm intervals)

(b)

Wavelength (10 nm intervals)

Figure 7.16 Typical spectrum of a light-emitting diode and (b) a (nonlasing) laser diode that is biased below the lasing threshold. Both spectra are generated from spontaneous emission, with the effect of the allowed resonator modes evident for the laser-diode spectrum.

A lightwave modulator can be constructed from a light-emitting diode using internal direct-current modulation. This modulation format generates a zero-mean modulating current i mod(t ) that is added to a constant bias current i bias, with the total injection current i (t ) into the diode given by i (t ) = i bias + i mod(t ).

(7.5.1)

This produces a charge carrier density N (t ) given by N (t ) = Nbias + N mod(t ),

(7.5.2)

consisting of electrons and holes. Most of these electrons and holes recombine in pairs radiatively in the diode junction, thereby producing noncoherent spontaneous emission with the emitted lightwave power P (t ) proportional to the carrier density N (t ). Directcurrent modulation controls only the emitted lightwave power and not the phase of the lightwave signal. The root-mean-squared amplitude of the emitted passband lightwave signal can be written as (cf. (1.3.5))

¾ (7.5.3) ·s (t ) = P (t ) cos(2π f c t + φ(t )), where P (t ) = Pbias + Pmod(t ) is the total lightwave power as controlled by the modulating current i (t ). The rapidly time-varying phase φ(t ) is not controlled and leads to

the noncoherent nature of the spontaneous emission. The statistics of the phase of a noncoherent source are discussed in Section 8.2.2. The noncoherent spontaneous emission implies that the frequency response of a lightemitting diode can be expressed in terms of the frequency response of the modulated lightwave power Pmod(t ) instead of the frequency response of the passband lightwave signal · s (t ). This frequency response is governed by a first-order differential equation that relates the rate of change of the injected charge carrier density d N (t )/dt both to a source term and to a loss term. Electron–hole pairs are injected into the junction at a rate i (t )/eV , where i (t ) is the injection current, V is the volume of the junction, and e is the electron charge. This is the source term. The electrons and holes injected into the junction then recombine at random times to produce photons by spontaneous emission. The mean time τ before spontaneous

7.5 Lightwave Transmitters

325

recombination is called the radiative-recombination lifetime.9 The finite lifetime leads to the loss term N (t )/τ due to the spontaneous emission. The total rate of change of the carrier density is dN (t ) dt

= ie(Vt ) − Nτ(t ) .

(7.5.4)

Taking the Fourier transform of each side of this equation using the frequency variable f and noting that the emitted power P ( f ) is proportional to the carrier density N ( f ), the frequency response of a light-emitting diode used as a modulator can be written as P( f ) I(f)

= 1 + i2Y π f τ ,

(7.5.5)

where I ( f ) is the input current at a frequency f , P ( f ) is the emitted power, and Y is a scaling constant that relates the zero-frequency input current to the zero-frequency output lightwave power.

Probability Distribution of the Emission

Within wave optics, the spontaneous emission from a light-emitting diode is treated as a bandlimited gaussian random process. Suppose that spontaneous emission with a bandwidth B is first photodetected to give a signal proportional to the intensity, then integrated over a time interval of duration T to form a sample. For a bandlimited source, the approximate number of independent degrees of freedom that contribute to the probability distribution of the sample is K = ³T B ´ (cf. Section 6.5). For a single degree of freedom, the probability density function for the intensity is an exponential distribution (cf. (2.2.49)). For K degrees of freedom, the probability density function of the sum is the convolution of K exponential distributions, which is a gamma distribution. The probability mass function for this case under a photon-optics signal model is the Poisson transform of a gamma distribution, which is a negative binomial distribution (cf. (6.5.11)). A typical integration time T is on the order of the radiative-recombination lifetime τ . A typical spectral width for a light-emitting diode is on the order of tens of terahertz, which means that the number K of independent contributions to the sample is extremely large. For this common case, the negative binomial probability mass function given in (6.5.11) reduces to a Poisson distribution (cf. Section 6.5.2). Accordingly, a broadband lightwave source such as a light-emitting diode can be modeled using Poisson statistics under a photon-optics signal model. The corresponding inverse Poisson transform goes to a constant.

Modulated Spectral Bandwidth Amplitude modulation of a light-emitting diode by injecting a time-varying current into the diode junction produces an intensity-modulated lightwave signal. The phase is not controlled by the current source. The frequency response of the modulation process given in (7.5.5) is governed by the radiative-recombination lifetime τ within the diode, 9 This term is analogous to the upper-state lifetime of an erbium-doped fiber amplifier.

326

7 Lightwave Components

which is on the order of nanoseconds. This means that the maximum modulation bandwidth for a direct-current-modulated light-emitting diode is on the order of hundreds of megahertz. This bandwidth is orders of magnitude smaller than the spectral width of the spontaneous emission, which is on the order of tens of terahertz. Accordingly, the spectral width σλ of the modulated lightwave is dictated by the spectral width of the phase noise due to the spontaneous emission and not by the bandwidth of the modulating signal. 7.5.2

Laser Diodes

The large spectral width of a light-emitting diode’s output signal can be substantially reduced when the semiconductor source is placed inside an optical resonator, which is a device that provides optical feedback by using partially reflective or fully reflective surfaces. A laser diode is an oscillator constructed using gain and feedback, with the feedback provided by the optical resonator. The resulting structure is a semiconductor laser diode. The resonator structure provides the positive feedback required to create optical oscillation, called lasing. A laser diode has a spectral width, called the intrinsic linewidth, that is many orders of magnitude smaller than the spectral width of a lightemitting diode. Moreover, because of the reduced angular spread of the output light, the emission from a laser diode can be more readily coupled into a single-mode fiber than can that from a light-emitting diode. The gain mechanism for a laser diode is the same gain mechanism as was discussed for a semiconductor optical amplifier in Section 7.4.3. The feedback required for oscillation depends on the allowed optical modes that resonate within the structure providing feedback. The quality of the resonance can be quantified by the photon lifetime τ of each allowed resonator mode, which is the mean time that a photon will remain in the resonator mode before leaving as emitted light or being lost to absorption or scattering. A longer photon lifetime permits more stimulated emission events per photon, thereby reducing the number of spontaneous-emission events.

Lasing Conditions

The spectral characteristics of the emission from a laser diode are a strong function of the strength of the injected current that produces the distribution of the electron–hole pairs. At low injection currents, the net gain per unit length, γ = γint − αm (cf. (7.4.2)), is negative because the round-trip loss αm is larger than the internal round-trip gain γint produced by the injected current. This negative feedback means that the device is not lasing, the emitted light is due only to spontaneous emission, and the device behaves much like a light-emitting diode. This behavior is shown in Figure 7.16(b), noting that the effect of the resonator mode structure does change the emission spectrum. For a sufficiently large injection current, the feedback for one or more resonator modes is positive because the gain per round trip is larger than the internal loss per round trip. At this injection current, called the threshold current i th , at least one resonator mode of the device lases, with the rate of stimulated emission increasing rapidly as the number of photons in the resonator increases. Moreover, the ratio of the coherent

7.5 Lightwave Transmitters

0

(a)

–10

–10 )mBd( rewoP

)mBd( rewoP

–20 –30 –40

327

(b)

–20 –30 –40 –50

–50 1500

1520

1540 1560 Wavelength (nm)

1580

1600

–60 1530

1540

1550

1560

1570

Wavelength (nm)

Figure 7.17 (a) A multimode laser diode. (b) A nearly single-frequency laser diode.

stimulated emission to the noncoherent spontaneous emission increases abruptly, resulting in the spectrum dramatically narrowing compared with the emission spectrum below the lasing threshold (cf. Figure 7.16(b)). Equilibrium is quickly reached, at which point the light leaving the device balances the net gain within the device. Simple resonator structures allow several dominant lasing modes, as shown in Figure 7.17(a). In general, the gain and feedback depend on the specific mode, so each laser mode has a different threshold. More complex resonator structures permit only one dominant lasing mode, with a narrow spectral width as shown in 7.17(b). This kind of laser is suitable for phase-synchronous modulation formats.

Direct-Current Modulation of Laser Diodes

Just as for a light-emitting diode, the lightwave power emitted from a laser diode can be directly controlled by modulating the injection current as given by i (t ) = i bias + i mod(t ). The direct-current modulation of a laser diode is shown schematically in Figure 7.18 for an idealized operating characteristic. The bias current i bias is set above the lasing threshold so that the response time is dominated by the intensity-dependent rate of stimulated emission instead of by an intensity-independent rate of spontaneous emission, which is governed by the radiative-recombination lifetime. This bias current produces a small lightwave power Pmin corresponding to a transmitted space. The modulated current i mod(t ) produces a large lightwave power Pmax . The ratio of the minimum lightwave signal power Pmin to the maximum lightwave signal power Pmax is defined as the extinction ratio e x , P min . (7.5.6) ex = Pmax

Rate Equations A light-emitting diode does not have a resonant cavity. The rate (7.5.4) describes how the emitted lightwave power P (t ) depends on the time dynamics of the carrier density N (t ).

328

7 Lightwave Components

Lightwave Power Out P max

P min

i mod(t) Current In

Threshold current i bias(t)

Figure 7.18 Current modulation of a laser diode.

A laser diode does have a resonant cavity. The positive feedback within the resonant structure and the corresponding stimulated-emission process results in the charge-carrier density N (t ) depending on the lightwave power P (t ). This dependence does not exist for a light-emitting diode. Two coupled equations, one describing the dependence of the lightwave power P (t ) on the carrier density N (t ), and one describing the dependence of the carrier density N (t ) on the lightwave power P (t ), comprise the rate equations for a laser diode. They are not given here.10 This set of coupled equations includes the nonlinear effect of gain saturation. The set of coupled nonlinear differential equations can be linearized, producing a simple second-order linear system with a characteristic resonance frequency f 0 called the relaxation oscillation frequency that depends on the properties of the material and on the structure of the resonator. The relaxation oscillation frequency characterizes the bandwidth of direct-current modulation of a laser diode. It also characterizes how intensity noise generated within the laser resonator is filtered by the dynamical response of the laser diode. This is discussed in Section 7.8.1.

Modulation-Induced Frequency Fluctuations

For a direct-current-modulated laser diode, the time-varying modulation causes a timevarying charge-carrier density N (t ). In turn, this causes a time-varying change of the index within the resonator as charge carriers are swept into and out of the diode junction. This unintended time-varying change in the index produces a corresponding time-varying phase change φ(t ) in the lightwave signal. The derivative of this phase change produces a frequency offset f d (t ) = (1/2π)dφ(t )/dt that results in frequency fluctuations. The unwanted intrinsic frequency fluctuations caused by the modulation broaden the spectral width of the laser diode compared with that of an unmodulated laser diode. This extraneous broadening caused by direct modulation can be much larger than the bandwidth of the modulating signal or the spectral width of the unmodulated laser. Therefore, 10 Rate equations as applied to lasers are described in Verdeyen (1995).

7.5 Lightwave Transmitters

329

for direct-current-modulated laser diodes, the spectral width from unintended frequency fluctuations often dictates the dispersive characteristics of the lightwave channel. These fluctuations can be avoided by using an external modulator. An external modulator separates the task of modulation into a separate device that is distinct from the device that generates the lightwave. 7.5.3

External Modulators

Two kinds of external modulators used in lightwave communication systems are electroabsorption modulators and electro-optic modulators. An electro-absorption modulator is straightforward and is described first, but only briefly. An electro-optic modulator is more complicated, but also more versatile. It is described second, and in more detail. An electro-absorption modulator is a semiconductor structure in which a current applied to the device changes the absorption characteristics, and thereby modulates the lightwave signal power. An undesired consequence is that the injected current changes the index. This means that modulating the power also modulates the phase of the lightwave signal. This phase change does cause undesired frequency fluctuations, but, because the electro-absorption modulator is not a resonant structure, these fluctuations are not as severe as for a direct-modulated laser diode. One attractive feature of an electro-absorption modulator is that it can be fabricated together with a laser diode on a single integrated substrate. An electro-optic modulator is based on the nonlinear electro-optic effect. It can modulate both the amplitude and the phase of the lightwave signal. In the electro-optic effect, a voltage V , which is applied transverse to the direction of signal propagation, changes the index by an amount ´n (V ). Over a distance L in an optical waveguide that supports a single spatial mode, the applied voltage produces a phase shift φ(V ) = ´ n(V )β L , where β is the propagation constant for the waveguide mode. The voltage-dependent phase change can be written as (7.5.7) φ(V ) = π VV , π

where Vπ is the voltage required to induce a phase shift of π radians in the lightwave signal. For a modulating voltage V (t ), the corresponding modulated lightwave signal is s(t ) = Ae iφ(V ( t )).

This phase-modulated signal has an in-phase signal component given by s I (t ) = A cos φ (V (t )) and a quadrature signal component given by sQ (t ) = A sin φ ( V (t )). The electro-optic modulator can be used to form a modulator structure known as a Mach–Zehnder interferometer, as shown in the inset of Figure 7.19. An unmodulated coherent lightwave source sin is split into two paths by a 3-dB coupler (cf. Section 7.1.1). The amplitude and the phase of the lightwave signal are controlled independently by applying a separate baseband signal in the form of a time-varying voltage to a segment of the waveguide in each path. Each baseband signal produces an independent phase shift . . given by φ1(t ) = φ(V1 (t )) and φ2(t ) = φ( V2(t )). The two phase-modulated signals are then recombined with the modulation response given by

330

7 Lightwave Components

cos2

V 2 V

π

π

Bias point V

Output lightwave power Input electrical signal (V)

φ1

φ2(V) Mach–Zehnder modulator

Figure 7.19 Modulation characteristic of an intensity modulator along with an input electrical waveform and an output lightwave power waveform. Inset: Mach–Zehnder modulator.

sout (t ) sin

= 21

³ iφ (t ) e

1

´ + e iφ (t ) , 2

(7.5.8)

where the factor of two is included to account for the two 3-dB couplers. A constant phase ψ , not shown, can be introduced to account for any difference between the two path lengths. Factoring out the common phase (φ1 (t ) + φ2 (t ))/2 in each path, (7.5.8) can be written as ³ ´ sout(t ) = ei(φ1 (t )+φ2 (t ))/2 cos (φ 1(t ) − φ2 (t ))/ 2 . (7.5.9) sin The first term, which is the phase, is controlled by the sum of the two modulating signals. The second term, which is the amplitude, is controlled by the difference in the two modulating signals. This kind of modulator is appropriate for a format that modulates both the magnitude and the phase. When φ1(t ) = −φ2 (t ), the complex exponential in (7.5.9) equals one. Then the device modulates only the amplitude and is called a balanced modulator. Using (7.5.7), the modulation response is

.

sout (t ) sin

= cos

¹ π V (t ) º 2 Vπ

,

(7.5.10)

where V (t ) = V1 (t ) − V2(t ) is the difference between the signals applied to the two paths. The voltage V (t ) then would be proportional to the arccosine of the desired lightwave signal. Although (7.5.9) includes both the amplitude and the phase, and in principle can be used to generate in-phase/quadrature modulation, it is often convenient to modulate the in-phase and quadrature components separately using two amplitude modulators

7.6 Noise in Lightwave Receivers

331

given by (7.5.10). This avoids the conversion from in-phase/quadrature modulation to magnitude/phase modulation. The response of a balanced modulator to the lightwave power P (t ) = 21 |s (t )|2 is Pout(t ) Pin

¹ º = cos2 π2 VV(t ) . π

(7.5.11)

This power-modulation response is shown in Figure 7.19. The modulating electrical waveform is biased at the midpoint of the positive slope of the power-modulation function as shown in Figure 7.19. Absent the use of arccosine-squared predistortion, the nonlinear cosine-squared dependence of the power-modulation function can cause amplitude distortion, but can also suppress some other forms of amplitude distortion that occur near the minimum or the maximum of the modulating signal. This instance of suppression is evident in Figure 7.19.

7.6

Noise in Lightwave Receivers A lightwave receiver is impaired by a combination of additive lightwave noise, photon noise, noise generated from the photodetection process, and additive electrical noise. Two other forms of noise that are generated within the photodetector are considered in this section. These are dark-current noise and photodetector-gain noise.

7.6.1

Dark-Current Noise

Dark-current noise is a discrete form of noise with a constant photoelectron arrival rate µ dark produced by electron–hole pairs that are generated when no light is incident on the photodetector. The dark current depends on the operating conditions such as temperature and supply voltage. It is independent of other noise sources and is modeled using photon optics as a point process described by a Poisson probability distribution (cf. Section 6.2.3) that is added to the signal generated within the photodetector. Dark current causes a constant-arrival-rate photoelectron noise term even in the absence of light, and cannot be rejected by a photon-counting receiver (cf. Section 9.5.2). 7.6.2

Internal-Gain Noise

The internal-gain process in an avalanche photodiode is described as a discrete random process shown schematically in Figure 7.5(b). This gain process has two independent forms of randomness. The first form of randomness, photon noise, is the random arrival time τ ² for each primary photoelectron. The second form of randomness, called internal-gain noise, is the random amplitude G for each electrical pulse generated from a primary photodetection event. The probability mass function of the random internal gain G in an avalanche photodiode is derived starting from the conditional probability mass function p(m|k) that m secondary photoelectron events are generated given k primary photodetection events.

332

7 Lightwave Components

The internal gain in an avalanche photodiode is based on the process of impact ionization (cf. Section 7.3.1). An impact ionization event occurs when an electron or a hole traverses a region with a large potential gradient, thereby acquiring significant energy. When this high-energy carrier impacts a lattice site, the acquired energy can be released in the form of a secondary electron–hole pair, thereby producing gain. The ratio of the probability of a hole creating a secondary electron–hole pair to the probability of an electron creating a secondary electron–hole pair is called the electron–hole ionization ratio κ . A local impact ionization event model permits any distance between ionization events. A nonlocal impact ionization event model posits a dead space defined as the minimum distance that a carrier must travel after each ionization event before it acquires sufficient energy to generate another secondary electron–hole pair.11 The conditional probability mass function p(m|k) that m secondary photoelectron events are generated in an interval given k primary photodetection events in that interval is expressed in terms of a conditional characteristic function (cf. (6.7.23)). For a local impact ionization model, closed-form expressions are known for p(m| k).12 However, the ∑ unconditional probability mass function p(m) = k p(k) p(m|k) requires the evaluation of an infinite summation that has no known closed-form solution. A probability mass function that approximates p(m) can be derived by replacing the conditional gain for each photodetection event by the mean gain. This approximation of p(m) is given by pm (m) =



1

2πσm

S(m)−3/ 2 exp

¿

À ( m − Es)2 − 2 . 2σm S(m)

(7.6.1)

When the discrete number of secondary photoelectrons m is treated as a continuous variable, this expression can be viewed as a skewed gaussian function13 that incorporates the effect of the current loops shown in Figure 7.5(b). This signal-dependent skew factor Á Â Á Â2 √ is S(m) = 1 + (m − Es)/νσm , where ν = ws F /( F − 1), F = G2 / G is the excess . noise factor, wÁ sÂ= E s/G is the expected value of the signal at the input Á Â to the internal gain . . process, G = G is the mean gain of the avalanche process, Es = m is the mean number of secondary photoelectrons at the output of the gain process, and σm2 is the variance. When S(m) is equal to one, the gain is not skewed. For this case, (7.6.1) reduces to a gaussian probability density function. The excess noise factor F satisfies F

2 =. ±ÁGÂ2² = κG + (1 − κ)(2 − G −1),

G

(7.6.2)

with κ being the hole–electron ionization ratio. When G = 1, then F = 1 for any value of the ionization ratio κ , thereby reducing to an expression for a conventional photodiode with no internal gain. 11 See Campbell, Demiguel, Ma, Beck, Guo, Wang, Zheng, Li, Beck, Kinch, Huntington, Coldren,

Decobert, and Tscherptner (2004) and Campbell (2008).

12 See McIntyre (1972) and equation (2) in Conradi (1972). 13 In statistics this function is called an inverse gaussian probability density function.

7.6 Noise in Lightwave Receivers

333

To determine the variance σ m2 of the distribution of the secondary photocounts after the internal gain process, use the characteristic function for p(m), which is14 C m (ω) = exp

¹

ws F

( F − 1)2

³

1−

¾

1 − 2iωG( F

º ´ − 1) − iω Fw−s G1 ,

(7.6.3)

where ws is the expected value of the signal at the input to the internal gain process. Using (2.2.17) and (7.6.3), the mean-squared value for p(m) is

»» »» 2 1 d » ±m ² = »» 2 dω2 C m(ω)»»» i 2

ω=0

= G2 ws (F + ws ) .

The variance is then

.

σm2 = ±m2 ² − ±m²2 ( ) = G2 ws ( F + ws ) − ws G 2 = wsG2 F = EsG F,

(7.6.4)

where ±m² = Es = ws G is the mean number of secondary photoelectrons at the output of the gain process. A plot of the probability mass function of the amplified signal given in (7.6.1) is shown in Figure 7.20. That probability mass function is significantly skewed, with the amount of skew related to the electron–hole ionization ratio κ . When one kind of charge carrier produces most of the ionization events, as is shown by curve (b) in Figure 7.20, the probability mass function becomes more symmetric but it is still significantly skewed –2 –4 noitubirtsiD ytilibaborP goL

–6 –8 (c)

(b)

(a)

–10 –12 –14 0

2

4 Output Carriers (x1000)

( )

6

8

Figure 7.20 The probability mass function pm m of the amplified output signal given in (7.6.1)

for (a) κ = 0.1, G = 25, and ws = 100, (b) the same as (a) but with κ probability density function with the same mean and variance as (b). 14 See Tang and Letaief (1998).

= 0.01, and (c) a gaussian

334

7 Lightwave Components

compared with a gaussian probability density function with the same mean and variance, as is shown by curve (c) in Figure 7.20.

7.7

Noise in Lightwave Amplifiers Spontaneous-emission noise in a lightwave amplifier has a discrete nature that is distinctly different from the noise in amplifiers used in lower-frequency systems. A photon-optics signal model is used to describe the discrete-energy nature of this noise. However, when a large-signal approximation is appropriate, a wave-optics noise source modeled as a circularly symmetric gaussian noise process (cf. Section 6.2.2) is used to describe the spontaneous-emission noise as a continuous random waveform. Our task is to understand both the discrete and the continuous noise models for a lightwave amplifier, including their differences and similarities.

7.7.1

Power Density Spectrum

For a lightwave amplifier that produces gain through the process of stimulated emission, the average number of noise photons per mode at the input to the amplifier has two contributions. The first contribution is from the quantum fluctuations in the incident lightwave. Section 6.1.2 states that the power density spectrum of lightwave noise for typical operating conditions is dominated by the quantum noise from vacuum-state fluctuations (cf. Figure 6.3). When a material that can provide gain interacts with a lightwave, these fluctuations result in spontaneous emission. The noise in each mode corresponds to an energy of h f /2 (cf. (6.1.12)). The second contribution to the noise is from the additional quantum fluctuations that occur within the material. These fluctuations are independent of the fluctuations in the incident lightwave and are also equivalent to half a photon per mode.15 Therefore, the input noise power density spectrum is the equivalent of one photon per mode, with the total expected number of photons per mode given by E + 1, where E is the expected number of signal photons in the mode generated by the modulation. The differential equation that describes the gain for the expected number of signal photons plus noise photons in a single mode without an attenuation term included has a form similar to (7.4.3). It is given by

or

d(E + 1) dz dE dz

= γint (E + 1)

= γint (E + 1),

(7.7.1)

where γint is the internal gain per unit length, and d(E + 1)/dz = dE/dz. The first term on the right is the amplification of the signal. The second term on the right is the amplification of the noise, which is amplified spontaneous emission. 15 See Caves (1982) and Milonni (1994).

7.7 Noise in Lightwave Amplifiers

335

With an attenuation term now included, this becomes dE dz

= γint (E + 1) − αm E.

(7.7.2)

The attenuation of the incident quantum noise is not included because the attenuation in the incident noise field is balanced by new noise fluctuations generated by the dissipated energy.16 The net effect is a constant noise source that is not attenuated. Now write (7.7.2) as dE = γ (E + nsp ) = γ (E + 1 + Namp ), (7.7.3) dz

where γ

= γint − αm is the net gain per unit length, and . γint = γint nsp = γ γ −α int

m

(7.7.4)

is defined as the spontaneous-emission noise factor. The factor nsp has a minimum value of one and characterizes the amount of noise added by the amplifier. This spontaneousemission noise factor can be partitioned into a noise term Namp

= nsp − 1 = γ α−m α int m

(7.7.5)

at the input to the amplifier that excludes quantum noise, which can be described classically, and a quantum noise term that has a value equivalent to one photon per mode. Rewrite (7.7.3) as d ( −γ z ) Ee = γ n sp e−γ z. dz

= 0 to z = L . This gives e−γ L Eout − Ein = −nsp (e−γ L − 1),

For an amplifier of length L , integrate each side from z

which reduces to Eout

= GE in + (Namp + 1)(G − 1),

(7.7.6)

where Ein is the expected number of signal photons at the input to the amplifier, Eout is the expected number of signal photons at the output of the amplifier, and G = eγ L is the amplifier gain over a length L . The first term G Ein on the right of (7.7.6) is the amplified input signal. The second term on the right of (7.7.6) is defined as N sp

=. (Namp + 1)(G − 1) = nsp (G − 1).

(7.7.7)

This term is the power density spectrum N sp of the noise at the output of a lightwave amplifier expressed in terms of the expected number of noise photons per spatial mode 16 This subtle effect is fundamental and is called the fluctuation–dissipation theorem. The application of this

theorem to lightwave systems is presented in Henry and Kazarinov (1996), Section VII C.

336

7 Lightwave Components

in a single polarization. As in the discussion in Section 6.1.1, this value is equated to a (single-sided) spontaneous-emission power density spectrum at the output of the lightwave amplifier. When the number of spontaneous-emission events is large, the discrete power density spectrum can be approximated by a continuous power density spectrum N sp ( f ) given by Nsp ( f ) = h f Nsp

= h f n sp (G − 1).

(7.7.8)

The spontaneous-emission noise power is given by (cf. (6.2.18)) Pn

=

̰ 0

Nsp ( f )d f .

(7.7.9)

The optimal performance of a lightwave amplifier using stimulated emission is achieved whenever the gain per unit length γ is large compared with the loss per unit length αm , and the total gain G is large. For these conditions, nsp approaches one (cf. (7.7.4)), G − 1 approaches G, and (7.7.7) reduces to Nsp

≈ G.

(7.7.10)

This expression states that the power density spectrum of the amplified spontaneousemission noise, expressed in terms of the expected number of noise photons, is approximately equal to the lightwave amplifier gain G. This power density spectrum at the output is equivalent to one noise photon at the input to an amplifier with gain G. A phase-insensitive lightwave amplifier must have at least this amount of noise. This is typically expressed in terms of the equivalent number of photons. This minimim amount of noise is called the quantum-noise limit. 7.7.2

Probability Distribution Functions

The mean and the variance both for the wave-optics signal model and for the photonoptics signal model of the probability distribution function of the amplifier output random variable E in Figure 7.21 are now derived. The development uses the probability distribution functions derived in Section 6.5.

Wave Optics

Within wave optics, the output sample E shown in Figure 7.21 is the directly photodetected lightwave energy, which is the integral of the squared magnitude of the sum of a constant signal and a bandlimited circularly symmetric gaussian noise process modeling spontaneous-emission noise. It will be shown in Section 10.6 that this kind of Optical Amplifier

Optical Filtering B

sin

Square-Law Photodetection | · |2

r(t)

Integrate and Sample T

E

no(t)

Figure 7.21 The lightwave components used to generate a sample for an amplified lightwave signal using direct photodetection.

7.7 Noise in Lightwave Amplifiers

337

energy demodulator is optimal for a lightwave source for which the phase varies rapidly compared with the duration of a symbol. The random variable E is the integral over an interval of duration T . The probability density function f ( E ) for the sample E is a noncentral chi-square probability density function (cf. (6.5.5)) with N = 2K degrees of freedom (cf. Section 6.5.1). The integer K is the number of independent exponentially distributed subsamples used to approximate the squared magnitude of the bandlimited gaussian noise over an interval of duration T . The number of subsamples K depends on the ratio of the integration time T to the coherence timewidth τc defined by the bandlimiting shown in Figure 7.21. It is K = ³T /τc ´. The mean and variance of E are given by (cf. (6.5.6))

Á EÂ = E + K N , s sp 2 σ = 2Es Nsp + K Nsp2 ≈ 2Es Nsp . E

(7.7.11a) (7.7.11b)

The second term in (7.7.11b) can be neglected under normal operating conditions, so σ E2 ≈ 2Es N sp . This is discussed further in Section 8.2.5. The signal term Es after lightwave amplification with gain G is (cf. Table 6.3) Es

= RG Pin T = RG E in ,

(7.7.12a)

where Pin = | sin | 2/2 is the lightwave signal power at the input to the amplifier, and E in = Pin T is the input lightwave signal energy over an interval of duration T . The photodetected mean noise in one polarization mode is (cf. (6.5.3) and (7.7.8))

. R N sp = Rh f n sp (G − 1),

N opt =

(7.7.12b)

where N sp is a summary notation for the mean lightwave noise. The first term of the variance in (7.7.11b) is generated by the lightwave signal mixing with the spontaneous emission in the square-law photodetector. This term is called signal–spontaneous-emission-noise or signal–noise mixing. The signal–spontaneousemission term 2E s Nsp is not affected by the integration time because the signal and the noise are correlated only during the coherence timewidth τc , so K = 1 for this term. The second term of the variance in (7.7.11b), which does not depend on the signal, is generated by the spontaneous emission mixing with itself, which is called spontaneous–spontaneous-emission noise.

Photon Optics

The probability mass function for photon optics is the Poisson transform of the probability density function for the wave-optics lightwave energy as was shown in Chapter 6. The Poisson transform of the noncentral chi-square probability density function is the Laguerre probability mass function (cf. (6.5.9)). The mean and variance given in (7.7.11) are replaced by (cf. (6.5.10))

Ám = E + K N , s sp 2 σm = Es + K Nsp + 2EsNsp + K N2sp .

(7.7.13a) (7.7.13b)

338

7 Lightwave Components

For ideal photodetection with η

= 1, the signal term Es is (cf. Table 6.3) Es = G Ein ,

(7.7.14)

where G = eγ L is the lightwave amplifier gain, and Ein is the expected number of signal photons at the input of the lightwave amplifier. The last two terms in (7.7.13b) correspond to the wave-optics terms given in (7.7.11b) for the signal–noise mixing term and the noise–noise mixing term. The first two terms in (7.7.13b) are the photon noise terms consisting of the term Es generated by the signal and the term K N sp generated by K independent subsamples used to describe the spontaneous-emission noise. 7.7.3

Noise Figure

The amount of noise added by an amplifier is quantified using the spectral noise figure. Several definitions are in use.

Spectral Noise Figure

The definition given in (2.2.81) of the spectral noise figure, repeated here, is FN ( f ) =

N in ( f ) + N a ( f ) , Nin ( f )

where N a ( f ) = Nsp ( f )/ G is the power density spectrum at the input of the lightwave amplifier, where Nsp ( f ) is the power density spectrum at the output of the lightwave amplifier as given in (7.7.8). The term N in( f ) = h f is the quantum-noise-limited power density spectrum at the input of the amplifier, which is equivalent to one noise photon as discussed at the beginning of Section 7.7. Accordingly, the basic spectral noise figure for an optical amplifier at a frequency f is FN ( f ) = 1 +

(

G −1 (G − 1) h f n sp hf

³ ´ = 1 + 1 − G −1 nsp .

)

(7.7.15)

The spectral noise figure does not depend on the input signal power. It is integrated over a bandwidth B to determine the noise figure F N . Inverting (7.7.15), the spontaneousemission-noise factor n sp is expressed in terms of the gain G and the noise figure FN as nsp = (F N − 1)G /(G − 1).

Alternative Noise Figure

An empirical alternative to this noise figure is often used. This is based on a definition of the signal-to-noise ratio that uses the photon number m as the signal instead of the lightwave signal s (t ). To derive the noise figure using this alternative definition of the signal-to-noise ratio, suppose that the incident lightwave signal power is a constant. Then the probability mass function of the number of photons over a time interval of duration T is a Poisson distribution. For the Poisson distribution, the expected number . of photons Ein = ±m² at the input to the amplifier is equal to the variance. The expected

7.7 Noise in Lightwave Amplifiers

339

number of photons at the output of the amplifier with gain G is Eout = G Ein. The signalto-noise ratio at the input using the photon number m as the signal is SNRin

2

2

signal = variance = EEin = Ein = EGout . in

In a similar way, SNR out = Eout /σm2 , where σm2 is the photon-number variance after lightwave amplification given in (7.7.13b). Therefore, the noise figure defined in (2.2.84) using the photon number as the signal is 2

FNP

/ Eout /σ m2 out ( ) Eout + 2Eout Nsp + K N sp 1 + N sp = G Eout ( ) K Nsp 1 + Nsp 1 + 2Nsp . + = G Ç G Eout Ä ÅÆ Ä ÅÆ Ç

SNRin = SNR =

Eout G 2

Signal-independent

(7.7.16)

Signal-dependent

This alternative noise figure FNP consists of both a signal-dependent term and a signalindependent term. This contrasts with the definition of the noise figure given in (7.7.15). The difference is a consequence of defining the signal-to-noise ratio in terms of the photon number m instead of the lightwave signal s (t ). This means that the alternative definition of the noise figure uses a “signal power” that is the square of the photon number. Because this is a fourth-order statistic with respect to the lightwave signal, it is not equivalent to the lightwave power density spectrum used for the standard definition of the noise figure. When the term Eout is much larger than the noise term K N sp, which is the typical case, then the signal-dependent terms in (7.7.16) can be discarded so that

1 + 2N sp 1 2nsp (G − 1) = + , (7.7.17) G G G where (7.7.7) has been used. Now the difference between (7.7.15) and (7.7.16) is insignificant. For either definition of the noise figure, the minimum noise figure as nsp approaches one is achieved for G much larger than one. Under these conditions, both (7.7.17) and (7.7.15) approach the same limiting value of two, which is the quantum-noise limit for any lightwave amplifier based on stimulated emission. FNP ≈

7.7.4

Nonlinear Phase Noise

Signal-dependent change in the index of refraction, called nonlinear phase, is the most significant intensity-dependent nonlinearity. This results in signal distortion and noise generation. The resulting signal distortion is discussed in Chapter 5. The consequential generation of noise is discussed in this section. The nonlinear phase transfers energy from the lightwave signal to the additive noise within the lightwave amplifier, thereby creating an intensity-dependent form of noise called nonlinear phase noise φ NL . This noise is correlated with both the signal and other lightwave noise that may be present.

340

7 Lightwave Components

The statistics of intensity-dependent phase noise are derived in this section, as well as the conditions under which this additional form of noise is comparable to the phase noise caused by spontaneous emission. For a segment of fiber of length L , the intensity-dependent nonlinear phase noise φ NL is given by (5.3.20) in terms of the fiber nonlinear coefficient γ (cf. (5.3.10)) and the effective fiber length L eff (cf. (5.3.19)). This is

φ ( L) = γ L eff P = 12 γ Leff |s + n|2 , NL

(7.7.18)

where the random lightwave power P at the output of the lightwave amplifier is expressed in terms of the lightwave signal s and the additive lightwave noise n. This expression states that the nonlinear phase noise is proportional to the squared magnitude of the sum of the signal and the additive noise. Section 7.4.5 states that an independent additive-noise source n j arises in each amplifier between fiber segments. This section now studies the phase noise generated in each segment of the span under the condition that the noises introduced in each span are independent. Temporarily neglecting linear dispersion, and referring to (7.7.18), the total nonlinear phase noise φ NL ( J ) after J fiber segments is determined by adding an independent zero-mean gaussian random variable n j in each segment, which is given by (cf. (5.3.20))

¹ J » ½ »º φ ( J ) = 21 γ L eff |s1 + n1|2 + |s1 + n 1 + n2 |2 + · · · + »s1 + n j »2 . NL

j =1

(7.7.19)

The accumulated noise in the J th segment is correlated with the noise in previous segments. Because each term in (7.7.19) is a noncentral chi-square random variable, the total nonlinear phase noise φ NL ( J ) is the sum of J nonidentical and dependent noncen2 ( J ) of φ ( J ) is difficult to tral chi-square random variables. Hence, the variance σNL NL calculate exactly. 2 An approximation for the variance σNL ( J ) can be developed by ignoring the dependence between segments. The random power P ( j ) for the j th fiber segment then can be approximated as P ( j ) = Ps + j P n ,

(7.7.20)

where the signal power Ps is constant because the gain is set to balance the attenuation in each fiber segment, and the expected noise power Pn per segment is given by (6.2.18). The nonlinear phase noise φ NL ( j ) for the jth fiber segment is given by (cf. (7.7.18))

φ ( j ) = γ L eff P ( j ). NL

(7.7.21)

Define σ P2 ( j ) as the variance of the total lightwave noise power at the output of the jth fiber segment. The total noise power j P n increases linearly in j , so the variance σ P2 ( j ) in the lightwave power increases as j 2 , with σ P2( j ) = j 2σ P2 . Similarly, because the random variables φNL ( j ) and P ( j ) are linearly related by (7.7.21), the variances are

7.7 Noise in Lightwave Amplifiers

341

2 ( j ) = (γ L )2σ 2( j ). Combining these statements, the variance σ 2 ( j ) of related by σNL eff P NL the nonlinear phase noise for the jth segment is

σ 2 ( j ) = j 2(γ Leff )2 σ 2, NL

(7.7.22)

P

where σ P2 is the noise-power variance in a single segment. Summing over J segments, 2 the accumulated nonlinear phase-noise variance σNL ( J ) is

σ ( J ) = σ (γ L eff) 2

NL

2

P

2

J ½ j =1

j2

= σ (γ ) ( + 1)( J + 1) J ≈ σ (γ ) , 1 2 L eff 2 2J 6 P 1 3 2 L eff 2 3J P

(7.7.23)

where the approximation is appropriate for large J. When the mean signal power Ps is much larger than the mean noise power Pn , (7.7.18) can be written as

±φ ( J )² ≈ ±φs ( J )² = J Ps γ L eff = Ps γ Ltotal . (7.7.24) Similarly, the variance σ 2 = 2Pn Ps + Pn2 of the random lightwave power P for a single segment can be approximated as 2 Pn Ps (cf. (6.2.15b)). Substituting σ 2 ≈ 2Pn Ps into (7.7.23), and using (7.7.24) to replace Ps ( J γ L eff )2 by ±φs ( J )²2 / Ps , the variance of the nonlinear phase noise ±φ ( J )² can be written as 2 (7.7.25) σ 2 ( J ) ≈ 2 ±φs ( J )² , NL

P

P

NL

NL

3 OSNR

where

Ps (7.7.26) J Pn is the optical-signal-to-noise ratio at the output of the span. The approximation given in (7.7.25) based on independent phase-noise terms is appropriate when the signal-to-noise ratio is large. When the signal-to-noise ratio is small, the neglected dependence of the fiber segments must be included for an accurate analysis.17 OSNR =

Linear Versus Nonlinear Phase Noise

2 of the nonlinear Under the approximations leading to (7.7.25), the variance σNL phase noise at the output of the span is proportional to the square of the mean intensity-dependent nonlinear phase shift ±φs ²2 produced by the signal, and is inversely proportional to the optical signal-to-noise ratio. 2 is comparable The nonlinear phase noise is a significant impairment whenever σNL 2 to the linear phase noise, denoted σ L , generated by the spontaneous emission. For a large signal-to-noise ratio, the marginal probability density function of the phase noise generated by spontaneous emission is well approximated by a gaussian density with a variance given by σ L2 = σ 2/ A 2 (cf. (2.2.36)). Assigning Ps = A 2/ 2 and σ 2 = J Pn leads to 1 J Pn = 2 OSNR , σ L2 = 2P s 17 See Ho (2005).

342

7 Lightwave Components

for the variance of the linear noise, where OSNR = Ps /( J Pn ) (cf. (7.7.26)). Now equate this expression to the right side of (7.7.25) and approximate ±φNL² by ±φs ² (cf. (7.7.24)) to show that the linear phase noise and the nonlinear phase noise are equal in strength when 2 ±φNL ²2 3 OSNR

1 = 2 OSNR . (7.7.27) √ Therefore, beyond the upper limit ±φ ² = 3/ 2 radians, the nonlinear phase noise dominates the linear phase noise. Because the mean nonlinear phase shift ±φ ² can be NL

NL

approximated as a linear function of the signal power (cf. (7.7.24)), this condition sets a practical limit on the maximum input√lightwave signal power Ps for a span that uses lightwave amplifiers. When the value 3/ 2 is exceeded, the variance of the nonlinear phase noise exceeds the variance of the phase noise from spontaneous emission. At this point, increasing the signal power also increases the total noise power proportionally and no longer produces a proportional increase in the signal-to-noise ratio. Indeed, for a sufficiently large signal power, the signal-to-noise ratio actually decreases.

7.8

Noise in Laser Transmitters The emission from an unmodulated laser is a combination of stimulated emission and spontaneous emission. The stimulated emission is generated by feedback in the resonant cavity. Absent modulation, stimulated emission is treated as a fixed power. The initial source of energy for the oscillation inside the resonator is spontaneous emission, which is the dominant source of light prior to lasing, but not after. Because the spontaneous emission is generated within a resonator both below the lasing threshold and above the lasing threshold, it is a nonadditive form of noise. Drawing on the material in Chapter 6, this section derives both the power density spectrum and the probability density function for the complex amplitude of a lightwave emitted from a laser. This section also derives equivalent expressions for the lightwave power. For a wave-optics signal model of a laser operating well above the lasing threshold, the light from the stimulated emission is treated as a constant signal and the spontaneous-emission amplitude is modeled as a circularly symmetric gaussian random variable. When deriving the power density spectrum for the complex amplitude of a laser, the amplitude fluctuations from the spontaneous emission within the resonator will be ignored in comparison with the phase fluctuations.

7.8.1

Power Density Spectra

The power density spectrum of the complex lightwave amplitude generated by a lightwave source such as a laser is a second-order statistic of the lightwave. The power density spectrum of the lightwave power is a fourth-order statistic of the lightwave. For direct photodetection, the power density spectrum of the electrical signal is proportional to the power density spectrum of the lightwave power. A random lightwave signal of

7.8 Noise in Laser Transmitters

343

sufficient amplitude, such as noise, may be described as a stationary, circularly symmetric gaussian random process. It was shown in Section 6.4 that, for such a waveform, the power density spectrum of the lightwave power can be expressed in terms of the power density spectrum of the complex lightwave amplitude (cf. (6.4.2)). In a resonator that provides feedback, however, the combination of stimulated emission and spontaneous emission noise cannot be treated as a circularly symmetric gaussian random process. Therefore, in contrast to the relationship presented in Section 6.4, there is no general relationship between the power density spectrum of the lightwave power emitted from a laser source and the power density spectrum of the complex lightwave amplitude. This means that for direct photodetection, there is no general relationship between the signal-to-noise ratio in the optical domain, as could be measured by an optical spectrum analyzer, and the signal-to-noise ratio in the electrical domain, as could be measured by an electrical spectrum analyzer. Accordingly, this section separately discusses both the power density spectrum of the complex lightwave amplitude of a laser and the power density spectrum of the lightwave power of a laser.

Power Density Spectrum of the Complex Amplitude

The power density spectrum of the complex amplitude of a laser with power P (t ) can be √ determined by writing the complex amplitude of the source as s (t ) = P (t )ei φ(t ) . Were a lightwave source to have no amplitude or phase fluctuations, it would be described by two constants: P (t ) = P and φ(t ) = φ. The autocorrelation function R s (τ) of this ideal source at a single point in space is (cf. Section 2.3.5) Rs (τ) = ±(s (t )s ∗ (t



= ± Pe = P,





+ τ))² Pe −iφ ²

which is independent of the time difference τ . Therefore, the corresponding power density spectrum of the lightwave is a scaled Dirac impulse at frequency f c given by S ( f ) = P δ( f − f c ). Oscillators at lower frequencies are often adequately modeled as such an idealized source, but this model can be inadequate for a lightwave source.

Random Phase Model A more refined model of a coherent lightwave source treats each spontaneous-emission event as an incremental phase perturbation of a constant complex-amplitude signal. These many random phase perturbations result in a random walk in the lightwave phase φ(t ) as a function of time. Each spontaneous-emission event independently changes the incremental phase in a random manner. For a laser operating well above the lasing threshold, a phase process described by a random walk is justified because the phase fluctuations within an oscillator cannot involve a net energy transfer. Therefore phase fluctuations caused by spontaneous-emission noise are more significant than are the amplitude fluctuations caused by spontaneous-emission noise. This leads to a random phase-noise process called a Wiener process.

344

7 Lightwave Components

An equivalent formulation describes the effect of the spontaneous emission in terms of a frequency-noise process f d (t ) that perturbs the lightwave frequency from the nominal frequency of the resonator. This frequency offset is modeled as a zero-mean white gaussian random process, which can be shown to be the derivative of a random-walk phase process as the time difference between the incremental phase offsets goes to zero. This time-varying phase broadens the spectral width of the emitted light. The autocorrelation function R f (τ) of the frequency-noise process is given by R f (τ) = ± f d (t ) f d (t

+ τ)² = C δ(τ).

(7.8.1)

The constant C is the (two-sided) power density spectrum of the frequency-noise process f d (t ) with units of hertz 2/ hertz = hertz. This is the rate of spontaneous emission. The integral of f d (t ), which is the phase-noise process φ(t ), given by

φ(t ) = 2π

Ãt 0

f d (τ)dτ,

(7.8.2)

is the Wiener process. When necessary to ensure that the random process s (t ) = √ Pe iφ(t ) is stationary, a random, uniformly distributed phase θ may be included as

an initial phase. The autocorrelation function Rs (τ) of the lightwave source is Rs (τ) = ±(s (t )s ∗ (t iφd (t )

+ τ))² ²,

= P ±e (7.8.3) where the term φd (τ, t ) = φ(t + τ) − φ(t ) is the phase difference of the laser over a positive time interval τ . This is written as à t +τ φd (τ, t ) = φ(t + τ) − φ(t ) = 2π f d (t µ )dt µ . (7.8.4) t

The same analysis holds for negative τ with the limits in (7.8.4) reversed. The incremental phase difference φd (τ, t ) is produced by the integration of a stationary, zero-mean, white gaussian random process over an interval of duration τ . Therefore, for a fixed τ , φd (τ, t ) is also a stationary gaussian random process (cf. Section 2.2.2) with respect to the time t, but φd (τ, t ) is a Wiener process with respect to the integration interval τ for any value of t . Define the phase noise φ d as a zero-mean gaussian random variable with variance

σd2 . The expectation ±eiφ ² in (7.8.3) is now Ã∞ iφ eiφ f (φ d )dφd ±e ² = −∞ à ∞ =√1 e iφ e−φ /2σ dφd 2πσd −∞ = e−σ /2 . d

d

d

d

2 d

2 d

2 d

(7.8.5)

The last line follows because this integral is an instance of the Fourier transform of a gaussian function, as given by (2.1.48) with t replaced by φd , and ω = 1.

7.8 Noise in Laser Transmitters

345

σd2 is equal to the expectation of the square so ȹ à t +τ º2 É fd (t µ )dt µ σ d2 = ±φ2d ² = 2π t à t +τ à t +τ 2 = (2π) ± f d (x ) f d ( y)²dx dy.

For a zero-mean process, the variance that

t

t

Using (7.8.1) and the sifting property of the Dirac impulse given in (2.1.2), the variance σd2 of the phase-noise process is a linear function of the integration interval τ given by

σ d2 = (2π)2 C τ for positive τ . The sign reverses for negative τ , so positive and negative τ

can be

combined into the single expression

σd2 = (2π)2C |τ|.

(7.8.6)

Now substitute (7.8.6) into (7.8.5) and substitute (7.8.5) into (7.8.3) to yield the autocorrelation function for the laser diode source, R (τ) = Pe −2π

2C

|τ| .

The Fourier transform pair given in Table 2.1, together with the shifting and scaling properties of the Fourier transform, shows that the power density spectrum S ( f ) of the laser output waveform above the lasing threshold has the form of a lorentzian function centered at frequency f c ,

S( f ) = P

B /2π

( f − f c )2 + ( B/2)2 ,

(7.8.7)

where B = 2π C. The full-width-half-maximum spectral width B of the power density spectrum S ( f ) is called the intrinsic linewidth. This rate dictates the width of the power density spectrum S ( f ) of the laser because fewer spontaneous emission events lead to a narrower spectral width.

Random Phase Plus Additive Gaussian Noise For a laser operating well above the lasing threshold, amplitude fluctuations in the complex lightwave signal s (t ), though small, do exist. They can be incorporated by including an additive circularly symmetric gaussian noise process n(t ). For this model, the emitted lightwave signal s (t ) is written as 18 s (t ) = Ae iφ(t )

+ n(t ).

(7.8.8)

The first term on the right of (7.8.8) represents the signal generated by the stimulated emission based on the random-phase model described in the previous section. The second term represents the additive noise generated by the spontaneous emission. The phase noise process φ(t ) varies slowly with respect to the complex noise process n(t ). The 18 For details, see Section 4.4 of Goodman (2015).

346

7 Lightwave Components

magnitude | n(t )| of this noise process is much less than A. This noise term affects both the amplitude and the phase of the total lightwave signal. For moderate power levels, the magnitude of the additive-noise term decreases as the photon lifetime in the laser resonator increases. This is because each photon within the resonator can generate many stimulated-emission events for each spontaneous-emission event.

Power Density Spectrum of the Lightwave Power

Fluctuations in the lightwave power, or, equivalently, in the directly photodetected electrical signal, are referred to as intensity noise. This kind of noise is characterized using a normalized covariance function r P (τ) for the laser power given by

. ± P(t ) P (t + τ)² − ± P ²2 . ± P ²2

r P (τ) = For τ

= 0, this reduces to r P (0) =

(7.8.9)

± P 2² − ± P ² 2 = σ 2 = 1 , ± P ²2 ± P ²2 SNR P

where SNR is the electrical signal-to-noise ratio after photodetection. The Fourier transform of r P (τ) is called the relative intensity noise (RIN) of the laser. It is given by

. RIN ( f ) = 2

̰

−∞

r P (τ)e−i2π f τ dτ,

f

≥ 0.

For consistency with common practice, this is written as a one-sided spectrum by including the factor of two. The power density spectrum of the laser intensity noise is not constant in frequency because the power density spectrum of the laser power is determined by the frequency response of the resonator. Because the relative intensity noise is defined in terms of the normalized covariance function given in (7.8.9), the photodetected electrical power density spectrum is NRIN ( f ) =

R2 ±P ²2 RIN( f ).

(7.8.10)

When specified in the electrical domain after photodetection, this power density spectrum includes the effect of shot noise. Setting NRIN equal to the power density spectrum N shot of the shot noise given in (6.7.8) produces the minimum shot-noise-limited relative-intensity noise RIN shot given by RIN shot

R

2e 2e = = = RN2±shot 2 P² R± P² ±i ² ,

(7.8.11)

where ±i ² = ± P ² is the mean electrical signal generated by direct photodetection. For example, using a photodetector with a responsivity of = 1 A/W, a laser with a power level of 1 mW has a shot-noise-limited relative intensity noise of approximately −155 dB/Hz. It cannot be smaller.

R

7.8 Noise in Laser Transmitters

7.8.2

347

Probability Density Functions

This section develops the probability density functions for the lightwave amplitude and the lightwave power emitted from a single-mode laser.

Probability Density Function of the Emitted Complex Amplitude

Over a time interval of a duration that is small compared with the coherence timewidth of the phase noise process φ(t ) given in (7.8.8), the probability density function f s (s ) of the complex lightwave amplitude s (t ) can be modeled as the sum of a constant bias Aei φ describing the stimulated emission and a circularly symmetric gaussian random variable n describing the effect of the spontaneous-emission noise. To the first order of approximation, when the squared magnitude of the bias increases, the variance of the noise decreases. For a laser operating well below the lasing threshold, the laser output is spontaneous-emission noise with f s (s ) given by a circularly symmetric gaussian random variable n. The variance of the spontaneous-emission noise below threshold is different than the variance of the noise above threshold because there is no competing stimulatedemission process below threshold.

Probability Density Function of the Power for a Single-Mode Laser

As the power in a single-mode laser transitions from well below the lasing threshold to well above the lasing threshold, the probability density function for the laser power begins as an exponential distribution (cf. (6.2.2)), then morphs into a noncentral chisquare probability density function (cf. (6.2.14)) when the power is somewhat above the lasing threshold, and then morphs further into a gaussian probability density function when the power is well above the lasing threshold. The evolution of the probability density function in the resonator is described by a stochastic differential equation, not given herein. This equation has a closed-form solution in terms of a parameter a given by19 f (P ) =



1 −( P −a) , P ≥ 0, √2π 1 + erf (a ) e 2

(7.8.12)

where P = P /( π Pth ) is a normalized output laser power in the mode, with Pth being the lightwave power at the lasing threshold. The parameter a characterizes the oscillation characteristics of the laser. It varies from a large negative value well below the lasing threshold, to zero at the threshold, to a large positive value well above the lasing threshold. The probability density function as a function of the normalized power P for several values of the parameter a is shown in Figure 7.22. The probability density function in (7.8.12) for a laser operating well below threshold is shown in Figure 7.22a. It approaches an exponential probability density function given by f (P ) ≈ 2| a|e−2|a |P ,

19 See Goodman (2015), and Risken (1970).

P ≥ 0,

(7.8.13)

348

7 Lightwave Components

0 noitubirtsid ytilibaborp goL

–2 –4 –6 (a)

–8

(b)

(c)

–10 –12 –14 0

2

4

6 P /(√π Pth)

8

10

12

Figure 7.22 Plots of the evolution of the probability density function of the laser power as a

function of the normalized laser power P . (a) A laser operating below the lasing threshold (a = −5) compared with the exponential probability density function given in (7.8.13) shown as a dashed line. (b) A laser at threshold (a = 0). (c) A laser operating above threshold (a = 5) compared with the gaussian probability density function given in (7.8.14) shown as a dashed line.

Á Â

Á Â2

with mean P = 1/2|a| and variance P = (1/2| a|)2 . The probability density function in (7.8.12) for a laser operating well above threshold, shown in Figure 7.22(c), approaches a gaussian probability density function given by f (P ) ≈

Á Â

√1π e−(P −a) , 2

P ≥ 0,

(7.8.14)

where P = a and σP2 = 1/2. The derivation of these expressions is asked for as an end-of-chapter problem.

Mode-Partition Noise

The probability density function of the power was derived for a single-mode laser in the previous subsection. When the laser resonator supports several lasing modes, the power can fluctuate between lasing modes even though the total power remains constant. Above the lasing threshold, the number of lasing modes depends on the structure of the resonator. Let Pmax and Pnext be the largest power in one mode and the nextlargest power in a different mode, respectively. The ratio Pmax / Pnext is called the modesuppression ratio. It is one measure of the spectral purity of the laser. The fluctuating exchange of power between lasing modes is a form of modedependent noise called mode-partition noise. In general, each lasing mode has a different gain γ , a different loss αm , and a different number of photons m. Consider a mode with an expected number of photons S that is less than the lasing threshold. The actual number of photons in this mode is a random variable with a probability density function that can be approximated by an exponential probability density function given in (7.8.13). The probability that the mode lases, denoted prob( P > Pth ), is the probability that the power P in that mode exceeds a threshold power, denoted Pth. This probability is determined by integrating (7.8.13),

7.9 References

prob( P

̰

√ − 1 2|a| e π √ − 2| a|/ π =e ,

> Pth ) =

349

−2|a| P dP (7.8.15)

where the lower limit for the integration is the normalized power parameter P = √ P /( π Pth ) for that mode at the lasing threshold at which P = Pth . As an example, consider a mode with a mean signal power below √ the lasing threshold such that a = −4 in (7.8.12). Then prob ( P > Pth ) = e−8/ π , meaning that for a mode characterized by a = − 4, the probability that the mode lases and causes a power fluctuation in the other modes is approximately 1%. When every member of a set of lasing modes is close to the lasing threshold condition, then a time-varying subset of modes will lase, causing fluctuations in the spatial power distribution even though the total power is nearly constant. Each lasing mode of a directly modulated laser has a different spatial structure and couples differently into the multiple spatial modes of a multimode fiber. When the distribution of the total laser power between the lasing modes is time-varying so that the power in each lasing mode fluctuates, the power launched into each fiber mode will fluctuate. Because each spatial mode of the fiber has a different mode-dependent group delay, this random spatial fluctuation can produce a random temporal fluctuation in the photodetected signal. This is a form of uncertainty called jitter.

7.9

References There is an immense body of literature discussing lightwave components and devices. The physics of lightwave components is presented in Chuang (2012), with components used for lightwave communication systems discussed in Venghaus and Grote (2017). The general characteristics of oscillators are covered in Lax (1967). The general theory of lasers is covered in Milonni (1988) and in Verdeyen (1995). Detailed treatments of semiconductor lasers are provided in Petermann (1988) and in Agrawal (1993), with issues of phase noise considered in Henry (1986). The application of semiconductor lasers to lightwave communications is discussed in Klotzkin (2013). The doping of rareearth atoms into an optical fiber is discussed in Poole, Payne, and Fermann (1985), with the application of erbium-doped fiber amplifiers to telecommunications presented in Mears, Reekie, Jauncey, and Payne (1987). The application of coupled-mode theory to passive photonic devices is covered in Liu (2005), with the waveguiding characteristics of couplers discussed in Okamoto (2006). The use of a Mach–Zehnder interferometer to improve the performance of a directional coupler is described in Jinguji, Takato, Sugita, and Kawachi (1990). Avalanche photodiodes are discussed in McIntyre (1966), in Campbell, Demiguel, Ma, Beck, Guo, Wang, Zheng, Li, Beck, Kinch, Huntington, Coldren, Decobert, and Tscherptner (2004), and in Campbell (2016). There is also an immense body of literature studying noise in lightwave devices. Noise in lasers is discussed in Goodman (2015). Nonlinear phase noise is covered in Gordon and Mollenauer (1990), Mecozzi (1994a), Mecozzi (1994b), and Ho (2005). The conditional probability mass function of the gain in an avalanche photodiode using a local ionization model was derived in McIntyre (1972) and verified by Conradi (1972).

350

7 Lightwave Components

The approximate form for the unconditioned probability mass function was proposed in Webb, McIntyre, and Conradi (1974). The probability mass function of the gain in an avalanche photodiode using a local ionization model is covered in Einarsson (1996). A comparison of the probability mass functions determined from different methods is provided in Personick, Balaban, Bobsin, and Kumar (1977). An analytical form for the probability mass function of an avalanche photodiode based on work by Baker (1996) is given in Tang and Letaief (1998). Fundamental sources of noise for phase-insensitive and phase-sensitive amplifiers are discussed in Takahasi (1965), in Caves (1982), and in Yamamoto and Haus (1986). The statistics of lightwave amplifiers is covered in Humblet and Azizoglu (1991). Raman amplification for lightwave communication systems is covered in Bromage (2004). The definition of a noise figure for lightwaves that is consistent with lower-frequency systems is considered in Haus (2000a). Noise in oscillators was studied by van der Pol (1926) and later by Lax (1967).

7.10

Historical Notes The method of balanced homodyne and heterodyne demodulation as applied to lightwave communications appears to have been first mentioned in a brief communication by Oliver (1961). The seminal breakthrough of laser action in semiconductors by four research groups in 1962 (Hall, Fenner, Kingsley, Soltys, and Carlson 1962; Nathan, Dumke, Burns, Dill, and Lasher 1962; Holonyak and Bevacqua 1962; and Quist, Rediker, Keyes, Krag, Lax, McWhorter, and Zeigler 1962) led to many other advances in lightwave transmitters. These include reliable room-temperature operation, laser emission at the minimum attenuation wavelength of a silica-glass optical fiber, increased modulation bandwidth for direct current modulation, and reduced spectral width for sources used for phasesynchronous modulation formats. A summary of these advances is given in Coleman (2012). Similar advances in photodiode technology are discussed in Campbell (2008) and in Beling and Campbell (2009). The seminal breakthrough of a practical erbium-doped fiber amplifier in Mears, Reekie, Jauncey, and Payne (1987) was followed by several other advances, including the demonstration of a low-crosstalk lightwave amplifier in Giles, Desurvire, and Simpson (1989). This led to the first commercial deployment of an erbium-doped fiber amplifier for a submarine system in the early 1990s and the first commercial wavelengthmultiplexed system in the mid 1990s. These developments are discussed in Zyskind, Nagel, and Kidorf (1997). The effect of phase noise on systems that use fiber amplifiers is discussed in Gordon and Mollenauer (1990). They showed therein that nonlinear phase noise caused by the Kerr nonlinearity mixing the spontaneous emission with the signal leads to a random nonlinear impairment that is difficult to compensate.

7.11

Problems 1 Three-dB coupler The governing equations for a symmetric directional coupler with inputs s1 t z and s2 t z are

(, )

(, )

7.11 Problems

ds1 (t , z) dz ds2 (t , z) dz

351

= −iκ s2(t , z ), = −iκ s1(t , z ),

where κ is the coupling coefficient between the modes in each waveguide, and each . mode has a z dependence given by e−i β z . The output signals are defined as z1 (t ) = . s1(t , L ) and z 2(t ) = s2 (t , L ). (a) Let the two inputs to the two paths of the directional coupler be s1 (t , 0) = s and s2(t , 0) = 0. Solve for z1 (t ) and z 2(t ) and determine the length L such that the two output signals are in phase quadrature. (b) Plot the power in each mode as a function of L and determine the minimum value of L that produces a 3-dB coupler. (c) Determine the minimum value of L that produces a power splitter with 10% of the lightwave power coupled into one path and 90% of the lightwave power coupled into the other path. (d) Now let the input to one path be s1(t , 0) = A, and let the input to the other path be s2(t , 0) = B. Show that for a proper choice of L, the output signals can be expressed as

± z (t ) ² 1 ± 1 =√ z2 (t ) 2

1 i i 1

²±

A B

²

,

which is the relationship for a 180-degree hybrid coupler given in (7.1.2). 2 Lossless couplers For a coupler to be lossless, the output power in the two output waveguides must equal the input power in the two waveguides so that

Pin1 where P

+ Pin = Pout + Pout , 2

1

2

= |s |2 is the root-mean-squared power and s is the lightwave signal. Let ± s (t ) ² 1 s= , s2 ( t )

be the vector of the two signals defined at either the input or the output of the coupler. (a) Show that when the coupler is lossless, s†in sin

= s†outsout ,

where † denotes the conjugate transpose and sout (b) Show that

T=

±

1 0

1 0

= Tsin .

²

does not satisfy the condition derived in part (a). This means that combining two spatially distinct input modes at the same carrier frequency into a single output mode cannot be implemented by a lossless transformation.

352

7 Lightwave Components

3 Transit time in a PIN photodetector Consider a PIN photodiode that is illuminated through the p-region as shown in the figure below: Ex p hf

i

vh + 0

x

n

ve w

x

The device is reverse-biased and is fully depleted so that the intrinsic region has no free charge in the absence of the lightwave signal. The electric field is uniform, with E x = Vx /w , where Vx is the bias voltage in the x direction and w is the device thickness shown in the figure. Electrons move through the n-region at velocity v e = µe V /w, and holes move through the p-region at velocity vh = µh V /w , where µe and µ h are constants called the electron mobility and the hole mobility, respectively. For many semiconductors µe is larger than µh and most of the carriers are generated near the illuminated region. Then the response time is minimized by illuminating through the p-region, as shown in the figure, so that the photogenerated holes have a smaller distance to traverse. Using energy arguments, it can be shown that an electron generated at time t = 0 at distance x creates a current during its transit to the n-region given by ie (x , t ) =

eve

w (u (t ) − u (t − (w − x )/ve )) ,

where u(t ) is the unit-step function (cf. (2.1.5)). This expression states that the photocurrent is nonzero for 0 ≤ t < (w − x )/ve . Similarly, the current due to the photogeneration of holes is ih ( x , t ) =

evh

w (u (t ) − u (t − (w − x )/vh )) .

Suppose the device is illuminated through the p-region by a short lightwave pulse having intensity P (t ) = E δ(t ), where δ(t ) is a Dirac impulse and E is the energy in the lightwave pulse. (a) Compute the resulting photocurrent i (t ), which, for E = 1, is the device impulse response h (t ). Neglect the time required for photons to travel through the device. Note that the infinitesimal number of carriers (electrons or holes) generated in the infinitesimal dx at a distance x is given by dN

= Eh fα e−α x dx ,

7.11 Problems

353

where α is the absorption coefficient and E is the energy in the lightwave pulse. The infinitesimal electron and hole currents i e ,h (t ) resulting from carriers generated in the infinitesimal dx at a distance x are given by di e,h (x , t ) = ie ,h (x , t )dN ,

with the total photocurrent i (t ) given by

i (x , t ) = ie (x , t ) + i h (x , t ),

where ie ,h (x , t ) = Hint:

Ãw 0

die (x , t )dt

Ãw 0

di e,h (x , t ).

´ − e−αw u(t ) ³ ´ + α1 1 − e−α(w−v t ) u (t − w/ve ).

= α1

³ −α(w−v t ) e

e

e

(b) Plot the impulse response for the p-illuminated device. For comparison, also plot the impulse response for the n-illuminated device. Use a silicon photodetector with the following parameters: λ = 806 nm, α = 105 m−1 , w = 30 µm, V = 10 V, µ e = 0.15 m 2/(V-s), and µ h = 0.045 m2/(V-s), (note that the quantum efficiency η is given by 1 − e−αw ≈ 0. 95). (c) On a log scale plot the frequency response H ( f ), which is the Fourier transform of the impulse response h(t ). Note that the zero-frequency response H (0) should be equal to the responsivity (cf. (6.2.23)), where η = 1 − e−αw . Compare the −3 dB bandwidths of a device illuminated through the p-region and a device illuminated through the n-region.

R

4 The output of a balanced photodetector The output of the balanced photodetector given in (7.3.2) is based on the coupling matrix T given in (7.1.5). Rederive the output of the balanced photodetector using the alternative form of the coupling matrix T given in (7.1.2). Comment on the result. 5 Image modes Using a two-sided frequency representation, show that a mixer has an “input image” at frequencies of the form f c 2 f IF appearing within the demodulated signal bandwidth.

¶ ∓

6 Noise figure for a combination of passive and active components Because lightwave signals have signal-dependent photon noise, any lightwave component that amplifies or attenuates the lightwave signal will change the photonnoise-limited signal-to-noise ratio. This means that for a photon-noise-limited lightwave signal, a noise figure can be defined even for a component that simply attenuates the signal. Using an ideal photon-noise-limited input lightwave signal described by a Poisson probability mass function and the definition of the noise

354

7 Lightwave Components

figure given in (7.7.16), which uses the photon number m, determine the noise figure for the following components. (a) A segment of optical fiber of length L with an attenuation coefficient κ . (b) A segment of optical fiber of length L with an attenuation coefficient κ followed by a lightwave amplifier with a gain G much larger than one and a noise figure FNP = 2n sp . (c) A lightwave amplifier with a gain G much larger than one and a noise figure FNP = 2nsp followed by a segment of optical fiber of length L with an attenuation coefficient κ . (This is the reverse of part (b)). Comment on the difference between the results of part (b) and part (c). (d) The cascade of two lightwave amplifiers is given. The first lightwave amplifier has a gain G 1 and a noise figure F1 . The second lightwave amplifier has a gain G2 and a noise figure F2. Solve for the noise figure using (7.7.15), which is based on the lightwave signal and not on the photon number. This gives the cascade rule for the noise figure, which can be written as Ftotal = F1 +

F2 − 1 . G1

This expression shows that the relative contribution of the second stage is reduced by the factor of G 1 with respect to the first stage. The noise contribution from each subsequent stage is reduced by the gain of that stage. Therefore, the overall noise is dominated by the noise in the first stage of amplification. (e) Repeat part (d) using the noise figure given in (7.7.16) under the condition that the gain of each amplifier stage is much larger than one. Comment on the result. 7 Dominant noise term in a lightwave amplifier (a) Under what conditions is the signal–spontaneous-emission noise mixing term not the most significant noise term for a lightwave amplifier? (b) Describe a system for which the conditions derived in part (a) are realized in practice. 8 Number of independent noise subsamples in terms of the symbol rate Referring to Figure 7.21, suppose that the sample value E is formed using an integration time T that is equal to the reciprocal of the symbol rate R . (a) Show that the number of independent subsamples K used to form the sample is approximately equal to B R , where B is the bandpass bandwidth of an ideal lightwave noise-suppressing filter. (b) Let the signal be a random equispaced binary sequence with an autocorrelation function given by (2.2.61). Determine K for an ideal bandpass lightwave noisesuppressing filter designed to pass 95% of the modulated signal power.

³ / ´

9 Lightwave amplifier noise terms For a wavelength of 1500 nm, let n sp 1 25, B 0 25 nm, B N 25 GHz, G 30 dB, FN 5 dB, and R 50 . The output lightwave signal is measured with a photodetector that has a responsivity of 0.8 A/W.

=

=

λ=

= µ

= .

= .

=

7.11 Problems

355

(a) Suppose the stimulated-emission cross section σe is equal to the absorption cross section σa . What is the ratio of the mean upper-state density N 2 to the mean lower-state density N 1 that will produce nsp = 1.25? (b) Determine the incident lightwave power Pin for which the following criteria hold. i. The power density spectrum of the shot noise generated by the signal is equal to the power density spectrum generated by thermal noise. ii. The signal–noise mixing term is equal to the noise–noise mixing term. (c) For what value of the input power Pin does neglecting all noise terms except the signal–noise mixing term result in a relative error in the total electrical power that is less than 1%? 10 Optimal biasing of a balanced modulator (requires numerics) (a) Illustrate and explain two different configurations for achieving an approximately linear response between the applied voltage and the output lightwave power for a balanced intensity modulator with a response given in (7.5.11) and repeated here:

Pout(t ) Pin

= cos

2

¹ π V (t ) º 2 Vπ

.

(b) Distortion in an external modulator At the optimal point for linear operation, determine the total harmonic distortion for the cosinusoidal input Vin (t ) = A cos(2π t ), where A = π V /2Vπ . The total harmonic distortion is the proportion of the total power contained in frequencies of the output lightwave power signal that are not the frequency of the cosinusoidal input signal. This output signal is Iout (t ) = cos 2( A cos (2π t ) − π/4) − 1/2, where the factor of 1/2 removes the zero-frequency term. The distortion may be determined by numerically evaluating the first Fourier-series coefficient of the output waveform and determining the proportion of the optical power contained in this component relative to the total power in the output signal. (c) Plot the harmonic distortion as a function of A = π V /2Vπ over the range of 0.01 < A < 0.5 and comment on the result. (d) Can V (t ) be predistorted to prevent this? How might this be accomplished? 11 Light-emitting-diode noise statistics A light-emitting diode has a 3 dB spectral bandwidth of 40 nm at 850 nm and a mean power of P . This lightwave source is incident on a photodetector with a responsivity of 0 5 A/W. (a) Derive the probability mass function p m for the number of photoelectrons over an integration time of T . (b) For what values of P T E can this source be modeled using a Poisson probability distribution such that the number of photoelectrons is within 5% of the number of photoelectrons calculated using the exact probability distribution?



R= .

( )

=

356

7 Lightwave Components

(c) Based on the results of part (b) and in a regime for which the data rate is greater than 1 Mb/s and the power is less than 1 W, is the approximation of p(m) by a Poisson distribution appropriate? 12 Characteristics of a laser diode An idealized laser diode is described by conditions that relate both the lightwave power PL to the injected current iin and the injected current to the applied voltage Vin as follows:

= 0.1i in P = 1.5i in − 7 iin = 0.1eV /0.5−1 iin = 0.1e−1

PL

< 5 mA, i in > 5 mA , Vin > 0 V, Vin < 0 V,

for i in for

L

for for

where PL is the laser power in milliwatts (mW), i in is the current in milliamps (mA), and Vin is the voltage in volts (V). (a) Determine the lasing-threshold current and voltage. (b) Determine the differential resistance dVin /di in at the lasing-threshold current and at twice the lasing-threshold current. (c) Determine the ratio of the lightwave power out of the laser to the input electrical power ( PL /(iin Vin )) for i in = 3 mA and i in = 10 mA. (d) A 4 mA peak-to-peak sinusoidal signal plus a bias current i bias is applied to the laser diode. Sketch PL versus i in for i bias = 4 mA and i bias = 8 mA. Comment on the result. 13 Power density spectrum of a multimode laser A multimode laser has a power density spectrum S f that is modeled by

S( f

− f 0 ) = g1( f − f0 )

∞ ½

²=−∞

( )

L( f

− f 0 − ² ´ f ),

where g1( f ) is a gaussian function of the form

2 2 g1 ( f ) = e− f /2σ

and L ( f ) is a lorentzian function of the form L( f ) =

B /2π , f + ( B /2)2 2

where B is the full-width-half-maximum spectral width of L ( f ). A plot of the spectrum is shown below for σ = 1, B = 1/10, and ´ f = 1. (a) Show that the summation can be written as a convolution,

∞ ½

²=−∞

L( f

− f 0 − ² f 0) = L ( f ) ±

where δ(t ) is a Dirac impulse.

∞ ½

²=−∞

δ( f − f 0 − ² ´ f ),

357

7.11 Problems

1 0.8 )0 f − f(

0.6 0.4 0.2 0

–3

–2

–1

0 f − f0

1

2

3

(b) Using the Fourier-transform pair (cf. Table 2.1)

´f

∞ ½ ²=−∞

º ∞ ¹ ½ ² δ( f − ² ´ f ) ←→ δ t − ´f ²=−∞

and the convolution property of the Fourier transform, derive the autocorrelation function R (τ). (c) Plot both ( f ) and R (τ) for σ = 1, Bpass = 1/10, and f 0 = 1. Comment on the result.

S

14 Characterization of a laser diode A common resonator structure for a laser diode is a Fabry–Pérot resonator. This is a resonator constructed using two parallel reflective surfaces. The spacing between the allowed frequencies f of a resonator of length d is given by f c0 2nd, where c0 is the speed of light in free space and n is the index of refraction. This value of f is called the free spectral range of the resonator.

´

´ = /

´

A semiconductor laser is fabricated with a Fabry–Pérot resonator of length d = 250 µ m and an index n = 3.5. (a) What is the free spectral range of the resonator? (b) Determine the number of possible lasing modes over a −3 dB bandwidth of 0.1 nm. (c) What is the length d of the resonator for which only one mode can lase over this bandwidth? (d) When the power density spectrum of the relative intensity noise has a constant value of −145 dB/Hz over the frequency range of 0 to 2 GHz, determine the electrical noise power from the relative intensity noise over an integration time T = 1 ns for a mean lightwave signal power of 1 mW, and a responsivity = 1A/W. (e) Compare this noise power with the thermal-noise power for T0 = 290K generated over the same frequency range. Comment on the result.

R

15 Balanced photodetection Explain in detail the method of balanced photodetection and compare this method

358

7 Lightwave Components

with the use of a mixer that directly multiplies the two lightwave signals. Illustrate your answer with your own version of a diagram both for balanced photodetection and for a mixer. 16 Dark current Let dark be the stationary dark-current arrival rate within a photodetector. (a) Using this value, modify the power density spectrum of the emission Nopt generated by direct photodetection given by (6.5.3) and repeated here:

μ

Nopt

=. R Pn τc = R Nsp

to include the effect of the dark current in the photodetector. (b) Modify the characteristic function Cr (ω) of the sample value r given in (6.7.16) and repeated here: Cr (ω) = exp

¹Ã ∞

−∞

R

(τ)

´ º

³

e iωh(t −τ) − 1 dτ

.

to include the effect of the dark-current arrival rate μ dark. (c) Determine the mean and the variance of the probability density function for the sample value r when the signal photogeneration rate is given by Rs (t ). 17 Noise terms A lightwave signal generated from a direct-current-modulated laser diode has a power P 23 dBm and a relative intensity noise of 120 dB/Hz. This signal is incident on a photodetector with a responsivity of 0 5 A/W. The output of the photodetector is connected to an electrical amplifier with a noise-equivalent bandwidth B N 15 GHz and an equivalent root-mean-squared thermal-noise current of i 4 µA at the input to the electrical amplifier. The amplified signal is then integrated over a time interval T and sampled. (a) Determine the variance of the sample value due to shot noise. (b) Determine the variance of the sample value due to the relative-intensity noise. (c) Determine the variance of the sample value due to the thermal noise. (d) Determine the total variance in the sample value. (e) Determine which noise source has the largest contribution to the overall variance and calculate the relative error in evaluating the root-mean-squared noise when only the most significant noise source is used. Is this a good approximation?

=−

σ =

.



=

18 Distributed amplification Show that distributed amplification optimizes the ratio G 1 loge G appearing in (7.4.11) by proving that G 1 achieves the minimum. Plot the loss in gain in dB against the gain per segment for uniformly spaced lumped amplifiers.

=

( − )/

19 Relative-intensity noise Suppose that the responsivity of a photodetector at 290 K is 1 A/W and that the sample value is generated by integrating over a time interval T that is less than the

7.11 Problems

359

relaxation oscillation frequency of a laser diode so that the power density spectrum of the relative-intensity noise can be treated as a constant. (a) Determine the lightwave power required so that the shot-noise-limited relative-intensity noise is equal to the thermal noise. (b) When the mean laser power is 1 mW, which source of noise is the largest? (c) When the excess relative intensity noise is 20 dB greater than the shot-noiselimited relative-intensity noise, for what value of the lightwave signal power is the relative-intensity noise equal to the thermal noise? 20 Intensity-noise probability density function The probability density function of the laser power is given in (7.8.12), and repeated here as 2 2 1 f P e −(P −a) P 0 1 erf a

( ) = √π + ( ) , ≥ , √ where P = P /( π Pth). √ (a) Using erf( x ) = −erf(−x ), erf(x ) = 1 − erfc(x ), and erfc(x ) ≈ (1/ x π)e−x

2

for large x (cf. (2.2.20)), show that for a much less than 0, the probability density function f P ( P ) approaches an exponential probability density function

Á Â √

Á Â2

with mean P = π Pth /2|a| and variance P . (b) Using the same approximations as in part (a), show that for a much greater than 0, the probability density function f P ( P ) approaches a gaussian probabil√ √ ity density function with mean ± P ² = a π Pth and variance π Pth /2.

21 Mean and variance of an avalanche photodiode probability mass function Starting with the characteristic function C m for the output distribution of an avalanche photodiode given by (7.6.3) and repeated here,

(ω)

C m (ω) = exp

¹

º ´ ws G ( F − 1)2 1 − 1 − 2iωG( F − 1) − i ω F − 1 , ws F

³

¾

and using (2.2.17), show that the mean of the probability mass function is equal to Es = wsG, and that the variance is given by

σm2 = G2 ws ( F + ws) −

(w G)2 = w G2 F = E G F. s

s

s

22 Avalanche photodiode noise An avalanche photodiode is characterized by quantum efficiency 0 8, a mean 40, an impact ionization ratio 0 4, and a noise-equivalent bandwidth gain G BN 15 GHz. An incident lightwave datastream consists of equally likely marks and spaces at a rate of 25 Gb/s at a wavelength of 1500 nm. Each symbol is of duration 40 ps. The mean lightwave signal power incident on the photodetector is 30 dBm. (a) Determine the excess noise factor F . (b) Determine the ratio of the detected lightwave signal energy E s in a single pulse to the power density spectrum of the spontaneous emission N opt generated by direct photodetection in a single pulse.

Á Â= =



κ= .

η= .

360

7 Lightwave Components

(c) What is the change in the excess noise factor when the ionization ratio decreases to κ = 0. 2? (d) Suppose that the output of an avalanche photodiode is connected to an electrical amplifier with an equivalent input root-mean-squared current noise σi = 4 µ A. Plot the mean electrical signal for a single pulse as a function of the mean gain ÁGÂ, and determine the value of the gain that maximizes E / N . s opt 23 Inverse gaussian probability density function The inverse gaussian probability density function is given by

f (x

Ê

: μ, β) = 2βπ x −3/2e −β(x −μ) /2μ x .

The probability mass function given in (7.6.1) is

2

2

À ( m − Es)2 p (m) = √ . − 2 2σm S(m) 2πσm Show that, using the variable transformation z = S (m), the probability mass function f (z ) is equivalent to an inverse gaussian probability density function given by f ( x : 1 , ν 2) . 1

S(m)−3/ 2 exp

¿

24 Combined noise sources Suppose a lightwave receiver consists of an avalanche photodiode followed by a detection filter that integrates the electrical waveform over a time interval T . The noise is a combination of the gain noise in the avalanche photodiode and the thermal noise in the external circuit after photodetection. (a) Determine the characteristic function of the sample value r . (b) Determine the characteristic function of the sample value r for the same conditions as part (a) when the avalanche photodiode is replaced by a lightwave amplifier so that the noise is a combination of the noise from the lightwave amplifier and the thermal noise in the external circuit after photodetection. Use a noncentral chi-square probability density function for the probability density function of the detected lightwave signal energy for the lightwave amplifier (cf. Section 6.5.1). (c) Suppose that the signal-to-noise ratio is large. Making reasonable approximations, determine the relationship between the excess noise factor F of the avalanche photodiode and the noise figure of the lightwave amplifier so that the two lightwave receivers have similar performance.

8

The Electrical Channel

The physical description of the lightwave channel, which has been the primary subject of the book up to this point, now will be seen as the foundation upon which a lightwave communication system is built. Accordingly, the book has reached a major transition at which point the language and physical description of photonics changes into the language and functional description of telecommunications. This transition from a physical description of the lightwave channel to a functional description of an information channel will take place in two stages, which are presented in this chapter and the next chapter. At a functional level, the input to the transmitter is a sequence of user data values that becomes a modulated lightwave waveform coupled into the fiber. In turn, the receiver recovers a reconstructed sequence of user data values by processing the modulated lightwave waveform coupled out of the fiber. The transmitter and the receiver are composed of functional blocks. The transmitter includes an encoder, described in Chapter 13, and a modulator, described in Chapter 10. The receiver includes a demodulator, described in Chapter 10, and a decoder, described in Chapter 13. When appropriate, the combination of the encoder/modulator and the demodulator/decoder, studied together, is conventionally called a modem. To analyze a modem, a communication system can be partitioned into a hierarchy of channel models as was shown in schematic form in Figure 1.4. That figure is redrawn as Figure 8.1 using a wave-optics signal model as appropriate to this chapter. The partitioning shown in Figure 8.1 is only notional, with a flexible choice of the interface between the functions included in each channel model. For the expository purposes of this book, the functions are partitioned into the nested set of channel models shown in Figure 8.1, recognizing that a practical implementation of a modem may distribute the required functionality differently across the several boundaries that define the channel models. A lightwave channel model, shown in Figure 8.1(a), has a lightwave modulation a(z , t ) with input a(0, t ) at z = 0 and output a( L , t ) at z = L (cf. (3.3.47)). An electrical channel model, shown in Figure 8.1(b), is the main topic of this chapter. The electrical channel has a single time-varying electrical waveform at its input and a single time-varying noisy or impaired electrical waveform at its output, perhaps sampled. An information channel, shown in Figure 8.1(c), is the topic of Chapter 9. An information channel has an input that is a sequence of logical symbols and an output that is another sequence of logical symbols. Finally, not shown in Figure 8.1, the encoder and decoder,

362

8 The Electrical Channel

(a) Lightwave Ch annel Input lightwave waveform

Output lightwave waveform

(b) Electrical Ch annel Input electrical waveform

Electrical/optical conversion (E/O)

Lightwave channel

Optical/electrical conversion (O/E)

Output electrical waveform

Electrical channel

Detection

Detected logical values

(c) Inform ation Channel Encoded logical values

Interpolation/ modulation

Figure 8.1 (a) A lightwave channel modeled using wave optics. (b) An electrical channel modeled

using wave optics. (c) An information channel.

discussed in Chapter 13, make the information channel into a reliable communication channel. The first phase of our transition to an information channel is a transition from the lightwave channel to a discrete-time electrical channel. The electrical channel then contains internally the physical mechanisms of lightwave propagation, dispersion, and distortion presented in Chapters 3, 4, and 5 along with the statistical models of uncertainty and noise presented in Chapters 6 and 7, but now described at the level of the electrical waveform. The electrical channel models are expressed, in part, in the language of system theory, which is the language normally used to design and analyze communication systems. As seen from its input and output, the electrical channel subsumes all aspects of the lightwave channel and includes the electrical-signal/lightwave-signal conversion at the transmitter and the lightwave-signal/electrical-signal photodetection process at the receiver. The components that perform these functions were described in Chapter 7. The electrical channel converts an electrical waveform into a lightwave at the transmitter and converts a lightwave into an electrical waveform at the receiver. The electrical waveform may be described as a passband signal, a real-baseband signal, or a complexbaseband signal, and may be described using either continuous time or discrete time. Three discrete-time input/output electrical channel models are developed in this chapter. The first channel model is an additive white gaussian noise model. This model is appropriate for phase-synchronous demodulation using wave optics when the shot noise can be treated as an additional additive white gaussian noise source (cf. (6.7.8)). The second model is a signal-dependent variance model that can incorporate signal-dependent shot noise and the mixing of the signal and noise during direct photodetection. This model is appropriate for phase-asynchronous demodulation using wave optics. The third model is a discrete counting model. This model is appropriate for photon optics limited by photon noise.

8.1 The Lightwave Channel

363

The second phase of the transition, which goes from a discrete-time electrical channel model to a probabilistic information channel model, is the topic of Chapter 9. The mathematical abstraction of an information channel is used in later chapters to address many important aspects of a communication system, including the probability of a detection error, and the maximum amount of information – as expressed by the channel capacity – that can be reliably conveyed. Taken together, Chapters 8 and 9 form a pedagogical bridge between the physical channel model of a lightwave communication system and the corresponding probabilistic information channel model of an information communication system.

8.1

The Lightwave Channel The earlier physical description of a lightwave channel is now enriched by a functional description. Topics that are described from a physical point of view in earlier chapters are now revisited and described using the mathematical models of system theory. This alternative system-level description forms the basis of the electrical channel model. Because the fiber can have several polarization and spatial modes, it can simultaneously support more than one input and more than one output. Therefore, multiple electrical channels may be supported by one fiber.

8.1.1

Linear Single-Input Single-Output Lightwave Channels

A single-input single-output channel transmits and receives a single waveform. A linear channel can be described using the language and methods of linear time-invariant system theory. The correspondence between this linear system-level model and the physical characteristics of the fiber is shown schematically in Figure 8.2. The single modulated lightwave signal is coupled into one or more spatial modes of the fiber. The signal launched into a single mode experiences pulse spreading as (a) Physic al Channel Model Input lightwave waveform

Output lightwave waveform

(b) Systems-level Ch annel Model Impulse response

Time domain

sin (t)

h(t)

Frequency domain

S in ( f )

H( f )

s out (t) = h (t )

s in (t)

S out ( f ) = H ( f )S in ( f )

Transfer function

Figure 8.2 (a) A physical channel model of a lightwave channel. (b) An equivalent linear system

channel model.

364

8 The Electrical Channel

described in Chapter 4. The linear system model of this lightwave channel is based on the complex signal envelope a(z , t ), which has a temporal spectrum A(z , f ).1 The propagation of a (z, t ) over a distance L can be functionally described as a linear, timeinvariant system with a frequency-domain input A(0, f ) and a frequency-domain output A( L , f ). The lightwave signal sout (t ) at the output is



. √2 a (L , t ),

sout (t ) =

(8.1.1)

where the factor of 2 converts the root-mean-squared amplitude of the complex envelope a( L , t ) into a peak amplitude of the lightwave signal. The complex-baseband transfer function H ( f ) at a distance L is defined as the ratio of the output frequency-domain signal A( L , f ) to the input frequency-domain signal A(0, f ). Using expressions developed in Section 4.3.3, the transfer function for a narrowband lightwave signal in a single mode is approximated by expanding the dispersion relationship2 β( f ) using a power series (cf. (4.3.1)) up to the second-order term. This gives (cf. (4.3.16))

. A(L , f ) = Sout ( f ) = e−(κ/2+iβ +i (2π f /v )+i2π β f )L , (8.1.2) A(0, f ) Sin ( f ) .√ where sin (t ) =√ 2a(0, t ), with Fourier transform Sin ( f ), is the input lightwave signal, . and sout(t ) = 2a (L , t ), with Fourier transform Sout ( f ), is the output lightwave signal at a distance L. For a single mode, the transfer function H ( f ) given in (8.1.2) is an all-pass filter because the amplitude of H ( f ) does not depend on the frequency. Substituting v g = L /τ (cf. (4.3.3)) into (8.1.2), the complex-baseband transfer H( f ) =

function is written as

0

g

2

2

2

2 2 H ( f ) = H0e −i2πτ f e−i2π β 2 L f ,

(8.1.3)

2 h(t ) = h 0ei (t −τ) / 2β 2 L

(8.1.4)

where τ is the modal group delay and β2 is the group-velocity dispersion coefficient . defined in Chapter 4. The term H0 = e−κ L /2 e−iβ 0 L is the constant, mode-independent, complex amplitude of the transfer function at distance L , where e−κ L is the channel transmittance (cf. (3.1.3)).3 The generalized inverse Fourier transform together with its shifting and scaling properties applied to (8.1.3) gives the complex-baseband impulse response

= H √1/2π iβ

for a single mode, where the constant h0 0 2 L is the complex amplitude of the impulse response h(t ). Every mode of a multimode fiber has an impulse response in the form of (8.1.4), but each mode has its own modal group delay τ and group-velocity dispersion coefficient

ω in radians/second which is used in the description of the propagation of lightwave signals with the frequency f in hertz, which is used in the description of the propagation of electrical signals. 2 Recall that the dispersion relationship β( f ) is the propagation constant β when it is written as a function of f . 3 Other analyses may use a mode-dependent transmittance.

1 This chapter replaces the angular frequency

8.1 The Lightwave Channel

365

β2. The two exponential terms in (8.1.3) will be discussed separately in the next two subsections, concluding with a discussion of the general expression given in (8.1.13a).

Mode-Dependent Group Delay

Mode-dependent group delay, considered in this subsection, is the effect in each mode caused by the first-order, linear phase term e−i2πτ f of the narrowband transfer function given in (8.1.3). The second-order, quadratic phase term in (8.1.3) is considered in the next subsection. For a signal-modulated coherent lightwave carrier that is coupled into M modes of a multimode fiber and jointly photodetected, the transfer function is the sum of the transfer functions of all modes, each mode weighted by the proportion am of the lightwave amplitude in that mode. A single modulated pulse pin (t ) at the input is decomposed by the spatially orthogonal modal structure of the multimode fiber into the weighted superposition pin (t ) =

M ±

m =1

am x (t )

(8.1.5)

of the pulses am x (t ) in the mth orthogonal mode at the fiber input, where x (t ) is a unit-energy transmitted pulse. The term am x (t ) represents the lightwave signal in the mth spatially orthogonal mode (cf. (2.3.33)). Using (8.1.5), the total pulse energy E p is given by Ep

= = =

²∞

| pin(t )|2 dt ² M ± N ± ∗ ∞ −∞

m =1 n =1 M

±

m =1

am an

−∞

|x (t )|2 dt

|am |2 dt ,

(8.1.6)

where the orthogonality of the modes has been used. The transfer function Hm ( f ) for the mth mode of a linear fiber up to the linear phase term is the product of two terms. One term is the mode-dependent linear phase term e−i2πτm f , where τm is the modal group delay for the mth mode. The other term, the same for all terms, is a complex constant H0 that describes the phase shift and the attenuation at the carrier frequency. The corresponding impulse response for the mth mode is a translated Dirac impulse hm (t ) = H0 δ(t

− τm),

(8.1.7)

with the time origin at the receiver aligned to the time origin at the transmitter by suppressing the propagation delay of the carrier frequency. The output pulse of the mth mode is pm (t ) = hm (t ) ± am x (t ). The Dirac impulse distributes across the sum so that, for multiple modes, the total output lightwave pulse pout (t ) is the sum of the complex pulse amplitudes in all spatially orthogonal modes. Thus

366

8 The Electrical Channel

pout(t ) =

M ±

− τm ) ± am x (t ) ´ ³ ± M = H0 am δ(t − τm ) ± x (t ) m =1

H0δ(t

m =1

= h(t ) ± x (t ), where h ( t ) = H0

M ± m =1

(8.1.8)

am δ(t

− τm )

(8.1.9)

is the impulse response of the multimode lightwave channel considering only the intermodal dispersion caused by the mode-dependent group delay of the spatially orthogonal modes. The spatial distribution of each mode in the transverse (x , y ) plane has been suppressed for brevity, but it must be considered when calculating power with multiple modes because the spatial integral describing the power given in (3.3.48) is over the squared magnitude of a sum of spatially orthogonal terms. This is discussed further in an end-of-chapter exercise. A typical modulated datastream at the input of a multimode fiber has the form sin (t ) =

∞ ±

j =−∞

s j x (t

− jT )

(8.1.10)

composed of a sequence of copies of the pulse x (t ). Each copy of x (t ) is translated in time by the intersymbol interval T and amplitude-modulated by the pulse amplitude s j . This results in a noiseless modulated datastream s (t ) at the output of a multimode fiber given by s(t ) = h(t ) ± sin(t )

= H0

∞ ±

j =−∞

sj

M ±

m =1

am x (t

− τm − j T ),

(8.1.11)

where the impulse response h(t ) for the mode-dependent group delay is given by (8.1.9). This expression ignores wavelength-dependent group delay. For the purpose of the electrical channel model, (8.1.11) describes the effect of linear intermodal dispersion on a lightwave waveform using a coherent carrier.

Wavelength-Dependent Group Delay

The wavelength-dependent group delay is the effect caused by the second-order, quadratic phase term of the narrowband transfer function given in (8.1.3). The wavelength-dependent group delay within a single mode is described by the impulse 2 response h (t ) = h0 eit /2β2 L , as given in (8.1.4), but with the modal group delay τ temporarily suppressed. Rewrite (8.1.2) as S( f ) = H ( f )Sin( f ). The convolution property of the Fourier transform converts this product in the frequency domain into a convolution in the time domain. Therefore,

8.1 The Lightwave Channel

367

s (t ) = h(t ) ± sin(t )

= h0

²∞

−∞

±2 sin (t ± )ei(t −t ) /2β 2 L dt ± .

(8.1.12)

Both h(t ) and sin (t ) are complex, in general, with the output s (t ) consisting of both a real part and an imaginary part, as given by (2.1.4). Expression (8.1.12) is the system-level model of the intramodal dispersion in a single mode caused by wavelength-dependent group delay. Both the transfer function H ( f ) given in (8.1.3), with the modal group delay suppressed, and the impulse response h(t ) given in (8.1.12) have the functional form of a quadratic phase shift. Quadratic phase functions in space appear frequently in optics to describe imaging. Therefore, an apt analogy is that linear dispersion “defocuses” a pulse in time as the pulse propagates in space. For typical values of the group-velocity dispersion coefficient β2, the fiber length L , 2 of the input pulse x (t ), the phase will vary and the root-mean-squared timewidth Trms slowly near the center of the input pulse, but will vary rapidly away from the center of that pulse so that, away from the center, its effect averages to a negligible value. This is the time-domain version of the well-known principle of stationary phase used in optics to validate ray theory when diffraction is negligible. Thus, for values of β2 L near zero, h (t ) will contribute only near the origin. When there is little group-velocity dispersion, β2 L goes to zero, and h(t ) behaves as a shifted Dirac impulse, as should be expected from (8.1.3). Indeed, for β2 equal to zero, H ( f ) is constant and h(t ) can be replaced by a Dirac impulse.

Combined Mode-Dependent Group Delay and Wavelength-Dependent Group Delay

When multiple modes are under discussion, the group-velocity dispersion coefficient β2 for the mth mode is written as β2 (m ). By including the group-delay term τm for each mode, the impulse response used in (8.1.12) is modified to read h(t ) = h0

M ± m =1

2 am ei( t −τm ) / 2β2 (m ) L ,

(8.1.13a)

where am is a mode-dependent amplitude. This extends (8.1.4) to multiple modes and includes both intermodal and intramodal dispersion. Then s (t ) = h(t ) ± sin(t )

= h0

∞ ±

j =−∞

sj

M ±

m =1

am x (t

− j T ) ± ei(t −τ ) /2β (m)L , m

2

2

(8.1.13b)

where h (t ) is given by (8.1.13a). Expression (8.1.13a) is the combined effect of intermodal dispersion between modes caused by mode-dependent group delay and the intramodal dispersion in each mode caused by wavelength-dependent group delay within the mode. This expression completely describes the linear electrical channel model for a single polarization up to the

368

8 The Electrical Channel

quadratic terms in the narrowband expansion of the dispersion relationship β(ω). For M larger than one, h(t ) will exhibit both amplitude and phase variations. For large M, the individual summands in (8.1.13) will not be resolved in the channel output because of additional effects such as intermodal coupling. 8.1.2

Multiplex Channels

Multiple independent datastreams may be transmitted within the same physical lightwave medium using different modes, different wavelengths, different polarizations, or different guiding cores. A lightwave channel in which the individual waveforms are separately processed at the channel output is called a multiplex channel. A lightwave channel in which individual waveforms are intermingled by the channel during transmission and jointly processed at the output is called a multi-input multi-output (mimo) channel. The subchannels of a multiplex channel can be defined in time, space, wavelength, or other signaling degrees of freedom such as polarization or spatial eigenmodes. Within a single channel in space, multiple subchannels can be defined using either time-division multiplexing or wavelength-division multiplexing. In time-division multiplexing, the subchannels are defined by different time intervals. In wavelength-division multiplexing, the subchannels are defined by different-wavelength subcarriers. Time-division multiplexing using a single carrier has no effect on the physical structure of the channel. Wavelength-division multiplexing using multiple subcarriers does. The wavelength subchannels of a multiplex channel may each be supported by an individual electrical channel. Taken together, these subchannels can be regarded as a single multiplex electrical channel. This section describes several physical properties of lightwave propagation in an optical fiber that can be used to provide a functional description of a multiplex electrical channel. In general, the output of a subchannel may depend on the signal in other subchannels because of interchannel interference or crosstalk during propagation. This interdependence may be regarded as an incidental impairment to be treated as such, or it may be of central importance such that the channel must be treated as a mimo channel, as discussed in Section 8.1.3. Figure 8.3 introduces the topic of multiplexing using polarization subchannels as an example. Figure 8.3(a) shows a single-input single-output lightwave channel within a single polarization mode of a single spatial mode that supports two polarization modes. As the lightwave signal propagates, part of the signal power in the modulated polarization mode can couple into the other polarization mode because of imperfections in the fiber. Each polarization mode can have a different modal group delay, so differential modal group delay between the two polarization modes could exist at the channel output (cf. Section 4.6). The photodetected electrical signal for a receiver that does not separate the two polarization modes simply sums the two lightwave signals in the two polarization modes, resulting in a broader pulse due to the differential delay. A better option, even with a single transmitted modulated signal, may be to use a polarization demultiplexer to form two polarization subchannels before individual photodetection, as is shown by the one-input two-output system in Figure 8.3(b). Two

8.1 The Lightwave Channel

369

(a) Photodetection (O/E)

Modulator (E/O) Polarization beamsplitter

(b)

O/E

E/O

O/E Polarization combiner

(c) E/O

O/E

E/O

O/E Lightwave channel Electrical channel

Figure 8.3 (a) A single-input single-output physical channel. (b) A single-input two-output

channel in which each polarization mode is photodetected separately. (c) A two-input two-output channel.

electrical subchannels are required. These form one block electrical channel containing two subchannels with the same signal on each subchannel but with different amplitude, noise, and delays. Subsequent processing of the two electrical signals first compensates the polarization components for the rotation of the optic axes and the differential modal group delay, then recovers the single signal from the pair of polarization components. This system outperforms a single-output system that photodetects both polarization components together. A third option uses the two polarization modes separately to transmit two waveforms with separate datastreams, and with the two polarization modes separately demodulated at the receiver. This two-input two-output channel is shown in Figure 8.3(c). The two photodetected polarization modes may be different than the two polarization modes used for transmission of the two modulation waveforms. The two detected modes then contain a mixture of the signals in the two transmitted polarization modes. Methods to recover the two individual transmitted signals from the two electrical signals are discussed later in Section 8.1.3 of this chapter, and in Sections 12.6 and 12.7 of Chapter 12. Other kinds of multi-input multi-output lightwave channels include a space-multiplex channel, shown schematically in Figure 8.4(a), in which a subchannel is defined by a separate fiber core within the same cladding. The channel shown in Figure 8.4(b) is a mode-multiplex channel in which several subchannels are defined, each using one or more spatial modes or mode-groups within the same multimode fiber. Methods to recover each transmitted mode-multiplex signal from the multiple received mode-multiplex signals of a few-mode fiber are discussed in Section 12.7.

370

8 The Electrical Channel

(a)

(b)

Figure 8.4 (a) A four-input four-output channel using four separate fiber cores within the same

cladding. (b) A two-input two-output channel using two spatial modes within the same multimode fiber.

Polarization-Multiplex Channel

A functional two-input two-output channel model for a physical channel that uses polarization multiplexing is shown in Figure 8.3(c). In an ideal cylindrical waveguide, the two orthogonal polarization modes that describe the state of polarization are arbitrary and equivalent. For this ideal fiber, using any pair of orthogonal polarization modes as a basis, the channel matrix H describing the matrix transformation between the two channel inputs and the two channel outputs would be diagonal. However, random imperfections and stress cause birefringence that breaks the cylindrical symmetry. The resulting channel depends on the polarization-dependent effects that are included in the channel model. The rate at which these effects change over time might also affect the channel model. A basic channel matrix for a polarization-multiplex channel models the birefringence of a segment of fiber of length L as a unitary transformation expressed by the unitary matrix function U(ξ, χ) given by (2.3.59). This matrix transformation can be viewed as an arc traversing the surface of the Poincaré sphere with the longitude 2ξ and the latitude 2χ varying from the input polarization state to the output polarization state. Frequencydependent polarization effects are not considered in this model. Over a time interval for which the polarization transformation does not change, U(ξ, χ) can be modeled as an unknown matrix parameter to be estimated, as discussed in Section 12.6. The polarization state can be regarded as constant over a time interval typically much longer than the symbol interval. With the narrowband polarization-independent transfer function H ( f ) included (cf. (8.1.3)), the two-by-two channel transfer-function matrix H( f ) for a polarizationmultiplex channel can be written as

H( f , ξ, χ) = H ( f )U(ξ, χ), (8.1.14) showing the normal matrix H( f , ξ, χ) factored into a polarization-dependent, frequency-independent lossless unitary matrix U(ξ, χ) and a polarization-independent, frequency-dependent scalar transfer function H ( f ) given in (8.1.3). This elementary model can be augmented by additional polarization-dependent effects. The effect of frequency-dependent, polarization-dependent differential modal group delay ±τ (cf. (4.6.10)) for a linearly birefringent fiber is expressed as a diagonal matrix in the principal polarization-state basis. This effect is included by defining a polarization-dependent linear phase term P( f ) given by

8.1 The Lightwave Channel

P( f ) =. ± U(ξ ± , χ ± )

µ

ei π f ±τ/2 0

0

e−iπ f ±τ/2



±U† (ξ ± , χ ± ),

371

(8.1.15)

where the unitary matrix ±U(ξ ± , χ ± ) is used to transform the principal polarization-state basis defined for this segment of fiber to the polarization-state basis defined at the input to the fiber segment (cf. (2.3.62)). The average modal group delay τ that produces the phase term e−i2πτ f remains in the common complex-baseband scalar transfer function H ( f ). Then (8.1.14) becomes

H( f , ξ, χ) = H ( f )P( f )U(ξ, χ).

(8.1.16)

H( f ) = H ( f )P( f )AU(ξ, χ).

(8.1.17)

Because P( f ) is a unitary matrix, the resulting channel matrix is still a normal matrix, but has a more general form than (8.1.14). The channel matrix H( f ) for a channel model that also includes frequencyindependent, polarization-dependent loss (cf. Section 4.6.4) can be written as (cf. (4.6.20)) This expression has now been augmented by polarization-dependent, frequencyindependent differential loss between the polarization modes and is described by a nonnegative-definite hermitian matrix A. The polarization-independent loss can be incorporated into the term H0 in the scalar transfer function H ( f ) given in (8.1.3)). When there is no polarization-dependent loss, A reduces to the identity matrix I, and (8.1.17) reduces to (8.1.16). With polarization-dependent loss included, the channel matrix is no longer a normal matrix. The analysis of this kind of channel matrix is discussed in Section 11.4.3.

Space-Multiplex Channel

A space-multiplex channel, shown in Figure 8.4(a), consists of spatially separated cores within the same cladding structure. The modes of each core in isolation can be determined using the methods presented in Chapter 3. The coupling of signal energy between cores depends on the proximity of the cores. Multiple bends and twists in the fiber cause weak coupling. In principle, these effects could be analyzed using coupled-mode theory (cf. Section 3.4) were the bends and twists fully known. However, such an analysis is not realistic. The random impairments over the length of a segment cause the complex amplitude in the i th core to couple into the j th core, generating crosstalk from the many small random perturbations. Under the condition that the crosstalk does not depend on the phase of the carrier, by asserting the central limit theorem, the crosstalk is described as a zero-mean, circularly symmetric gaussian random process with variance σi2j depending on the distance between the two cores. This means that for each polarization component, the crosstalk coupled from the i th core into the j th core can be modeled as a circularly symmetric gaussian distribution. The statistics of the crosstalk power has additional degrees of freedom even for a single-mode fiber because each core supports two polarization modes with the in-phase and quadrature signal components in one polarization mode coupling to the in-phase and

372

8 The Electrical Channel

quadrature signal components in the other polarization mode. Because the power is the squared magnitude of the complex amplitude, the coupled power for each polarization mode has two degrees of freedom described by an exponential probability density function or, equivalently, a central chi-square probability density function with two degrees of freedom. Accordingly, the crosstalk power coupled from the i th core to the j th core is a central chi-square random variable with four degrees of freedom.4 This random variable has a probability density function of the form of xe− x (cf. (2.2.44) and Figure 2.8).

Mode-Multiplex Channel

The spatial modes of a multimode optical fiber can, in principle, be used as a single channel or as a multiplex channel as shown schematically in Figure 8.4(b). The evident theoretical approach is to separate the modes into subchannels using an ideal spatial mode filter, then to use the multiple modes as a mode-multiplex channel. However, a multimode fiber may support as many as several hundred modes. It is impractical or impossible to separate so many modes at the receiver. Consequently, for a modemultiplex channel, “a few-mode fiber” can be designed to support only a few modes. Another possibility is to ignore the mode structure and to empirically partition the spatially varying light pattern at the output face of the fiber into regions so as to create a multiplex channel. For this purpose, the output fiber face is coarsely pixelated, with a separate complex-baseband electrical signal photodetected for each pixel. A training sequence (cf. Section 12.5) may be used, in principle, to estimate the output spatial pattern, both magnitude and phase, that occurs in response to the location of a sequence of small light spots approximating spatial impulses illuminating different locations on the input face of the fiber. Spatial patterns at the input face of the optical fiber, that can be resolved at the output face of the fiber by the individual complex patterns they produce, provide additional degrees of signaling freedom in each modulation interval. In this way, the random speckle discussed in Section 8.1.4 may be converted, in part, to a set of useful spatial subchannels or to a mimo channel. This is similar to the separation of polarization modes, but is more complex because of the size of the channel matrix describing the mode coupling. Nevertheless, in principle, this type of multiplex channel has an inherently larger channel capacity than a single-mode fiber.

8.1.3

Multi-input Multi-output Channels

A multi-input multi-output channel is a variation of a multiplex channel in which the multiple datastreams are mingled in transit and are treated collectively at the receiver. Rather than process the channel outputs separately so as to form several received subchannels, the mimo channel is regarded as a single channel with multiple inputs and multiple outputs served by a single transmitter and a single receiver. The modulation format can be designed accordingly. The receiver has the burden of using the dependences among the multiple outputs to improve performance. 4 This probability density function can also be expressed as a gamma distribution as defined in (2.2.50).

8.1 The Lightwave Channel

373

To illustrate this distinction, consider the dual-polarization channel shown in Figure 8.3(c). When the two input signals and the two output signals are considered as comprising two separate subchannels and are processed individually at the receiver, the channel is regarded as a polarization-multiplexed pair of interfering channels. Alternatively, when the two signals are considered as one block signal that is processed at the receiver using multi-user detection, the same physical channel is regarded as a polarization mimo channel, and the resulting electrical mimo channel has different characteristics than those of the polarization multiplex channel. A functional description of a mimo lightwave channel with two inputs and two outputs is sufficient for this discussion. Let sin (t ) be a block waveform with components s1 (t ) and s2(t ) that are the input modulation waveforms for the two lightwave subchannels. The output block waveform s(t ) is given by s(t ) = h(t ) ± sin (t )

µ

¶ µ ¶ = hh 11((tt )) hh 12((tt )) ± ss1 ((tt )) µ h 21(t ) ± s22(t ) + h (t ) ±2 s (t ) ¶ = h 11(t ) ± s1 (t ) + h12(t ) ± s2(t ) . 21 1 22 2

(8.1.18a)

(8.1.18b)

The matrix h(t ), whose elements are the scalar impulse responses hi j (t ) from the j th input subchannel to the ith output subchannel, is called the time-domain channel matrix or the channel impulse-response matrix. Using the linearity and the convolution property of the Fourier transform, the output block waveform S( f ) in the frequency domain is S( f ) = H( f )Sin ( f )

¶µ S ( f ) ¶ H11 ( f ) H12( f ) = H ( f ) H ( f ) S 1( f ) µ H21 ( f ) S ( f22) + H ( f )S2 ( f ) ¶ = H11 ( f ) S1 ( f ) + H12( f )S2( f ) . 21 1 22 2 µ

(8.1.19a)

(8.1.19b)

The matrix H( f ), whose elements are the scalar transfer functions Hi j ( f ) for the jth input subchannel to the i th output subchannel, is called the frequency-domain channel matrix or the channel transfer-function matrix or the block transfer function. Each transfer-function element Hi j ( f ) of the channel transfer-function matrix is the Fourier transform of the corresponding complex-baseband impulse response hi j (t ) for the channel impulse-response matrix. The diagonal elements of the channel matrix, either time domain or frequency domain, describe the response of each subchannel output to the corresponding subchannel input. The off-diagonal elements of the channel matrix describe the linear redistribution of the lightwave signal energy between the two subchannels. The offdiagonal terms represent linear interchannel interference or crosstalk in those situations for which the diagonal terms are considered to be the dominant terms. In other applications, all four terms may have equal status, consisting of a single channel with multiple inputs and multiple outputs.

374

8 The Electrical Channel

A continuous-time mimo channel with no frequency dependence in the channel matrix is called a memoryless continuous-time mimo channel. It can be written as s(t ) = Hsin (t ),

(8.1.20)

where H is now a matrix of complex values that does not depend on time or frequency.

8.1.4

Channel Statistics in Time and Space

In general, the channel matrix of a multimode fiber is random and slowly changing because of unavoidable perturbations caused by environmental changes and random launch conditions. The power launched by the light source into an individual mode of a multimode fiber may be time-varying, either randomly or intentionally. In general, the launched time-varying modal amplitudes {s j (0)} in the fiber are fluctuating due to the power fluctuations in the multiple modes of the lightwave source as discussed in Section 7.8.2. For the mth mode, the complex field in one polarization component at the output face of the fiber denoted Um (r, t ) = am (z , t )Ut m (r, ψ) (cf. (3.3.46)) is given by a combination of both the random initial complex amplitude s m (0) in that mode, as discussed in Section 7.8.2, and the perturbations caused by random mode coupling. The ∑ total complex field envelope is U (r, t ) = m U m (r, t ). This combined time-varying randomness produces a time-varying, spatially varying, complex amplitude at the output face of the fiber. The varying intensity of this complex amplitude is called speckle. Because speckle is caused by the spatial interference between two or more modes, it does not occur for a monochromatic source launched into a single-mode fiber.

Coherence Regions

Two speckle patterns at the output face of the same multimode fiber for two different input distributions of the spatial power are shown in Figure 8.5. A speckle pattern is characterized by the coherence region Acoh , which, for that speckle pattern, is a region of typical size over which the lightwave field is highly correlated in space. Various approximations to the speckle pattern are possible. A speckle pattern might be approximated as a random mosaic of independent coherence regions, each region described by a single random variable such as a circularly symmetric gaussian random variable. The size of a typical coherence region depends on the number of fiber modes, the distribution of the power in those modes, and how the modes couple. The dependence of the coherence regions on the modal-power distribution is shown in Figure 8.5, which shows an image of the lightwave intensity at the output face of the same multimode fiber for two different modal-power distributions. For the same fiber, a lightwave source that couples into a few modes at the fiber input will produce a coarse-grained speckle pattern at the fiber output with a few large coherence regions. A lightwave source that couples into many modes will produce a fine-grained speckle pattern that has many small coherence regions. Similarly, the temporal structure of a time-varying speckle pattern can be slowly varying or rapidly varying as it responds to the spatial and temporal lightwave variations at the input face of the fiber.

8.1 The Lightwave Channel

375

Figure 8.5 Two different speckle patterns generated using two different launch conditions at the

output face of the same multimode fiber. Left: A uniform illumination at the input face of the fiber. Right: A nonuniform illumination in which the lightwave signal is coupled only to the central region of the fiber core. The circle indicates an estimate of the size of a typical coherence region coh.

A

Spatial Statistics

The intensity of the lightwave varies in space and in time across the output face of the fiber. It can be observed by a pixelated array of direct photodetectors. The phase of the lightwave also varies in space and in time across the output face of the fiber. The phase distribution is not visible in an intensity plot and is not visible by direct photodetection. The spatial phase distribution can be observed by imaging the output face of the fiber onto a pixelated array of balanced photodetectors, thereby creating a coherent multi-input multi-output phase-synchronous electrical channel. The effect of random mode coupling can be analyzed using a formalism that is similar to the earlier analysis of polarization-mode dispersion (cf. Section 4.6.3). Regard a long segment of the fiber as partitioned into J independent subsegments, each subsegment with a length equal to the correlation length of the random mode coupling (cf. (4.6.15)). Under this approximation, the number of independent subsegments J is equal to the ratio of the overall length of the fiber segment to the correlation length of the mode coupling. The sequence of couplings between the modes over the length of the fiber is approximated by a sequence of independent random channel matrices H j for j = 1, . . . , J, where the size of the matrix depends on the number of modes that interact. The overall channel matrix H is approximated by the product of J such independent · random matrices with H = Jj=1 H j . When the number of such subsegments is large, the central limit theorem can be asserted. Then, the joint temporal and spatial properties of a random lightwave field at a time instant t and spatial location r are approximated by a multivariate gaussian random variable (cf. (2.2.28a)) characterized by a spatiotemporal covariance matrix V. The simplest spatiotemporal covariance matrix and the only one considered in this book factors into two terms that separately describe the spatial statistics and the temporal statistics. In such a case, the spatial and temporal characteristics are independent and the

376

8 The Electrical Channel

overlap

Component 1

Component 2

1

2

coh

A1 and A 2, showing the A overlap and the coherence region Acoh for the case in which Acoh < Aoverlap < A1 < A2 .

Figure 8.6 Two lightwave components with cross-sectional regions

overlap region

channel is called a space–time-separable channel. For a space–time-separable channel, the coherence region and the coherence timewidth are independent. Within one coherence region, the complex amplitude is described by a single circularly symmetric gaussian random variable for all points within the region. Accordingly, the probability density function f ( P ) of the lightwave power in a single polarization of that region is an exponential distribution given by (cf. (2.2.49)) f (P) =

1 − P / Pcoh e , Pcoh

(8.1.21)

where Pcoh is the expected lightwave power in one coherence region.

Modal Noise

Now consider one lightwave component with a face of cross-sectional area A1 placed end-to-end with a second lightwave component with a face of cross-sectional area A2 , as shown in Figure 8.6. Let the numerical aperture of the two components be equal and define Aoverlap as the region of overlap between the two lightwave components, where the overlap region is smaller than the cross-sectional region A1 but larger than the coherence region Acoh , as is shown in Figure 8.6. Because the overlap region does not contain the entire cross-sectional region A1 of the first component, there will be temporal fluctuations in the lightwave signal coupled into the second component as the time-varying speckle pattern at the output face of the first component shifts in space across the cross-sectional region A overlap of the overlap region. This fluctuation is called modal noise. When the overlap region A overlap is larger than the coherence region A coh, several coherence regions contribute to the field coupled into the second component. When the mean power Pcoh is the same in each of the M coherence regions within the overlap region, the probability density function of the total lightwave power in a single polarization mode within the overlap region Aoverlap can be determined by approximating the spatial integration over the overlap region by the sum of M independent, exponentially distributed random variables – one random variable for each coherence region Acoh within the overlap region Aoverlap. The resulting probability density function is a central chi-square distribution with 2 M degrees of freedom or, equivalently, a gamma probability density function with parameter k equal to N /2. As M increases with the

8.2 Lightwave Demodulation

377

total expected power ² P ³ = M Pcoh in the overlap region held constant, the gamma distribution approaches a gaussian distribution (cf. Figure 2.8(b)). The summation over the M spatial coherence regions within the overlap region Aoverlap has the same form as the summation of the individual temporal coherence intervals over a time duration T given in (6.5.2). In that expression, each exponentially distributed random variable E k corresponds to a subsample from one interval of a duration equal to the coherence timewidth τc , with the total number of independent subsamples given by K = ´T /τc µ. For a stationary temporal random process, the number of independent subsamples K used to derive the probability distribution is fixed because the coherence timewidth τc is a constant. Similarly, for a stationary spatial random process, the number of spatial coherence regions within an overlap region would be fixed as well. However, due to environmental conditions, or incidental dependence on the modulation, the number of modes that can couple from one component to another component can vary. This leads to a change in the size of a spatial coherence region, which results in a change in the number M of spatial coherence regions within a fixed overlap region. Because the number M of spatial coherence regions is not constant in time, the spatial statistics are time-varying. When there is strong mode coupling, the elements Hi j of the channel matrix H describing that mode coupling can be approximated as independent, identically distributed zero-mean circularly symmetric gaussian random variables. With phasesynchronous demodulation to an electrical waveform, the block channel seen by the electrical system has the same form but is translated in frequency. Random multi-input multi-output channels have been widely studied for lowerfrequency communication systems, and, under appropriate conditions, techniques derived for those lower-frequency systems5 can be applied to a mode-multiplex lightwave system.

8.2

Lightwave Demodulation Lightwave demodulation first transfers the modulation on the lightwave onto an electrical waveform, as described in this chapter, then converts that electrical waveform into a sequence of logical symbols, as described in Chapter 9. At some point in this process, usually in the electrical domain, the waveform is passed through one or more detection filters as mentioned briefly in this chapter, and in detail in the next. An electrical waveform can be generated from a modulated lightwave by balanced photodetection or by direct photodetection. Both methods use a photodetector, as is described in Section 7.3, but henceforth usually described with the responsivity R suppressed. 6 An electrical waveform that is generated by balanced photodetection may be linear or nonlinear with respect to the lightwave complex-amplitude waveform s(t ) as 5 For example, see Zheng and Tse (2003). 6 In this and subsequent chapters, when possible, the lightwave power and energy refer to the effective

power and energy as adjusted for the photodetector responsivity

R or the quantum efficiency η .

378

8 The Electrical Channel

discussed in Section 8.2.1. An electrical waveform that is generated by direct photodetection of a noncoherent lightwave is linear with respect to the lightwave power waveform P (t ). This is discussed in Section 8.2.2. An electrical waveform generated by the direct photodetection of a coherent lightwave that has intersymbol interference is not linear with respect to the lightwave complex-amplitude waveform and is not linear with respect to the lightwave power waveform. This is discussed in Section 11.5.1. Lightwave demodulators can also be classified as phase-synchronous or phaseasynchronous when referencing the role of the carrier. Phase-synchronous demodulation requires knowledge of the phase of the carrier and uses either heterodyne or homodyne demodulation. Phase-asynchronous demodulation does not use knowledge about the phase of the carrier and usually uses direct photodetection. Phase-asynchronous heterodyne demodulation is an alternative method that can yield better performance at a cost of greater complexity. 8.2.1

Demodulation of the Lightwave Complex Amplitude

Generating a passband electrical waveform whose complex amplitude has a linear relationship with the complex amplitude of the received lightwave s (t ) requires a linear demodulation process. This process can be implemented using balanced photodetection (cf. Section 7.3.2). Now the physical description of this demodulation process is replaced by a functional description. The passband signal ¸ s(t ) for any modulation format can be expressed as

¸s (t ) = s (t )cos(2π f c t ) − s (t )sin(2π fc t ) = As (t )cos(2π fc t + φs (t )) = Re[s (t )ei2π f t ], (8.2.1) where As (t ) and φs (t ) are the magnitude and phase of the passband waveform, respectively, and s (t ) and s (t ) are the in-phase and quadrature signal components as I

Q

c

I

Q

modulated onto the cosine and sine carrier components, respectively. The corresponding complex-baseband signal s (t ) is given by s (t ) = sI (t ) + isQ (t )

= As (t )eiφ (t ), s

(8.2.2)

in two equivalent forms.

Functional Description

Linear modulation refers to the process of converting a complex-baseband signal s (t ) into a passband signal ¸ s (t ) for which s (t ) is the complex envelope. A functional block diagram of a modulator for a complex-baseband signal is shown in Figure 8.7. The phase reference in the modulator is an oscillator that provides both an in-phase carrier and a quadrature carrier at frequency fc . The two mixers up-convert the two components of the complex-baseband signal s (t ) to the two components of the complex envelope of the passband signal ¸ s (t ) at frequency f c .

8.2 Lightwave Demodulation

Baseband signal for the I-component

sI (t )

cos(2 π f ct)

s I (t) + isQ (t) Complex baseband signal

2

379

o

- 90

Baseband signal for the Q-component

sQ (t )

+

+ –

s I (t)cos(2 πf c t) - s Q (t )sin(2 πf c t)

Figure 8.7 Functional block diagram of a phase-synchronous modulator.

s I (t )

Passband signal

Baseband signal for the I-component

cos(2 π f ct )

sI (t)cos(2 π f c t) - sQ(t)sin(2πf ct)

90 o

sI(t) + isQ(t) 2

sQ(t)

Baseband signal for the Q-component

Figure 8.8 Functional block diagram of a phase-synchronous demodulator.

Linear demodulation refers to the process of converting a passband signal ¸ s (t ) 7 into a complex-baseband electrical signal s (t ). A functional block diagram of a phase-synchronous demodulator is shown in Figure 8.8. The frequency reference in the demodulator, which may be offset in phase by θ , is called the local oscillator. Then the baseband electrical signal is s (t )eiθ , requiring an eventual rotation by θ in subsequent processing. The term ei θ is suppressed in this chapter. The estimation of θ is discussed in Chapter 12. The power in the local oscillator signal can be made much larger than the power in the incident lightwave signal (or in the lightwave noise), so as to provide a form of amplification called mixing gain. The demodulation from passband to baseband may instead take place in two steps as shown in Figure 8.9. The first step down-converts a passband lightwave waveform at frequency f c into a passband electrical waveform at frequency f IF using a lightwave local oscillator at frequency f LO with the nonzero frequency difference f c − f LO called the intermediate frequency f IF . This process is called heterodyne demodulation. It downconverts a lightwave passband waveform to an electrical-passband waveform using a single balanced photodetector, which consists of two physical photodetectors. A filter, at passband, may be inserted at this point. Then a second step down-converts the electrical passband waveform to an electrical complex-baseband waveform, now using an electrical local oscillator with a frequency equal to f IF .

()

7 The notation s t refers to both a noiseless lightwave signal and an electrical signal generated by balanced

photodetection. The physical units are different and any scaling constant has been suppressed. When noise is included, r¯ (t ) may be used to differentiate the noiseless lightwave and electrical signals.

380

8 The Electrical Channel

s I (t )

Passband signal sI(t)cos(2πfc t)

Baseband signal for the I-component

cos(2 πfIF t)

- sQ (t)sin(2 πfct) cos(2 πfLOt)

90 o

sI(t) + isQ (t) 2

sQ(t)

Baseband signal for the Q-component

Figure 8.9 Functional block diagram of a two-step phase-synchronous demodulator.

Detailed Description

The two-step down-conversion to complex baseband uses only one balanced photodetector and allows additional filtering at the intermediate frequency f IF in the electrical domain before a final homodyne demodulation. In principle, several subcarriers can be simultaneously down-converted to a common intermediate frequency using a single local oscillator. Then each subcarrier can be segregated by an electrical filter and separately homodyne-demodulated to complex baseband. The demodulation from passband to baseband may instead take place in one step. In this case, the lightwave local oscillator frequency f LO is equal to the carrier frequency f c . This type of demodulation is called homodyne demodulation. When only one signal component is homodyne-demodulated, a real-baseband electrical waveform is produced. When both signal components are homodyne-demodulated, a complex-baseband electrical waveform is produced. A separate balanced photodetector is required for each signal component. Noise in the lightwave signal must be considered as described next. The spectral noise in the electrical signal N 0 may include the additive spontaneous emission noise in the lightwave prior to balanced photodetection, the shot noise generated during demodulation and photodetection, and the additive thermal noise after balanced photodetection. It may also include the dark-current noise in the photodetector and the noise on the local oscillator signal. First, an expression for additive lightwave noise is discussed. This expression is used to derive the ratio E p / N 0 of the pulse energy E p to the power density spectrum N0 for heterodyne and homodyne demodulation.8 For binary modulation formats, E p = E b . When only additive lightwave noise is considered, the expression for E p / N0 is shown to be the same for both methods of demodulation, as is the case for lower-frequency systems. Then, at the end of the subsection, a semiclassical description of shot noise is included in the analysis. This form of noise is different for heterodyne and homodyne demodulation, leading to different expressions for E p / N0 . 8 The physical units of E are energy per unit resistance. The physical units of N are A2 /Hz, which is the p 0

electrical noise power density spectrum per unit resistance. The reference to unit resistance can be suppressed. The expected number Ep of signal photons is E p / h f . The expected number Nsp of noise photons is N sp/ h f . A full discussion on units is given in the notation section of the book.

8.2 Lightwave Demodulation

381

Additive Lightwave Noise The statistical properties of the down-converted electrical passband noise process ¸ nIF (t ) include those properties due to the incident spontaneous emission noise within the band of the lightwave noise-suppressing filter. Consequently, the electrical-noise power density spectrum N0 caused by spontaneous emission is proportional to the lightwave-noise power density spectrum N sp at the output of the optical noise-suppressing filter as given by (7.7.8). With the responsivity = ηe/ h f explicitly included (cf. (6.2.23)), the electrical-noise power density spectrum N0 due to spontaneous emission noise can be written in two equivalent forms as

R

N0

=. R P

= 2R2 P Nsp = 2ei Nsp , LO

(8.2.3)

LO

where i LO LO is the electrical signal generated from the lightwave power in the local oscillator, where N sp = N sp (ηe/ h f ) = eN sp when η = 1, and where Nsp = N sp / h f is the expected number of noise photons (cf. (7.7.7)). The electrical thermal noise N elec comes after the mixer and does not experience mixing gain. Lightwave noise does experience mixing gain. Therefore, for a sufficiently large local oscillator mixing gain, the term Nelec can be ignored compared with the electrical noise generated from the mixing of the additive lightwave noise with the local oscillator. The validity of this statement is considered as an end-of-chapter exercise.

R

Heterodyne Demodulation Heterodyne demodulation transfers the modulation on the lightwave waveform onto an electrical waveform at passband. A phase-synchronous heterodyne demodulator based on balanced photodetection (cf. Section 7.3.2) using a 180 ◦ hybrid coupler (cf. Figure 7.2(a)) is shown in Figure 8.10. Referring to Figure 8.10(a), in the presence of lightwave noise no (t ), the received complex noisy lightwave signal (s (t )+ no (t )) ei2π f c t (a)

Optical s(t) +no (t) sLO (t)

(b)

Balanced photodetector

Electrical

r˜ (t )

Electrical demodulation

s˜ (t )

r(t) = rI(t) + irQ(t)

B n(t)

-fc fc Lightwave signal before noise filtering

-fc fc Lightwave signal after noise filtering

-fIF f IF Electrical signal after heterodyne demodulation

Figure 8.10 Functional block diagram of phase-synchronous heterodyne demodulation of a

lightwave signal. (a) Block diagram of a lightwave passband demodulator. (b) Signal and noise spectra after each stage along with the passband signal bandwidth B.

382

8 The Electrical Channel

is first optically filtered to suppress the noise outside the spectral bandwidth of the lightwave signal. For a single mode, the complex lightwave signal s (t ) at the output of the lightwave channel is s (t ) = h(t ) ± sin (t ) (cf. (8.1.12)), where h(t ) is the complexbaseband impulse response of the lightwave channel including the effect of the noise filter shown in Figure 8.10(b). The spectrum of the noisy lightwave signal before and after filtering is shown in Figure 8.10(b) together with the spectrum of the received output lightwave signal s (t ). Using (7.3.1) with a = (s (t ) + no (t )) ei2π f c t and b = A LO ei2π f LO t , where f LO = fc − f IF , the noisy passband electrical waveform ¸ r (t ) can be written as (cf. (7.3.2))

¹ º ¸r (t ) = A Re (h (t ) ± sin (t ) + no (t )) ei2π f t ¹ º = A Re (s (t ) + no (t )) ei2π f t = A |s (t )|cos (2π f t + φs (t )) + A ¸n (t ). IF

LO

IF

LO

LO

IF

LO

IF

(8.2.4a) (8.2.4b) (8.2.4c)

The complex envelope s (t ) = |s (t )|ei φs (t ) and the passband noise process ¸ n IF (t ) = Re[no (t )ei2π f IF t ] are now centered at the intermediate carrier frequency fIF = f c − f LO . The noise or interference in the electrical domain may now be filtered from this passband electrical waveform. The electrical pulse energy E p in the passband electrical pulse ¸ p(t ) after heterodyne demodulation is (cf. (2.1.74)) Ep

²∞

¸p2(t )dt −∞² ∞ = 2i | p(t )|2 cos 2(2π f t + φs (t ))dt −∞ = 2i W p ,

=

LO

IF

LO

. R

(8.2.5)

R

where iLO = A 2LO /2 = PLO is the electrical signal generated from the lightwave local oscillator power. The last line follows » ∞ because the rapidly varying cosine carrier averages to one-half and W p = 21 −∞ | p(t )|2 dt is the photocharge due to the received lightwave pulse s (t ). Because the electrical pulse energy E p is proportional to the local oscillator signal i LO , balanced heterodyne photodetection provides mixing gain controlled by the strength of the local oscillator. The wave-optics electrical pulse energy in (8.2.5) is reconciled with photon optics using W p = eWp . Making this substitution in (8.2.5) and setting the quantum efficiency η equal to one for convenience so that expected number Wp of photoelectrons in a pulse is equal to the expected number Ep of photons in a pulse9 gives Ep

= 2ei

LO Ep

.

9 The term W in italic font denotes the real-valued photocharge within wave optics. The term p

(8.2.6)

= W p /e = Ep /η in sans-serif font denotes the real-valued mean number of photoelectron events within photon optics. These relationships are listed in Table 6.2.

Wp

8.2 Lightwave Demodulation

383

Writing the electrical noise power density spectrum as N 0 = 2ei LO Nsp (cf. (8.2.3)), which ignores thermal noise, and using (8.2.6) for the electrical pulse energy E p at passband after heterodyne demodulation, the ratio E p / N0 is Ep N0

2ei = 2ei

LO Ep

LO N sp

= NEp . sp

(8.2.7)

Homodyne Demodulation Homodyne demodulation transfers the modulation on the lightwave waveform onto an electrical waveform at real or complex baseband. For a homodyne-demodulated complex-baseband signal, each of the two components requires a separate balanced photodetector. Therefore, to produce a complex-baseband signal requires two balanced photodetectors. For a homodyne-demodulated real-baseband signal one balanced photodetector is sufficient. For homodyne demodulation, the local oscillator may be phase-locked with the incident lightwave. In this case, the frequency and phase offset between the local oscillator and the incident lightwave can be estimated and corrected in the complex-baseband signal after balanced photodetection. Small phase or frequency offsets can be corrected in the complex-baseband signal, provided that both signal components are available. Methods of phase estimation are discussed in Section 12.2. The demodulated signal for the homodyne demodulation of either the in-phase or the quadrature lightwave signal component is determined by setting f IF equal to zero and setting φs equal to zero or π/2 in (8.2.4c). The electrical energy E p in the demodulated real-baseband pulse A LOs (t ) after homodyne demodulation is (cf. (8.2.6)) Ep

= 2i = 4ei

²∞

LO

−∞

LO Ep

p2 (t )dt

,

(8.2.8)

where i LO is an electrical describing the lightwave power in the local oscilla» ∞ psignal 2(t )dt is the expected number of photons needed for ideal tor, and Ep = (1/2e) −∞ photodetection. The electrical pulse energy given in (8.2.8) for homodyne demodulation to a realbaseband pulse is twice as large as the electrical pulse energy in the passband pulse generated using heterodyne demodulation given in (8.2.6). The noise power density spectrum N0 is also doubled compared with (8.2.3) so that E p / N0 is Ep N0

4ei = 4ei

LO Ep

LO Nsp

= NEp , sp

(8.2.9)

which is the same expression as for heterodyne demodulation. The simultaneous homodyne demodulation of both the in-phase and the quadrature signal components to form a complex-baseband signal is based on balanced photodetection of each signal component (cf. Figure 7.2(b)). A functional description is shown in Figure 8.11. The noisy demodulated in-phase and quadrature components at complex baseband are

384

8 The Electrical Channel

(a) Optical s(t) + no (t) sLO(t)

Electrical rI(t)

Balanced photodetector

2

rI(t) + irQ (t)

rQ(t)

(b) In-phase

cos( f LOt) Filtered lightwave signals

B sin( fLOt)

Quadrature

Figure 8.11 (a) Functional block diagram of homodyne demodulation to a complex-baseband

signal. (b) Signal spectra for the in-phase and quadrature signal paths. Left: the signal spectrum for each signal component before mixing. Right: the baseband signal components.

r I(t ) = 21 ALO Re [s(t ) + no (t )] ,

(8.2.10a)

rQ(t ) = 12 ALO Im [s(t ) + no (t )] .

Within the wave-optics model, the complex-baseband waveform consists of the two real waveforms that together are regarded as one complex waveform. That is r (t ) = rI (t ) + ir Q(t )

= 21 A = 12 A = 12 A

LO LO LO

(h(t ) ± sin (t ) + n o(t )) (s (t ) + n o(t )) ¹ ( )º (s (t ) + no (t )) + i s (t ) + n o (t ) . I

I

Q

Q

(8.2.11)

For the same local oscillator power per signal path, the signal term Ep and the noise term N sp in (8.2.9) are halved as compared with a homodyne demodulator designed for one signal component to a real-baseband signal, but the value of E p / N 0 remains unchanged. In summary, when only additive lightwave noise is considered, heterodyne and homodyne demodulation produce the same value of E p / N 0. The parallel statement is not true when shot noise is considered. This is discussed next.

Shot Noise The shot noise in the down-converted electrical signal depends on the method of demodulation of the lightwave signal. Shot noise is a semiclassical depiction of quantum noise and is discussed in Chapter 15 using quantum optics. This section treats shot noise using a semiclassical large-signal analysis. Because the shot noise depends on the kind of demodulation, the expressions for E p / N 0 are modified compared with the expressions when only additive lightwave noise is considered. Indeed, for a sufficiently large local

8.2 Lightwave Demodulation

385

oscillator mixing gain, the electrical additive noise term Nelec can be ignored compared with the lightwave noise. For phase-synchronous demodulation with a large mixing gain, the local oscillator power is much larger than the signal power or the noise power. Under this condition, the local oscillator generates the largest contribution to the shot noise so that the shot noise from the lightwave signal and the lightwave amplifier noise can be neglected. In this large-signal regime, the shot noise in the down-converted electrical signal is accurately modeled as an independent additive white gaussian noise source with an electrical power density spectrum given by (cf. (6.7.8)) Nshot

= 2ei .

(8.2.12)

LO

Combining the two lightwave noise sources given in (8.2.3) and (8.2.12) yields N0

= N sp + N shot = 2ei

LO

e

(N + 1) , sp

(8.2.13)

where the power density spectrum of the additive lightwave noise given in (8.2.3) is now denoted as Nspe . Using the modified expression for N0 that includes shot noise, the expression for E p / N0 for heterodyne demodulation given in (8.2.7) is modified to read Ep N0

= 2ei 2ei(N E+p 1) = N Ep+ 1 . sp sp LO

LO

(8.2.14)

When the mean number of noise photons N sp generated by spontaneous emission noise is much less than one, (8.2.14) reduces to Ep N0



Ep Nshot

= Ep.

(8.2.15)

This expression states that the electrical signal-to-noise ratio for heterodyne demodulation – limited solely by the shot noise generated by balanced photodetection – increases linearly with the number of photons Ep in a pulse. Equivalently, the shot noise generated during the heterodyne mixing process is equivalent to one photon or N 0 = 1. Subsequent down-conversion in the electrical domain does not generate additional shot noise. Similarly, under the same large-mixing-gain condition as used for heterodyne demodulation, the shot noise generated for homodyne demodulation does not change and depends only on the power in the local oscillator. Using the modified expression for N0 that includes shot noise, the expression for E p / N0 for homodyne demodulation given in (8.2.9) is modified to read Ep N0

4ei LO Ep LO N sp + 2ei LO

= 4ei

= N +Ep 1/2 . sp

(8.2.16)

When the mean number of noise photons Nsp generated by spontaneous emission noise is much smaller than one, (8.2.16) reduces to Ep N0



Ep N shot

= 2Ep .

(8.2.17)

386

8 The Electrical Channel

This expression states that the electrical signal-to-noise ratio for the homodyne demodulation of one signal component to a real-baseband signal – limited solely by the shot noise generated by balanced photodetection – is twice that of heterodyne demodulation given in (8.2.15) and is equivalent to half a photon or N0 = 1/2. This is the minimum amount of quantum noise in a classical lightwave. The origin of the shot noise in the expressions for E b / N 0 is described properly as a form of quantum uncertainty in Chapter 15 using the quantum-optics signal model. That chapter shows that the shot noise generated in phase-synchronous modulation depends on the number of modes that interact in the demodulation process. This form of “mixing” shot noise is different than the “counting” shot noise generated by direct photodetection because the two methods of demodulation are different. One method uses balanced photodetection. The other method uses direct photodetection. For a system limited only by the shot noise generated from the mixing of the signal with the local oscillator in balanced photodetection, the power density spectrum for heterodyne demodulation to a passband signal is twice as large as homodyne demodulation to a real-baseband signal. The difference arises because two modes actually contribute to the shot noise in the demodulated electrical signal – the signal mode and an additional image mode that interacts with the signal within the receiver. The trigonometric identities used to describe the mixing in heterodyne demodulation show that two lightwave signals can contribute to the electrical signal at the intermediate frequency f IF = f c − f LO , where f c is taken to be larger than fLO . The first signal is centered at the carrier frequency fc . The second signal is centered at the frequency 2 f LO − fc because f LO − (2 f LO − f c ) = f IF . This frequency is called an image mode or image frequency. Any noise remaining in this image mode after bandpass filtering will fold into the demodulated signal centered at f IF . The quantum-optics signal model of Chapter 15 shows that the vacuum-state fluctuations in this image mode are always present and cannot be removed by filtering. These fluctuations interact with the signal mode during the mixing process, thereby increasing the noise in the demodulated passband electrical signal. The mixing of the local oscillator with the vacuum-state fluctuations together with the quantum fluctuations in the signal mode itself produce a noise power density spectrum N shot equal to one in accordance with (8.2.15). The use of balanced photodetection means that this form of shot noise does not, in general, have the same statistical properties as the shot noise generated using direct photodetection. This distinction is discussed in Chapter 15. For homodyne demodulation to a real-baseband signal, an image mode is not present because the carrier frequency fc is equal to the frequency of the local oscillator f LO , so 2 f LO − f c = f c . Therefore, there is only a single mode at frequency f c that contributes to the shot noise in the demodulated electrical signal. For homodyne demodulation to a complex-baseband signal, the shot-noise power density spectrum is twice as large as homodyne demodulation to a real-baseband signal. This comes from an additional mode interacting with the signal within the receiver. The additional mode occurs because the incident lightwave signal must be split into the two quadrature receiver paths. A power splitter for a lightwave signal always has two input

8.2 Lightwave Demodulation

387

spatial modes, one mode with the signal, the other with no signal but with vacuum fluctuations (cf. Section 6.1.2). Each output port of the splitter provides a linear combination of these two input modes (cf. (7.1.1)). Each of the two output modes has vacuum-state noise from the input mode that contained no signal. This splitting is not required for homodyne demodulation to a real-baseband signal. The quantum origin of the noise in this additional mode as well as in the image mode seen in heterodyne demodulation is discussed in detail in Section 15.6.

8.2.2

Demodulation of the Lightwave Intensity

Phase-asynchronous demodulation of the lightwave intensity generates an electrical waveform without using knowledge of the phase of the carrier. This limitation is unavoidable for a noncoherent carrier. In this case, the corresponding electrical channel using direct photodetection is not linear in the complex lightwave amplitude, but may be linear in the real lightwave power. In general, linearity in the lightwave power requires that the random phase of the carrier rapidly varies compared with the modulation. A lightwave with such a carrier is called a noncoherent carrier. In general, to achieve linearity in the power, the power density spectrum of the time-varying phase of the noncoherent carrier should be much wider than the spectrum of the modulation. This means that the transmitted power density spectrum of an intensity-modulated noncoherent source is dominated by the carrier phase variations and not by the modulating waveform.

Noncoherent Superposition

Consider the lightwave signal in a single spatial mode at the fiber output that is the sum of two lightwave complex amplitudes s1 (t ) and s2 (t ) with modulated time-varying intensities and random time-varying phases φ1(t ) and φ2(t ). When this complex field envelope is directly photodetected, the received electrical signal is

¼¼ ¼2 s1 ( t ) + s2 ( t ) ¼ ½ = P1(t ) + P2(t ) + 2 P1(t ) P2(t ) cos(φ1(t ) − φ2 (t )). (8.2.18) √ √ The two lightwave signals are given by s1 (t )= 2P1 (t )eiφ (t ) and s2 (t )= 2P2 (t )eiφ (t ) r (t ) =

1 2

1

2

(cf. (2.3.27) and (2.3.29)). When the random phases φ1 and φ2 are independent and uniformly distributed on [0, 2π) for any value of t, the cross term in (8.2.18) has an ensemble average equal to zero. For this case, the expected electrical signal ²r (t )³ is linear in the lightwave power, but there may be a large variance in that signal due to the cross term. However, for a phase that varies rapidly over the duration of the pulse, the cross term varies rapidly compared with the other terms and its frequency spectrum is much wider than that of the other terms. Because it averages to zero over time, the variance of the cross term can be made nearly equal to zero at the output of an appropriate time-domain filter with little effect on the other terms.

388

8 The Electrical Channel

When the random phase does not vary significantly over the duration of the pulse, the cross term cannot be filtered. The variance is 2P1(t ) P2(t ) because ²cos 2(φ 1(t ) − φ2(t ))³ = 1/2. For this case, the statement that the electrical signal ²r (t )³ is linear in expectation in the lightwave power does not imply that any realization of the electrical signal r (t ) is linear in the lightwave power even when the two random phases φ1 and φ2 are independent. When, however, the random phase does vary rapidly over the duration of the pulse, the cross term in (8.2.18) has a variance after filtering nearly equal to zero and can be ignored. The realization r (t ) of the electrical signal is then linear in the lightwave power. When the random phases φ1 and φ 2 are dependent, the situation is more complicated. Two pulses separated by a time ±t modulated onto a common noncoherent carrier each have a random phase defined using a common phase noise process φ(t ) describing the noncoherent carrier at two times that differ by ±t . When ±t is smaller than the coherence timewidth τc of the noncoherent carrier, the random variables φ 1 and φ2 defined at two time instants separated by ± t are correlated. Any overlap of the two received pulses will add in the complex amplitude. When ± t is larger than the coherence timewidth τc , φ1 and φ 2 can be regarded as independent random variables, and the received pulses add in power. When ± t is comparable to τc , the sum cannot be described either in terms of the complex amplitude or in terms of the power. This partially coherent case is discussed briefly in Section 11.5.1. For a fully noncoherent carrier with a wavelength autocorrelation function given by a scaled Dirac impulse (cf. (4.5.5)), the intersymbol interference due to the dispersion and filtering of the passband signal in the channel is linear in the power. Intersymbol interference can be removed by suitable prefiltering or postfiltering of the noncoherent carrier after the modulation has taken place, provided that the initial modulation uses a Nyquist pulse. In contrast, intersymbol interference caused by filtering of the baseband signal prior to modulation onto the noncoherent carrier will not be linear in the lightwave power of the individual pulses because the overlapping pulses that produce the intersymbol interference have a common phase and so do not add in power.

Phase-Asynchronous Demodulation

The straightforward phase-asynchronous demodulator uses direct photodetection with the demodulated electrical waveform proportional to the lightwave power. A less straightforward alternative phase-asynchronous method is heterodyne demodulation of the lightwave signal followed by envelope detection of the resulting passband electrical signal. These two kinds of demodulators are shown in Figure 8.12.

Intensity Demodulation A direct-photodetection demodulator is shown in Figure 8.12(a). For an ideal phase-insensitive lightwave amplifier with a gain G (cf. (7.7.10)) followed by direct photodetection with η = 1, the mean value N sp of the photodetected noise is approximately equal to G (cf. (7.7.10)). Similarly, the mean number Ep of photons in a pulse is

8.2 Lightwave Demodulation

(a)

Optical

±s(t )

Lightwave amplifier

Electrical

Optical/electrical conversion Optical noise limiting filter

Square-law photodetector

389

r(t)

Baseband filter

Sample rk

±n o (t )

(b)

±s(t )

Lightwave amplifier n o(t )

Optical noise limiting filter

Balanced photodetector

L

r±(t )

Passband filter

Envelope detection

Sample rk

Figure 8.12 Phase-asynchronous demodulators. (a) Ideal lightwave amplification followed by

direct photodetection. (b) Lightwave heterodyne demodulation followed by electrical envelope detection.

equal to G Ein , where Ein is the mean number of photons in the pulse at the input to the lightwave amplifier. The ratio Ep / Nsp is Ep Nsp

≈ GGEp = Ein .

(8.2.19)

This expression shows that the expected number of photons Ein in a pulse at the input to an ideal lightwave amplifier followed by direct photodetection is equal to the value of the electrical quantity E p / N 0, a real number that would have been obtained using asynchronous heterodyne demodulation. This number is given by E p / N 0 = Ep /(Nsp + 1) (cf. (8.2.14)). When a lightwave amplifier is used, Nsp is much larger than one and E p / N0 ≈ Ep /N sp. Now use (8.2.19) to replace Ep /Nsp with Ein to give E p / N 0 ≈ Ein .

(8.2.20)

This expression relates the number of photons Ein at the input to the lightwave amplifier for direct-photodetection demodulation to the electrical signal-to-noise ratio for asynchronous heterodyne demodulation.

Asynchronous Heterodyne Demodulation Demodulation of a phase-asynchronous waveform by heterodyning is shown in Figure 8.12(b) as an alternative to direct photodetection. Although this demodulator is more complex, which may make it unattractive, the approach does make it possible to use a better filter in the electrical domain prior to envelope detection. This could provide an improved detection statistic compared with direct photodetection, an advantage that is discussed in Section 10.5. The noisy asynchronous passband electrical waveform ¸ r (t ) at frequency f IF after balanced photodetection has a complex envelope given by r (t ) = s (t )eiφ(t ) + n (t ).

(8.2.21)

390

8 The Electrical Channel

The complex envelope s (t ) is multiplied by a term eiφ(t ) that represents the random carrier phase at the receiver. The term n(t ) is an additive circularly symmetric gaussian noise process with an electrical power density spectrum N 0 (cf. (8.2.13)). This additive noise term contains a combination of local oscillator shot noise, additive thermal noise, and spontaneous-emission noise, if present. For each realization of the rapidly varying carrier phase, the transform S( f ) of s (t ) is spread to the Fourier transform of s (t )eiφ(t ) . A detection filter y(t ) prior to envelope detection must have a bandwidth large enough to pass the phase noise. Otherwise, the signal will be lost along with the phase noise. A filter can be used for pulse shaping only if the carrier phase is varying sufficiently slowly within a symbol interval that it can be represented as a uniform random variable φ over the duration of the pulse. This would be the case for orthogonal signaling.

Asynchronous Orthogonal Signaling A modulation waveform may use multiple pulse shapes, here denoted s² (t ) for ² = 1, . . . , L . In the simplest case, these pulses are orthogonal and jointly Nyquist pulses with respect to the symbol interval T . As will be shown in Section 9.4, for slowly varying random phase, optimal demodulation uses first the bank of matched filters y² ± (t ) = s²∗± (T − t ) then envelope detection for ² ± = 1, . . . , L, where ²± indexes the matched filters. Suppose that the ²th of the L complex symbols s² (t ) is received in additive noise. A copy of the received signal is filtered by s²∗± (T − t ) for ²± = 1, . . . , L. The L filters comprise a bank of matched filters, and the magnitudes of the outputs are¼sampled. ¼¼ For ¼ orthogonal waveforms, the magnitude of the complex envelope sample r ² ± after the ²± th matched filter is

¼¼ ¼¼ r ² ± = |r² (t ) ± y² ± (t )|t = T ¼¼ ¼¼ ² ∞ ²∞ iφ ∗ ∗ ¼ s² (τ)s²± (t − τ)dτ + n(τ)s²± (t − τ)dτ ¼¼ = ¼e −∞ −∞ t=T ¼¼ ¼¼ iφ = ¼2E p e δ²²± + n¼ , (8.2.22) »∞ »∞ where 2E p = −∞ | s (t )|2 dt, so that −∞ s² (t )s∗²± (t )dt = 2E p δ ²²± for a set of orthogonal waveforms. Expression (8.2.22) states that for the L − 1 matched filter outputs for which ²± does not equal ², the sample is simply the magnitude of the noise |n |. For the single matched filter for which ² is equal to ²± , the sample is the magnitude of the signal plus

noise. The L -ary detection process then asserts that the filter with the largest output magnitude corresponds to the transmitted symbol. Ideal binary on–off keying is a simple example of orthogonal signaling with asymmetric pulse energies. No signal is transmitted for a space, which is trivially orthogonal to the mark. The receiver uses a single detection filter matched to the waveform of a mark with E p = E 1. When a mark is transmitted, the complex sample r 1 is r1

¼ ¼ = ¼¼2E1 eiφ + n¼¼ ,

(8.2.23)

8.2 Lightwave Demodulation

391

which is a ricean random variable with a probability density function given by (2.2.33). When a space is transmitted, the complex sample r 0 is |n |, which is a Rayleigh random variable with a probability density function given by (2.2.34). Orthogonal signaling – both coherent and noncoherent – is an energy-efficient, but bandwidth-inefficient modulation format with moderate complexity. It is suitable for a slowly varying noncoherent channel. When the signaling waveforms are not pairwise orthogonal, either by design or because of dispersion in the channel, the Kronecker impulse δ ²²± is replaced by the correlation coefficient ρ ²²± (cf. (10.1.5)) between the two waveforms. For this case, the appropriate detection functions are the magnitudes of various linear combinations of the nonorthogonal waveforms. Nonorthogonal waveforms are discussed in Section 10.4.4. 8.2.3

Demodulation of Pulse Intensity in Multiple Modes

A single noncoherent lightwave pulse x (t ) = | x (t )|eiφ(t ) launched into a multimode fiber with dispersion will result in a superposition of overlapping copies of the transmitted pulse being seen at the receiver. These copies of the pulse at the receiver will have different modal group delays. A single pulse modulated onto a noncoherent lightwave at the input of a multimode fiber results in a superposition of multiple delayed copies of that noncoherent waveform at the output face of the fiber. The copies of the pulse traveling in different modes have different spatial structures across the (x , y ) plane, and the modes may couple differently into a photodetector whose aperture has a finite spatial extent (cf. Section 8.1.4). This section will ignore spatial considerations. It provides a basic understanding of modal interactions. For an input lightwave pulse x (t ), the output lightwave pulse in the mth spatial mode is given by am x (t −τm ) (cf. (8.1.9)), where the scaling constant H0 is here absorbed into am and τm is the group delay in the mth mode. Using (8.1.9), the total noiseless output electrical pulse p (t ) is p(t ) =

|x (t ) ± h(t )|2 ¼M ¼¼2 1 ¼¼ ± = 2 ¼¼ am x (t − τm )¼¼¼ , m =1 1 2

(8.2.24)

where the complex constant am is the proportion of the lightwave signal in the mth mode. Expanding this expression and explicitly inserting the noncoherent lightwave carrier phase for each mode (cf. (8.2.18)) gives p(t ) =

=

1 2

¾± M

m =1

M ±

m =1

+ 12

am | x (t

Fm Pin (t M ± M ± m =1 n ¶=m

− τm )|e

i φ(t −τm )

¿¾± M

n= 1

a∗ | x (t − τn )| e−iφ(t −τn ) )

¿

n

− τm ) am an∗ | x (t

− τm )||x (t − τn )|ei(φ(t −τ )−φ(t −τ )), n

m

(8.2.25)

392

8 The Electrical Channel

where Pin (t ) = | x (t )|2 /2 is the lightwave power, and Fm = |am | 2 is the proportion of the lightwave power in the mth mode (cf. (8.1.9)). The spatial orthogonality of the modes is not considered. Each summand in the first term of (8.2.25) is the power in a single mode. Each summand in the second term of (8.2.25) is the product of two complex pulses in two different modes generated from a single launched pulse. These pulses overlap at the receiver, causing modal interference. For a fully noncoherent carrier, for which φ(t ) is white noise, the expectation of the random phase in the cross term of (8.2.25) is proportional to δ(τn − τm ) (cf. (8.2.18)). The double summation then vanishes. For a random phase in (8.2.25) that is varying rapidly compared with the modulation, the cross term is real and will tend to average to zero in subsequent filtering. Thus, it can be ignored in the signal. It suffices to write the electrical pulse as p(t ) =

M ± m =1

Fm Pin(t

= Pin (t ) ±

M ± m =1

− τm ) Fm δ(t

− τm ).

(8.2.26)

With only mode-dependent group delay considered, the output intensity pulse in each mode is a shifted form of the input lightwave power waveform Pin (t ), with the shift depending on the mode. Then (8.2.26) is the output pulse for an electrical channel impaired only by mode-dependent group delay, with the electrical impulse response ∑ F δ(t − τ ). Collectively, these modal delays cause delay given by helec (t ) = M m m =1 m spread and intermodal dispersion in the composite intensity pulse generated from the portion of the delayed pulse in each mode. For a partially coherent carrier, the interfering terms have dependent phases that need not average to zero, thereby invalidating (8.2.26). Interference of partially coherent multimode copies of a pulse is considered in Section 11.5.1.

8.2.4

Demodulation of Pulse Intensity in a Single Mode



A single noncoherent lightwave pulse s (t ) = 2Pin (t )eiφ(t ) = | s(t )| eiφ(t ) in a single spatial mode will experience a wavelength-dependent group delay that depends on the power density spectrum S λ (λ) of s (t ). The electrical pulse x (t ) at the input to the electrical channel is proportional to the transmitted lightwave pulse power Pin (t ). The mean directly photodetected electrical pulse p(t ) at the electrical channel output is related to the electrical pulse x (t ) at the electrical channel input by p(t ) = x (t ) ± helec(t ),

(8.2.27)

where the received electrical pulse p(t ) is proportional to the received lightwave pulse power Pout (t ). The impulse response helec (t ) is the channel response as seen by the electrical channel due to the wavelength-dependent group delay for a fully noncoherent lightwave carrier. This section derives an expression for helec (t ).

8.2 Lightwave Demodulation

393

For a noncoherent lightwave carrier, the wavelength-dependent group delay in a single mode can be described by simply replacing the expressions for the mode-dependent group delay by the analogous expressions for the wavelength dependence. For each mode, replace the mode-dependent group delay τm by the wavelength-dependent group . delay τλ = λsλ , where sλ is the slope of the wavelength-dependent group delay in that mode evaluated at the carrier wavelength (c.f. (4.4.12)). The normalized power density spectrum of the lightwave signal Sλ (λ)/ P in each wavelength λ (cf. (4.5.4)) replaces the proportion Fm of the power in mode m. Finally, replace the summation over m by an integration over λ . In this way, the continuous equivalent of (8.2.26) for the photodetected electrical pulse p(t ) in a single mode is written as p(t ) =

²

1 ∞ Sλ (λ) Pin (t P −∞

− λsλ)dλ,

(8.2.28)

where the time t is measured with respect to the group delay τm for the mth mode, Pin (t ) is the input pulse power, and sλ = dτ/dλ|λ=λc is the slope of the wavelength-dependent group delay evaluated at the carrier for the mth mode (cf. Figure 4.8(a)). Substituting . the wavelength-dependent group delay τλ = λsλ into (8.2.28) gives p(t ) = where dλ = dτλ /sλ and

²∞

−∞

helec (τλ ) Pin (t

.

h elec(τλ ) =

− τλ)dτλ,

1 Sλ (τ λ /sλ ) Psλ

(8.2.29a)

(8.2.29b)

is the impulse response seen by an electrical channel due to the wavelength-dependent group delay of the intensity pulse in a single mode. Expression (8.2.29) holds for each mode of a multimode fiber. Because the slope sλ of the wavelength-dependent group delay can depend on the mode (cf. Figure 4.8(a)), h elec(τλ ) can vary from mode to mode. The electrical impulse response helec(τ λ ) is a scaled form of the power density spectrum S λ (λ) of the modulated lightwave pulse and is markedly different than the impulse response for the wavelength-dependent group delay of the amplitude using a coherent carrier (cf. (8.1.4)). Although examination of (8.2.29) might suggest that the received electrical pulse p(t ) does not depend on the phase φ(t ) of the noncoherent lightwave carrier, the power density spectrum Sλ (λ) given by (8.2.28) does depend on the random phase φ(t ) of the lightwave carrier as given by (7.8.2). Therefore, in a dispersive channel, p(t ) does depend indirectly on the statistics of φ(t ) even though square-law photodetection is phase-insensitive. The combined effect of mode-dependent group delay and wavelength-dependent group delay on a single pulse for a noncoherent carrier with direct photodetection is shown schematically in Figure 8.13. The received pulse waveform for this channel is compared with other channels in Table 8.1. As an example, let the transmitted pulse power be given by Pin(t ) = rect(t / T ), and let the normalized power density spectrum of the modulated lightwave pulse be given by

394

8 The Electrical Channel

Table 8.1 Received pulse waveforms excluding scaling constants for a single-input pulse for several channels

Kind of channel

Lightwave channel

Intermodal dispersion (multiple modes) pout (t ) =

M ± m =1

am x (t

Intramodal dispersion (single mode)

− τm )

(8.1.12)

(8.1.8) with (8.1.9) Electrical channel a

p(t ) =

M ±

m =1

pout (t ) =

Fm Pin(t

− τm )

p( t ) = (8.2.28)

(8.2.26)

1 P

²∞ −∞

²∞ −∞

2 x (τ)ei( t −τ) /2β 2 L dτ

Sλ (λ) Pin( t − λsλ )dλ

a Single-pulse response using a noncoherent carrier with direct photodetection.

Mode 1

τ3 τ2

Mode 2

Sum powers

τ1

Mode 3

Output pulse

Input

Output

Figure 8.13 A single input pulse coupled into several modes in a multimode fiber. For a

noncoherent carrier and direct photodetection, the output power is the sum of the power in each mode, with the impulse response varying from mode to mode.

Sλ (λ) P

= (σ σ/2λ)/22π+ λ2 ,

(8.2.30)

λ

where σλ is the spectral width of the modulated lightwave pulse. Substituting Pin (t ) and (8.2.30) into (8.2.29a) gives p(t ) =

=

³

²

´ ³ ´

1 ∞ t − τλ τ rect Sλ λ dτλ sλ −∞ T sλ ² t +T /2 1 σλ/2π dτλ . sλ t −T /2 (σλ /2)2 + (τλ /sλ )2

The standard indefinite integral

²

1

a2 + x2

dx

= a1 arctan

Àx Á a

(8.2.31)

+C

shows that, for the transmitted pulse rect(t / T ), the received electrical pulse including the effect of dispersion is given by

³ 2t + T ´ 1 ³ 2t − T ´ p(t ) = arctan π σλsλ − π arctan σ λsλ . 1

(8.2.32)

8.2 Lightwave Demodulation

395

1 0.1 esluP tuptuO detcetedotohP

0.8 0.5

0.6

1

0.4 0.2 0 –2

–1

0

1

2

Normalized Time (t/T) Figure 8.14 Dispersed output intensity pulse for a rectangular intensity pulse for several values of the dispersion parameter σ λ sλ / T .

Figure 8.14 plots the received electrical pulse given in (8.2.32) for three values of the normalized dispersion parameter σλ sλ / T , which characterizes the effect of the wavelength-dependent group delay on the received pulse shape. This figure shows that, for a large value of σλ sλ / T , a significant portion of the energy in the received electrical pulse resides outside the modulation interval T . This temporal energy redistribution produces intersymbol interference in a sequence of intensity pulses. The consequences and treatment of intersymbol interference are discussed in Chapter 11. 8.2.5

Cumulative Electrical Power Density Spectrum

S

The cumulative power density spectrum r ( f ) of the directly photodetected and filtered electrical waveform used to generate the detection statistic is studied in this section. The noisy electrical detection filter y(t ) used to generate the detection statistic is described by a transfer function Y ( f ). Using the power density spectrum for the directly photodetected electrical signal before the detection filter given by (6.4.17), the power density spectrum r ( f ) after the detection filter is

S

À

Á

Sr ( f ) = eR P + R2 S ( f ) |Y ( f )|2 . (8.2.33) The first parenthesized term, eR P, on the right is a white electrical power density P

spectrum due to the shot noise from the total mean lightwave power P . The second parenthesized term, 2 P ( f ) on the right is proportional to the power density spectrum P ( f ) of the lightwave power as given in (6.4.7) and repeated here,

S

RS

S ( f ) = P 2 δ( f ) + 2Ps Sn ( f ) + Sn ( f ) ± Sn ( f ), P

(8.2.34)

396

8 The Electrical Channel

S

where Ps is the mean signal power and n ( f ) is the power density spectrum of the complex amplitude of the lightwave noise. The first term on the right of (8.2.34) is the total mean power. The second term on the right is the signal mixing with the noise, and the last term is the noise mixing with itself. Suppose that the power density spectrum n ( f ) of the complex amplitude of the lightwave noise is a constant Nsp over the passband bandwidth B of a lightwave noisesuppressing filter so that n ( f ) = Nsp rect( f / B ). Then10

S

S

Sn ( f ) ± Sn ( f ) =

Â

2 (1 − | f | / B ) for | f | ≤ B B N sp

otherwise .

0

Substituting this expression into (8.2.34) gives

S ( f ) = P 2δ( f ) + 2Ps Nsp rect( f/ B) + B Nsp2 (1 − | f |/ B). (8.2.35) Finally, substituting (8.2.35) into (8.2.33) and setting R equal to one gives the filtered electrical power density spectrum Sr ( f ) as Sr ( f ) = ÃP 2|Y (ÄÅ 0)| 2δ( f ) + e P |Y ( f )|2 + Nth |Y ( f )| 2 Æ Ã ÄÅ Æ Ã ÄÅ Æ P

mean signal

shot noise

+ Ã2Ps Nsp rect(ÄÅf / B)|Y ( f )|Æ2 + signal–noise mixing

thermal noise

f |/ B )| Y ( f )| , ÃB Nsp(1 − | ÄÅ Æ 2

2

(8.2.36)

noise–noise mixing

where a filtered thermal noise term Nth |Y ( f )|2 has been included (cf. (2.2.64)) with N th denoting, for this example, the white thermal noise power density spectrum. A schematic plot of the five terms of the power density spectrum before the electrical filter is shown in Figure 8.15, where the two-sided arrows for the thermal noise and the shot noise denote a white power density spectrum. A plot of the power density spectrum for each of the four noise terms before the detection filter as a function of the average amplified lightwave signal power Ps is shown in Figure 8.16 for values that are typical of a high-gain, low-noise erbium-doped fiber amplifier. The signal–noise mixing term is the most significant term in a large-signal Mean signal

Signal–noise mixing

Shot noise

Noise–noise mixing

Thermal noise fc

B/2

B

f

Figure 8.15 Schematic plot of the terms in the power density spectrum of a noisy lightwave signal

before electrical filtering.

10 For a polarization-insensitive photodetector, there is an additional factor of two because the noise in both

polarizations is detected.

8.3 Discrete-Time Electrical Channels

–130

397

Total noise

–140 )zH/mBd( esioN

–150 Noise–noise mixing

–160

Thermal noise

–170 Signal–noise mixing

–180

Signal shot noise + Spontaneous-emission shot noise

–190 –200

–40

–30

–20

–10

0

10

Amplified Signal Power (dBm) Figure 8.16 Electrical noise terms for the direct photodetection of an optically amplified lightwave

signal at λ = 1550 nm, for G = 30 dB, n sp = 1, an electrical noise figure F N noise bandwidth B = 0.1 nm, and an output resistance of R = 50 ³.

= 5 dB, an optical

regime because that term scales linearly with the signal. The second largest term in this regime is the shot noise from the signal. The ratio of these two terms can be found to be approximately twice the gain G of the lightwave amplifier, which is asked for as an endof-chapter exercise. It is also evident that the thermal noise term and the noise–noise mixing term are independent of the signal power Ps and can be ignored for a highgain lightwave amplifier followed by a low-noise electrical amplifier. With no lightwave amplification and an ideal laser source, there is only thermal noise and shot noise. Otherwise, when the laser source is nonideal, the shot-noise term is replaced by the power density spectrum (cf. (7.8.10)) of the laser intensity noise. For frequencies smaller than the relaxation oscillation frequency f 0 , which characterizes the linearized, smallsignal bandwidth of a laser diode, the power density spectrum of the relative intensity noise N RIN( f ) is nearly constant. In this regime, the relative intensity noise can be treated as an additive white gaussian noise source with a variance that depends on the signal power.

8.3

Discrete-Time Electrical Channels The two-way conversion between a waveform channel and a discrete-time channel lies at the interface between the information channel and the electrical channel both at the transmitter and at the receiver. This combined task of interpolation at the transmitter and sampling at the receiver could be described within either the information channel or the electrical channel. For the purpose of this book, it is convenient to describe these conversions as part of the electrical channel, recognizing that drawing the boundary between the electrical channel and the information channel is a matter of pedagogical convenience.

398

8 The Electrical Channel

(a)

Transmitted s (t) waveform

(b)

{sj } Transmitted sequence

Pulse shaping

Transmitted sequence

Waveform channel

r (t)

Detection filter

{r j } Detection statistic

(c)

{sj}

s (t)

r (t) Received waveform

Channel

Pulse shaping

s(t)

Waveform channel

Noise filter

Raw samples

Digital processing

{r j } Detection statistic

Figure 8.17 (a) A waveform channel. (b) A simple equivalent discrete-time channel. (c) A

discrete-time channel using a combination of noise filtering and digital processing.

This section describes three discrete-time electrical channel models. Two of these models are derived from the continuous-time waveform wave-optics models discussed in the previous section. The third discrete-time model is based on photon optics. These three discrete-time models will be used in Chapter 9 to develop several information channels. The relationship between a waveform channel and a discrete-time channel is shown schematically in Figure 8.17. Depending on our choice of the partition between the information channel and the electrical channel, the input to a discrete-time channel is a sequence {s j } of symbols generated from the coded-modulation process. Within the discrete-time electrical channel, this input sequence generates the continuous-time baseband waveform s (t ) by the process of interpolation using an appropriate transmit pulse x (t ) for this purpose. Correspondingly, after propagation through a waveform electrical channel, which is internal to the discrete-time electrical channel, the output sequence is derived by sampling the filtered baseband electrical waveform. The resulting sequence of real or complex values is the output of the discrete-time electrical channel. This sequence of samples is passed to the information channel, which determines the most likely sequence of logical symbols using an appropriate method of detection as described in Chapter 9. A properly formulated discrete-time channel is generated from an electrical waveform channel and incorporates all of the relevant properties of the continuous-time electrical waveform. For a linear, bandlimited channel, this conversion is readily accomplished using the sampling theorem (cf. (2.1.79)). For a nonlinear channel, which may generate new frequency components, this conversion is problematic and is discussed in Section 14.6. The discrete-time channel has a sequence {s j } of real or complex numbers at its input and a sequence {r j } of real or complex numbers at its output. Two examples are shown in Figures 8.17(b) and (c). Once the dependence of the output on the input has been defined, the discrete-time channel can be regarded as a black box whose internal

8.3 Discrete-Time Electrical Channels

399

structure is not of interest. Only the probabilistic relationship between the discrete-time output and the discrete-time input is relevant to the needs of the information channel.

8.3.1

Interpolation and Sampling

The relationship between a sequence {s j } of real or complex numbers generated from the coded-modulation process and a real-baseband or a complex-baseband waveform, s (t ), used for modulation can be derived from the Nyquist–Shannon sampling theorem (cf. Section 2.1.3) used in reverse as applied to interpolation. The sampling theorem can be read as an interpolation theorem stating that a baseband waveform s (t ) with a spectrum S( f ) that is zero outside a baseband bandwidth is produced from the sequence {s j } of samples using a time-scaled sinc pulse for interpolation so that (cf. (2.1.80))

W

s (t ) =

W

∞ ± j =−∞

W t − j ),

s j sinc (2

(8.3.1)

where = 1/2Ts is the Nyquist sampling rate. Then s ( j Ts ) = s j so the interpolation preserves the samples. A sinc pulse is one example of a Nyquist pulse, which is any pulse that is zero at all nonzero integers and has the value one when its argument is zero. Other Nyquist pulses are also used for interpolation and are studied in Section 9.2.2. The scaling property of the Fourier transform shows that, for a sample interval Ts , the corresponding spectrum of a sinc pulse is a rect function over a baseband bandwidth = 1/2Ts . For this case, the transmission of the sequence of real or complex values {s j } occurs at a rate of R = 1/ Ts samples per second using a baseband waveform with a bandwidth = R/2 = 1/2Ts . This is the minimum bandwidth of a waveform that can transmit R symbols per second. The Nyquist–Shannon sampling theorem (cf. (2.1.80)) states that a sampling rate of twice the (one-sided) signal bandwidth is sufficient to recover the entire waveform. Indeed, this sampling rate is necessary if the waveform is unknown except for the bandwidth. Practical implementations may use a rate higher than the Nyquist rate to simplify subsequent processing, as when the detection filter is placed after the sampler. For a bandlimited waveform with bandwidth , any sampling rate Rs at the receiver that is greater than the Nyquist rate R = 2 produces a sample sequence that is a sufficient detection statistic. This sequence is called a sufficient detection statistic because it retains all of the relevant information contained in the received continuous waveform that is needed to correctly decide the most likely transmitted symbol. An example of the spectrum of a sampled waveform is shown in Figure 8.18. Figure 8.18(a) illustrates that the frequency-domain sampling images do not overlap when sampled at the Nyquist rate. This waveform can instead be sampled as is shown in Figure 8.18(b) at a rate Rs that is less than the Nyquist rate of 2 without producing significant aliasing. Furthermore, for a waveform known to be a sequence of pulses with no intersymbol interference, the sampling rate can be set to the pulse rate, but this requires synchronization between the pulses and the samples. For a random datastream, the power in the aliased frequency components from the reduction in the sampling rate

W

W

W W

W

400

8 The Electrical Channel

(a)

Rs = 2

max

Frequency (b)

Rs < 2

max

Frequency Figure 8.18 (a) Spectrum of a sampled waveform using the Nyquist rate. (b) Spectrum of the same

waveform using a sampling rate that is less than the Nyquist rate, with the rate chosen so as to produce a small amount of aliasing.

may be treated as an additional noise source. This additional source of noise may be acceptable to reduce the sampling rate in high data-rate systems.

8.3.2

Conventional Discrete-Time Channels

Three conventional discrete-time electrical channel models used to analyze lightwave communication systems are described here. These three channels are an additive white gaussian noise channel, a direct-photodetection channel, and a Poisson channel. They are recapitulated here as a summary of the chapter.

Additive White Gaussian Noise Channel

A single-input single-output waveform channel with additive white gaussian noise may be an appropriate model for a lightwave channel that uses ideal phase-synchronous modulation and demodulation. For this channel, the transmitted and received electrical signal is proportional to the corresponding lightwave signal (cf. Section 8.2.1). The baseband electrical waveform, possibly complex, is given by r (t ) = s (t ) + n(t ), with the electrical noise power density spectrum N0 given by (8.2.3). To form an equivalent discrete-time model, the noisy received waveform r (t ) is projected onto a detection function y (t ). Optimal forms of the detection function y (t ), realized as a detection filter, are discussed in Chapter 9. For a real-valued electrical signal, this projection at time j T can be written as (cf. (2.1.65)) rj

²∞ = (s (t ) + n(t )) y(t − j T )dt −∞ = sj + n j,

(8.3.2)

where s j is the sample value for the jth symbol interval, n j is a zero-mean gaussian random variable, and the integration interval is synchronized with the symbol interval. Expression (8.3.2) is a discrete-time model of a memoryless channel in additive white gaussian noise, with r j being the detection statistic. Expression (8.3.2) can be implemented by a linear filter y(t ) followed by a sampler. For the j th sample, the transmitted symbol s j can be any element s² , possibly complex, of the signal constellation (cf. Section 10.1.1), now with ² indexing the L points

8.3 Discrete-Time Electrical Channels

401

of the signal constellation. Therefore, the probability density function of the detection statistic r ² is conditioned on the transmitted symbol s² . For additive white gaussian noise, this conditional probability density function f (r |s² ) is a conditional gaussian distribution with a mean that depends on s² . The variance depends on the noise but not on the transmitted symbol. The form of the detection function y (t ) controls both the mean and the variance, and is chosen to minimize the probability of a detection error. Section 9.4.2 shows that, for this form of noise, the optimal form for y(t ) produces a detection statistic r ² that has a mean value equal to the received symbol energy E² and a variance equal to the electrical noise power density spectrum N0 . A multi-input multi-output channel can be described by the vector equivalent of (8.3.2) given by (cf. (8.1.20)) rj

= Hs j + n j ,

(8.3.3)

where s j is the block of input symbols at time j, r j is a block of noisy symbols, n j is a block of independent, identically distributed noise samples, and H is the channel matrix. An additive white gaussian noise model is also appropriate for an electrical channel that uses a noncoherent lightwave carrier with direct photodetection and a constant electrical power density spectrum N0 , which can be a combination of thermal noise and intensity noise (cf. Section 7.8.1). The detection statistic for this channel is proportional to the received lightwave power.

Direct Photodetection with Additive Lightwave Noise

A direct photodetection system with additive lightwave noise is the second discrete-time channel. The lightwave channel model is r (t ) = s (t )+ no (t ), where r (t ) is the received noisy lightwave waveform, s (t ) is the received noise-free waveform, and the lightwave noise no (t ) is modeled as a white gaussian noise process with a power density spectrum N 0 given by (7.7.8) as Nsp . Neglecting scaling constants, one detection statistic r j for this kind of channel is generated by integrating the squared magnitude of the received noisy lightwave waveform over one symbol interval T , with rj

=

² T /2

−T /2

|s (t ) + no (t )|

2

dt

=

²∞

−∞

|s (t ) + no (t )|2 rect((t − j T )/ T )dt .

(8.3.4)

The integration corresponds to a rectangular detection filter y (t ) = rect(t / T ). For this model, the conditional probability density function f (r | E ² ) for a signal constellation of L nonnegative points indexed by ² is a noncentral chi-square distribution with 2K degrees of freedom (cf. (6.5.5)), with each signal point equal to the photodetected lightwave energy E² . In contrast to the additive-noise channel model, the variance of f (r | E ² ) depends on the signal E² (cf. (6.5.6b)).

Poisson Channel

The third discrete channel model corresponds to a photon-optics channel with photon noise (cf. Section 7.3.2). The Poisson arrival rate of the signal photons in each interval is determined by the transmitter. A discrete detection statistic can be generated within a

402

8 The Electrical Channel

photodetector that has internal gain by a nonlinear thresholding operation. This threshold is designed to produce a best estimate of the number of detected photons over a detection interval Ts by balancing the probability of a false detection event with the probability of a missed detection event. The detection interval Ts need not be equal to a symbol interval T . This nonlinear photon-counting process replaces a charge-integration process, and can avoid the effect of thermal noise added after direct photodetection. The noise generated by dark current (cf. Section 7.3.1) cannot be removed. For an ideal photon-noise-limited channel, the thresholded detected photon counts are summed over a time interval Ts to produce a detection statistic equal to the total number of photocounts m within time interval Ts . The probability mass function of this random variable depends on the mean count E² within Ts for ² = 0, . . . , L − 1. The corresponding conditional Poisson probability distribution p (m| E² ) for m is p(m| E² ) =

(E²)m e− ² . m! E

(8.3.5)

This channel is inherently discrete and is discussed in Section 9.5.2.

8.4

References Linear system-level models of lightwave propagation are discussed in Einarsson (1996). The parallel between the temporal impulse response of an optical fiber and Fraunhofer diffraction is discussed in Saleh and Irshid (1982b) and in Azana and Muriel (2000). Statistical models of both linear and nonlinear interference and noise are covered in Zweck and Menyuk (2009), in Carena, Curri, Bosco, Poggiolini, and Forghieri (2012), and in Poggiolini (2012). Multi-input multi-output communication systems are presented in Tse and Viswanath (2005) in the context of wireless communication systems. The linearity of fiber-optic channels with respect to the lightwave power was addressed in Personick (1973b) and later in Saleh and Irshid (1982a). Multi-input multi-output communication systems based on space-division multiplexing are presented in Richardson, Fini, and Nelson (2013). The transmission of several independent datastreams through a few-mode fiber is discussed in Randel, Ryf, Sierra, Winzer, Gnauck, Bolle, Essiambre, Peckham, McCurdy, and Lingle (2011). The recovery of two independent intensity-modulated signals in a mode-multiplex system was demonstrated by Stuart (2000), having been proposed in Berdague and Facq (1982). Multi-input multi-output channel models of polarization-multiplex systems are presented in Forestieri and Prati (2004), Han and Li (2005), and Roudas, Vgenis, Petrou, Toumpakaris, Hurley, Sauer, Downie, Mauro, and Raghavan (2010). Orthogonal wavelength-division multiplexing is presented in Goldfarb, Li, and Taylor (2007). Statistical models of strong mode coupling are given in Ho and Kahn (2011) and in Mecozzi, Antonelli, and Shtaif (2012). Modal noise was first reported by Epworth (1978). The distinction between shot noise generated in phase-synchronous receivers using balanced photodetection and the shot noise generated in direct photodetection is discussed in Yuen and Chan (1983). A comprehensive discussion of speckle is given in Goodman (2007). Mode-coupling in a low-loss multicore fiber is discussed in Hayashi, Taru, Shimakawa, Sasaki, and Sasaoka (2011).

8.6 Problems

8.5

403

Historical References The half-photon difference in the shot noise between homodyne and heterodyne demodulation apparently was first discussed in a series of brief communications circa 1962. Oliver (1961) discussed balanced homodyne and heterodyne demodulation as was mentioned in the historical notes for Chapter 7. The difference of half a photon between the shot noise for homodyne demodulation and that for heterodyne demodulation was attributed to the mean of the squared sinusoidal term in heterodyne demodulation that is not present in homodyne demodulation in Haus, Townes, and Oliver (1962)). Later, using a quantum-optics signal model, Shapiro and Wagner (1984) attributed the difference to the fact that, for heterodyne demodulation, vacuum-state fluctuations in an image mode mix with the signal mode, increasing the shot noise. This history is discussed further in Chapter 15.

8.6

Problems 1 Electrical noise for balanced photodetection and direct photodetection An ideal lightwave amplifier operating at 1500 nm with a spontaneous emission noise factor n sp 1 and gain G 23 dB produces spontaneous emission noise that is photodetected using an ideal balanced photodetector ( 1) with a local oscillator power PLO . (a) Derive an expression for the electrical noise power from the spontaneousemission noise from the lightwave amplifier after two filters. The first filter is an optical noise-suppressing filter treated as an ideal bandpass optical filter with a transfer function given by H f rect f B at complex baseband. The second filter, applied after balanced photodetection, is an electrical filter with an impulse response h t rect t T at real baseband. (b) For a temperature T0 of 290 K, determine the required local oscillator power PLO such that the electrical thermal noise power is 20 dB less than the electrical noise power generated from the lightwave amplifier noise. For this local oscillator power, determine the total electrical noise power when T 1 B 100 ps. (c) For the same amplifier and the same values of B and T , determine the lightwave signal power such that the electrical noise power using direct photodetection is equal to the electrical noise power using balanced photodetection. For this calculation, use (8.2.36), including only the signal–noise-mixing term, and the thermal noise term. (d) Comment on the result with regard to the noise power using direct photodetection as compared with the noise power using balanced photodetection.

=

=

η=

( )=

()=

(/ )

(/ )

= / =

2 Propagation of a chirped gaussian pulse An input lightwave pulse s t is given as

()

2 2 s (t ) = Ae −t /2σc ,

404

8 The Electrical Channel

where 1/σc2 = (1 + iK ) /σ in2 is complex, with the constant K called the chirp parameter. The corresponding real-passband lightwave pulse is

¸s (t ) = Ae−t /2σ 2

2 in

À

cos 2π f c t

with the instantaneous frequency given by

=

f

1 d θ(t ) 2π dt

= fc +

+

¾

À

Á Á

K /2σin2 t 2

K

2πσin2

¿

t,

where θ(t ) is the argument of the cosine function. When K is positive, increasing time corresponds to increasing frequency. This is called blue-shifting. When K is negative, increasing time corresponds to decreasing frequency. This is called redshifting. These two kinds of chirped pulses are shown in the figure below. (The ratio of the carrier frequency to the spectral width is small enough to show the effect of the chirp.) The pulse passes through a fiber with a transfer function at a distance z = L given by (8.1.3), which is repeated here: 2 2 H ( f ) = H0e−i2πτ f e−i2π β2 L f .

Red-shifted chirp

Blue-shifted chirp

t

t

(a) Determine the input spectral content S( f ) of the chirped pulse at z = 0. (b) Determine the root-mean-squared width ±ωrms of the magnitude of the spectrum S ( f ) in terms of K and σin2 . (c) Determine the output spectral content Sout ( f ) of the chirped pulse at z = L. (d) Using the properties of the Fourier transform, determine the output lightwave signal sout (t ) and the output timewidth σout. (e) Show that the square of the ratio of the output timewidth σout to the input timewidth σin can be written as 2 σout = (1 + K (z/ L ))2 + (z/ L )2 , σin2 D

D

where L D = σin2 /β2 is the dispersion length (cf. (5.3.23)). (f) Show that when β2 and K have the same sign, the pulse timewidth increases monotonically with the distance L.

8.6 Problems

405

(g) Show that when β2 and K are opposite in sign, the pulse comes to a “focus” as the pulse propagates in z, with the minimum timewidth occurring at a distance given by zmin

= 1 +| KK| 2 L . D

3 Variance in the photodetected output With the expected power P collected by direct photodetection held constant, show that the output signal-to-noise ratio is proportional to the number M of coherence regions at the output face of the fiber. (Note: the mean and variance of the gamma probability density function are M P and M P 2, respectively.)

² ³

² ³

² ³

4 Modal noise for a single photodetector The output light of a multimode fiber is collected using a single direct photodetector that has an overlap region overlap whose area is equal to the total area of the region fiber of the output face of the fiber including the core and the cladding. (a) Is there modal noise when there is no mode-selective attenuation? Explain. (b) Is there modal noise when the photodetector is misaligned and collects only a portion of the power in the fiber and there are no other mode-selective attenuation mechanisms? Explain.

A

A

5 Channel matrix for polarization multiplexing (a) Show that the diagonal matrix in (8.1.15) is unitary. (b) Using the result from part (a), and the fact that the product of two unitary matrices is unitary, show that the channel matrix H f given in (8.1.16) is a normal matrix that is the product of a scalar function H f and a unitary matrix M . (c) Determine the inverse of H f given in part (b) in terms of H f and M. (d) Does this procedure work for the channel matrix H f given in (8.1.17)? Explain.

( )

( ) ( )

( )

( )

6 Output pulse for a fiber that supports two spatial modes Consider a fiber that supports two spatial modes. The output lightwave pulse in the first mode before photodetection is a unit-amplitude gaussian pulse with unit variance. The output lightwave pulse before photodetection in the second mode is a unit-amplitude gaussian pulse also with unit variance, but delayed in time by a value equal to one-half the variance. Determine an expression for the electrical signal energy E in the following cases. (a) The pulses in each mode are noncoherent. (b) The pulses in each mode are coherent. Comment on the results. 7 Amplitude-phase coupling in a dispersive fiber Suppose that a lightwave signal s t at the input to a dispersive fiber is sinusoidally phase-modulated so that

()

s (t ) = eiµ sin( 2π f t ) ,

406

8 The Electrical Channel

where µ is the modulation index and f is the modulation frequency with period T = 1/ f . This periodic signal can be expressed in terms of an exponential Fourier series given by s( t ) = e

iµ sin( 2π f t )

=

∞ ±

n=−∞

Fn ein2π f t ,

with the Fourier-series coefficients Fn given by Jn (µ), the Bessel function of the first kind and order n. (a) Derive an expression for the output lightwave signal s (t ) at a distance L in terms of the Fourier-series coefficients and the complex-baseband transfer function given in (8.1.3). (b) By equating terms of the same frequency, determine an expression for the . output lightwave power P = |s (t )|2 /2 at frequency f . 8 Distance for the electrical waveform to be linear in the lightwave power for a dispersive multimode fiber

(a) Using the figure in Problem 7 of Chapter 4 and (4.4.10) and setting N 1 ≈ n1 = 1.5 and V = 5, determine the modal group-delay difference per unit length between the LP01 mode and the LP11 mode. (b) Considering only the modal delay difference determined in part (a), estimate the smallest length L for which the output electrical signal after direct photodetection is linear in the lightwave power.

9 Output pulse shape from wavelength-dependent group delay A noncoherent lightwave carrier has a normalized power density spectrum fλ the wavelength given by

λ

f λ (λ) =

(λ) in

√ 1 e−λ /2σ . 2πσ 2

2

This noncoherent carrier is amplitude-modulated with the transmitted pulse shape given by rect(t / T ). (a) Determine the output pulse shape as a function of σ/ T . (b) Plot the output pulse shape for σ/ T = 0.5, for σ/ T = 1, and for σ/ T = 2. 10 Spatial statistics This problem compares the statistics of the spatially integrated multimode lightwave signal with the statistics for a radio-frequency communication system that uses an antenna to collect the received electromagnetic radiation. (a) Suppose that an antenna has an effective collection area defined by a radius that is much smaller than the carrier wavelength rf of the incident radio-frequency signal. Determine the probability density function of the spatially integrated signal and determine the variance of the distribution. (b) Repeat for a multimode lightwave channel using an effective radius for the fiber of M c , where c is the lightwave carrier wavelength and M is much larger than one. Determine the variance of the distribution as a function of M. (c) Which system has the larger variance?

λ

λ

λ

8.6 Problems

407

(d) How does this variance affect the performance of the communication system? (e) Discuss the relationship between wavelength, aperture size, and spatial coherence. 11 Modal partitioning of the energy A colleague provides the following faulty analysis. An input lightwave pulse described by an amplitude p t is equally launched into two spatial modes of an ideal lossless fiber that supports only these two modes. There is no significant group dispersion, so the shape of the pulse remains unchanged. The modal delays are 1 and 2, with 1 2 much larger than the pulse width. The faulty 1 1 analysis states that the total input pulse is pin p t p t , with the total 2 2 ∞ pulse energy E p at the input given by −∞ p t 2 dt . The output is stated as

()

τ

pout

τ

|τ − τ |

= 21 p(t − τ1) +

» | =( )| ( ) + ( )

(t − τ2), with the energy at the output given by ² ∞ ¼¼ ¼¼2 1 1 Ep = p ( t − τ ) + p ( t − τ ) ¼ ¼ dt 1 2 2 2 −∞ ²∞ ²∞ 2 1 1 | p ( t )| dt + | p(t )|2 dt = 4 4 −∞ −∞ = 21 | p(t )|2 , 1 2p

apparently losing half the signal energy even though the fiber is lossless. (a) What is the flaw in the analysis? (b) Using a correct analysis, what happens when |τ1 − τ2| is larger than the pulse width? (c) Using a correct analysis, what happens when |τ1 − τ2 | is smaller than the pulse width? (d) Discuss how the analysis must be modified in the presence of mode coupling. 12 Spatial spread of a local oscillator The transverse spatial distribution f x y of the lightwave field in a single-mode fiber is treated as a two-dimensional gaussian function in the x y plane. This lightwave field mixes with a lightwave local oscillator lightwave field before photodetection. Suppose that the local oscillator field has a spatial distribution g x y that can be controlled independently of the spatial distribution f x y for the lightwave signal. What is the optimal choice of g x y ? Why? Explain all considerations.

(, )

(, )

(, )

(, )

(, )

13 Pixelated local oscillator for a multimode fiber Suppose that a balanced photodetector is designed to isolate the energy in a single spatial coherence region of a multimode fiber. The lightwave signal at the output face of the fiber is described by a coherence region coh and a coherence timewidth c . The local oscillator field has a finite amount of power PLO and is pixelated across the output face of the fiber, with a spatial resolution LO such that every coherence region coh is comprised of one or more of the pixelated output regions. Within each of these pixelated regions, the phase difference between the lightwave field and the local oscillator can be maintained precisely.

A

τ

A

A

408

8 The Electrical Channel

Define the mixing efficiency as the proportion of the lightwave signal at the output face of the fiber that is correctly demodulated. (a) Describe and justify a strategy to distribute the local oscillator power across the output face of the fiber so as to maximize the mixing efficiency. (b) Discuss the relationship between the mixing efficiency and the ratio of the coherence region to the spatial resolution of the local oscillator field. (c) Discuss why the mixing efficiency can change over time. (d) Given a fixed spatial resolution ALO for the local oscillator field, discuss and justify the design of a multimode fiber that can (i) maximize the mixing efficiency and (ii) minimize the variance in the mixing efficiency. Are these two designs the same? Why or why not? 14 Demodulating with spatially orthogonal local oscillator fields Suppose that a balanced photodetector is designed to separately demodulate each of the spatial modes of a multimode fiber by spatially matching the magnitude and phase distribution of a separate local oscillator field to each separate mode supported by the fiber. (a) Discuss the design of this kind of demodulator for an ideal fiber with no mode coupling under the constraint that the total local oscillator power allocated for all spatial modes is finite. (b) Discuss the effect of mode coupling. (c) Discuss the efficient management of the received energy in such a system. 15 Signal-to-noise ratio The electrical power in the signal at the output of a detection filter is determined by integrating (8.2.36), where y t is the impulse response of the detection filter. Suppose that the received optical pulse po t before the noise-suppressing lightwave filter is given by po t 2Ps rect t T , where Ps is the lightwave signal power. (a) An ideal noise-suppressing lightwave filter has a frequency response at complex baseband given by rect f B . Determine B in terms of the pulse width T such that 95% of the lightwave energy in the pulse is passed by the noise-suppressing lightwave filter. (b) Ignoring the effect of the noise-suppressing lightwave filter on the lightwave pulse shape, determine an expression for the power density spectrum of the directly photodetected electrical pulse pe t . (c) Expression (8.2.36) treats the lightwave signal power Ps as a constant. Modify that expression for the electrical signal pulse pe t used in this problem. (d) Using a matched filter (cf. Section 9.4.2) with y t pe T t and including only the signal–noise-mixing term from (8.2.36), determine an expression for the electrical signal-to-noise ratio after the detection filter in terms of Ps , N sp , and the pulse duration T . (e) Repeat part (d) for a system without an optical amplifier, modifying the shotnoise term to include a relative-intensity noise NRIN term (in dB/Hz) that is treated as a constant spectral density.

( )=



()

() (/ )

(/ )

()

() ()= ( − )

8.6 Problems

409

16 Lightwave amplifier noise terms Show that, for a high-gain lightwave amplifier with gain G much larger than one, the ratio of the signal-noise term, which is the fourth term in (8.2.36), to the shot-noise term, which is the second term in (8.2.36), is approximately 2G (cf. Figure 8.16). Using this result, is shot noise a significant contribution to the overall noise power density spectrum? 17 Subchannels A lightwave communication system uses two subcarrier subchannels, each with bandwidth B, and separated by a guardband of width B . Discuss several reasons why a single carrier with bandwidth 2B B might not have been used instead. Such reasons might depend on the type of fiber or on the specific lightwave components used.



±

9

The Information Channel

A discrete information channel can be regarded as a black box, accepting symbols at its input from an input alphabet and producing symbols at its output from an output alphabet, which is not necessarily the same as the input alphabet. The channel input is a sequence of logical symbols, each symbol characterized by a common probability distribution on the set of input symbols. The output of the information channel is a sequence of symbols, either noisy logical symbols or a probabilistic description of those symbols. A conditional probability distribution links the output symbol to the input symbol. The information channel subsumes all aspects of the electrical channel discussed in Chapter 8. This means that the electrical channel resides completely inside the information channel and is not visible, as such, from the input or the output of the information channel. The passage from the physical lightwave channel to the information channel, begun in Chapter 8, is completed in this overlapping chapter by describing the conversion of a sequence of information symbols to the electrical waveform that becomes the input to the electrical channel, and the conversion from the received waveform at the output of the electrical channel to the sequence of logical information symbols that becomes the output of the information channel. Appending these conversions to the input and output of an electrical channel results in an information channel. The input to an information channel is the output of an encoder. A sequence of user data symbols enters the encoder, which transforms that sequence into another sequence of codeword symbols that is appropriate for transmission through the information channel. The output from the information channel is the input to a decoder. The decoder takes the sequence of sensed symbols from the information channel and transforms that sequence into the sequence of user data symbols, and rarely with errors. The encoder and decoder are not considered herein as part of the information channel. They are discussed in Chapter 13. The information channel could instead be defined to include portions of the encoding/modulation process at the transmitter or portions of the demodulation/decoding process at the receiver. Then the encoding and decoding functions may be partially absorbed into the definition of the information channel. In some instances, portions of the encoder or decoder may be mingled with the input or the output of the information channel, so the separation of the functions is not sharp in practice. Our definition of an

The Information Channel

411

information channel is for expository convenience, and is not intended to prohibit the mingling of the system functions in other ways or to imply that a specific partitioning must be used for implementation. The output of an information channel viewed locally in time is regarded as a sequence of random symbols. An information channel is memoryless if there is no dependence between successive output symbols whenever there is no dependence between successive input symbols. Any long-term memory in the sequence that may be intentionally introduced by the encoder is not visible in the short term. It can be seen only at the block level, and is recognized as such only by the decoder. After the detection process, each output symbol of a memoryless information channel is described by a singleletter probability distribution pr (r ) on the channel output alphabet. Any short-term memory introduced by the channel into the received electrical waveform, such as intersymbol interference, studied in Chapter 11, has been removed prior to the output of a memoryless information channel. The proper alignment of the output sequence to the input sequence requires synchronization of time between the receiver and the transmitter, which is addressed in Chapter 12. For the purposes of this chapter, the input and output sequences are aligned. An information channel is converted to a discrete-time electrical channel by incorporating a modulator within the transmitter, and a demodulator and a detector within the receiver. The notional relationship between an information channel and a discrete-time electrical channel is shown schematically in Figure 9.1. The modulation process interpolates the sequence of input symbols to form a real-baseband waveform, a complex-baseband waveform, a passband waveform, or a point process corresponding to a modulated photon stream. The topic of modulation is introduced in this chapter, and is discussed more fully in Chapter 10. (a) Discrete-time electrical channel s

Pulse shaping

Transmitted value

Continuous-time channel

Detection filter

r

Sampler

Received sample value

(b) Information channel s : ps (s)

Coded-modulation

Discrete-time channel

Detection

r : pr (r)

(c) Probability model ps (s) Prior probability

p(r|s)

pr (r) Posterior probability

Figure 9.1 (a) A discrete-time channel. (b) An information channel. (c) A probability model of an

information channel.

412

9 The Information Channel

This chapter describes detection techniques based on wave optics and on photon optics, delineating the circumstances under which each method of detection is appropriate. The detection process is appended to the output of the noisy electrical channel and transforms the received waveform into a sequence of independent noisy symbols or samples. This sequence of noisy symbols is then passed to the decoder. Detection techniques based on real waveforms or on point processes are derived and applied to each of the three discrete-time channels presented in Section 8.3. These three channels are a discrete-time additive white gaussian noise channel, a discretetime direct-photodetection channel with additive lightwave noise, and a point-process channel with Poisson statistics.

9.1

Prior and Posterior Distributions The probability distribution on the set of output symbols of a discrete memoryless information channel is conditioned by the corresponding channel input symbol and by only that symbol. In turn, for a well-designed encoder, the input to the information channel, which is the sequence of symbols at the output of the encoder, will appear to be random and independent. The dependences within the stream of channel symbols inserted by the encoder can be seen only at the level of the codewords, and so are recognized as such only by the decoder. The probability of each codeword symbol, seen in isolation, at the input to the channel from an ideal encoder is described by a prior probability distribution ps (s), called simply a prior, and with components called prior probabilities. When appropriate, the prior may be written as a vector p, with the components being the prior probabilities of the individual symbols. In turn, the probability of each channel output symbol, seen in isolation, is described by the posterior probability distribution pr (r ) on the output, called simply the posterior. The optimal choice for the prior on the input symbols is developed in Chapter 14 using the methods of information theory. Practical methods of encoding and decoding are discussed in Chapter 13. The encoding process that generates the code sequence and the decoding process that recovers the user data are not here considered to be part of the information channel. This separation allows the functions of encoding and decoding to be studied separately. Other definitions of the information channel can be given for the same communication system. For example, the modulation process may be removed from the definition of the information channel and included as part of the encoder. The functions included in the information channel depend on which attributes of the communication system are considered fixed when designing the encoder and decoder, and which attributes are considered flexible. At the receiver, the detection process converts each discrete-time sample of the electrical waveform, which may be real or complex, into a symbol that is sent to the decoder. This symbol is called a detection statistic. A detection statistic can have many forms, ranging from a hard decision to soft information. A detection statistic that retains all relevant information contained in the received analog waveform is called a sufficient detection statistic.

9.2 Methods of Modulation

413

The sequence of discrete symbols or numbers, real or complex, recovered from the received waveform by the detection process is then sent to the decoder. The decoder determines the transmitted codeword from that sequence or, equivalently, determines the user dataword represented by that codeword. The information channel can be viewed either from the transmitter or from the receiver. When viewed from the transmitter, the memoryless information channel is concisely summarized as a conditional probability distribution pr | s (r |s ), abbreviated p(r |s ), that the symbol r will be received when the symbol s is transmitted.1 This conditional distribution depends on the channel model as shown in Figure 9.1(c). When viewed from the receiver, the information channel is summarized as a conditional probability distribution ps | r (s|r ), abbreviated p(s |r ), that when the symbol r is received, the symbol s was transmitted. The set of symbols at the channel output need not be the same as the set of symbols at the channel input. This is the case, for example, when the channel output is a quantized sample value. Using the Bayes rule, the combination of the information channel model and the prior distribution on the input symbols can be expressed as a joint probability distribution p ( s, r ) , p(s , r ) = ps (s ) p(r |s ) = pr (r ) p(s |r ),

(9.1.1)

where ps (s ) is the prior probability for the transmitted input symbol s, and pr (r ) is the posterior probability for the received output symbol r. It follows immediately from (9.1.1) that pr (r ) =

and so the Bayes rule becomes p(s |r ) =

±

ps (s ) p(r | s),

(9.1.2)

∑psp(s ()sp)(pr |(sr )|s ) .

(9.1.3)

s

s

s

For the same physical channel, the expressions for the probabilities based on continuous-wave optics are different than the expressions based on discrete-photon optics because these are different models and use different methods of detection.

9.2

Methods of Modulation At the transmitter, the information channel receives a sequence of logical symbols from the encoder and converts that sequence into a sequence of real or complex electrical pulses for the electrical channel. At the receiver, the information channel receives a sequence of noisy real or complex samples taken from the electrical waveform and converts this sequence into a sequence of logical symbols for the decoder. This back-and-forth conversion is the task of modulation and demodulation as introduced

(|)

1 For continuous output alphabets, the probability distribution p r s is replaced by the probability density

function f (r | s) .

414

9 The Information Channel

dmin

(a)

dmin

(b)

Figure 9.2 (a) A signal constellation of real values. (b) A signal constellation of nonnegative real

values.

in this chapter. A more detailed study of modulation and demodulation continues in Chapter 10.

9.2.1

Signal Constellations

The most straightforward interface between the information channel and the electrical channel at the transmitter is a continuous-time real or complex waveform consisting of a sequence of pulse amplitudes uniformly spaced in time that are interpolated by an appropriate pulse shape. Each component of the sequence of real or complex numbers used to define the modulated waveform is restricted to a finite set of real or complex numbers called the signal constellation. In this chapter, only real signal constellations, as shown in Figure 9.2, are considered. Complex signal constellations are considered in Chapter 10. Figure 9.2 shows two real signal constellations, each with four points. Figure 9.2(a) shows a four-point antipodal signal constellation. Figure 9.2(b) shows a four-point on–off signal constellation, as may be used for intensity modulation. These are the four-point multilevel extensions of the two-point antipodal signal constellation and the two-point on–off signal constellation, which are not shown. Each signal constellation is judged by the smallest euclidean distance di j between any two distinct signal points i and j . This smallest euclidean distance is called the minimum distance dmin of the signal constellation. It is defined as dmin

=. min di j . i ±= j

(9.2.1)

The minimum distance is evident in the two signal constellations shown in Figure 9.2. In these two examples, every point has a nearest neighbor at distance dmin . For some other signal constellations, not every point has a nearest neighbor at distance dmin. In either of the two cases shown in Figure 9.2, each of the four points can be labeled with two bits. Then a sequence of two-bit numbers is equivalent to a sequence of real numbers drawn from the specified signal constellation. This sequence of real numbers can be described as a waveform w(t ) on continuous time using Dirac impulses as given by

w(t ) =

∞ ± j =−∞

s j δ(t

− j T ),

(9.2.2)

9.2 Methods of Modulation

415

(a) Pulse description (t)

δ

Dirac impulse

x(t)

x(t)

h(t)

p(t)

y(t)

q(t)

Pulse shaping

Transmitted pulse

Physical channel

Received pulse

Detection filter

Filtered pulse

(b) Waveform description w(t)

x(t)

s(t)

h(t)

r(t)

y(t)

Datastream

Pulse shaping

Transmitted waveform

Physical channel

Received waveform

Detection filter

rk Sampler

Sequence of samples

Figure 9.3 Channel response: (a) to an impulse and (b) to a datastream.

where T is the symbol interval, and the symbol s j in the j th interval is a point of the L -point signal constellation2 {s0, s1, . . . , sL −1} that is specified by the user data. The elements s j of the encoded sequence must come from the chosen signal constellation, but there may be constraints on the allowable sequence patterns to control or eliminate errors at the receiver. These sequence constraints will be discussed in Chapter 13. For this chapter, the s j at time j T can be any value in the signal constellation. The abstract representation of the datastream given in (9.2.2) must appear as a continuous waveform at the transmitter, and must then appear at the receiver as a corresponding continuous waveform that is sampled. An obvious way to form a continuous waveform is to replace each impulse by a suitable transmit pulse shape x (t ) as shown in Figure 9.3(a). For a modulated datastream, shown in Figure 9.3(b), this replacement can be expressed mathematically as s (t ) = w(t ) ± x (t ) =

∞ ± j =−∞

s j x (t

− j T ).

(9.2.3)

This waveform, called amplitude-shift keying, is the transmitted waveform. A pulse of the waveform need not be confined to an interval of duration T . Overlap is permitted and often desired. The waveform at the transmitter s(t ) in (9.2.3) is not the waveform that is eventually of interest; it is the waveform at the receiver from which the datastream must be recovered that is of interest. For a linear system, let q(t ) represent the composite of the transmitted pulse x (t ), the physical channel impulse response h(t ), and the intentional filtering y (t ) in the receiver shown in Figure 9.3(a), so that

±

2 The th letter of the signal constellation is denoted s . The j th symbol of the modulated sequence, which is ±

an element of the signal constellation, is denoted s j . This redundant usage is used for brevity and should cause no confusion. Thus, for example, E is the fifth letter of the Roman alphabet and the second letter of the word “receive.”

416

9 The Information Channel

q (t ) = x (t ) ± h(t ) ± y (t ).

(9.2.4)

The desired impulse response q(t ) at the location where the waveform is sampled is called the target pulse. The noise-free electrical signaling waveform 3 r (t ) used for sampling in the receiver can be written in terms of the target pulse as r (t ) = w(t ) ± q (t ) =

∞ ±

j =−∞

s j q (t

− j T ).

(9.2.5)

Figure 9.3 shows the generation of r (t ) from q (t ). The sampler at time kT will see only the desired sample sk if q(t ) is a Nyquist pulse, as described next. 9.2.2

Nyquist Pulses

The samples of the received filtered waveform r¯ (t ) have no interference from other symbol intervals when the target pulse q (t ) is a (scaled) Nyquist pulse. This is any pulse with finite energy that has the value zero at all nonzero integers (or integer multiples of T ) and has the value one when its argument is zero (cf. Section 8.3.1). Applying this property to (9.2.5) gives r¯ (kT ) = sk so that the sample of the received signal is equal to the transmitted symbol. Working backwards from the receiver to the transmitter using the Nyquist pulse q (t ) as the target pulse implies that the Fourier transform X ( f ) of the transmitted pulse x (t ) must be X ( f ) = Q ( f )/ H ² ( f ), where Q ( f ) is the Fourier transform of the Nyquist pulse q (t ), and H ² ( f ) is the transfer function of the complete linear system given by h² (t ) = h (t ) ± y (t ) (cf. (9.2.4)). The transmitted pulse x (t ) and the transmitted waveform s(t ), given by s (t ) =

∞ ±

j =−∞

s j x (t

− j T ),

(9.2.6)

are not seen in the receiver where the waveform is sampled. Only the corresponding waveform r¯ (t ) given in (9.2.5) based on the target pulse q(t ) is seen at the sampler. While it is a convenient and common practice to speak of the transmitted pulse x (t ) as if it were a Nyquist pulse, the requirement that the sample has no interference from other symbols means that the Nyquist property is actually required at the receiver. A sinc pulse (cf. (2.1.40)) satisfies sinc(k ) = 0 for nonzero k, and is an oftenmentioned example of a Nyquist pulse because it leads to the smallest bandwidth of the interpolated baseband waveform. However, interpolation using a sinc pulse is computationally intensive because sinc(t ) decays slowly as 1/ t (cf. (2.1.40)) and thus a large number of terms must be summed to accurately produce the baseband waveform. Recalling that the sum of 1/ n over n is divergent also alerts us to the concern that sinc interpolation is quite sensitive to instability or amplitude saturation. To reduce the pulse spread and also reduce the instability, a Nyquist pulse with a more confined time duration is used, but at the cost of a larger bandwidth. One class of

()

()

3 The term r t is used for the noise-free received waveform. The term r t is used for the noisy received

waveform.

9.2 Methods of Modulation

417

unit-interval (T = 1) Nyquist pulses with various bandwidths and timewidths is the set of pulses with raised cosine spectra given by q (t ) =

sin(π t )cos (βπ t ) ( ) π t 1 − (2βt )2 ,

(9.2.7)

where β is a parameter in the range [0, 1] that controls the temporal duration of the pulse. For β = 0, the pulse reduces to a sinc pulse. For large t and a nonzero value of β , this pulse q (t ) eventually decays as t −3 . The spectrum Q( f ) is

⎧ ⎨ 1( ² ³) 1 Q( f ) = 1 + cos (π/β)(| f | − (1 − β)/2 ) 2 ⎩0

for | f | ≤ (1 − β)/2 for (1 − β)/2 ≤ | f | ≤ (1 + β)/2 otherwise. (9.2.8) The pulse has a two-sided total bandwidth (1 + β), in contrast to 1 for a sinc pulse. Figure 9.4 plots several pulses with raised cosine spectra in frequency. Inspection of the frequency plots suggests that, for each such pulse, the portion of the spectrum outside the unit bandwidth interval could be translated left and right and added so as to fully fill in the spectrum for | f | ≤ 1/2 to exactly form a rectangle. This is an instance of a general frequency-domain property of Nyquist pulses that states that q (t ) is a Nyquist pulse if and only if Q ( f ) satisfies

∞ ±

k =−∞

Q( f

+ k) = 1

for | f | ≤ 1/2,

(9.2.9)

where k takes integer values. This statement, which the reader is asked to prove in an end-of-chapter exercise, can be clearly seen in Figure 9.4(b) by observing overlapping translated copies of Q ( f ). 9.2.3

Detection

The output of a linear channel with additive noise is the noisy version of (9.2.5) given by r (t ) = w( t ) ± q(t ) + n(t ) =

∞ ±

j =−∞

=0 = 1/2 β=1

β

1

β

s j q (t

− j T ) + n (t ),

(9.2.10)

1 esnopseR

esnopseR

0.5

0 –3

–2

–1

0 Time

1

2

3

0 –1



1 2

0

1 2

1

Frequency

Figure 9.4 Time and frequency plots of a raised-cosine spectrum Nyquist pulse as a function of

β.

418

9 The Information Channel

where n(t ) is the additive noise. The function of detection is to recover a suitable estimate of s j from the received noisy waveform r (t ). The detected sequence of symbols forms the senseword or the sensed noisy codeword at the output of the information channel, which then becomes the input to the decoder. A memoryless senseword is one for which the successive senseword symbols are independent. Specifically, a memoryless senseword has no intersymbol interference. A system can avoid memory in the senseword by design of the information channel. Alternatively, memory in the senseword can be accommodated, or even exploited as when it is treated by a sequence estimator in the receiver, as discussed in Chapter 11. Possibly, a combination of methods is used, and possibly leaving some residual memory ignored. Two types of detected symbols are considered. In soft-decision detection, the kth component of the senseword is a sample rk , real or complex, or a quantized form of that sample. In hard-decision detection, the detection process decides on a symbol from a discrete output alphabet based on the received sample and on prior knowledge about the possible channel inputs. The output of hard-decision detection is a sequence of symbols, each generated by a hypothesis-testing procedure that is used to form the senseword. Hypothesis testing is quantified by the probability of a detection error pe . In softdecision detection, the detection process replaces the received sample with an equivalent or nearly equivalent value, such as a quantized value. The probability of a detection error pe in the sensed symbol is not meaningful for soft-decision detection because the notion of a symbol error is not relevant. Accordingly, an information channel that uses hard-decision symbol detection is not the same as an information channel that uses soft-decision symbol detection. These are regarded herein as two different information channels, with different performance advantages and with different disadvantages. 9.2.4

Partial-Response Signaling

To produce the Nyquist pulse q (t ) as a target pulse at the channel output, the intersymbol interval cannot be small in comparison with the reciprocal of the overall bandwidth of the channel because Q ( f )/ H ²( f ) is not well-behaved if H ² ( f ) goes to zero faster than does Q ( f ). Because the input pulse x (t ) satisfies X ( f ) = Q ( f )/ H ² ( f ), this places a limit on the signaling rate. To circumvent this limit, the notion of a Nyquist pulse is generalized to the notion of a partial-response pulse. A partial-response pulse is an alternative target pulse to a Nyquist pulse that uses interference instead of trying to eliminate it. The interference is thereby managed such that the sample used for detection now depends on the current data value sk , a previous data value such as sk −1 , and perhaps others. This form of controlled interference manages the effect of interference, rather than removing it. A duobinary partial-response pulse is a target pulse q (t ) with the property that q (0) = q (1) = 1, and q(k ) is equal to zero at every integer k other than zero or one. A duobinary partial-response pulse is easily constructed as the sum of a Nyquist pulse and a delayed copy of that Nyquist pulse. A simple example of a duobinary pulse with the intersymbol interval scaled by T is

9.2 Methods of Modulation

q (t ) = sinc (t / T ) + sinc(( t

419

− T )/ T )

= T T− t sinc(t / T ).

This duobinary pulse decays as t −2 for large t . Any Nyquist pulse, such as those given in (9.2.7), can be used in a similar construction to form other partial-response pulses. The duobinary partial-response pulse results in intentional interference in each received sample from the previous modulation interval with the noisy received sample r k at time kT given by rk

=

∞ ±

j =−∞

s j q (kT

− j T ) + n ²(kT )

= sk + sk −1 + n ²k ,

(9.2.11)

where n ²k = n² (kT ) is a sample of a signal-independent, additive-noise source after the detection filter. This noise model is appropriate for phase-synchronous systems or for direct-detection systems that do not have spontaneous emission noise (cf. Section 1.6). Signal-dependent noise models are discussed in Section 10.5. One obvious choice for the kth detection statistic for a duobinary pulse is r k − ´ sk −1 , where ´ sk −1 is the detected value of the previous symbol, which is meant to cancel out sk −1. This demodulation technique is called decision-feedback demodulation and is analyzed in Section 11.2.5. In this case, the receiver would implement partial-response decision-feedback demodulation. Two other conventional partial-response pulses are a dicode partial-response pulse and a modified duobinary partial-response pulse. A dicode pulse satisfies q (0) = −q (1) = 1, and q (t ) is equal to zero for every other integer k. A modified duobinary pulse satisfies q (0) = −q (2) = 1, and q (t ) is equal to zero for every other integer k. For a dicode pulse, the kth sample is r k = sk − sk −1 + n²k . For a modified duobinary pulse, the kth sample is r k = sk − sk−2 + n²k . Each of these examples of a partial-response pulse can be demodulated using decision feedback (cf. Section 11.2.5). Because the decision feedback of ´ sk may sometimes be in error, this incorrect feedback will increase the probability of detection error for the next symbol, possibly leading to a run of errors. The decision-feedback detection statistic is not a sufficient statistic. An alternative method, called partial-response maximum-likelihood demodulation, exploits rather than cancels out the partial-response interference. The method of partial-response maximum-likelihood demodulation is discussed in Chapter 13. 9.2.5

Sampler Response

The received signal in signal-independent additive noise at the input to the electrical channel is r (t ) =

∞ ±

j =−∞

s j x (t

− j T ) ± h(t ) + n(t ),

(9.2.12)

420

9 The Information Channel

where h(t ) is the impulse response of the channel. This signal is first filtered by a detection filter y(t ). The output of the detection filter is then sampled to form the detection statistic, as shown in Figure 9.3. An ideal sampler is instantaneous and measures the amplitude of the signal at a single point. For a Nyquist pulse in additive gaussian noise, this instantaneous sample is a sufficient statistic. A physical sampler, however, is not instantaneous. It has a finite response time, and so can integrate energy from other symbols, thereby causing interference. When this finite response time of the sampler is a significant influence, the response time can be modeled as part of the target pulse q (t ) so that the sampler can be treated as if it were instantaneous even when it is not. Consider the important instance of a sampler that integrates the signal over an interval of duration T ² . Even an interval much smaller than the interpulse interval T can have a noticeable effect on the sample value. Then the noiseless sample is rk

=

µ kT +T ² /2 kT −T ² /2

r¯ (t )dt

=

µ

∞ 1 r¯ (τ)rect((kT T ² −∞

− τ)/ T ² )dτ,

(9.2.13)

where r¯ (t ) is the noiseless received waveform. This expression, however, is equivalent to the convolution expression

¶ (9.2.14) = r¯ (t ) ± T1² rect(t / T ² )¶t =kT , so the sampler can be treated as if r¯ (t ) were passed through the virtual filter with impulse response rect(t / T ² )/ T ² and then instantaneously sampled at t = kT . As T ² goes to zero, rk

this virtual impulse response goes to a Dirac impulse, and the detection statistic goes to an instantaneous sample. The virtual received pulse q² (t ) including the effect of the finite sampling interval is (cf. (9.2.4)) q² (t ) = x (t ) ± h(t ) ± y(t ) ± rect(t / T ² )/ T ²

= q (t ) ± rect(t / T ² )/ T ² .

(9.2.15)

This pulse goes to q (t ) as T ² goes to zero. This virtual pulse q² (t ) is the target pulse that should be a Nyquist pulse. With this target pulse, the finite sample interval will not cause interference from other symbols. A pulse q(t ) with Fourier transform Q ( f ) = Q ²( f )/sinc(T ² f )

(9.2.16)

before the integrating sampler will cause the integrating sampler to appear as an instantaneous sampler of the virtual pulse q ² (t ). Of course, the virtual pulse q ² (t ) does not actually exist. It is an artifice to enable the design of the actual pulse q (t ) suitable for an integrating sampler. Because this is simply an alternative way of implementing a filter, there is no reduction in the output signal-to-noise ratio of the sample. The method can be used to replace an instantaneous sampler with an integrator. It can be modified to replace an integrator over the symbol interval T by an integration over a shorter interval T ² . As an example, Figure 9.5 plots the spectrum Q ( f ) of the required pulse q (t ) for which the virtual Nyquist pulse q ² (t ) after an integrating sampler is sinc (t ) with a

9.2 Methods of Modulation

q( τ)

1

1

Integration window rect(t – τ )

0.5 –4

–1

421

1

q ¢(t) = sinc(t) 0.5 4

–4

τ

1

–1

4 t

–0.2

–0.2

()

Figure 9.5 Waveform q t of a pulse before an integrating sampler that produces a virtual Nyquist

sinc pulse after the integrating sampler.

Fourier transform Q ² ( f ) = rect( f ). The division of Q ( f ) by sinc(T ² f ) does cause amplification at the band edges of Q ² ( f ), which may be undesirable. This effect can be reduced by decreasing the integration window T ². The target pulse then approaches a sinc pulse and the spectrum of the target pulse approaches a rectangle as the integration window shown in Figure 9.5 approaches a Dirac impulse. Instead, these amplified edges can be suppressed by using an integration window that does not have an abrupt start and stop, replacing (9.2.13) with rk

=

µ∞

−∞

r¯ (τ)w(kT

− τ)dτ,

(9.2.17)

where w(t ) differs from a rectangular integration window by rounding the corners. An example of such a window is a raised-cosine pulse with nonzero β (cf. Figure 9.4). Such a tapered integration using the window w( t ) is not the same as a virtual rectangular pulse, and may be more suitable. The lesson of this section as given in (9.2.16) leads to another important conclusion. Any training method for learning a detection filter from the channel output that targets a Nyquist pulse after an integrating sampler will implicitly accommodate the integrating sampler into its learned detection filter. It may appear that an integrating sampler should integrate across the entire interpulse interval T in order to maximize the signal-to-noise ratio. This is not true. Instead, Section 9.4 shows that in gaussian noise, optimally filtered, any integration time T ² is allowed, provided that q(t ) is designed so that the pulse q ² (t ) in (9.2.15) is a Nyquist pulse. The appropriate detection filter will gather the energy into an interval suitable for the actual duration of the integration. This condition is easiest to satisfy when T ² is small compared with T . Section 9.5.2 asserts that this holds even for a photon-counting receiver with a suitably designed detection filter in the optical domain. For a shorter integration interval, the detection filter gathers the energy, resulting in a larger arrival rate of signal photons for the shorter interval. This theoretical pulse-shaping filter, however, is in the optical domain, prior to the photodetector. Complex signal processing in the optical domain, such as the implementation of an optically matched filter that can be adjusted to match the channel, is not currently used because of practical issues in the fab-

422

9 The Information Channel

rication and control of such devices. However, to approach the theoretical performance limits of future systems, optical-domain signal processing may be required. Specifically, it is likely that some form of optical signal processing that mimics features of conventional electrical signal processing will be required for high-rate quantum-lightwave communication systems as discussed in Chapter 16.

9.3

Methods of Reception Detection is the receiver function that converts noisy samples into discrete symbols. Detection is the first step in recovering the user datastream. The discussion of this function, which began in Chapter 8, usually concerns a detection filter followed by measurements that generate the detection statistic for each symbol. A study of detection filters is given in Section 9.4. Methods of detection from a single independent detection statistic for a binary signal are studied in Section 9.5, and methods of detection for a multilevel signal are studied in Section 9.6. Methods of detection from multiple dependent samples are studied in Chapter 11. The physical implementation of a detection technique depends on the properties of the lightwave signal that convey the information. Different signal models describe different properties. Wave optics describes the continuous-phase property of a lightwave signal, but does not describe the discrete-energy property. Photon optics describes the discrete-energy property of a lightwave signal, but does not describe the phase property. Accordingly, a system based only on wave optics or photon optics provides an incomplete description of the properties of a lightwave signal that could be used to convey information. An information-theoretic channel model based on a complete description of the properties results in a capacity that can be larger than that of a model based on only the properties of either wave optics or photon optics alone. Lightwave-detection techniques are usually designed on the basis of a judicious understanding of the discrete and continuous properties of a lightwave signal but without mingling these properties in the detector. A detector that does mingle properties of both wave optics and photon optics, known as a displacement receiver, is discussed in Section 9.5.4. Chapter 14 shows that as the signal in additive lightwave noise becomes large, the capacity of an information channel using detection methods described by the discrete photon-optics signal model approaches the capacity using detection methods described by the wave-optics signal model, as given by the famous Shannon capacity formula for the additive gaussian noise channel. As the signal level with no additive lightwave noise becomes small, the channel capacity approaches the capacity based on detection methods described by photon optics. Accordingly, as the mean signal level is reduced, the optimal method of detection changes from a detection method that predominantly uses the continuous nature of a lightwave signal to a detection method that predominantly uses the discrete-energy nature of a lightwave signal. This transition is not sharp, with the optimal detection method using a combination of wave-optics properties and photon-optics properties, and perhaps other properties that can be described only by using quantum optics.

9.4 Detection Filters

423

This chapter begins the development of these concepts by discussing binary detection methods for a real signal constellation in additive white gaussian noise. After a discussion of the detection filter used to form the detection statistic, detection for channels with noise based on the wave-optics signal model and for channels with noise based on the photon-optics signal model are presented. Then a detection method that mingles the two signal models is described. Chapter 11 builds on the methods developed using wave optics to include various forms of interference caused by effects such as dispersion.

9.4

Detection Filters The study of detection filters starts with the elementary case of a binary antipodal signaling baseband waveform with no intersymbol interference in the presence of additive white gaussian noise n(t ) with variance σ 2 = N 0/2. At passband, this signaling waveform becomes binary phase-shift keying. It will be shown that when a detection filter is used that maximizes the sample signal-to-noise ratio, the probability of a detection error for antipodal signaling with an equiprobable prior in additive white gaussian noise is given by pe

= 21 erfc

·¸

E b / N0

¹

,

(9.4.1)

where E b is the mean energy per bit. In general, for binary signaling, E b is defined as

=. p0 E0 + p1 E1, and p1 satisfying p0 + p1 = Eb

(9.4.2)

with prior probabilities p0 1. For antipodal signaling, E 0 = E 1 , and for any prior, the mean energy per bit E b is equal to the mean energy per pulse E p and E b = E 0 = E 1 = E p . This statement is not true for other modulation formats. Expression (9.4.1) for the binary error rate, which is derived as (9.5.18) and shown in Figure 9.6, is a precursor for the probability of a detection error for many other signaling formats. These probabilities of error can be expressed as modifications to (9.4.1). The detection filter that underlies this basic expression is the matched filter, which is derived later in this section. The mean bit-energy-to-noise ratio E b/ N 0 or mean pulse-energy-to-noise ratio E p / N0 in (9.4.1) is an appropriate parameter of matched-filter detection as shown in (9.4.1). More general variations of (9.4.1) will arise for other modulation formats or other detection filters that are not matched filters, or for other forms of noise. The corresponding parameters inside the erfc function for other situations use a variety of notations – some traditional – and some appropriate to a specific situation that will be studied in this chapter and the next. For the basic case of antipodal signaling (or binary phase-shift keying), alternative notations are

º

2E b N0

= σA = √γ = d2min σ = Q.

(9.4.3)

424

9 The Information Channel

–1 –2 ep 01 goL

–3

–4 –5 0

2

4

6

8

10

Eb /N0 (dB) Figure 9.6 Probability of a detection error for antipodal signaling in additive white gaussian noise.

Antipodal signaling uses the two signal points ³ A with the euclidean distance dmin 2 /4σ 2 is the sample between the points given as dmin = 2A. The term γ = dmin signal-to-noise ratio. The term is a catch-all term whose meaning varies with the situation. For example, with signal-independent noise, the variances σ 02 and σ12 are √ equal and the term is equal to γ . For noise that generates a signal-dependent variance, such as shot noise, σ02 and σ12 are not equal and the term is equal to √ 2 2 γ , where γ = dmin /4σ AV is an effective sample signal-to-noise ratio with the mean noise level given by σ AV = (σ0 + σ1 )/2. For either of these two cases, is given as dmin/(σ0 + σ1). The detection filter that maximizes the sample signal-to-noise ratio γ for a gaussian sample statistic with only additive white noise is the matched filter, which will be developed in Section 9.4.2. The detection filter that maximizes the effective signalto-noise ratio γ (or ) for a signal-dependent gaussian noise sample is developed in Section 9.4.3. These filters maximize the effective signal-to-noise ratio but may create intersymbol interference.

Q

Q

Q

Q

Q

9.4.1

Linear Detection Filters

A detection filter is used to prepare for a sample statistic to be taken from the demodulated baseband waveform. For a real signal constellation and no intersymbol interference, each received pulse is to be summarized by a sample statistic consisting of a single real number. That number is compared with one or more thresholds to make a decision about the transmitted symbol. The sample statistic depends on the signal model and can be controlled through the transformation that converts the demodulated baseband waveform into the detection statistic. In most cases, this tranformation can be implemented using a linear detection filter. In cases such as photon counting, the transformation also involves a nonlinear thresholding operation.

9.4 Detection Filters

425

In the simplest case, the sample statistic is a single gaussian random variable obtained by sampling at time kT , for each k. The output of a single linear detection filter is designed to control the signal mean and the noise variance. Modulation formats that use several different pulse shapes, such as orthogonal signaling, do require multiple detection filters given by the set { y± (t )}. Linear filtering followed by sampling at time kT can be described mathematically as the projection of the received noisy waveform r (t ) onto the set of detection functions { y± (t )} translated in time by kT for the kth pulse, with the ±th detection statistic at time kT given by rk±

=

µ∞

−∞

r (t ) y± (kT

− t )dt ,

(9.4.4)

where k indexes the time, and ± indexes the detection filter. For amplitude-shift-keyed modulation formats, there is only one detection filter, and the subscript ± is superfluous. In this chapter, the received noisy waveform r (t ) is a real function of time, and the detection filter is a real function as well. The elementary detection filter y (t ) = rect(t / T ) is called the integrating detection filter. This is because rk

=

µ∞

−∞

r (t )rect(( kT

− t )/ T )dt =

µ (k +1/2)T (k −1/2) T

r (t )dt ,

(9.4.5)

which can be implemented as an integrator. For an integrating detection filter σ 2 = N 0T /2 (cf. (2.2.70)). While this detection filter (in the form of an integrator) may be simple to implement, in general, it does not maximize the signal-to-noise ratio γ of a sample at the filter output. 9.4.2

Detection Filters for Additive White Noise

This section derives the form of a detection filter that maximizes the signal-to-noise ratio γ of the sample in additive white noise, not necessarily gaussian. Later, Section 10.2.3 shows that, for additive white noise that is also gaussian, maximizing the signal-to-noise ratio also minimizes the probability of a detection error. For a single pulse p(t ) in additive noise n (t ), a detection filter with impulse response y (t ) has a sample statistic r k at time kT given by the sampled convolution (cf. 9.4.4) rk

=

µ∞

µ−∞ ∞

r (t )y (kT

− t )dt

= p(t ) y(kT − t )dt + n² (kT ) −∞ = sk + n²k , (9.4.6) » ∞ where sk = −∞ p(t ) y(kT − t )dt is the kth filtered sample, and n²k = n² (kT ) is the kth filtered noise sample with variance

σ = 2

µ

N0 ∞ 2 y (t )dt 2 −∞

=

µ

N0 ∞ 2 y (kT 2 −∞

− t )dt .

(9.4.7)

426

9 The Information Channel

The signal-to-noise ratio given by

γ at the filter output at time T for the single pulse p(t ) is

¼µ ∞ s2

−∞µ

γ = σ2 =

p ( t ) y( T

N0 ∞ 2 y (T 2 −∞

− t )dt

½2

− t )dt

,

(9.4.8)

where (9.4.6) and (9.4.7) have been used, and k is set to one. This ratio is to be maximized by the choice of the detection filter y(t ). The Schwarz inequality (cf. (2.1.71)) applied to p(t ) and y(T − t ) states that

¼µ ∞

−∞

p(t )y (T

½2 µ ∞ µ∞ − t )dt ≤ p2 (t )dt y2 (T − t )dt , −∞

−∞

(9.4.9)

with equality when y (T − t ) equals p(t ) or, restated, when y(t ) = p(T − t ). This is often written in noncausal form as y (t ) = p(−t ) with the time offset by T , or, to make the filter causal, suppressed, but implied. Referring to (9.4.9), γ is maximized when the received electrical pulse p(t ) is correlated with a copy of itself. Were the received pulse complex-valued, as will be the case in Chapter 10, the detection filter would be equal to the complex conjugate of the time-reversed copy of the pulse. Then y(t ) = p ∗(−t ). In this chapter, only real pulses are considered. A filter of the form y (t ) = p∗ (−t ) is called a matched filter. The filter y (t ) is optimal when it is matched to the received pulse p(t ). The sample value is then equal to the energy E p in the pulse p(t ). Setting y (t ) = p(T − t ), (9.4.8) becomes

γ=

»∞

2 −∞ p (t )dt

N0 /2

where Ep

= s2 =

0

µ∞

is the energy in the received pulse p(t ), and

p = 2E , N

−∞

p2 (t )dt

σ 2 = N 0/ 2,

(9.4.10)

(9.4.11a)

(9.4.11b)

is the variance of the noise in the sample at the filter output. A matched filter maximizes the signal-to-noise ratio of the sample, but does not necessarily eliminate intersymbol interference. There is no interference at the output samples if and only if p(t ) is such that p(t ) ± p(−t ) is a Nyquist pulse. Satisfying this condition is discussed in Chapter 10. A received pulse p(t ) with the special property that p(t ) ± p(−t ) is a Nyquist pulse also has the important property that, for white noise at the input, the output noise samples are uncorrelated, and hence, for gaussian noise, are independent. To show this, let nk = p∗ (−t ) ± n(t )| t =kT be a sample of the filtered noise. Then

9.4 Detection Filters

´nk nk ² µ = = =

¾µ ∞ µ

p(t

−∞µ ∞ ∞

−∞ µ−∞ ∞µ ∞ −∞µ −∞

− kT )n(t )dt

µ∞

p(t ² − k ² T )n(t ² )dt ²

427

¿

−∞ À Á p(t − kT ) p(t ² − k ² T ) n(t )n(t ² ) dt dt ²

p(t

− kT ) p(t ² − k ² T )δ(t − t ² )dt dt ²

∞ = N20 p(t − kT ) p(t −∞ = 0 for k ±= k ² ,

− k ² T )dt (9.4.12)

where the last equality holds if and only if p(t ) ± p(−t ) is a Nyquist pulse. √ For binary antipodal modulation, using (9.4.11a) gives s1 = −s0 = E b , where 2 E b = E p is the mean energy per bit. Using dmin = d102 = (s1 − s0 )2 = 4Eb and 2 2 σ = N 0/2, the sample signal-to-noise ratio γ = dmin /4σ 2 for a matched filter for antipodal modulation is 2

d10 γ = 2N .

(9.4.13)

0

The probability of a detection error is given by (cf.(9.5.18)) pe

=

1 2

erfc

= 12 erfc

¼Â ·¸

½

/4N0 ¹ E b / N0 . 2 d10

(9.4.14a) (9.4.14b)

This fundamental expression, stated earlier in (9.4.1) for signal-independent noise, is √ √ equivalent to (9.5.18a) with A = E b and σ = N0 /2 (cf. (9.4.11b)). The matched filter maximizes the signal-to-noise ratio for white noise, which is noise with a power density spectrum N ( f ) = N 0/ 2. Maximizing the signal-to-noise ratio minimizes the probability of a detection error only when the noise is white and gaussian. When the noise is gaussian, but not white, a whitened matched filter given by Y ( f ) = C P ∗ ( f )/ N ( f ), where C is any constant, maximizes the signal-to-noise ratio. This filter removes the correlation of the noise samples, but, in general, results in intersymbol interference in the signal. When the noise is white but nongaussian, the optimal linear filter that maximizes the signal-to-noise ratio is again the matched filter. A nonlinear function may be a better detector in theory for nongaussian noise, but it is less robust because it is more sensitive to imprecision in the noise model. A linear filter does not itself vary with the signal amplitude, and so is usually preferred.

9.4.3

Detection Filters for Signal-Dependent Noise

The optimal detection filter for a single pulse with no intersymbol interference when there is a combination of signal-dependent noise and signal-independent noise is the topic of this section. The optimal detection filter that minimizes the probability of a detection error varies from an integration over the duration of the pulse, when there is

428

9 The Information Channel

only signal-dependent noise such as shot noise modeled using Poisson statistics, to a matched filter, when there is only signal-independent white noise.4 When there is only signal-dependent noise due to shot noise, the mean and the variance are determined by Campbell’s theorem (cf. (6.7.18)). The signal-to-noise ratio γ is then

(» ∞ p(t )y (T − t )dt )2 γ = σ 2 = »−∞ , (9.4.15) ∞ 2 −∞ p(t )y (T − t )dt where the nonnegative real pulse p(t ) is proportional to the time-varying arrival rate R(t ), or, equivalently, proportional to the received power P (t ) in the lightwave pulse (cf. s2

(1.2.5)). Because p(t ) is nonnegative, the numerator can be written as

¼µ ∞

−∞

½2 ½2 ¼µ ∞ ¸ ¸ p(t ) p(t ) y (T − t )dt . p(t )y (T − t )dt = −∞

(9.4.16)

Now apply the Schwarz inequality (cf. (2.1.71) to write

¼µ

∞¸ −∞

½2 µ ∞ ¸ µ ∞ (¸ ¸ )2 ( p (t ))2 dt p(t ) p(t ) y(T − t )dt ≤ p(t ) y(T − t ) dt µ−∞ µ ∞ −∞ ∞ = p(t )dt p(t ) y2 (T − t )dt . −∞

−∞

Substituting this expression into (9.4.15) gives

γ≤

µ∞

−∞

p(t )dt ,

(9.4.17)

for all choices of the detection filter y(t ). The inequality is satisfied with equality when y (t ) = rect(t / T ), where T is large enough that p(t )rect(t / T ) = p(t ). This optimal detection filter simply integrates the received pulse over a time interval of duration T with the sample s given by (9.4.5). The statement that the optimal filter for a single pulse is an integrator when there is only signal-dependent-noise described by Campbell’s theorem is consistent with the equivalent statement based on photon optics. For photon-optics detection, a harddecision detection statistic is given by the photon count over the symbol interval T with the detection statistic described by a Poisson probability distribution as discussed in Section 9.5.2. Therefore, when only signal-dependent Poisson noise is present, the optimal method to generate the detection statistic is to count events, employing photon optics, or to integrate, employing wave optics. Although the integrating filter maximizes the sample signal-to-noise ratio, it does introduce intersymbol interference unless the pulses do not overlap. 4 See Chapter 5 of Einarsson (1996).

9.4 Detection Filters

9.4.4

429

Detection Filters for General Noise

When only signal-independent noise is present, the optimal linear detection filter that maximizes γ is a matched filter. When only signal-dependent shot noise is present, the optimal detection filter that maximizes γ is an integrating filter. When both forms of noise are present, the shape of the detection filter y(t ) varies between these two limiting cases. An appropriate compromise for the intermediate case is the filter y( t ) =

p(T − t ) p(T − t ) + b

(9.4.18)

defined over the duration of the pulse, where b is a shape parameter that depends on the relative contribution of the two noise sources. The specific form of an optimal detection filter y (t ) depends on the criteria used for optimization. The optimal detection filter that maximizes the sample signal-to-noise ratio γ can be determined using variational calculus (cf. Section 6.1.1). An end-ofchapter exercise discusses the derivation and the optimality of (9.4.18) for one optimality criterion. Another criterion that balances noise and intersymbol interference is presented in Section 11.5.2. Figure 9.7 plots the detection filter given in (9.4.18) for several values of the shape parameter b for a received pulse p(t ) equal to cos(π t )rect(t ). When there is no signalindependent noise or background noise, b equals zero. This corresponds to an ideal shot-noise-limited system. When the signal-independent noise is the most significant noise source, b is large, so the term p(T − t ) in the denominator can be discarded compared with b. For this case, the detection filter y (t ) approaches a matched filter with y (t ) ≈ p(T − t ). The behavior is evident in Figure 9.7. For the special case in which p(t ) = rect(t / T ), the detection filter is an integrator regardless of the division between the signal-independent noise and the signaldependent noise. Such a pulse is appropriate only when the data rate is small compared 1

esnopseR retliF dezilamroN

0.5 b b b b –1/2

10 1 10–1 10–2 0 Time

1/2

Figure 9.7 The detection filter that maximizes the sample signal-to-noise ratio for a cosine pulse

as a function of the shape parameter b.

430

9 The Information Channel

with the channel bandwidth so that there is no significant intersymbol interference due to dispersion. Chapter 11 studies detection in the presence of intersymbol interference.

9.5

Detection of a Binary Signal Three instances of the detection of a binary signal are presented in this section. These are the detection of a binary wave-optics signal, the detection of a binary photon-optics signal, and the detection of a binary signal that mingles the wave-optics property of phase with the photon-optics property of discrete energy.

9.5.1

Detection of a Binary Wave-Optics Signal

The output of a linear wave-optics lightwave channel is first converted to a noisy electrical waveform, then passed through a detection filter y(t ) and sampled. In the simplest case, the noisy electrical waveform consists of the sum of the modulated signal and additive white noise n(t ), usually gaussian. The sample value r k for each k is rk

=

∞ ±

j =−∞

s j q (kT

− j T ) + n² (kT ),

(9.5.1)

where q (t ) = p(t ) ± y (t ) is the target pulse (cf. (9.2.4)), and n² (kT ) = n²k is the kth noise sample. When q (t ) is a Nyquist pulse, the sample is r k = sk + n²k . When the Nyquist pulse q (t ) also has the form p(t ) ± p(−t ) with p (−t ) as the detection filter, the noise samples n²k are uncorrelated random variables and so, when the noise is gaussian, they are independent. Otherwise, when the noise is nongaussian, the noise samples may be dependent. The kth detection statistic is generated from a single sample of a filtered baseband waveform, as shown in Figure 9.1(a). This section discusses the minimization of the probability of symbol detection error pe under the conditions that q (t ) is a Nyquist pulse and the noise samples n ²k are independent. The case in which q (t ) is not a Nyquist pulse and the case in which the noise samples n²k are dependent are studied in Chapter 11.

Decision Regions

The probability that the discrete-time memoryless channel has the real value r as the channel output given that the real value s± is transmitted at the channel input is described by the conditional probability density function f (r | s± ) for each s± , where ± indexes the L points of the signal constellation. A binary modulation format has a two-point signal constellation with real values s0 and s1 . Hard-decision detection on each channel output uses the single sampled real number r to decide between two hypotheses: hypothesis H0 is that s0 was transmitted; hypothesis H1 is that s1 was transmitted. A deterministic detection rule divides the set of real numbers R into two decision regions, 1 and 0 . If r ∈ 1 , then hypothesis H1 is chosen. If r ∈ 0 , then hypothesis H0 is chosen. Suppose that s0 is transmitted. If r ∈ 0 , we correctly decide in favor of H0, but if r ∈ 1, we incorrectly decide in favor of H1. The conditional probability p1|0 of

R

R

R

R

R

R

9.5 Detection of a Binary Signal

431

this incorrect decision is given by the integration of the conditional probability density function f (r | 0) over all possible values r ∈ 1 , p1|0

=

R

µ

R1

f (r | 0)dr

= 1 − p0|0 = 1 −

µ

R0

f (r | 0)dr,

(9.5.2a)

where p0| 0 is the conditional probability of a correct decision, and p0|0 + p1| 0 = 1. Similarly, suppose that s1 is transmitted, then the conditional probability p0|1 of detecting s0 is p0|1

=

µ

R0

f (r | 1)dr

= 1 − p1|1 = 1 −

µ

R1

f (r | 1)dr,

(9.5.2b)

where f (r |1) is the conditional probability density function on r given that s1 is transmitted, and p1| 1 is the conditional probability of a correct decision.

Maximum Symbol Posterior Detection

For binary modulation, the prior is given by p = ( p(s0), p(s1)) , abbreviated ( p0 , p1), with the first component being the prior probability of transmitting symbol s0 and the second component being the prior probability of transmitting symbol s1 . Then p0 + p1 = 1. The total probability of a detection error pe is determined by weighting the conditional error probability by the prior probability with pe

.

= p0 p1|0 + p1 p0|1 = 1 − p0 p0|0 − p1 p1|1 = 1 − pc ,

(9.5.3a) (9.5.3b) (9.5.3c)

where pc = p0 p0|0 + p1 p1|1 is the probability that the correct hypothesis is chosen. Substituting the last expression for p1|0 given in (9.5.2a) and the first expression for p0| 1 given in (9.5.2b) into (9.5.3a) gives pe

= p0 −

µ

R0

( p0 f (r |0) − p1 f (r |1))dr.

(9.5.4)

To minimize the probability of a bit error, maximize the integral in (9.5.4) by the choice of 0. The maximum occurs when the region 0 includes every r for which p0 f (r |0) > p1 f (r |1) and the region 1 includes every r for which p0 f (r | 0) < p1 f (r | 1). Ties are so unlikely as to be of no consequence, and can be put into either region. This is called the (conditional) maximum-posterior decision rule. The maximum-posterior condition defining 0 can be written as

R

R

R

p0 f (r |0) p1 f (r |1)

R

> 1.

(9.5.5)

R

Values of r satisfying the opposite inequality are placed in 1. Using the Bayes rule given in (2.2.9) written in terms of the conditional probability density functions for binary detection, the numerator and denominator of (9.5.5) are, respectively,

432

9 The Information Channel

p0 f (r | 0) = p(r ) f (0|r ),

(9.5.6a)

p1 f (r | 1) = p(r ) f (1|r ).

(9.5.6b)

The conditional probability density function f (s± |r ) is called the posterior probability distribution on symbol s± given output symbol r . Substituting (9.5.6) into (9.5.5) and canceling out the common term p(r ), define u (r ) as the ratio of the posterior probability density functions on the left side of (9.5.5),

. f (0|r ) . f (1|r )

u(r ) =

(9.5.7)

Expression (9.5.7) is simpler than (9.5.5), but can be computed only if p0 and p1 are meaningful and known. Then, using (9.5.5) and (9.5.7), the optimal detection rule in terms of u (r ) is choose H0 if u(r ) ≥ 1,

(9.5.8)

choose H1 if u(r ) < 1,

where H0 is the assertion “s0 was transmitted” and H1 is the assertion “s1 was transmitted.” When the prior is not available, it is customary to use an equiprobable prior as discussed in the next section. A graphical depiction of the decision regions for a gaussian probability density function for an equiprobable prior is shown in Figure 9.8.

Maximum-Likelihood Detection

When a prior is not available, the posterior conditional f (s± |r ) cannot be computed. Only the prior conditional f (r |s± ) is known. The likelihood function is defined as

²(s±; r ) =. f (r |s±).

(9.5.9)

The right side is the conditional probability density function that the value r is received, given that the value s± was transmitted. It is considered to be a function of the unknown channel output r for each possible value of the specified input s± . For the binary case ± is an element of {0, 1}. The left side of (9.5.9) is the likelihood function ²(s±; r ). It is considered to be a function of the unknown channel input s± for an observed channel output r . The mathematical functions on the two sides of (9.5.9) are the same, but the interpretations are different because the variables considered to be known and unknown are reversed. The notation f (r | s± ) denotes a function of r parameterized by s± . The notation ²(s± ; r ) denotes a function of s± parameterized by r, but actually identical to

f(r|0)

u(r) > 0 u(r) < 0

f(r|1)

f(r|0) = f(r|1) r

Figure 9.8 The decision regions

prior.

R0 and R1 for binary hypothesis testing for an equiprobable

9.5 Detection of a Binary Signal

433

f (r |s± ). The likelihood function depends only on the probabilistic model of the information channel, and does not include the prior. It can be used even when the prior probabilities are not known. Each term on the right of (9.5.6) is a joint probability density function f (s± , r ) which can be written as

.

f (s± , r ) = ps± ²(s± ; r ).

.

(9.5.10)

Using r = p0/ p1 to denote the ratio of the prior probabilities, the ratio u (r ) of the posterior probability density functions given in (9.5.7) can be written as u (r ) =

f (0, r ) f (1, r )

= rλ(r ),

(9.5.11)

where the likelihood ratio λ(r ) is defined as

²(0; r ) . λ(r ) = ²( 1; r )

(9.5.12)

Lu (r ) = Lr(r) + Lλ (r ),

(9.5.13)

.

Define Lλ (r ) = loge λ(r ) as the log-likelihood ratio. Using this expression, u (r ) given by (9.5.11) satisfies where L u (r ) = loge u (r ) and Lr (r) = log e r. The appropriate detection rule depends on whether the prior probabilities are specified at the receiver. When the prior probabilities are specified, hypothesis testing based on maximizing the posterior probability using (9.5.8) is optimal. When the prior probabilities are not specified, maximizing the log-likelihood function is used. Maximum-posterior detection and maximum-likelihood detection are equivalent when the prior probabilities are equal. Chapter 14 shows that, for a binary modulation format with equal-energy symbols in signal-independent noise, an equiprobable prior yields the maximum information rate. For this case, optimal detection based on the likelihood function is sufficient. For modulation formats that use unequal symbol energies, as when there is a constraint on the mean energy or when there is signal-dependent noise, unequal prior probabilities usually are needed to achieve the maximum information rate, and so should be used. For this case, optimal detection is based on the maximum-posterior probability. Maximumlikelihood detection based on an equiprobable prior may still be used, but the symbol error rate will be poorer.

Binary Symmetric Detection

For many probability density functions, the equation u(r ) = 1 has a single solution for r , which is called the detection threshold ³. For an equiprobable prior, u(r ) = λ(r ) and maximum-likelihood detection is optimal. The threshold ³ is the value of r , if unique, satisfying

²(1; r ) = ²(0; r ).

(9.5.14)

434

9 The Information Channel

(a)

(b) f(r|0)

p

f(r|1)

1 −ρ 1

1 ρ

ρ

p(0|1)

p(1|0)

Θ

1–p

r

0

1 −ρ

0

Figure 9.9 (a) Error probabilities for a continuous binary symmetric channel. (b) A discrete

binary symmetric channel with error probabilities denoted by ρ .

In general, however, (9.5.14) can have multiple solutions, leading to more than one threshold. For an equiprobable prior and all errors of equal importance, the thresholds are determined by equating the two likelihood functions or equating the log-likelihood ratio L λ (r ) to zero. The decision regions R0 and R1 are then determined by the thresholds, and the probability of a detection error is again given by (9.5.3). When the threshold is set so that the conditional probabilities p0|1 and p1| 0 of a detection error are equal and independent of the other symbols, the information channel is called a binary symmetric channel. A binary symmetric information channel, depicted graphically in Figure 9.9(b), results from maximum-likelihood detection when signaling in additive white gaussian noise. For this case, the conditional probability density functions f (r |0) and f (r |1) on the sample r given by

√ 1 e−(r −s ) /2σ , 2πσ 1 f (r |1) = √ e−(r −s ) / 2σ 2πσ f (r |0) =

0

1

2

2

2

2

(9.5.15a) (9.5.15b)

are the two conditional distributions that correspond to the two transmitted symbol values s0 and s1. For both distributions, σ 2 is the variance of the gaussian noise.5 These probability density functions are shown in Figure 9.9(a). Referring to that figure, the single hard-decision threshold ³ at which f (r |0) equals f (r | 1) is the mean of the two signal levels,

³ = (s1 + s0)/2.

(9.5.16)

This threshold gives equal values of p1|0 and p0|1. The probability of a detection error pe is (cf. (9.5.3a) pe

= p0 p1|0 + p1 p0|1 = ( p0 + p1 ) p1|0 = p1|0 = p0|1,

(9.5.17)

with the probability of a bit detection error pe then called the crossover probability and also denoted by ρ . The crossover probability ρ reduces the gaussian channel to a discrete binary symmetric information channel as shown in Figure 9.9(b). 5 The symbol s is used both for the transmitted symbol value and for the noiseless received symbol value.

For balanced photodetection, the received value is determined from the received lightwave amplitude. For direct photodetection, the received value is determined from the received lightwave power.

9.5 Detection of a Binary Signal

435

When s1 = −s0 = A, the binary signaling is antipodal. Recalling that the minimum distance dmin of a signal constellation is the smallest euclidean distance between any two points of the signal constellation gives dmin = 2A. Substituting (9.5.15a) into (9.5.2a) and using the change of variable r ² = (r + A )/σ gives pe

µ∞

√1 e−r ² /2 dr ² A/σ ·2π√ ¹ 1 = 2 erfc A/ 2σ · √ ¹ = 12 erfc dmin /2 2σ ·¸ ¹ = 12 erfc γ/2 ,

= p1|0 =

2

(9.5.18a) (9.5.18b) (9.5.18c)

2 where γ = A 2/σ 2 = dmin / 4σ 2 is the sample signal-to-noise ratio. The mean value A 2 and the variance σ depend on the method of sampling at the receiver. When the sample is taken at the output of a matched filter, as derived in Section 9.4, the sample signal-tonoise ratio is equal to 2E p / N 0. For binary modulation with an equiprobable prior, E p is √ equal to E b , so the argument of the erfc function in this case is also written as E b / N0 . This form is also given in Section 9.4 as (9.4.14).

Binary Asymmetric Detection

A continuous binary asymmetric channel using gaussian noise with unequal variances for the two input symbols is depicted graphically in Figure 9.10(a). Because the variances of the two gaussian distributions are unequal, this describes a channel that has a signal-dependent conditional probability distribution. For direct photodetection, this signal-dependent term can be described by a variance σ ±2 that depends on the mean s0 or s1 , which is added to the signal-independent term σ 2 used in (9.5.15). The total variance σ±2 is then

σ±2 = σ 2 + σ 2 (s± ) ± ∈ {0, 1}. (9.5.19) For the specical case of shot noise, σ 2 (s± ) = s± (cf. (6.7.5)), which is a consequence of the properties of Poisson statistics. Another form of noise that produces a signaldependent variance occurs when the lightwave signal and the lightwave noise are mixed within a square-law photodetector. This form of noise is called mixing noise and is discussed further in Section 10.5.3. (a)

(b) f(r|0)

p

p(1|1) 1

f(r|1)

p(0|1)

p(1|0)

Θ

1 p(0|1)

r

1–p

p(1|0)

0

0 p(0|0)

Figure 9.10 (a) Error probabilities for a continuous binary asymmetric channel. (b) A discrete

binary asymmetric channel.

436

9 The Information Channel

The two conditional probability density functions with variances in Figure 9.10(a), are

σ02 and σ12 , shown

√ 1 e−(r −s ) /2σ , 2πσ0 1 e−(r −s ) / 2σ , f (r |1) = √ 2πσ1 f (r |0) =

0

1

2

2 0

(9.5.20a)

2

2 1

(9.5.20b)

where s0 and s1 , respectively, are the two means. The two error probabilities p0|1 and p1|0 , shown in Figure 9.10(a) as shaded areas, depend on the threshold. The threshold can be set to make the two probabilities equal, resulting in a binary symmetric channel. This, however, is not the maximum-likelihood decision and does not result in the smallest error probability under an equiprobable prior. For the case shown in Figure 9.10(a), minimizing the error results in a binary asymmetric channel as shown in Figure 9.10(b). For the binary symmetric case with an equiprobable prior discussed earlier, the single threshold occurs when f (r | 0) = f (r | 1). For the binary asymmetric case, possibly with an unequal prior, the situation is more complicated. Now the maximum-posterior decision regions with prior p = ( p0, p1 ) are defined by the solution to L u (r ) = 0 (cf. (9.5.13)) written as

Ä Ã ¼ p f (r |0) ½ p0 σ1e−(r −s ) /2σ 0 loge = 0. = loge p1 f (r | 1) p1 σ0e−(r −s ) /2σ 0

1

2

2 0

2

2 1

This reduces to

(r − s1)2 − (r − s0 )2 + log ( p σ ) − log ( p σ ) = 0. (9.5.21) e 0 1 e 1 0 2σ12 2σ02 Unless σ02 and σ12 are equal, this is a quadratic equation in r and has two solutions given as the two thresholds  ² ( )³ s0 σ12 − s1σ 02 ³ σ 12σ02 (s1 − s0)2 − 2(σ12 − σ02 ) loge ( p1σ0 ) − log e ( p0σ1 ) ³= . σ12 − σ02

(9.5.22) The maximum-likelihood decision regions are obtained from (9.5.22) by setting p0 and p1 each equal to one-half. The two solutions given by (9.5.22) and the corresponding decision regions R0 and R1 (shown in Figure 9.11 for several gaussian probability density functions) yield a disjoint decision region R1 defined by two thresholds. Three examples of thresholds are shown for three pairs of means (s0, s1) given by (0, 1), (0, 4), and (0, 9), and a pair of variances (σ02 , σ12 ) equal to (1, 16) for all three examples. The symbol s1 is detected whenever the sample is larger than the higher threshold or smaller than the lower threshold. The symbol s0 is detected whenever the sample falls between the two thresholds. For large expected signal values, the probability of a signal value smaller than the lower threshold is not significant, as is evident in Figure 9.11 for f (r | 9), so this threshold can be ignored. For this common case, only the significant threshold ³

9.5 Detection of a Binary Signal

f(r | 0)

ytilibaborP

f(r | 1)

f(r | 4)

mark

mark

mark

space

mark

space

mark

space

437

mark

f(r | 9) Sample Value r Figure 9.11 Bottom three curves: conditional gaussian probability density functions with σ 1 = 4 for three means (1, 4, and 9). The two thresholds using equal prior probabilities are shown as dashed lines. The corresponding decision regions for a mark are also shown. Top curve: a zero-mean conditional gaussian probability density function with σ 0 = 1.

given by the positive sign in (9.5.22) need be used. The validity of this approximation is discussed in an end-of-chapter exercise. The conditional probabilities p1|0 and p0|1 of a detection error when using a single threshold are p1| 0 = p0| 1 =

µ∞

µ ³³ −∞

f (r | 0)dr

=√

f (r | 1)dr

=

1

µ∞

2 2 e−(r −s0 ) / 2σ0 dr,

2πσ0 ³ µ³ 2 1 2 e−(r −s1 ) / 2σ1 dr, √ 2πσ1 −∞

(9.5.23a) (9.5.23b)

which can be expressed in terms of the complementary error function (cf. (2.2.19)). To express p1|0 , change the variable of integration to r ² = (r − s0)/σ0 , dr ² = dr /σ0 in (9.5.23a), and therefore change the lower limit to (³ − s0 )/σ0 to yield p1| 0 =

1





µ∞

−r ³−s 0 e σ0

²2 / 2

dr ²

½ ¼ (9.5.24) = 21 erfc ³√− s0 . 2σ0 · √ ¹ = 21 erfc (s1 − ³)/ 2σ1 . Substituting these expressions into

In a similar way, p0|1 (9.5.3a) gives the probability of a detection error as pe

= p0 p1|0 + p1 p0|1 ¼³ − s ½ p ¼s − ³½ p0 1 0 + 2 erfc √1 . = 2 erfc √ 2σ0 2σ1

(9.5.25)

438

9 The Information Channel

A binary symmetric channel is generated when σ1 = σ0 = σ , and ³ = (s1 + s0)/2 (cf. (9.5.16)), which recovers (9.5.18a) with A = (s1 − s0 )/2. This case corresponds to noise that is signal-independent. For a large signal-to-noise ratio, the second term inside the square root in (9.5.22) can be neglected. The two thresholds then reduce to

³ = σ1σs0 ³³ σσ0 s1 . 0

(9.5.26)

1

The dominant threshold is the one with the plus sign both in the numerator and in the denominator. The other threshold will be ignored. Substituting (9.5.26) into (9.5.25), the probability of a detection error is pe

=

where

¼

½

(σ1 s0 + σ0s1√)/(σ0 + σ1) − s0 p0 erfc 2 ¼ s − (σ s +2σσ0 s )/(σ + σ ) ½ p1 + 2 erfc 1 1 0 √ 0 1 0 1 2σ1 ¼ ½ = p0 +2 p1 erfc √ s1 − s0 2(σ1 + σ0 ) =

Q is defined as

1 erfc 2

¼Q½ √ ,

(9.5.27)

2

(9.5.28) Q =. σs1 −+ sσ0 . 1 0 Because s1 − s0 = dmin , the value Q is equal to dmin /2σ , where σ = (σ1 + σ0)/2. . Define the effective sample signal-to-noise ratio as γ = Q2 . Then ½2 ¼ 2 (9.5.29) γ = dmin2 = σs1 +− σs0 , 1 0 4σ which now has the same form as γ (cf. (9.4.3)), but with σ replaced by σ . Then ·¸ ¹ 1 ¼ dmin ½ 1 , (9.5.30) pe = erfc γ /2 = 2 erfc √ 2 2 2σ which has the same form as (9.5.18b), but with σ replaced by σ . This also has the same form as (9.5.18c) but with the signal-to-noise ratio γ replaced by the effective signal-to-noise ratio γ . When there is only signal-independent noise, σ1 equals σ0 , γ equals γ , σ equals σ , and (9.5.30) is equivalent to (9.5.18b) or (9.5.18c). Thus with AV

AV

AV

AV

AV

AV

AV

a signal-dependent variance and a large signal-to-noise ratio, the same formula works simply by averaging the two standard deviations. Using the approximate form of the erfc function given in (2.2.20) gives the approximation 1 −γ / 2 pe ≈ √ (9.5.31) πγ /2 e ,

which shows that the probability of a detection error is dominated by an exponential function of the modified sample signal-to-noise ratio.

9.5 Detection of a Binary Signal

439

Because each summand on the first line of (9.5.27) has equal arguments in the erfc function, the threshold defined by the underlying approximation produces a binary symmetric information channel for which p0|1 and p1|0 are equal (cf. (9.5.25)). Without the approximation used herein, the information channel is asymmetric, with the optimal threshold depending on the prior probability p1 of transmitting s1 . This asymmetric information channel approaches a symmetric channel for large values of .

Q

9.5.2

Detection of a Binary Photon-Optics Signal

The output of a lightwave channel under the photon-optics model is a time-varying discrete point process. The received wave-optics waveform r (t ) is transformed into a point process by a photon-counting photodetector (cf. Figure 7.9). The nature of the photon-optics detection problem depends in part, on the coherence properties of the waveform. The nature of the detection problem also depends on the nature of the intersymbol interference. The nature of the intersymbol interference depends on the nature of the coherence. The lightwave waveform r (t ) at the input to the photodetector consists of a sequence of amplitude-modulated copies of the pulse p (t ) in noise. The simplest received pulse is a rectangular pulse rect(t / T ) with no intersymbol interference. For this pulse, the detection statistic is the number m of events counted in an interval of duration T . While dispersion is always present, it is not included in this initial discussion, so the statistics of a point process can be studied alone. Linear dispersion will be included later in the section. In each modulation interval, the signal arrival rate R is random as determined by the prior. The received point process is a Poisson point process whose arrival rate is the sum of the signal arrival rate in that interval and the noise arrival rate. When there is no channel noise and the signal arrival rate is fixed, the probability mass function of the photon count is a Poisson distribution. When the signal arrival rate is random in each modulation interval, the photon count is described by the Poisson transform (cf. Section 6.3) of the continuous probability distribution of the lightwave energy E defined over the interval T . As described within the photon-optics signal model, the Poisson mean E (cf. Table 6.3) is the signal in an interval and the probability distribution f (E) on the mean is the prior. When the range of E is unrestricted, taking any value in the nonnegative reals, the maximum-entropy probability distribution on E is given by the exponential probability density f (E) = (1/S)e−E/S with mean S (cf. (6.3.10)). The probability mass function on the resulting point process is the Poisson transform of f (E). It is a Gordon distribution (cf. Section 6.3.4), which is the maximum-entropy probability mass function on the point process. When the mean E is restricted to only the points of a real signal constellation, the prior f (E) is restricted to only those points. For ideal binary on–off keying with no signal transmitted for a space, the prior on the mean number of counts E is restricted to the two-point signal constellation {0, E1}. Then the probability distribution f (E) for E can be written as f (E) = p0δ(E) + p1δ(E − E1),

(9.5.32)

440

9 The Information Channel

corresponding to the discrete prior p = ( p0, p1 ). Accordingly, using E 1 = h f E1 for the detected lightwave energy in a pulse, the transmitted wave-optics intensity-modulated binary waveform is an on–off-keyed binary waveform s (t ) =

1 T

∞ ±

j =−∞

s j rect((t

− j T )/ T ),

(9.5.33)

where s j ∈ { 0, E 1} with the prior given as p = ( p0 , p1), and T −1 rect (t / T ) is a unitenergy rectangular pulse. The corresponding time-discrete point process is a mixture of two Poisson point processes.

Ideal Detection of a Binary Photon-Optics Signal

Ideal photon-optics detection for on–off-keyed binary modulation is based on a received signal that has photon noise, but no other noise. This means that, for a received rectangular pulse p(t ) equal to rect(t / T ), the arrival rate of signal photons R for a mark is a constant during the pulse interval T . No signal photons are received for a space. The probability density function f (E) for the mean number of counts E is a pair of Dirac impulses (cf. (9.5.32)). For f (E) = δ(E − E1 ), the conditional probability mass function for detection of a mark is a Poisson distribution with mean E1 . A plot of the Poisson probability distribution with E1 = 5 is shown in Figure 9.12. The mean number of counts Ep generated from a single pulse is Ep

=

µ∞

−∞

() ,

R t dt

(9.5.34)

where R(t ) is the time-varying arrival rate of photons (cf. (1.2.5)) and Ep = E p / h f is the expected number of counts in the pulse. Binary photon counting for a noiseless rectangular pulse has a count for a space equal to zero, because E0 = 0, and a count for a mark equal to E1, in expectation.6 For hard-decision detection, the number of received counts m is compared with a detection threshold to determine the most likely transmitted symbol. For any detection threshold

Threshold = 0

0

2

4

6 8 Number of Photons

10

12

Figure 9.12 Probability mass function of an ideal noise-free photon-counting receiver with E

= 5.

6 A mode with no photons does have vacuum-state fluctuations (cf. (6.1.10)), but they do not contribute to

the expected number of counts.

9.5 Detection of a Binary Signal

p1

441

1 ± e±E1 1

1 e±E1

p0

0

0 1

Figure 9.13 An ideal asymmetric photon-counting channel.

between zero and one, the conditional probability p1|0 that a mark is detected when a space is transmitted is zero. The conditional probability p0| 1 that a space is detected when a mark is transmitted is

.

p0|1

= (E01!)

0

e−E1

= e−E , 1

recalling that 0 ! = 1. Therefore, with a single threshold set to separate zero from nonzero, the ideal photon-counting channel is a binary asymmetric channel (cf. Figure 9.13). When zero counts are recorded, it is asserted that a space was transmitted. When one or more counts are recorded, it is asserted that a mark was transmitted. For an equiprobable prior, which is not the optimal prior, the mean number of counts Eb per bit is given by Eb

= 21 E0 + 21 E1 = 12 E1.

(9.5.35)

For an equiprobable prior, the probability of a detection error given by (9.5.3a) is pe

= 21 p0|1 = 12 e−E = 12 e−2E .

(9.5.36a)

1

b

(9.5.36b)

This means that in the absence of additive noise, the expected number of photons ( + E1)/2, per bit required to achieve a probability of bit error pe is 1 − 2 loge (2 pe ). For pe = 10−9, this is about ten photon counts on average or about 20 photon counts for a mark.

Eb , which is E0

Detection in the Presence of Noise Counts

For practical photon-optics detection, the received signal has other noise in addition to the signal-dependent photon noise. Background noise sources such as spontaneous emission and dark current can be treated as a Poisson process with rate R0 . For these sources of noise, the conditional probability mass function p(m|0) for a space is a Poisson distribution with a nonzero mean. When a mark is transmitted, a constant arrival rate ´R is added to the noise, with the total arrival rate given as R0 + ´R. Accordingly, in the time interval of duration T , the mean number of counts E1 is E1

= E0 + ´E,

(9.5.37)

442

9 The Information Channel

Threshold (Θ = 10)

E0 = 5

p1|0 0

5

10

15

20

25

30

35

30

35

E 1 = 20 p0|1 0

5

10

15 20 Number of Photons

25

Figure 9.14 Probability mass functions for E1 threshold ³.

= 20 and E0 = 5 along with the optimal discrete

where E0 is the mean number of counts due to noise. Because E1 is the sum of two Poisson processes, it is also a Poisson process (cf. (6.2.30)). An example with probability mass functions E0 = 5 and E1 = 20 is shown in Figure 9.14. The terms contributing to the probability of error are circled. The maximum-likelihood threshold is the value at which the conditional probability mass functions are equal (cf. (9.5.14)), 1

1 ( E1 )m e−E = (E0)me−E , m! m! 1

0

which need not be an integer. Accordingly, the largest integer ³ used to determine p0|1 is given by

³=

Å

E1 log e E1

− E0 Æ . − loge E0

(9.5.38)

Similarly, the smallest integer used to determine p1| 0 is given by ³ + 1. Therefore, we can regard ³ as a discrete threshold with

³ (E )m ± 1 e−E , m! m=0 ∞ 1 ± (E0 )m −E e . p1|0 = 2 m! m=³+1 p0|1

= 21

(9.5.39a)

1

(9.5.39b)

0

Rewriting (9.5.39b) in terms of a finite sum, the probability of a detection error for an equiprobable prior is pe

= 21 p1|0 + 12 p0|1 Ã ± Ä ± ³ (E )m ³ (E )m 1 1 0 1 − E e +2 e−E = 2 1− m! m! m=0 m=0 ³ · ¹ ± = 21 − 21 m1! Em0e−E − Em1e−E , m= 0 0

0

(9.5.40)

1

1

which is the discrete equivalent of (9.5.4) for an equiprobable prior.

(9.5.41)

9.5 Detection of a Binary Signal

0

443

E0 = 10

–2 –4

E0 = 1

ep

–6

goL

E0 = 0 . 1

–8

E0 = 0.01

–10

E0 = 0

–12 0

5

10

15

20

25

30

Expected Number of Signal Photons (E 1) Figure 9.15 The probability of a detection error pe as a function of the expected value of the mark

=

E 1 for several values of the noise E0 . The E0 0 curve is given by (9.5.36b). The shape of the curves for E0 0 is a consequence of the discrete threshold defined in (9.5.38).

±=

A plot of (9.5.41) as a function of E1 for several values of E0 is shown in Figure 9.15. Referring to that figure, a small amount of Poisson noise results in a significant increase in the probability of a detection error. For the case of no noise mentioned in the previous subsection, E0 = 0. If E1 = 20, then pe = 10 −9. Instead, for an expected noise level of just one photon E0 = 1, Figure 9.15 shows that pe ≈ 10 −4 for the same value of E1 = 20. The probability of a detection error has increased by five orders of magnitude! The rapid increase in the probability of a detection error as the noise level increases implies that a practical photon-counting receiver operates far from the noise-free limit.

Detection in the Presence of Bandlimited Noise

A different photon-optics discrete noise model is based on a directly photodetected wave-optics bandlimited gaussian noise process. Because the noise is bandlimited, it is not white. It has a coherence timewidth τc . Referring to Section 6.5.1, this bandlimited gaussian noise process can be approximated as a process defined on a sequence of independent intervals, each of duration τc , during which the noise has a constant, but random, value. This noisy lightwave is photodetected and accumulated over a symbol interval of duration T . The probability mass function of the sample is the Poisson transform of the integral of the noisy wave-optics signal. The corresponding wave-optics distribution of the energy is a noncentral chi-square probability density function (cf. (6.5.5)) with N = 2K degrees of freedom, where K = ¶T /τc · is the number of independent exponentially distributed subsamples used to approximate the squared magnitude of the bandlimited gaussian noise over an interval of duration T . A method of asynchronous demodulation based on this noise model is discussed in Section 10.6. The Poisson transform of the noncentral chi-square probability density function is the Laguerre probability mass function used within photon optics (cf. (6.5.9)). The

444

9 The Information Channel

Laguerre distribution depends on the number of coherence intervals K forming the detection statistic in an observation interval of duration T = K τc . For a modulated on–off-keyed binary signal with rectangular pulses, the mean number of signal counts is either zero or E1. The conditional probability mass function p(m|1) for a signal designating a mark is a Laguerre distribution characterized by the mean E1 + N sp and the number K = ¶T /τc · of coherence intervals in the observation interval of duration T . The conditional probability mass function p(m| 0) for a null signal designating a space is a negative binomial distribution (cf. (6.5.11)) characterized by the mean number N sp of additive noise photons and the number K of coherence intervals. When N sp is much smaller than one, the Laguerre distribution approaches a Poisson distribution with mean E1. This can be seen by ignoring the term Nsp (eiω − 1) in the characteristic function of the Laguerre distribution given in (6.5.8). The resulting characteristic function is that of a Poisson distribution with mean E1 (cf. (6.2.27)). In the same way, when the number of coherence intervals K in an observation interval is large and Nsp is small, the negative binomial probability mass function approaches a Poisson probability mass function with mean K Nsp (cf. Section 6.5.2). Therefore, when the additive noise N sp is small compared with one, the approximate probability of a detection error can be determined using an analysis based on the Poisson probability mass function. 9.5.3

Binary Detection for a Dispersive Channel

Because of pulse selection at the transmitter or because of channel dispersion, the received lightwave may experience intersymbol interference which is then transferred onto the electrical waveform r (t ). The nature of the intersymbol interference depends on the coherence properties of the lightwave and on the method of conversion to an electrical signal. The case of a coherent carrier is straightforward. A balanced photodetector simply transfers the waveform, including the intersymbol interference, onto the electrical waveform. This is a linear process. The nature of the intersymbol interference for a noncoherent lightwave is described as three separate cases. The carrier phase can be slowly varying or rapidly varying as compared with the symbol duration. For a rapidly varying carrier phase, the nature of the intersymbol interference depends on whether the intersymbol interference is already present in the modulating waveform prior to transfer onto the lightwave or whether the intersymbol interference is created by dispersion of the lightwave after modulation. For a coherent lightwave, suppression of intersymbol interference can be implemented by filtering either in the optical domain before photodetection or in the electrical domain after balanced photodetection of a coherent lightwave. Other than the effect of shot noise, these are mathematically equivalent for balanced photodetection. This is because the electrical pulse is proportional to the lightwave pulse. For direct photodetection, however, this is not true. Because direct photodetection is a nonlinear operation, intersymbol interference cannot be completely suppressed by filtering after direct photodetection. With difficulty, intersymbol interference after direct photodetection can

9.5 Detection of a Binary Signal

445

only be partially managed directly. It can be managed indirectly by sequence detection on a nonlinear trellis as discussed in Section 11.3. Intersymbol interference can be partially managed prior to direct photodetection by using dispersion-controlled fiber in the optical domain (cf. Section 4.5.3). Moreover, in principle, a pulse-shaping filter that controls both the amplitude and the phase of the lightwave signal, but only if the phase is coherent for longer than a symbol interval, could be fabricated in the optical domain. This managing of intersymbol interference in the optical domain prior to direct photodetection is more flexible, but is also more complex. The condition that the phase must be stable over at least a symbol interval is a requirement of noncoherent demodulation, which is discussed in Section 10.5.1. Some pulse shaping for direct-photodetection intensity modulation can be implemented in the electrical domain. The exemplar case is binary on–off keying. In the absence of intersymbol interference, the intensity r (t ) is simply integrated over the pulse duration. Thus the sample is rk

=

µ kT +T /2 kT −T /2

r (t )dt

=

µ∞

−∞

r (τ)rect((kT

− τ)/ T )dτ,

(9.5.42)

where the integrand could be, instead, a photon-event counter. Instead of the rectangular function rect(t ), a detection window y (t ) may be used to assign different weights to photons in different parts of the detection interval. Including this detection window, the real, nonnegative target pulse q ²(t ) that is sampled is given by (9.2.15). Thus rk

=

µ kT +T /2 kT −T / 2

r (t ) y(t )dt

=

µ∞

−∞

r (τ) y(kT

− τ)dτ.

(9.5.43)

The detection window y(t ) will give less weight to regions with more intersymbol interference.

9.5.4

Displacement Detection of a Binary Signal

A method of binary demodulation/detection based on the dual wave/particle nature of a lightwave signal, called a displacement receiver, is described in this section. A displacement receiver is based on both the wave properties and the particle properties of light. A displacement receiver cannot be described using only continuous wave optics because that signal model cannot represent the discrete-energy property of a lightwave signal. A displacement receiver cannot be described using only discrete photon optics because that signal model cannot represent the phase of a lightwave signal. Though both wave optics and photon optics are incomplete on their own, properties of both signal models can be used in the analysis of a particular system. Here, a method of detection is described that can be explained only by using both the wave-optics concept of phase and the photon-optics concept of photon counting. Consider a binary signal constellation that transmits the complex lightwave pulse ³ p(t ) at a frequency f c using a coherent carrier with the positive sign for one data value and the negative sign for the other data value. Each data value, and hence each

446

9 The Information Channel

±p(t) +

Detected symbol

Photon counting

sLO(t) Figure 9.16 A functional block diagram of a displacement receiver.

sign, is equally likely. The expected number of photons per bit Eb over a symbol interval T is the same for both signs, and the signs cannot be distinguished by simply counting photons. For a memoryless, photon-noise-limited channel, the received pulse is equal to the transmitted pulse. At the receiver, a local oscillator signal sL O (t ) is added to the incident pulse as shown in Figure 9.16. The local oscillator signal is equal to p(t ) in amplitude, phase, polarization, and time. The sum of the two pulses sL O (t ) ³ p(t ) is the input to a photon-counting receiver (cf. Section 9.5.2). When − p(t ) is transmitted, the sum of the local oscillator signal and the incident signal is zero, and no counts are recorded (E0 = 0). However, when p(t ) is transmitted, the sum of the local oscillator signal and the incident signal is 2 p(t ), and 4Eb counts are recorded over time T . The expected number of counts per bit after adding the local oscillator and subsequent photon counting is (4Eb + 0)/2 = 2Eb , and is twice as large as the result of ideal photon counting for an equiprobable prior when a matched local oscillator is not used. The probability of a detection error given by (9.5.36a) is modified to read pe

= 21 e −4E , b

(9.5.44)

which shows an increase in the exponent by a factor of two. The displacement receiver uses properties of a lightwave signal that are expressed using two different signal models. The matching of the lightwave local oscillator signal s L O (t ) to the pulse waveform p(t ) is based on the wave-optics concept of phase. This shifts or displaces the antipodal signal constellation to a nonnegative signal constellation amenable to photon counting. Photon counting is based on the photon-optics concept of discrete energy. It is shown in Chapter 10 that a displacement receiver achieves a lower probability of detection error than does homodyne demodulation based solely on wave optics, which is not sensitive to the discrete-energy nature of a lightwave signal. It is also shown that the displacement receiver has a lower probability of a detection error than does photon counting using equal prior probabilities with on–off keying based solely on photon optics, which is not sensitive to the phase of the lightwave signal. A comprehensive analysis of a displacement receiver requires a signal model that encompasses both wave optics and photon optics. This is the quantum-optics signal model, which is fully developed in Chapter 15. Using this model, demodulation/detection methods based on the complete set of properties of a lightwave signal can achieve a lower probability of a detection error than for any demodulation method based on wave optics or photon optics alone. Some of these methods cannot be expressed by any combination of the properties of wave optics, and photon optics, and are discussed in Chapter 16. The displacement receiver is one method within this larger class of admissible methods.

9.6 Detection of a Multilevel Signal

9.6

447

Detection of a Multilevel Signal A multilevel modulation format is an extension of the binary signal constellation to a signal constellation with L real points, where L is larger than two, usually a power of two. Signal constellations with L = 4 are shown in Figure 9.2. Each signal constellation shows equally spaced points, which is often preferred but is not a requirement. This section discusses multilevel wave-optics detection and multilevel photon-optics detection. Much of the discussion is a straightforward extension of the binary case to the L-ary case. For signal-dependent noise with small signal levels, a complete analysis would require the examination of an expression like (9.5.21) for each pair of signal levels. Instead we will go immediately to the large-signal approximations for the threshold and the error rate given in (9.5.30) for the binary case.

9.6.1

Detection of a Multilevel Wave-Optics Signal

A decision rule for a real-valued multilevel modulation format associates each signal value s± with one of L regions that partition the real line. These are denoted by ± , for ± = 0, . . . , L − 1, each region consisting of one or more real line segments. If r ∈ ±, the hypothesis H± “the symbol s± was transmitted” is asserted. For maximum-posterior detection, the decision regions are defined as follows. For each value of r , there are L joint probability densities, denoted f (s± , r ) (cf. (9.5.10)). The ± th detection region ± is determined by the set of values of r for which the probability f (s± , r ) is largest. When, as for the usual case, each resulting decision region ± is a distinct interval of the real line, the threshold ³ ± separating decision region ± from adjacent decision region ±+1 is determined by the value of r for which (cf. (9.5.11))

R

R

R

R

f (s± , r ) f (s±+1 , r )

= 1,

R

R

(9.6.1)

where 0 ≤ ± ≤ L − 1. For an equiprobable prior, the joint densities can be replaced by the conditional densities f (r |s± ). Using (9.5.3) along with the multilevel generalization of (9.5.4) gives pe

= 1 − pc µ L −1 ± = 1 − p±

R

±=0



f (r |s± )dr,

(9.6.2)

where the set of decision regions { ± } is determined by the set of thresholds {³± } as defined by (9.6.1). For a sufficiently high signal-to-noise ratio, a real-valued multilevel modulation format in gaussian noise with signal-dependent variance can be treated using the methods discussed in Section 9.5.1. For multilevel modulation, the threshold ³± given in (9.5.26) and the corresponding effective signal-to-noise ratio γ given in (9.5.29) are modified to read

448

9 The Information Channel

(9.6.3a) ³± =. σ±+σ1s± ++σσ±s±+1 , ±+1 ± ¼ ½2 γ ± =. σs±+1 +− σs± . (9.6.3b) ±+1 ± The probability of a correct decision pc (s1 ) when transmitting s1 is µ³ 1 e−(r −s ) / 2σ dr pc (s1 ) = √ 2πσ1 −∞ ·¸ ¹ (9.6.4) = 1 − 21 erfc γ 1 /2 , where ³ 1 = (σ2s1 + σ1 s2)/(σ1 + σ2), and (erfc(− x ) + erfc(x ))/ 2 = 1 has been used to state the second line. Similarly, the probability of a correct decision pc (s2) when 1

1

2

2 1

transmitting s2 is

µ³ √1 e−(r −s ) /2σ dr 2πσ2 ³ ·¸ ¹ ·¸ ¹ = 1 − 12 erfc γ 1 /2 − 21 erfc γ 2 /2 , and so the probability pe (s2 ) of a detection error is (cf. (9.6.2)) ·¸ ¹ 1 ·¸ ¹ pe (s2) = 21 erfc γ 1 /2 + 2 erfc γ 2/2 . 2

pc (s2 ) =

2

2

2 2

1

(9.6.5)

(9.6.6)

For a modulation format that has L real signal amplitudes, each of the L − 2 middle symbols is incorrectly detected in two ways, so the probability pe is given by the sum of the two terms given in (9.6.6). The two end states have a single term for pe = 1 − pc , where pc is given in (9.6.4). In total there are 2L − 2 terms. Grouping the terms by a common value of γ ± leads to L − 1 pairs of equal terms for an equiprobable prior. Therefore, the probability of an incorrect decision is pe

=

1 L

L −1 ±

±=1

erfc

·¸ ¹ γ ± /2 .

(9.6.7)

If all terms in the summation in (9.6.7) are identical, then the information channel is symmetric, with the probability of a detection error given by pe where γ

·¸ ¹ = L −L 1 erfc γ / 2 ,

(9.6.8)

= Q2 may be used for the second expression (cf. (9.5.29)).

Signal-Independent Noise

The condition that γ ± is equal to γ for all ± can be satisfied by uniformly spacing the mean values s± when σ±2 is equal to σ 2 for all ±. The conditional gaussian probability densities on a channel gaussian noise of variance σ 2 are given √ with−(signal-independent r −s± ) 2 /2σ 2 . This is a signal-independent noise channel with by f (r |s± ) = (1/ 2πσ) e γ = γ (cf. (9.5.29)). For this case, (9.6.3a) reduces to ³ ± = (s±+1 + s± )/2, which is a generalization of the binary case (cf. (9.5.16)).

9.6 Detection of a Multilevel Signal

ytilibaborP ytilibaborP ytilibaborP ytilibaborP

s1

Θ1

p

p1

2|1

s2 p1|2

449

r

Θ2

p2

p3|2 s3

p2|3

p3

r

Θ3

p4|3 s4

p3|4

p4

r

r

Figure 9.17 Conditional gaussian probability density functions for a symmetric multilevel real-valued modulation format with signal-independent noise. Only the nearest-neighbor error events are depicted.

An example of a 4-ary signal constellation s± ∈ {−3A, − A, A, 3 A} is shown in Figure 9.17. There are three thresholds ³± halfway between adjacent signal values. These thresholds are −2A, 0, and 2 A. For an equiprobable prior, minimizing the squared euclidean distance (r − s± )2 over the set of signal points {s± } maximizes the probability pc of a correct decision and therefore minimizes the probability of a detection error pe given in (9.6.2). √ For large L , the error probability pe approaches erfc ( γ/2), which is twice the value given in (9.5.27). This is because the most significant terms in (9.6.7) are the L − 2 middle terms for which an incorrect decision can be either of the two nearest neighbors. These nearest-neighbor error events are shown in Figure 9.17. For binary transmission, L is equal to two, and there are no middle terms. Using (9.6.4),

= 1 − pc ¸ = 21 erfc( γ/2), which agrees with (9.5.30) with γ = γ . pe

(9.6.9)

Signal-Dependent Noise

When σ±2 is equal to s± for all ±, the condition that γ ± is equal to γ for all ± can be satisfied by uniformly spacing the square roots of the mean values s± . The conditional √ 2 gaussian probability density for this case is given by f (r |s± ) = (1/ 2π s± )e( r −s ± ) / 2s± . For this signal-dependent noise channel, using σ±2 = s± in (9.6.3) gives (9.6.10a) ³± =. σ±+σ1s± ++σσ±s±+1 = √s±s±+1, ±+1 ± (9.6.10b) Q± =. σs±+1 +− σs± = √s±+1 − √s±. ±+1 ± √ Expression (9.6.10a) states that the threshold ³± is the geometric mean s± s±+1 of the two adjacent signal levels. Expression (9.6.10b) states that Q ± does not depend on ±

450

9 The Information Channel

(a)

s1

(b)

ytilibaborP

ytilibaborP

p 2|1

p1

s1 p 2|1

p1

r

r

p 1|2

ytilibaborP

ytilibaborP

Θ1

s2 p 3|2

p2

Θ1 p 1|2

s2

p2

p 3|2

r

r

Θ2

Θ2 ytilibaborP

ytilibaborP

s3 p 2|3

p 4|3

p3

s3

p 2|3 p 3

p 4|3

r

r

s4

p 3|4

p4

r

Signal Level

ytilibaborP

ytilibaborP

Θ3

Θ3 p 3|4

s4 p4

r

Signal Level

Figure 9.18 Conditional gaussian probability densities for a multilevel real-valued modulation format with signal-dependent variance σ ±2 = s ± . Only nearest-neighbor error events are depicted. √ (a) An optimal distribution of uniformly spaced root mean values s± ∈ {1, 2, 3, 4} that produces a symmetric information channel. (b) An ill-advised distribution of uniformly spaced mean values s ± ∈ {3, 6, 9, 12} chosen to have the same mean amplitude as the values used in part (a). This distribution produces an asymmetric information channel.





when the difference s±+1 − s± is the same for all ±. This condition is satisfied when √ the values s± are uniformly spaced. Then the information channel is symmetric, with the probability of a detection error given by (9.6.8). The condition that produces a symmetric channel for this case is shown in √ Figure 9.18(a) with s± ∈ { A , 2A , 3 A , 4A } being uniformly spaced. Then s± ∈ { A, 4A, 9 A, 16 A}. This signal constellation leads to a symmetric information channel with the conditional probability of each detection error being independent of the transmitted symbol. The contrasting case is shown in Figure 9.18(b). Now the signal values themselves are uniformly spaced, with s± ∈ {3A , 6A , 9A , 12 A } chosen to produce the same mean amplitude as the signal constellation shown in Figure 9.18(a) with the threshold given √ in (9.6.10a) using σ± = s± . For this case, the uniform spacing of the amplitudes leads to an asymmetric channel that has a significantly larger probability of a detection error than the signal constellation shown in Figure 9.18(a). Combining the signal-dependent noise case with the signal-independent noise case, the optimal spacing of the points of a real-valued signal constellation varies from a

9.6 Detection of a Multilevel Signal

451

uniform spacing of the signal amplitudes when there is only signal-independent noise to a uniform spacing of the square roots of the amplitudes when there is only signaldependent noise with that noise treated using a gaussian distribution with a variance equal to the mean. For a mixture of noise sources, the optimal spacing of the signal points must be determined using numerical methods.

Symbol Errors and Bit Errors

The relationship between the symbol error rate given in (9.6.8) and the bit error rate depends on the mapping from the bits to the symbols. For example, when using a fourlevel system with bits mapped by s1 = 00, s2 = 01, s3 = 10, and s4 = 11, whenever the symbol labeled s3 is transmitted and the symbol labeled s2 is detected, then a single symbol error corresponds to two bit errors. The mean number of bit errors can be reduced slightly for a real signal constellation by labeling the states using a Gray code. A Gray code for a four-level modulation format is the following: s1 = 11, s2 = 10, s3 = 00, s4 = 01. For this code, bit patterns mapped into nearest-neighbor symbols differ by only one bit. Because nearest-neighbor detection errors are the most likely, this kind of detection error generates one bit error per symbol error. Therefore, for an L-ary modulation format, the bit error rate ρ is approximately

ρ ≈ logpe L ,

(9.6.11)

2

where the symbol error rate pe is given by (9.6.8). In particular, for a four-level system that uses a Gray code, the bit error rate is approximately pe / 2. For the signal constellations defined on the complex plane in Chapter 10, labeling the states using a Gray code is not always possible.

9.6.2

Detection of a Multilevel Photon-Optics Signal

A nonnegative multilevel real signal constellation is appropriate for a photon-optics stream. For ideal multilevel photon counting for which the only source of noise is pho. ton noise, let E± = ´m± µ be the expected number of counts for symbol s± in the interval of duration T for ± = 0, . . . , L − 1 with E±

= s± Ep,

(9.6.12)

where Ep is the expected number of counts generated from a unit pulse and s± is nonnegative. Each corresponding conditional probability mass function is a Poisson distribution p(m| E± ) =

(E±)m e− ± , m! E

(9.6.13)

with mean value E ± . The thresholds are determined using (9.6.1) for each value of ±. The conditional Poisson probability mass functions for L = 4 along with the thresholds ³± are shown in Figure 9.19. The solution for the threshold ³± has the same form as (9.5.38) with

452

9 The Information Channel

E1 = 10

0

Θ1 = 16

10

20

30

40

50

60

70

80

90

100

50

60

70

80

90

100

m

Θ2 = 34

E2 = 25 0

10

20

30

40

Θ3 = 58

E3 = 45 0

m

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

60

70

80

90

100

m

E4 = 75 0

m

Figure 9.19 The probability mass functions for the number of counts for a four-level

photon-counting receiver with an equiprobable prior. Also shown are the optimal thresholds ³± used to minimize pe .

Å

Æ − E± ³ ± = log E − log E . e ±+1 e ± E±+1

(9.6.14)

For an equiprobable prior and signal levels E± that are uniformly spaced in amplitude, the numerator of (9.6.14) is a constant, so the set of thresholds ³± has an inverse logarithmic dependence. As a consequence, pe is different for each transmitted symbol, so the resulting information channel is asymmetric. If, instead, every signal level E± is large so that the Poisson probability mass function can be approximated by a gaussian probability density function (cf. Figure 10.20), then a uniform spacing of the square roots √ E± of the signal levels produces a symmetric information channel (cf. Figure 9.18(b)). For any set of thresholds, the probability of a detection error pe is determined using the discrete equivalent of (9.6.7), which can be written as pe

= 1 − L1

±

L −1

pc (m|E± )

±=0 L −1 ³(±+1) 1 ± ±

=1− L

±=0

(E±)m e−E± , m! m=³± +1

(9.6.15)

9.7 Noise Models for Intensity Detection

453

where ³0 + 1 = 0 and ³ L = ∞. As an example, the threshold ³ 3 between the probability mass function with mean E3 and the probability mass function with mean E 4 is determined by equating the conditional probability mass functions for the two symbols

(E3 )m e−E = (E4)m e −E . m! m! 3

4

(9.6.16)

Using the expected values E3 and E4 shown in Figure 9.19, the solution for ³3 is

Å

Æ Å 75 − 45 Æ − E3 ³3 = log E − log E = log 75 − log 45 = 58. e 4 e 3 e e The values for the other thresholds are ³1 = 16 and ³2 = 34, with pe given by (9.6.15) using L = 4, 3 ³(±+ ) 1 ± ± (E± )m −E± e ≈ 0.05. pe = 1 − 4 m! ±=0 m =³± +1 E4

1

9.7

Noise Models for Intensity Detection The detection of an intensity-modulated signal is an instance of the detection of a realvalued signal. The detected intensity samples are intrinsically nonnegative, such as those generated by direct photodetection. The photodetected electrical baseband signal r (t ) is equal to the squared magnitude of the total lightwave amplitude at the channel output. When the total lightwave amplitude in a single polarization is the sum of a lightwave signal s (t ) and additive lightwave noise no (t ), the directly photodetected electrical signal r (t ) is (cf. (1.6.2)) r (t ) =

|s (t ) + n o(t )|2 + nshot (t ) + ne (t ), (9.7.1) where nshot (t ) is shot noise and ne (t ) is electrical noise added after photodetection. Expression (9.7.1) has three forms of noise. The additive lightwave noise n o(t ), such 1 2

as lightwave amplifier noise, is independent of the lightwave signal but becomes mixed with the signal in the square-law photodetection process, thereby leading to a form of mixing noise in the demodulated electrical signal. The shot noise nshot (t ) is signaldependent noise caused by random photon arrival times. The power density spectrum of this noise source is proportional to the lightwave power (cf. (6.7.8)). Finally, the additive electrical noise n e (t ) comes from the thermal noise and is added after photodetection. These three noise sources are discussed briefly here and in detail in Section 10.5.3. Each source of noise is characterized by a different probability distribution. When one form of noise is dominant, that distribution alone is sufficient to analyze the probability pe of a detection error. Otherwise, a numerical analysis may be required for an accurate representation of the combined effect of multiple noise distributions. A pragmatic alternative method, as implied in this section and used in Section 10.5, is to simply fit each form of noise with an approximating gaussian distribution and add the variances. For binary intensity signaling, the generic form of the probability of a detection error pe depends on the effective sample signal-to-noise ratio γ and is given by

454

9 The Information Channel

(√ )



Q

pe = 21 erfc γ /2 (cf. (9.5.27)), with γ = determined to fit the particular noise model. For additive noise, this form is exact. For signal-dependent shot noise and mixing noise, this form is an approximation. This method enables multiple noise sources to be compared because each noise source simply changes the effective value of γ . 9.7.1

Additive Electrical-Noise Model

When the only noise is white gaussian noise ne (t ) added to the electrical signal after direct photodetection, the probability pe of a detection error is determined by the integral under the tail of a conditional gaussian probability density function. The expression for pe for multilevel intensity modulation (√ is )given in (9.6.8). When L equals two, this reduces to the expression pe = 21 erfc γ /2 for binary on–off keying (cf. (9.5.30)). 9.7.2

Signal-Dependent Shot-Noise Model

When the only noise is shot noise, which is always signal-dependent, the probability of a detection error is determined by using a conditional Poisson probability mass function. The expression for pe for multilevel intensity modulation given in (9.6.15) reduces to the expression for binary on–off keying when L equals two (cf. (9.5.36)). This form of analysis must be used to obtain accurate results for small signal levels. For large signal levels, the Poisson probability mass function may be approximated by a gaussian probability density function with σ±2 = s± . Then working with the parameter ¸ ± = γ ± for the ±th symbol (cf. (9.5.28)) gives

Q

(9.7.2) Q± = σs±+1 +− σs± = √ss±+1 −+ s√± s = √s±+1 − √s± . ±+1 ± ±+1 ± A symmetric information channel with Q± = Q for all ± is generated by spacing √ the square roots s± uniformly. Then the amplitude values s± ∈ { A , 4A , 9A , . . .} are

quadratically spaced. This situation is shown in Figure 9.18(a). Indeed, when the values s± ∈ {0, A , 2A , 3A } are uniformly spaced, an asymmetric information channel is formed, with ± varying as a function of ±. This is a variation of the case shown in Figure 9.18(b).

Q

9.7.3

Signal–Noise Mixing Model

When the only electrical noise is generated by the lightwave amplifier noise no (t ) mixing with the lightwave signal s (t ) within the photodetector, the probability of a detection error is determined using a conditional distribution with a signal-dependent variance. The variance σ±2 is no longer simply equal to the mean value s± as would be the case for signal-dependent shot noise. The conditional probability density function for this signal–noise mixing model is discussed in Section 10.5.3. When the conditional probability density function is approximated by a gaussian probability density function, the value of for ideal photodetection mixing noise is given by

Q

9.10 Problems

√E − √E Q = σ + σ = √1 F 0 , 1 0 s1 − s0

455

(9.7.3)

NP

where E 1 is the mean number of photons for a mark, E0 is the mean number of photons for a space, and FNP is the noise figure of the lightwave amplifier (cf. (7.7.17)). The derivation of this expression is asked for as an end-of-chapter exercise. The simple form of (9.7.3) is a consequence of approximating the true conditional distributions with gaussian distributions. Although the noise sources are different, under the asserted approximation, (9.7.3) is a scaled version of the for an ideal shotnoise-limited system given in (9.7.2) for binary intensity signaling with ± set equal √ to zero. The value of is reduced by FNP to account for the lightwave amplifier noise.

Q

Q

9.8

References Optimal detection based on a continuous probability density function is presented in Helstrom (1968). The application of detection theory to communication systems is discussed in Benedetto and Biglieri (1999) and in Haykin (2001). Aspects of optimal detection based on the discrete probability mass functions encountered in optics are discussed in Saleh (1978). The introduction of the parameter into lightwave communications appears in Personick (1973a), where it is based on a binary symmetric channel. Detection theory applied to lightwave communication systems is covered in Gagliardi and Karp (1976), in Einarsson (1996), and in Kazovsky, Benedetto, and Willner (1996). Detection of point processes is discussed in Snyder (1975), Saleh (1978), and Gagliardi and Karp (1976). A discussion of optical matched filters is given in Humblet (1991).

Q

9.9

Historical Notes The matched filter for maximizing the signal-to-noise ratio was introduced by North (1943). Interference-free signal design for bandwidth-limited channels was introduced by Nyquist (1928). Partial-response signaling was proposed by Lender (1963, 1964) and generalized by Kretzmer (1966). It has seen widespread use in magnetic recording and communication systems. The displacement receiver was proposed by Kennedy (1973a).

9.10

Problems 1 Test for a Nyquist pulse in the frequency domain Prove that q t is a Nyquist pulse if and only if the Fourier transform Q f satisfies

()

∞ ±

n =−∞

( )

Q( f

where n ranges over the integers.

+ n) = 1

for | f |

< 1/ 2,

456

9 The Information Channel

2 Approximate form for the probability of a symbol error The large-argument expansion (valid asymptotically in x) of the complementary error function7 as x goes to infinity is

Ç ± È ∞ − x2 m 1 × 3 × ¸ ¸ ¸ × (2m − 1 ) 1+ (−1) . erfc( x ) ≈ √ e πx (2x 2)m m =1 1

Using this expression, show that pe (γ ) =

·¸ ¹ 1 γ /2 erfc 2

≈ √πγ1 /2 e−γ /2,

which is a form of (2.2.20). 3 Integrating sampler (requires numerics) (a) Compute Q f satisfying

( )

Q( f ) =

Q²( f ) sinc (T ² f )

for the raised-cosine pulse given in (9.2.8). This pulse is shown in Figure 9.4 as the target pulse Q ² ( f ). Do so for the following values of β: 0, 0.5, and 1. (b) Compute the inverse Fourier transform q(t ) of Q ( f ) for the same values of β . Note that for β = 0 the pulse should agree with Figure 9.5. (c) Directly calculate q (t ) in the time domain by convolving the pulses determined in part (b) with a rect function, thereby validating the results of part (b). 4 Photon counting Photon-counting demodulation of a binary intensity waveform is described by the expected number of counts E1 for a mark and the expected number of counts E0 for a space. (a) Derive an expression for the probability of a detection error pe . (b) Derive an expression for the probability of a detection error pe based on a gaussian approximation to the Poisson probability mass function. (c) Derive an expression for the difference in pe between part (a) and part (b) as a function of E0 and E1 . (d) Plot the discrepancy for E0 5 as a function of E1 over the range 10 100 . Comment on the relative error as a function of E1. (e) Repeat for other values of E0 and derive a practical rule for the validity of the gaussian approximation as a function of E1 and E0 .

=

[ ,

]

5 Exact and approximate thresholds (a) Derive the threshold expression for unequal 0 and 1 as given in (9.5.22). 2 (b) Show, for s1 s0 2 much larger than 2 12 0 log e 1 0 , that p1|0 and p 0|1 are approximately equal, which demonstrates that the channel is approximately a binary symmetric channel.

( − )

σ σ (σ −σ ) (σ /σ )

7 See equation (7.1.23) in Abramowitz and Stegun (1965).

9.10 Problems

457

(c) Suppose that s0 is equal to 100. A shot-noise-limited system with equal prior probabilities has its conditional probability density functions approximated by gaussian densities with ± given in (9.6.10b). Plot the relative error as a function of s1 between the probability of a detection error pe using the threshold given in (10.6.10a) and the probability of a detection error pe using a Poisson probability mass function.

Q

6 Gaussian probability density function with signal-independent and signal-dependent variances Let the expected sample value s1 when a mark is transmitted be equal to 200. Let the expected sample value s0 when a space is transmitted be equal to 20. The system has additive signal-independent gaussian noise characterized by 2 900, and signal2 dependent noise characterized by ± s± , where s± is the expected sample value. Using (9.5.27) and (9.5.28), determine the following. (a) The probability of a detection error pe and the threshold when only the signaldependent additive noise term is included. (b) The probability of a detection error pe and the threshold when only the signalindependent additive noise term is included. (c) The probability of a detection error pe and the threshold when both noise terms are included. (d) According to this analysis, which noise source is more significant?

σ =

σ =

³

³

³

7 Minimum distance The minimum distance dmin , defined as the smallest distance between two possible symbols at the receiver, depends on the method of demodulation. (a) Derive an expression for dmin for antipodal modulation using balanced photodetection. (b) Derive an expression for dmin for on–off-keyed modulation using direct photodetection. (c) For the same lightwave power, which method of demodulation produces the largest value of dmin ? (d) Is this comparison meaningful without considering noise? 8 Thresholds for a multilevel system (a) A multilevel system with L levels, with ± being a constant, is indexed by . Show that, for this system, ± is a constant and that the minimum probability of a detection error pe is achieved for uniformly spaced signal levels s± . s± , supposing that (b) Now consider an ideal shot-noise-limited system with ± the values of the square root s± of the expected signal levels are uniformly spaced. Show that for this system ± is again a constant that does not depend on . (c) Show that the uniform spacing of the values of the square root of the signal levels for a shot-noise-limited system produces the minimum probability of a detection error pe .

σ =σ

γ



±

±

σ =√

γ

458

9 The Information Channel

9 Detection filters This problem compares the performance of an integrating detection filter with the performance of a matched filter for a pulse as given by

p(t ) = A cos

¼πt ½ T

for |t | ≤

T . 2

The matched filter y (t ) for the pulse p(t ) is p(−t ). (a) Find the value of A such that the detection filter y(t ) produces the same expected sample value as a sample obtained by integrating over the interval T . (b) Using Campbell’s theorem (cf. (6.7.21)), determine the signal-dependent variance for the sample value using the cosine detection filter. (c) Using (9.4.7), determine the signal-independent variance. (d) Determine the ratio cos / rect for the following systems. i. A signal-dependent noise-limited system with σ±2 = s± . ii. A signal-independent noise-limited system with σ 2 being a constant. (e) What is the optimal filter impulse response when the most significant noise source is shot noise? (f) What is the optimal filter impulse response when the most significant noise source is signal-independent noise?

Q Q

10 Whitened matched filter Show that for a noise power density spectrum N f , the matched filter should be replaced by a filter with transfer function

( )

Y( f ) =

S ∗( f ) . N( f )

This filter is called the whitened matched filter. 11 Detection of a pulse in partially coherent noise This problem studies the detection of a pulse in partially coherent circularly symmetric gaussian noise indirectly by analyzing the detection of a pulse from N independent intensity measurements. (a) Detection of a pulse from N measurements of the intensity of a signal in circularly symmetric gaussian noise consists of two hypotheses:

H0 : r ±

= |n ±|2 , ± = 1, . . . , N , H1 : r ± = |s + n± | 2, ± = 1, . . . , N .

Hypothesis H0 is characterized by N independent Rayleigh distributions. Hypothesis H1 is characterized by N independent ricean distributions. State the likelihood ratio. ∑ N −1 log I (s √r /σ 2) (b) Show that the optimal decision rule compares ² = ± 0 ±=0 with a threshold ³, where I0 (x ) is the modified Bessel function of the first kind. ∑ −1 r (c) Show that for small signal-to-noise ratio, the test reduces to comparing N ±=0 ± with a threshold. (d) What does this problem imply about Figure 6.6?

459

9.10 Problems

12 Optimal detection filter for combined signal-independent and signal-dependent noise Let z t y T t be the desired impulse response for a detection filter that maximizes for a system that has a combination of signal-independent noise and signal-dependent noise. Let the two received pulses at the input to the detection filter be s1 p t and s0 p t . (a) Using (9.4.6) and the expression for for a sample with both additive noise and shot noise, S1 S0 dmin

() = ( − ) Q ()

()

Q

Q = σ +σ = ¸ 2 −¸ 2 , 0 1 σ + S1 + σ + S0

(b) (c)

(d) (e) (f)

derive an expression for the numerator of the sample signal-to-noise ratio in terms of p(t ), z(t ), and si for i = 0, 1. In this expression, the Si are the sample values from the signal after the detection filter, and σ 2 is the variance of the filtered additive white gaussian noise with a (two-sided) power density spectrum N 0/2. Repeat for the denominator. Replace z (t ) by z (t ) + δ x (t ) in the expression for the denominator of , where x (t ) is an arbitrary function. Expand the square-root functions for the noise terms, keeping only terms up to first order in δ , using σi = (σ 2 + si )1/2 in the final expression. Note that this term depends on z (t ). 1 −1 Define − δ = σδ rδ as the inverse of the perturbed form of , where σδ = σ1δ + σ2δ . Determine an expression for each perturbed noise term σ±δ. . Determine an expression for the inverse of the perturbed signal rδ−1 = (r 1δ − r0δ )−1 and expand this term keeping only terms of order zero and order δ . 1 −1 equals zero, the expression is at a stationary point with When P = − δ − respect to the functional form of z (t ). Show that P can be written as

Q

Q

Q

Q

Q

P

µ∞ x (t )Y (t )dt = 0, = r −δ r 1 0 −∞

where Y (t ) = σ1−1 (s1 p(t ) + N0/ 2)z (t ) + σ 0−1 (s0 p(t ) + N 0/2)z (t )

− (s1 − s0)p(t )Q−1 .

(g) If P is to equal zero for an arbitrary choice of x (t ), then Y (t ) must equal zero. Using the relationship, show that an implicit expression for the filter function z(t ) is

(s1 − s0) p(t ) (s1σ0 + s0σ1 ) p(t ) + ( N0/ 2)(σ0 + σ1 ) = K ² p(pt )(t+) b ,

z (t ) = K

giving the expressions for K ² and b. This expression has the same form as (9.4.18), but it gives z (t ) implicitly because σi depends on z (t ).

460

9 The Information Channel

13 Probability density function of an arbitrary detection filter Let j t be a set of orthonormal basis functions determined from the eigenfunctions of an appropriate eigenvalue problem as discussed below. (a) Following the same steps as were used to derive (6.6.7) and replacing the integration over the symbol interval of duration T with a detection filter y t , show that the filtered output signal r t can be written as

{ψ ( )}

r (t ) =

() ∞ ∞± ± j =1 k =1

()

b j b∗k

µ∞

−∞

ψ j (t )ψk∗ (t ) y(t )dt .

(b) How would one impose a modified orthogonality condition on the set of basis functions that can accommodate an arbitrary detection filter? This orthogonality condition must account for the waveform r (t ) used for sampling being generated by the convolution of the demodulated electrical waveform with the detection filter y (t ) instead of an integration over the symbol interval T . Show that the modified orthogonality condition on the set of basis functions {ψ j (t )} can be written as

µ∞

−∞

ψ j (t )ψk∗ (t ) y(t )dt = δ jk .

(c) Use the modified orthogonality condition to show that (cf. (6.6.1)) bk

(d)

=

µ∞

−∞

ψk∗ (t ) y(t )n (t )dt .

À Á Form b b∗ , using the result from part (b) to yield j k

µ

T/2

− T/2

R n (t 1 − t2 )ψk (t2 ) y(t2 )dt2

= λk ψk (t1 ).

(e) Show that when y(t ) = rect(t / T ), this expression reduces to (6.6.4). 14 Sensitivity of the probability of a detection error This problem quantifies the sensitivity of the probability of a detection error pe to the value of . (a) Using (9.5.30), determine the value of that produces pe 10 −9. (b) Determine pe when the value of determined in part (a) is halved, and comment on the result. (c) Let . Expand the approximate expression for pe given in 0 (9.5.31), keeping only terms up to the first order in . (d) Using this expansion and a nominal value of 0 9, determine the change in the probability of a detection error when the value of changes by 5%.

γ

γ

γ = γ + δγ

γ

=

δ γ =

(γ )

γ

15 Signal-dependent noise terms For spontaneous-emission noise, the scaling constant C for the signal-dependent variance generated from the mixing of the lightwave signal and the spontaneousemission noise within a square-law photodetector is not equal to one as it was for signal-dependent shot noise.

461

9.10 Problems

(a) Using (6.7.21b) and (7.6.4), show that, for an avalanche photodiode, C

ÀG Á2 F, where F is the excess noise factor.

=

ÀG2 Á =

(b) Using (7.7.17) and solving for the power density spectrum Nsp of the spontaneous-emission noise in terms of the noise figure, show that C ≈ G 2 FNP for a lightwave amplifier in which the mixing of the signal and the spontaneous emission noise in the direct-photodetection process is the most significant noise contribution. 16 Unequal prior probabilities Consider two systems. The first system defines the threshold using the ratio of the posterior probability density functions u r given in (9.5.7) based on a known prior. The second system defines the threshold using the likelihood ratio r . (a) Derive an expression for the relative error in the probability of a detection error using r compared with using u r as a function of the ratio r of the prior probabilities p0 p1 when the two conditional probability density functions are gaussian probability density functions with mean values of s0 and s1 , and unit variance. (b) Plot the relative error over the interval 1 r 10 for s0 1 and s1 equal to (i) 2, (ii) 3, and (iii) 6. Comment on the result with regard to the dependence of the relative error on the prior probability ratio and the signal-to-noise-ratio.

()

λ( )

/

λ( )

()

<
(2r − 1)/ r (cf. (14.5.3)). Within a continuous-energy description of an additive white gaussian noise channel, this requirement for small spectral rate efficiency to allow small E b / N0 cannot be circumvented. Within a discrete-energy photon-optics description of a

476

10 Modulation and Demodulation

lightwave channel with no additive noise, however, this minimum requirement does not apply. This difference is discussed in detail in Section 14.5.

10.2

Phase-Synchronous Demodulation A passband channel with a phase-synchronous demodulator is treated by the modulation and demodulation process as a complex-baseband channel. The probability of a detection error is derived in this section for several modulation formats used on an additive white-gaussian-noise complex-baseband channel with a matched filter and with no intersymbol interference. The probability of a detection error of a modulation format based on a complex signal constellation depends in large part on the minimum distance dmin of the signal constellation. The resulting expressions provide a reference for other modulation formats with other forms of impairment and noise.

10.2.1

Demodulation of Binary Formats

The probability of a detection error in additive complex circularly symmetric gaussian noise for a binary signaling format with equal prior probabilities using a matched filter is given by (9.4.14a) and repeated here as pe

= 21 erfc

¸½

2 / 4N d10 0

¹

.

The power density spectrum N 0 of the additive gaussian noise is given in (8.2.13). Using the relationship between the distance and the expected energy per bit E b shown in Figure 10.3 gives (cf. (9.4.14b))

⎧ 1 erfc »²E / N ¼ ⎪⎪ 2 b 0 ⎨ 1 »² ¼ pe = 2 erfc E b /2N0 ⎪⎪ ⎩ 1 erfc »² Eb /2N 0¼ 2

(BPSK)

(10.2.1a)

(Orthogonal)

(10.2.1b)

(On–off Keying).

(10.2.1c)

More complicated variants of these expressions will be developed for more complicated modulation formats and for other noise models.

10.2.2

Demodulation of Multilevel Real-Valued Formats

The probability of a detection error for a multilevel real-valued signal constellation is discussed in Section 9.6. For pulse-amplitude modulation with L levels, pe is 2 determined by substituting γ = dmin /2N0 (cf. (9.4.13)) into (9.6.8), which gives pe

=

L −1 erfc L

¸½

2 dmin

¹

/ 4N 0 ,

where the minimum distance dmin is shown in Figure 10.7.

(10.2.2)

10.2 Phase-Synchronous Demodulation

477

Multilevel pulse-amplitude modulation will be compared with binary antipodal modulation with the bit energy E b for the two modulation formats constrained to be equal. Each binary symbol encodes one bit, and each multilevel symbol encodes log2 L bits. A multilevel signaling pulse with energy E that conveys log 2 L bits uses a normal2 . ized energy per bit of E b . Substitute E = Eb log2 L into (10.1.13) and solve for dmin 2 into (10.2.2) gives Substituting the resulting expression for dmin pe

=

L −1 erfc L

¾¿

3 log2 L E b L 2 − 1 N0

À

(10.2.3)

for the probability of a symbol error. For L = 2, this expression reduces to the antipodal bit error rate pe given in (10.2.1a). For L larger than two, the relationship between the symbol error rate and the bit error rate depends on the encoding of bits into symbols. For Gray-coded symbols, the bit error rate is approximately pe / log2 L (cf. (9.6.11)). Considering this effect produces a factor that is outside the argument of the erfc function, so this factor is of less significance. Uncoded multilevel signaling increases the ratio of the transmitted bit rate to the spectral bandwidth, but, for a fixed pe , does so at the cost of a larger energy per bit. To see this, compare (10.2.3) with (10.2.1a). To make the arguments of the erfc function equal in the two cases, the ratio E b/ N0 must increase by 10 log10[( L 2 − 1)/(3 log2 L )] decibels either by increasing Eb or by decreasing N 0. This factor is the primary energy penalty. It can be offset by coding, as described in Chapter 13. There is also an energy adjustment needed to offset the coefficient outside the erfc function, but this effect is much smaller and often ignored in initial comparisons.

10.2.3

Detection of Multilevel Complex-Valued Formats

The detection of a point of a complex signal constellation is an extension of the detection of a point of a real signal constellation discussed in Section 9.5. For the simple case of hard-decision detection, each noisy complex sample is mapped into the closest complex point of the signal constellation. This mapping can be described as a partition of the two-dimensional complex plane into the L decision regions R± corresponding to the L complex points of the signal constellation. This partitioning of the complex plane is more elaborate than the partitioning of the real line. As for the scalar case studied in Chapter 9, the probability pe of a symbol detection error for the complex-baseband sample r in gaussian noise is minimized by minimizing the exponent of the gaussian function |r − s± | 2. The minimization is over the set of complex points {s± } in the signal constellation. Working with the vector form of the log-likelihood function L ² (s± ; r) (cf. (9.5.13)) and discarding the constant N 0 because it does not affect the minimization gives

L² (s±; r) = −loge ²(s±; r) = |r − s± |2 .

(10.2.4a)

Neglecting | r| 2 because it also does not affect the minimization, the log-likelihood function can be reduced to

478

10 Modulation and Demodulation

L² (s±; r) = |s±| 2 − 2 Re[r · s±].

(10.2.4b)

Ás± = argmin±|r − s±|2 .

(10.2.5)

For the case of PSK, the term | s± |2 can be ignored because each point of the signal constellation has the same magnitude. For this case, minimizing the squared euclidean distance requires maximizing the correlation Re[r · s± ]. For other signal constellations, the term | s± |2 must be retained. Section 9.4.2 showed that a matched filter maximizes the sample signal-to-noise ratio γ . Therefore, for an additive white gaussian noise channel, maximizing the signalto-noise ratio γ is equivalent to minimizing the probability of a detection error pe . This equivalence is not true for nongaussian noise or noise that is signal-dependent. Minimizing pe in additive gaussian noise requires evaluating the squared euclidean distance | r − s± |2 between the received complex symbol r and each possible transmitted symbol s± , with the most likely transmitted symbol Á s± given by1

The more likely of the two symbols s± and s± ² is determined by which of the two half-planes the symbol r lies in, with the two half-planes defined by the perpendicular bisector of the line joining s± and s± ² . The intersection of all half-planes corresponding to all neighbors of the point s± defines the decision region for s± . This will be discussed in the next subsection. For the case in which the prior probabilities are known and unequal, this minimization is replaced by minimizing a weighted distance, with the weights determined by the prior. Minimizing the euclidean distance between r and s± given in (10.2.4a) is intuitively appealing and provides significant insight. Because of its robustness, it can be used even when optimality is not an issue. Instead, for channels with nongaussian noise, minimizing the probability of a detection error must be expressed in terms of minimizing the negative logarithm of the likelihood function, which is given by

.

d2 (r, s± ) = L ² (s± ; r) = −loge ²(s± ; r).

(10.2.6)

It is suggestive in such a case to call d(r, s± ) a distance2 which is to be minimized. Although minimizing d (r, s± ) is optimal when the noise is well modeled, using this metric for detection can be sensitive to flaws in the model, with robustness then being an issue.

Bounding the Probability of a Detection Error

For a complex signal constellation in gaussian noise, the decision regions, called Voronoi regions,3 are the generalization to the complex plane of the decision regions on the real line defined in Section 9.5.1. Decision region R± is the intersection of L − 1 halfplanes of the complex plane. Each half-plane is formed by the perpendicular bisector

(·) is the symbol s± in the signal constellation for which |r − s±|2 is a minimum. 2 While the word “distance” is used here to describe this term, it does not generally have the formal properties of a geometric distance. 3 Whimsically called “nearest post office” regions.

1 The output of the function argmin ±

10.2 Phase-Synchronous Demodulation

479

of the line connecting s± to one of the other points of the signal constellation. These regions are shown in Figures 10.8(c)–(e) for each pair of signal points of a four-point signal constellation that includes the point s1 . Each pairwise bisector establishes an error probability that the corresponding neighbor is more likely than the correct point. The union bound, described below, states that the sum of these pairwise error probabilities is an upper bound on the probability of a detection error. The union bound is a basic inequality of probability theory. Given a set of points and a probability distribution on that set of points, define the event E1 as any specified subset of these points. Define the probability p(E1) as the sum of the probabilities of all points in that subset. A second event E2 has a probability p(E2 ) defined in the same way. The probability of the union of these two events, denoted p(E1 ∪ E2 ), is the sum of the probabilities of all points that are in either or both of E1 and E2 . Clearly, any point that is in both E1 and E2 has its probability added into p(E1 ∪ E2) only once, but the probability of that point is added into both p(E1 ) and p(E2). Therefore p(E1 ∪ E2 ) ≤ p(E1) + p(E2 ), with equality only when the events E1 and E 2 have no common points and so are described as disjoint. In general, this discussion extends to the statement

¾Â À ± K K p Ek ≤ p(Ek ). k =1

(10.2.7)

k =1

This inequality is known as the union bound. Now suppose that the complex point si is transmitted. Let Vi j be the event that the point s j is more likely than si conditional on the channel output r, and thus generates a detection error. Using the union bound, the probability that some point s j for j ³ = i is more likely than si satisfies

⎞ ⎛ L −1 LÂ − 1 ⎟⎟ ± ⎜⎜ p(Vi j ). Vi j ⎠ ≤ p(e|si ) = p ⎝ j =0 j ³=i

j =0 j ³= i

(10.2.8)

The unconditioned probability pe of a symbol detection error for a random choice of si using an equiprobable prior is pe

=

±1 L −1 i =0

L

p(e|si ) ≤

1 L

±±

L −1 L −1 i =0 j =0 j ³= i

p(Vi j ).

(10.2.9)

This expression is depicted graphically in Figure 10.8 for a constellation of four signal points. This figure is an excellent illustration of both the power and the weakness of the union bound. The exact probability of a detection error is the integration of the conditional probabilities over the shaded region shown in Figure 10.8(b). The sum of the three probabilities p(Vi j ) derived from the shaded regions shown in Figures 10.8(c)–(e) is an easily stated upper bound. Each of the half-plane regions in Figures 10.8(c)–(e) is easy to evaluate. That being said, there is a great deal of double-counting. In fact, the term arising in Figure 10.8(d) is completely superfluous and can be easily omitted from the union bound. Yet, in a more complicated problem, especially sequence detection in

480

10 Modulation and Demodulation

S2

S1

S3

S4

(a)

S2

S1

S3

S4

V12

V13

V14

(b)

S2

S1

S1

S1

V13

V14 S4

S3

V12 (c)

(d)

(e)

Figure 10.8 (a) A four-point signal constellation in additive gaussian noise along with the optimal

decision regions. (b) The probability of a detection error pe when the symbol s1 is transmitted is the probability that r lies within the shaded region. (c)–(e) The pairwise error probabilities p(Vi j ) of transmitting s1 and receiving s2 through s 4. The sum of these probabilities upperbounds the error depicted in (b).

Chapters 11 and 13, it may be tedious or impractical to recognize the superfluous regions in the union bound, and so some regions may be redundantly included or incorrectly ignored. Decision regions that satisfy the minimization given in (10.2.5) and the corresponding probability of a pairwise detection error p (Vi j ) can be determined by choosing, for each pair of points, a coordinate system with the origin midway between the two points and with the x axis along the direction connecting the two points as is shown in Figure 10.9(a). In this coordinate system, the y axis defines the boundary between the pairwise decision regions, with each circularly symmetric gaussian probability density function equal to the product of two one-dimensional gaussian distributions. The probability p(Vi j ) that the point s j is more likely than si and thus generates a detection error is the integral over the pairwise decision region for s j . In this coordinate system, the integral in the y direction runs from −∞ to ∞ and integrates to one. The remaining integral in the x direction is the same form as (9.5.18) and using σ 2 = N 0/2 gives p(Vi j ) =

.

»½ 2 ¼ 1 erfc di j /4N0 , 2

(10.2.10)

where di2j = |si − s j |2 is the squared euclidean distance between the two points. Substituting (10.2.10) into the union bound (10.2.9) gives pe



1 2L

±±

L −1 L −1 i =0 j =0 j ³=i

erfc

»½

di2j /4N0

¼

.

(10.2.11)

10.2 Phase-Synchronous Demodulation

y

(a)

481

(b)

x

2

1

3

4

j

i

Figure 10.9 (a) Two circularly symmetric gaussian probability densities functions centered at si

and s j , the two decision regions, and the coordinate system used to evaluate the probability of a detection error. (b) Decision regions for QPSK.

Because the erfc function is monotonically decreasing, an upper bound for each term of (10.2.11) can be obtained by substituting dmin (cf. (9.2.1)) for di j into the argument of the erfc function given in (10.2.11). Now the two summations are over L ( L − 1) identical terms so that pe



L−1 erfc 2

¸½

2 dmin

¹

/4N0 .

(10.2.12)

A tighter but imprecise form of (10.2.11) can be derived by recalling that erfc (x ) ≈ 2 e− x for large x (cf. (2.2.20)), which is a rapidly decreasing function of x. This means that the most significant error terms in (10.2.11) are those for which d is equal to dmin , so the transition from (10.2.11) to (10.2.12) can be rather conservative. It can be pragmatically tightened simply by discarding the smaller terms, accepting that the result is then only an approximate bound. To this end, let n± be the number of neighbors of the ± th point at distance dmin . With 2 2 di j replaced by dmin , the inner summation in (10.2.11) is dominated by the n± identical terms so that (10.2.11) is approximated by pe



±

L −1

²

1 2L

²

n erfc 2

n ± erfc

±=¸ 0

½

2 dmin

¸½

2 dmin

¹

/4N 0 ,

/4N0

¹

(10.2.13)

L −1 where n = (1/ L ) ±= 0 n ± is the average number of points at distance d min. To recover (10.2.12), the distance to every neighboring point must be n, which is impossible for a set of complex signal points when L is larger than three. For every other case, the error terms on the right side that are not at distance dmin are neglected, so (10.2.13) as such is only an approximate upper bound. For a large signal-to-noise ratio, dmin is large, pe is small, and (10.2.13) approaches the true error probability. The form of (10.2.13) provides insight into the properties of a good signal constellation. Because dmin is inside the erfc function whereas n is outside, maximizing dmin is the primary design objective, with the mean number of nearest neighbors being of secondary importance. For other channels that have a combination of additive gaussian noise, signal-dependent noise, and phase noise, and possibly with unequal prior

482

10 Modulation and Demodulation

probabilities, (10.2.13) might not be an appropriate approximation, and the probability of a detection error must be determined using a different method. Statement (10.2.13) regarding the probability of error for symbol-by-symbol detection is a forerunner of a statement on the probability of error for the detection of sequences, which is discussed in Section 11.3.

Quadrature Amplitude Signaling

The probability of a symbol detection error for a square quadrature-amplitudemodulated (QAM) signal constellation can be determined by observing that the QAM signal constellation is two pulse-amplitude-modulated (PAM) signal constellations in √ phase quadrature. Each one-dimensional PAM signal constellation has M = L signal points, with an expected energy per symbol that is half the expected energy per symbol of the QAM signal constellation. Accounting for this factor of two and inverting 2 between two adjacent signal points for (10.1.13), the squared minimum distance dmin one quadrature component is

=

2 dmin

Using E

6E = − 1 L − 1.

6E M2

(10.2.14)

2 into (10.2.13) yields = Eb log2 L and substituting dmin ⎛¿ ⎞ ¾¿ À 2 dmin n ¯ n¯ 3 log L E b 2 ⎠ = erfc pe ² erfc ⎝ , 2 4N 2 2( L − 1) N 0

(10.2.15)

0

where n¯ is the mean number of nearest neighbors. Plots of pe versus E b / N0 for several values of L are shown in Figure 10.10(a). Because both scales are logarithmic, the curves for larger values of L can be obtained from the curves for L = 2 by first translating horizontally to account for the change in the argument of the erfc function, then moving the curve vertically to account for any change in the mean number of nearest neighbors multiplying the erfc function. For L > 4, the approximate probability of a detection error for QAM is (a)

(b)

0 –2

L= 32

–2 L = 16

L = 64

–6

L=2

–4

goL

e p goL

L= 4

ep

L=2

–4

L=4

L= 8

L= 16

–6

–8 –10

0

–8 0

5

10 Eb /N 0 (dB)

15

20

–10

0

5

10 E b/N 0 (dB)

Figure 10.10 (a) Union bounds on the probability of a symbol detection error for

15

quadrature-amplitude modulation. (b) Probability of a symbol detection error for multilevel phase-shift keying.

20

10.2 Phase-Synchronous Demodulation

pe

≈2

¾√

L −1



À

¾¿ erfc

L

3 log2 L E b 2(L − 1) N0

À

483

.

The derivation of this expression is asked for as an end-of-chapter exercise.

Multilevel Phase Signaling

Because phase-shift keying is a symmetric format, the probability of a correct decision pc (±) in additive gaussian noise does not depend on ±. The exact expression for pc is pc

=

³ π/ L

−π/ L

f φ (φ)dφ,

where f φ (φ) is given in (2.2.35). The probability density function f φ (φ) of the phase is plotted in Figure 2.7(b). The

à Ä2

à Ä

shape of f φ (φ) depends on the parameter F = A /2σ 2 , where A is the mean amplitude and σ 2 is the variance of one component of a circularly symmetric gaussian random variable. For a matched √ filter output, the mean value is equal to the square root of the symbol energy E , with the variance σ 2 given by N 0/2, so that . Ã Ä2 F = A / 2σ 2 = E / N0 . The exact probability of a detection error for L -ary phase-shift keying is obtained by using (9.6.2) and setting pc (±) equal to pc for each ±, so that pe

=1−

1 L

L ±

±=1

pc

= 1 − pc .

(10.2.16)

When L = 4, the modulation format is quadrature phase-shift keying (QPSK). This modulation format corresponds to two BPSK signals in phase quadrature, each with an energy per bit given by E b = E /2. The exact expression in (10.2.16) can be approximated by using the union bound. For L -ary phase-shift keying, the minimum distance dmin is determined using the geometry shown in Figure 10.1(d). This is dmin where φL

=2



E sin(π/L ),

(10.2.17)

= 2π/ L. Substituting E = Eb log2 L into (10.2.17) yields ² dmin = 2 E b log2 L sin(π/ L ).

(10.2.18)

Using (10.2.13) for L larger than two with the number of nearest neighbors n equal to two gives pe

≈ erfc

¾¿

Eb log 2 L sin(π/ L ) N0

À

(10.2.19)

as the approximate probability of a detection error for L -ary phase-shift keying.

484

10 Modulation and Demodulation

10.2.4

Demodulation with Phase Noise

Phase-synchronous demodulation requires that the phase of the carrier be known. Methods to estimate carrier phase are described in Chapter 12, but the phase is never estimated perfectly. The phase reference has a time-varying residual phase error, φe (t ), called phase noise. This phase noise, in effect, causes a time-varying rotation eiφe (t ) of the signal constellation in the complex plane. In the presence of phase noise with no other impairments, the random complex sample value r for a received pulse p(t ) at the peak output of a matched filter is (cf. (9.4.11a)) r

=

³

T

0

s± | p(t )| 2ei φe ( t ) dt .

(10.2.20)

When the phase noise φe (t ) is approximately constant over the symbol interval T , the phase noise in each symbol interval can be treated as a random phase error denoted by the random variable φe . This phase error produces a random rotation as is shown in Figure 10.11. For binary phase-shift keying, a rotation eiφe reduces the projection of the euclidean distance d between two signal points onto the real line by cos φe as is shown in Figure 10.11(a). Using (10.2.1a) and setting E = E b , where E b is the mean energy per bit, the conditional probability p(e|φe ) of a detection error is p(e|φe ) =

1 2

erfc

»²

E b / N 0 cos φe

¼

.

The unconditioned probability of a detection error is then determined by averaging over the probability density function f (φe ) for the random phase error φe . For a phase error that is not too large, the probability density function f (φ e ) for the phase error is well approximated by a zero-mean gaussian probability density function with variance σe2, as will be shown in Section 12.2. The unconditioned probability of a detection error is Rotated signal with a phase error

Error when

| φe | > π

2



E

Decision boundary

φe





- E

E



E cos φe

(a)

(b)

Figure 10.11 (a) When the phase noise is slowly varying, it produces a random rotation of the

signalÅconstellation of a symbol interval T . (b) An error occurs for binary phase-shift keying Å ÅÅ when Å φe Å > π/2.

10.2 Phase-Synchronous Demodulation

(a)

(b) 40o

–2

30o

–4

20o

–6

15o

–8

–10 –12

rorrE tiB a fo ytilibaborP goL

rorrE tiB a fo ytilibaborP goL

0

0o 0

5

10o

10 Eb /N 0 (dB)

0

20 o

–2

15 o

–4

10 o

–6

7.5o

–8

–10

15

–12

20

485

0o 0

5

10 Eb /N0 (dB)

5o 15

20

/

Figure 10.12 (a) Probability of a bit error for BPSK as a function of E b N 0 for several values of the root-mean-squared phase error e expressed in degrees. (b) Probability of a bit error for

σ

QPSK with phase noise.

pe

=

³∞ −∞

p(e|φe ) f (φe )dφe

= √1 2 2πσe

³∞

−∞

e−φe / 2σe erfc 2

2

»²

¼

E b / N0 cos φe dφe .

(10.2.21)

The probability of a detection error for BPSK given in (10.2.21) is plotted in Figure 10.12(a) for several values of the root-mean-squared (rms) phase error expressed in degrees. Figure 10.12(a) shows that, for BPSK at a bit error rate of 10 −10, an rms phase error of 10◦ can be offset by increasing the signal-to-noise ratio by about 1 dB. However, at a bit error rate of 10−10 , an rms phase error of 15◦ cannot be offset by any reasonable increase in the signal-to-noise ratio.

Error-Rate Floor

In the limit of additive noise going to zero, a detection error for binary phase-shift keying due to phase noise occurs when the phase rotation is greater than π/2 as is shown in Figure 10.11(b). Modeling the phase error in a sample as a gaussian random variable with variance σe2 , the probability of a detection error in the absence of additive noise is pe

=1− √

1

³ π/2

2πσe −π/2

2 2 e−φe /2σe dφ = erfc

¾

À π/ 2 ² 2 , 2σe

(10.2.22)

where (2.2.19) has been used. The limiting error probability due to phase noise describes an error-rate floor. As the additive signal-to-noise ratio increases, the probability of error pe given by (10.2.21) asymptotically approaches the error-rate floor given by (10.2.22) and becomes flat. The error-rate floor can be seen in Figure 10.12(a) for an rms phase error σ e of 20◦. For a fixed rms phase noise σe that is independent of the signal-to-noise ratio, this error-rate floor cannot be reduced by increasing the signal energy because the signal energy does not directly affect the phase error. Indirectly, however, the phase error can be decreased by increasing the signal energy if the phase is estimated from the received modulated signal. When the phase is so estimated, which is discussed in Section 12.2, increasing the signal energy reduces the variance of the estimated phase error and thereby indirectly reduces its effect on the probability of a detection error.

486

10 Modulation and Demodulation

10.2.5

Demodulation with Shot Noise

Shot noise in the demodulated electrical signal is a form of quantum noise. It has several consequences in the semiclassical theory of lightwaves that can be significant. Quantum noise arises from fundamental energy fluctuations in a mode (cf. (6.1.3)). This form of noise is fully described within quantum optics as is discussed in Chapter 15. Within semiclassical optics, “counting” shot noise, which occurs when a lightwave signal is directly photodetected, may be viewed as a form of noise caused by random photon arrivals (cf. Section 6.2.3). In contrast, this section discusses “mixing” shot noise, which is generated by balanced photodetection. The two forms of shot noise are two different manifestations of quantum noise described in Chapter 15. The shot noise generated by balanced photodetection has different properties than those of the shot noise generated by direct photodetection because balanced photodetection mixes two lightwave signals. Specifically, the quantum-optics description of balanced photodetection shows that the “counting” shot noise terms caused by direct photodetection are canceled out, leaving only the “mixing” shot-noise terms (cf. (15.5.4)). This section discusses a semiclassical description of “mixing” shot noise generated in balanced photodetection. When a balanced photodetector has a large mixing gain, the noise generated by either the spontaneous emission lightwave noise or the shot noise is typically much larger than the thermal noise. Therefore, the thermal noise added to the electrical signal after balanced photodetection is neglected, as is usually appropriate. Moreover, within a semiclassical description, the shot noise in the demodulated electrical signal is dominated by the shot noise due to the local oscillator. When a real signal is homodyne-demodulated, the shot noise is equivalent to half a photon of noise in the sample (cf. (8.2.17)). When a real or complex signal is heterodyne-demodulated, the shot noise is equivalent to one photon of noise in the sample (cf. (8.2.15)). These facts will be discussed in this section.

Binary Modulation Formats

To include shot noise in balanced photodetection, the expression for the power density spectrum from the spontaneous emission N0 = 2ei LO Nsp (cf. ((8.2.3)) is augmented by the expression for the power density spectrum from the shot noise Nshot = 2ei LO (cf. (8.2.12)). For a large mixing gain, the shot noise can be accurately modeled as an independent additive white gaussian noise process. Combining these two noise sources gives (8.2.13), which, repeated here, is N0

= Nsp + Nshot = 2ei (Nsp + 1), e

LO

where N sp is the photon noise count from spontaneous emission.

Homodyne Demodulation For binary phase-shift keying with homodyne demodulation, the mean energy per bit E b is equal to the pulse energy E p . Recall (8.2.16), which states that E p / N 0 = Ep /(Nsp + 1/2). Substituted into (10.2.1a) and replacing Ep with Eb , this gives

10.2 Phase-Synchronous Demodulation

pe

=

1 erfc 2

¾¿

À

Eb

N sp

487

(homodyne BPSK),

+ 1/2

(10.2.23)

where Eb is the mean number of detected photons in a bit and Nsp is the mean number of noise photons given in (7.7.7). Neglecting the one-half in the denominator of (10.2.23) – which is due to “mixing” shot noise – recovers the wave-optics expression (10.2.1a) with N sp ≡ N0 . Neglecting the spontaneous emission term N sp compared with one-half gives pe

»² ¼

≈ 21 erfc

2Eb

(homodyne shot-noise limit)

(10.2.24)

≈ √ 1 e−2E , 2π Eb b

where the approximation (2.2.20) is valid for large expected values. This expression is the shot-noise limit as the spontaneous emission goes to zero for the homodyne demodulation of binary phase-shift keying. It shows that the effect of the shot noise for balanced photodetection is the equivalent of half a photon.

Heterodyne Demodulation For heterodyne demodulation of binary phase-shift keying, because of the presence of an image mode (cf. Section 7.3.2) centered at frequency 2 f LO − f c , an expression different than (10.2.23) is obtained. Any input signal or noise within an image mode will mix into the output signal at frequency f IF , thereby impairing the signal. The quantum-optics analysis of the image-mode noise, as given in Section 15.5.3, shows that even when the image mode contains no signal, the vacuum-state fluctuations in the image mode and the lightwave signal are mixed in a heterodyne demodulator, and so affect the shot noise in the down-converted signal at the intermediate frequency. The mixing of the image mode replaces the one-half by a one in the denominator of (10.2.23). Setting E p equal to E b, then substituting (8.2.14) into (10.2.1a) gives pe

=

1 erfc 2

¾¿

Eb Nsp

+1

À

(heterodyne BPSK).

(10.2.25)

The probability of a detection error when only shot noise is considered is determined by setting Nsp equal to zero. This is pe

≈ 21 erfc

»² ¼ Eb

(heterodyne shot-noise limit)

(10.2.26)

≈ √ 1 e−E . π Eb b

Comparing (10.2.24) and (10.2.26), the shot noise for ideal heterodyne demodulation is twice as large as the shot noise for ideal homodyne demodulation of only one signal component to a real-baseband signal because of the noise added by the vacuum-state fluctuations in the image mode. This is discussed in detail in Section 15.5.3. For additive spontaneous emission noise N sp that is much larger than one, the demodulator is limited by that additive noise. In this large-signal regime, homodyne and

488

10 Modulation and Demodulation

0

–5

Heterodyne

ep

Displacement receiver

goL

–10

Homodyne Photon counting (equi± probable prior)

–15

–20

2

4

6

8

Eb (dB)

10

12

14

Figure 10.13 The probability of a detection error for several shot-noise-limited binary modulation

formats.

heterodyne demodulation of binary phase-shift keying have the same performance. Otherwise, when N sp is small, the homodyne demodulation to a real-baseband electrical signal and the heterodyne demodulation to a passband electrical signal have different performance because of the different levels of shot noise. These different levels are caused by the different numbers of modes that mix to produce shot noise for each type of balanced detector. However, for the joint homodyne demodulation to a complex-baseband electrical signal, the level of shot noise is the same as heterodyne demodulation to a passband electrical signal, with the two demodulators having the same performance for all signal levels. This is discussed in the next subsection. The shot-noise-limited probability of a detection error for binary phase-shift keying given in (10.2.24) and (10.2.26) is plotted in Figure 10.13. Also plotted in that figure is the probability of a detection error for ideal binary photon counting based on direct photodetection with an equiprobable prior given in (9.5.36b). In addition, the probability of a detection error for a binary displacement receiver (cf. Section 9.5.4) is also plotted in Figure 10.13. Comparing ideal photon counting using direct photodetection with the homodyne demodulation of one signal component to a real-baseband signal using balanced photodetection, these two demodulation techniques are equivalent in the argument of an exponential function and so have comparable performance. 4 For large values of the mean number Eb of photons per bit, the expression for heterodyne demodulation given in (10.2.26) is shifted to the right by 3 dB because the argument of the exponential function given in (10.2.26) is smaller by a factor of two. The expression for a displacement 4 In systems with background noise, the performance of a photon-counting receiver degrades rapidly

compared with that of an ideal system with no background noise (cf. Figure 9.15).

10.2 Phase-Synchronous Demodulation

489

receiver is shifted to the left by 3 dB because the argument of the exponential function given in (9.5.44) is larger by a factor of two.

Joint Demodulation to a Complex-Baseband Signal A complex-baseband electrical signal can be generated by a first down-conversion to a passband electrical signal using heterodyne demodulation followed by a second down-conversion from the passband electrical signal to a complex-baseband electrical signal using homodyne demodulation of the electrical passband signal. Alternatively, a complex-baseband signal can be directly generated by the homodyne demodulation of both lightwave signal components. For either method of joint demodulation to a complex-baseband signal, additional shot noise is generated by the mixing process. Each couples a second mode into the mixed signal. For heterodyne demodulation, the additional shot noise is due to an image mode near the lightwave carrier frequency. The image mode in the second downconversion from the passband electrical signal to a complex-baseband electrical signal adds an insignificant amount of shot noise. This is because the quantum of energy for the intermediate electrical frequency is many orders of magnitude less than the quantum of energy at or near the lightwave carrier frequency. For joint homodyne demodulation of each lightwave signal component to form the complex baseband signal, the additional shot noise comes from a spatial mode in the beamsplitter. This beamsplitter is required to generate two signals that are separately homodyne demodulated to baseband. These two real signals are the in-phase and quadrature components of the complex-baseband signal. This kind of coupling is discussed in detail using quantum optics in Section 15.5.3. For either method of joint demodulation, the coupling of a second mode leads to a shot-noise term that is twice as large as the shot-noise term for the homodyne demodulation of one signal component to a real-baseband signal such as the demodulation of binary phase-shift keying. For this case, the shot noise is equivalent to half of a photon because there is not an additional mode involved in the demodulation process. When N sp is much greater than one, the shot noise can be neglected and either method of demodulation gives the same performance, with Ep N0

= NEp

sp

(10.2.27)

(cf. (8.2.14) and (8.2.16)). For this case, quantities expressed in terms of the lightwave energy are equivalent to quantities expressed in terms of the electrical energy (cf. Table 6.2).

Shot-Noise-Limited Minimum Distance

An approximate analysis of the shot-noise-limited performance of a modulation format based on a signal constellation uses the minimum distance dmin of that signal constellation (cf. Figure 10.7). As before, the effect of the shot noise depends on whether the demodulated signal is a real-baseband signal or a complex-baseband signal.

490

10 Modulation and Demodulation

Demodulation to a Real-Baseband Signal Consider the homodyne demodulation of a real-valued multilevel modulation format such as pulse-amplitude modulation characterized by dmin used on a channel that has no additive noise. Substitute the equivalent of half of a photon, which is Nshot = 1/ 2 (cf. (8.2.17)), into (10.2.13). This gives pe

≈ ≈

¸½

¹

n 2 erfc dmin /2 (shot-noise-limited homodyne) 2 2 n e−dmin /2 , √ dmin π/2

(10.2.28a)

where (2.2.20) is used for the second expression. For binary antipodal demodulation √ n = 1 and dmin = 2 Eb (cf. Figure 10.3(a)), and (10.2.28a) reduces to (10.2.24). For heterodyne demodulation, the shot-noise-limited power density spectrum is doubled to N shot = 1 because of the image mode noise. Substituting Nshot = 1 into (10.2.13) gives pe

≈ n2 erfc

¸½

¹

2 dmin /4

≈ (d /n2)√π e−d min

2 min

(shot-noise-limited heterodyne)

/4 .

For binary antipodal demodulation n (10.2.26).

(10.2.28b)

= 1 and dmin = 2√Eb, and (10.2.28b) reduces to

Joint Demodulation to a Complex-Baseband Signal Because joint demodulation to complex baseband either by first using heterodyne demodulation or by immediately using homodyne demodulation has the same shotnoise-limited performance, the approximate shot-noise-limited probability of a detection error pe for a modulation format based on a complex signal constellation is the same as the expression for heterodyne demodulation given by (10.2.28b) with dmin evaluated on the complex plane instead of the real line.

Limitations of a Semiclassical Analysis

For ideal homodyne demodulation to a real-baseband signal, the mean number of shotnoise photons N0 per mode generated by balanced photodetection is equivalent to half of a photon, which is the minimum possible quantum noise (cf. Section 6.1.2). In contrast, ideal joint demodulation of both signal components does not produce the minimum amount of noise allowed by quantum optics in each quadrature signal component, considered separately. This statement must be reconciled with the semiclassical photon-optics interpretation of photodetection. That interpretation views the lightwave signal before photodetection classically, and incorporates the discrete nature of the lightwave signal into the photonoptics signal description of “counting shot noise” based on a Poisson probability mass function. However, that semiclassical interpretation is incomplete because it cannot fully describe “mixing shot noise” when a local oscillator is used in the demodulator. The full

10.3 Dual-Polarization Signaling

491

resolution of the origin of shot noise requires the quantum-optics signal model and is discussed in Section 15.6.

10.3

Dual-Polarization Signaling A multi-input multi-output modulation format using multiple subchannels can transmit more than one waveform on the same physical channel (cf. Section 8.1.2). A modulator may spread a single user datastream over multiple subchannels to improve performance. This section discusses the modulation and demodulation of a multi-input multi-output lightwave channel for the particular case of polarization diversity. Modulation waveforms for polarization are discussed in Section 10.3.1. Dual-polarization modulation and demodulation are discussed in Section 10.3.2. These methods are viable only if the polarization state can be estimated and equalized. Such methods of estimation are discussed in Section 12.6.

10.3.1

Constellations in Four-Dimensional Signal Space

A polarization-multiplex channel modulates a separate complex-baseband waveform onto each of the two polarization modes. These two modes become subchannels that carry two separate datastreams comprising separate information channels. Instead, the two complex numbers for the signal constellations of the two subchannels can be regarded as a single four-component point in a four-dimensional euclidean space R4. The signal constellation is then a set of points in R4. The corresponding fourdimensional signal constellation is a set of L = N 2 points, where N is the number of signal points in each (two-dimensional) complex signal constellation. For example, if QPSK is used for each polarization, then N equals four and there are 4 2 or 16 possible points in the four-dimensional constellation. Each four-dimensional signal point can be described by a block signal s given by s = (sAI, sAQ, sBI , sBQ),

(10.3.1)

where the first subscript (A or B) designates the polarization component and the second subscript (I or Q) designates either the in-phase signal component or the quadrature signal component. If QPSK is used for each polarization, then the bit energy E b for each of the four quadrature components is one-quarter of the symbol energy E , with E = 4E b . The block signal s is any of the 16 elements of the finite-dimensional signal constellation and is given by

Ʊ A,± A,± A,± AÇ , ² √ √ where A = E /4 = E b log 216/4 = E b (cf. Figure 10.3). s∈

(10.3.2)

The 16 points in the set given in (10.3.2) all lie on the surface of a fourdimensional hypersphere. The points are equidistant from the origin. This suggests a four-dimensional L -ary generalization of PSK, with the L points lying on the surface of a four-dimensional hypersphere. Such a polarization-modulated waveform can have a nearly constant power. This can minimize intensity-dependent nonlinear effects.

492

10 Modulation and Demodulation

Each point of the signal constellation in (10.3.2) is equidistant from its adjacent signal points. This distance, which is the minimum distance dmin , is given by

½ √ dmin = ( E /4)(1 − (−1)) 2 = E = 2 A ,

(10.3.3)

which is the same minimum distance as both BPSK and QPSK. Indeed, the four components could be separately demodulated as four independent BPSK waveforms. In this way, this format has the same probability of error pe as BPSK. Using (10.1.20) with L = 16, which corresponds to dual-polarization QPSK modulation, the energy efficiency is

E

=

2 log L dmin 2 4E

=

E log2 16 4E

= 1.

This value is equal to the energy efficiency of BPSK because dual-polarization QPSK modulation is equivalent to four orthogonal BPSK signals. The probability of a detection error pe for the 16-ary symbol is pe

= 1 − pc = 1 − (1 − pe (BPSK))4 ,

(10.3.4)

where the probability of correct decision pc is the product of the probabilities of four independent correct decisions. For pe (BPSK) much less than one, the term (1 − x )4 can be expanded, keeping only two terms giving (1−x )4 ≈ 1−4x, so that pe ≈ 4 pe (BPSK). The probability of a symbol error for a four-dimensional orthogonal modulation format is equivalent to the probability of a bit error in a four-bit sequence for binary phase-shift keying. The signal constellation given in (10.3.2) suggests smaller four-dimensional modulation formats, such as one defined by the following set of eight signal points:

Æ Ç ∈ (± A,± A, 0, 0), (0, 0,± A,± A) , (10.3.5) ² √ √ where A = E /2 = E b log 28/2 = 3E b /2. These eight points again lie on the surface of a four-dimensional hypersphere of radius √ A = E /2. This format has the same expected energy as the format defined in (10.3.2), s²

but has only half the number of points, and half the energy per symbol. This set corresponds to a QPSK format that can choose either the Á e A polarization or the Á e B polarization for each symbol to convey one extra bit modulated onto the polarization. The use of both polarizations during the same modulation interval is not allowed with this format. The minimum distance for this format is dmin

½

= (E /2) (1 − (−1))2 =



2E

= 2A.

For a fixed amplitude A, this√is the same as before, and E b is reduced by 2/3. For fixed total energy E, this is 2 larger than the minimum distance for the constellation defined in (10.3.2) because there are half as many signal points. This modulation format is called polarization-switched QPSK. This example of a jointly orthogonal modulation format uses orthogonal in-phase and quadrature components and orthogonal polarization components. Using (10.1.20), the energy efficiency for this format is

10.3 Dual-Polarization Signaling

E

493

2

log2 8 log 2 L = 2E4E = 3/2, = dmin4E

and it is 50% more energy-efficient than polarization-multiplexed QPSK. This improved energy efficiency is because, by the choice of polarization, one additional bit is encoded with no increase in the symbol energy. In comparison with the dual-polarization signaling, polarization-switched QPSK comes at the cost of reducing the information rate by one bit per symbol because the signal constellation is half as large. Using a four-dimensional modulation format provides design flexibility to satisfy system constraints such as bandwidth efficiency and energy efficiency, and even confers some protection from nonlinearities such as nonlinear phase. An example is a variation of polarization-switched QPSK that uses differential modulation of the third bit onto the polarization, thereby changing the polarization by ±90◦ depending on the value of that bit. When combined with minimum-shift keying (MSK), this format produces a constant-amplitude waveform that is less susceptible to nonlinearities. As a second example, consider a 16-QAM signal constellation on each polarization component. This format does allow eight bits to be transmitted at each sample time, but the corner points will sometimes be transmitted on both polarizations, resulting in a large signal amplitude. To provide protection from nonlinearities, a 256-point signal constellation can be designed in four dimensions to replace these corner points and yet retain the same four-dimensional minimum distance. To this end, the 256-point four-dimensional signal constellation can be written in product form as {− 3A ,−3A , 3A , ² 3A}4 . There are 16 corner points such as the point (3A , 3A , 3A , 3A ) with amplitude 4(3A )2 = 6A. point can√be replaced by the new point (5A , A , A , A ) with amplitude ² This (5 A)2 + 3A2 = 28A, thereby reducing the effect of the nonlinearity without changing dmin .

10.3.2

Dual-Polarization Modulation and Demodulation

A block diagram of a dual-polarization lightwave modulator is shown in Figure 10.14. A single coherent lightwave source generates two carriers for the two polarization components using a polarization beamsplitter. Using a linearly polarized lightwave source oriented at 45◦ with respect to the two polarization components defined by the beamsplitter, the two polarization components used as lightwave carriers have equal power. A complex baseband waveform is modulated onto each of these two polarization components. The two modulated lightwave signals – one in each polarization mode – are then multiplexed using a polarization combiner. The polarization-multiplexed signal is coupled into a single-spatial-mode fiber that supports both polarization components, eˆ A and eˆ B . At the receiver, two polarization beamsplitters are used. Two polarization components, eˆ a and ˆeb , are defined by the two polarization beamsplitters shown in Figure 10.15. The first polarization beamsplitter projects the received noisy lightwave signal onto each of the two polarization components. The second polarization beamsplitter projects the local oscillator onto the same set of two polarization components. The individual polarization components are then demodulated, each using a balanced photodetector.

494

10 Modulation and Demodulation

sAI(t)

Beamsplitter

Linearly polarized coherent source

eˆB

cos(2 πfc t)eˆ A

eˆ A

- 90°

+

+

)

sAI(t) + isAQ(t) eˆ A

±

Polarization combiner

sAQ(t) sBI(t) cos(2 π fc t) eˆ B - 90°

+

+

±

)

sBI(t) + isBQ(t) eˆB

sBQ(t) Figure 10.14 Block diagram of a dual-polarization lightwave modulator. Noise

+

Local oscillator

Le − i2 πf IF t ê a [s(t) + n (t)]bêb Le − i2 πf IFt êb

Direct to baseband homodyne demodulation

Direct to baseband homodyne demodulation

zaQ

zbI zbQ

tnemngila noitaziraloP

Signal

[s(t) + n(t)]a ê a

Sample zaI

Receiver polarization basis

±AI z±AQ z

±BI z±BQ z

Filter rAI rAQ

rBI rBQ

Estimated transmitter polarization basis

Figure 10.15 Block diagram of a dual-polarization lightwave demodulator.

The demodulation of the lightwave signal in each received polarization component is implemented with each component using its own balanced photodetector (cf. Section 7.3.2). This results in two complex signals described by four real signals. For a lossless lightwave channel, the polarization axes at the receiver are, in general, rotated with respect to the polarization axes at the transmitter, with the rotation described by a unitary transformation on the surface of the Poincaré sphere (cf. (2.3.60)). Therefore each sample z i j of the four down-converted baseband signals is a linear combination of the transmitted in-phase and quadrature components in the two transmitted polarization components. For a channel that is dispersionless with fixed polarization modes (cf. Section 4.6) over a suitable time interval, the appropriate discrete-time channel model for the received noisy block sample z before processing is z = Hs + n,

(10.3.6)

where the 2 × 2 channel matrix H depends on space, but not time. This is called a space–time-separable channel. For a dispersionless single-spatial-mode channel with no polarization-dependent loss (cf. Section 4.6.4), the channel matrix H is a normal channel

10.3 Dual-Polarization Signaling

495

n (t) s(t) =

s 1 (t) s 2 (t)

H

z

+

H -1

Sample

Polarization alignment

Vector channel



y

r

Block±matched filter

Detection statistic

Figure 10.16 A block diagram of a system that uses dual-polarization modulation.

matrix (cf. (8.1.14) and (8.1.16)). Techniques to estimate the matrix H are discussed in Section 12.6. For a memoryless space–time separable channel, the block sample r used for detection is generated using the system shown in Figure 10.16, which combines the modulator shown in Figure 10.14 and the demodulator shown in Figure 10.15. At the receiver, a noisy block sample z is generated. A polarization-alignment procedure takes that block sample and estimates the complex sample value z² in each transmitted polarization mode, eˆ A and eˆ B , using a linear combination of the block sample values z in the polarization modes, eˆa and eˆ b, defined at the receiver. For each block sample z, the polarization transformation is the inverse H−1 of the channel matrix. For an exact H−1, the output block sample z² after the polarization alignment is given by z²

= H−1z ( ) = H−1 Hs + n = s + H−1 n. (10.3.7) For a unitary channel matrix, H−1 = H†, and the covariance matrix V of the complex noise samples after the polarization alignment is

È

»

¼ †É

V= H n H n †



Ê Ë = H† nn† H = N 0H† H = N 0 I. (10.3.8) The second line follows from applying (AB)† = B† A† . The third line is a consequence Ã Ä of the noise variance being equal for each signal component so that nn † = 2σ 2 = N0

(cf. (6.2.11)). The final expression relies on the matrix H being unitary. Given that V is a scaled identity matrix, the noise samples have the same variance, and are uncorrelated. Thus they are independent. The second part of the demodulator takes the sequence of polarization-aligned block samples {z²k } and generates the block detection statistic {rk } using a block matched filter y. This step yields a pair of complex numbers for the block detection statistic – one complex number for each polarization component in each symbol interval k. Given this statistic, the detection process may decide each component of the four-dimensional block symbol separately, or the detection process may use joint detection employing the components of the block sample based on prior knowledge of how each block symbol

496

10 Modulation and Demodulation

is encoded. Refinements to this channel model that include the effect of dispersion and polarization-dependent loss are discussed in Section 11.4.3.

10.4

Constellations in Signal Space The notion of a signal constellation on the complex plane can be generalized to the notion of a signal constellation in an infinite-dimensional signal space. While this formalism is not required for conventional signaling formats and may seem needlessly complex, it provides an elegant and unifying perspective that can accomodate modulation formats, such as frequency-shift keying or orthogonal signaling, that cannot be expressed as signal constellations on the complex plane. Moreover, the section provides a precursor in a classical setting to the elegant subject of quantum-optics detection, which is discussed in Chapter 16. Complex signal space is the set of all complex functions of finite energy. Two complex functions s1 (t ) and s2 (t ) are deemed to be equivalent in signal space if

³∞

−∞

|s1(t ) − s2(t )|2 dt = 0.

If s1 (t ) is equivalent to s2(t ) and s2 (t ) is equivalent to s3 (t ), then s1(t ) is equivalent to s3(t ) as well. Formally, the notion of equivalence partitions the set of finite-energy functions into equivalence classes. Informally, two equivalent functions are regarded as the same function. Likewise, the set of all finite-energy complex functions {s (t )} on the interval [−T /2, T / 2] is a signal space. Alternatively, the set of finite-energy functions with Fourier transform S( f ) having bandlimited support [− B /2, B /2] is also a signal space. Either is suitable for communication signaling. A signal space limited to a finite time or frequency interval is spanned by a countable basis. Either set comprises an infinite-dimensional signal space spanned by a countable set of complex basis functions { y1 (t ), y2 (t ), . . .}. A signal constellation of size L in a signal space is represented by a set of functions {s±(t )} for ± = 1, . . . L, where L is the size of the signal constellation. Each function in the signal constellation can be expressed in terms of the basis as s± ( t ) = where m ±k

=

∞ ±

m± k yk (t ),

(10.4.1a)

s± (t ) yk∗ (t )dt

(10.4.1b)

k =0

³∞ −∞

is the complex expansion coefficient. For an arbitrary basis, the ±th function s± (t ) in the signal constellation is represented by an infinite vector {m ± 1, m ±2 , . . .} of complex numbers. Possibly only a finite number of components are nonzero. For a signal constellation of size L, there is a basis for which not more than L components are nonzero. For a traditional signal constellation in the complex plane, there is a basis for which only

10.4 Constellations in Signal Space

497

one component is nonzero. The set of all such infinite complex vectors is a vector space, and the signal constellation is specified by L such infinite complex vectors.

10.4.1

General Signal Constellations

A signal constellation of size L is a finite set of L points in a signal space with a countable basis. Using vector–matrix notation, this set of points can be written as a signal-constellation matrix M , with the relationship between the set of L signal functions {s± (t )} and the infinite set of basis functions {yk (t )} given by s( t ) =

My(t ),

(10.4.2)

where s(t ) is a column vector of length L with components s± (t ), where y(t ) is an infinite column vector whose components yk (t ) are the basis functions, and where the elements m ±k of the matrix M are given by (10.4.1b). A signal constellation in a signal space is fully specified by the structure of the signal constellation matrix M . The L points of the signal constellation lie in a subspace of dimension L ² not larger than L , so the basis {y(t )} can be chosen to have L basis functions that span a subspace containing all the points of the constellation. Then the matrix M can be cropped to an L × L matrix of rank L ² . The vector y(t ) has orthogonal waveform components yk (t ) spanning an L -dimensional subspace of signal space. In contrast, an arbitrary basis may require an infinite number of basis functions to describe the subspace containing the points of the signal constellation. For hard-decision detection using a signal constellation characterized by the matrix M, the optimal set of detection functions and the associated decision rule partition the signal space into decision subspace regions. The decision subspace regions are chosen so as to minimize the probability that an incorrect hypothesis is chosen. These regions are analogous to the Voronoi regions on the complex plane, with the number of subspace regions equal to the size L of the signal constellation in signal space.

10.4.2

Constellations on the Complex Plane

A signal constellation expressed on the complex plane, such as quadrature-amplitude modulation, modulates a single-pulse waveform s (t ) = x (t ) with a complex amplitude m ± from the signal constellation. Accordingly, the signal subspace is one-dimensional. Only a single complex basis function y (t ) is required to span the one-dimensional complex subspace. This basis function can be chosen to be equal to the pulse x (t ), with s± (t ) = m ± x (t ).

(10.4.3)

For this case, the nonzero components of the matrix M reduce to a single L × 1 column vector with complex components m ± for ± = 1, . . . , L that are modulated onto the single pulse x (t ). For example, the conventional QPSK signal constellation can be written in the form of (10.4.2) as

498

10 Modulation and Demodulation

⎡ s1 ( t ) ⎤ ⎡ 1 + i ⎤ ⎢⎢ s2 (t ) ⎥⎥ = ⎢⎢ −1 + i ⎥⎥ [x (t )] , (10.4.4) ⎣ s3 ( t ) ⎦ ⎣ − 1 − i ⎦ s4 ( t ) 1−i where here M is written as a 4 × 1 matrix. To write this expression with a 4 × 4 matrix for M write ⎡ s1 (t ) ⎤ ⎡ 1 + i 0 ⎤ ⎡ x (t ) ⎤ 0 0 ⎢⎢ s2 (t ) ⎥⎥ ⎢⎢ 0 −1 + i 0 ⎢⎢ x (t ) ⎥⎥ 0 ⎥ ⎥ = (10.4.5) ⎣ s3 (t ) ⎦ ⎣ 0 0 −1 − i 0 ⎦ ⎣ x ( t ) ⎦ , s4 (t ) 0 0 0 1−i x (t ) thereby adhering to the statement that M is an L × L matrix. 10.4.3

Orthogonal Constellations

For L -ary orthogonal signaling with equal-energy symbols, there is a separate waveform s± (t ) for each point in the signal constellation, and the L orthogonal waveforms span an L -dimensional subspace of signal space. Choosing a set of basis functions { yk (t )} matched to the set {s± (t )} of orthogonal pulse waveforms gives m ± k = δ ±k , where δ ±k is the Kronecker impulse. This means that the infinite-dimensional matrix M can be cropped to an L × L square matrix M equal to the identity matrix I. Then s(t ) = Iy(t ),

(10.4.6)

where s(t ) is a column vector with components {s1(t ), s2(t ), . . . , s L (t )}. The diagonal structure of the signal-constellation matrix shows that each basis function y± (t ) is matched to the corresponding signal pulse waveform s± (t ). For this case, with additive gaussian noise, the optimal receiver is shown in Figure 10.17. It consists of a set of detection functions { yk (t )} that are equivalent to a bank of matched filters. The block r = (r1 , r2 , . . . , rL ) of sampled outputs is a sufficient detection statistic. The received noisy signal r (t ) = s± (t ) + n(t ) is projected onto each detection function as given by ± ∞

−∞

y 1∗(t ) r (t) = s ± (t ) + n(t)

± ∞

−∞

−∞

( ·)d t

Sample

r1

r2

...

± ∞

y ∗L (t )

(·)d t

...

y 2∗(t )

(·)d t

Choose largest real part

±²

rL Decision block r

Figure 10.17 The optimal receiver for an equal-energy orthogonal signal constellation.

10.4 Constellations in Signal Space

499

(10.4.1b) to obtain the sample detection block. The decision rule then asserts the hypothesis corresponding to the largest real component of the block r. For a fully modulated waveform, the functions { y±∗ (t )} must be replaced by the functions {y±∗(t − kT )} for each k, or a matched filter must be used. A variation of an orthogonal signal constellation in signal space is a biorthogonal signal constellation. The L points of the signal constellation lie in a signal space of dimension L /2 such that the signal-constellation matrix M can be partitioned as

Ì

Í

M = −II .

(10.4.7)

In this case, the preferred basis of the subspace is a set of L / 2 orthogonal signals.

10.4.4

Nonorthogonal Constellations

An L -ary nonorthogonal constellation in signal space is a signal constellation for which the L signals do span an L-dimensional subspace of signal space, but the inner product of at least one pair of points of the signal constellation is nonzero. An example of a binary nonorthogonal signal constellation {s1(t ), s2(t )} in the signal space with a set of basis functions {y1(t ), y2 (t )} is shown in Figure 10.18. Signal constellations of this form are sometimes used for classical signaling, and are common in quantum optics as a consequence of fundamental dependences within that signal model that are not fundamental in classical signaling. For this case, the nonorthogonality of the signaling waveforms produces off-diagonal elements in the signal-constellation matrix M for any choice of basis functions. The nonorthogonality of the signaling waveforms leads to a set of basis functions { yk (t )} that are not mapped one-to-one to the set {sk (t )} of signaling functions. Instead, each signal s± (t ) is a linear combination of the basis functions as expressed by the corresponding row of M. Inverting the signal-constellation matrix M expresses y(t ) as y(t ) = M −1s(t ),

(10.4.8)

y 2 (t) s2(t)

s1(t)

y 1(t)

{ ( ), s 2(t )} and a basis {y1 (t ), y2( t )}

Figure 10.18 A binary nonorthogonal signal constellation s 1 t

in signal space.

500

10 Modulation and Demodulation

describing each orthogonal basis function yk (t ) in terms of a linear combination of the nonorthogonal set {s± (t )} of signal functions. The received noisy signal r (t ) = s± (t ) + n(t ) is projected onto each basis function to obtain the sample detection block. The decision rule then asserts the hypothesis corresponding to the largest real component of the block r. This is discussed in an end-of-chapter exercise, and in detail in Chapter 16 in the context of quantum optics.

10.5

Noncoherent Demodulation Phase-asynchronous demodulation does not require that the phase of the carrier be known. Two cases are considered. They are distinguished by the rate of change of the carrier phase φ(t ) compared with the duration of a symbol. For the first case, discussed in this section, the carrier phase φ(t ) is slowly varying over the duration of a symbol, but is not known or estimated. This case is called noncoherent demodulation. It nominally consists of sampling the magnitude of the filtered pulse rather than the real part. For the second case, discussed in Section 10.6, the carrier phase φ(t ) is rapidly varying over the duration of a symbol. This case is called energy demodulation. It nominally consists of integrating the intensity in a pulse to obtain the energy. For either case, the waveform itself is called a noncoherent waveform. The two forms of demodulation differ in the output of the linear detection filter. For noncoherent demodulation, the phase is preserved in the filter output, though still unknown. The phase has no other effect on the output of a complex linear detection filter. The unknown phase is suppressed by sampling the magnitude of the filter output. The matched filter is still optimal for noncoherent single-symbol detection in additive white gaussian noise. For energy demodulation, however, the rapidly varying phase will severely diminish the amplitude of the output of a linear detection filter, and a matched filter is not appropriate. Indeed, no linear filter is appropriate. The rapidly varying phase is suppressed by integrating the squared magnitude of the unfiltered pulse, as is discussed in Section 10.6. Restricting the discussion to noncoherent demodulation, the difference between noncoherent demodulation and coherent demodulation for which the carrier phase is known can be evaluated by viewing the complex sample values for each method of demodulation as is shown in Figure 10.19. Referring to Figure 10.19(a), the complex filtered signal sample s with magnitude |s| and unknown constant phase φ is corrupted by a noise sample that is described as a circularly symmetric gaussian random variable. For coherent demodulation, the phase φ of the received lightwave signal is known, and the noise component that is orthogonal to the complex signal can be rejected. The resulting demodulated signal plus noise would then be expressed as s + (n · ´s), where ´s is a unit vector along the direction of the signal. This is shown in Figure 10.19(c). When the phase is not known, only the magnitude |s + n| can be observed, as shown in Figure 10.19(b). Because a noncoherent demodulator does not estimate the phase of the carrier, it cannot make use of a coordinate system aligned with the in-phase and quadrature

Signal + noise s+n

10.5 Noncoherent Demodulation

501

Noise n

In±phase noise n · ²s

Noise n Signal s

φ (a)

{

Signal s

Signal + noise s + n · ²s

Signal s

φ

(b)

(c)

φ plus a random noise sample n. (b) Without a phase estimate, the signal and noise add. (c) With a phase estimate, the noise orthogonal to the signal can be rejected.

Figure 10.19 (a) Signal sample s with an unknown phase

components. For the analysis, however, it is permissible and convenient to choose a coordinate system in which the in-phase component is aligned with the signal so that . s = s I = s. The difference in the euclidean distance between a synchronously demodulated coherent signal and the asynchronously demodulated noncoherent signal in a single polarization is

)2 ÅÅ ÅÅ ( ) ½( s + n − s + (n · s´) = s + n + n2 − (s + n ), Q

I

I

(10.5.1)

where the in-phase and quadrature noise components are n I and n Q, respectively. In the absence of the quadrature noise component n Q , this difference is equal to zero. For large s, a series expansion gives

½(

s + nI

)2

+n =s 2

Q

=s

Î

1+

2n I s

1+

nI s

¾

+



nI

n2I 2s2

+n s2

+

Q

n2I + n2Q 2s 2

+ ···

À

n2Q

= s + n + 2s 2 + · · · . I

This observation shows that the effect of the quadrature noise component, which cannot be rejected when using phase-asynchronous demodulation, enters only quadratically and diminishes as the signal-to-noise ratio becomes large. Then the coherent and the noncoherent demodulators have similar performance. 10.5.1

Detection of Noncoherent Orthogonal Signals

Two methods of phase-asynchronous noncoherent demodulation for noncoherent orthogonal signals are shown in Figure 8.12. Either method can be extended to the demodulation of more than two orthogonal pulses. One method, shown in Figure 8.12(a), uses direct photodetection. To this end, the orthogonal filtering of the waveform must be done by an orthogonal pair of filters in the optical domain prior to the photodetector. Because this functionality is difficult to implement optically, direct demodulation of orthogonal lightwave pulses using optically matched filters is not currently used in practice. However, the theory can be fully developed here only by positing the existence of arbitrary complex filters in the optical domain. Then lightwave pulses

502

10 Modulation and Demodulation

with orthogonal complex envelopes can be separated in the optical domain. The output of each optical filter, after individual direct photodetection, is sampled to detect which filter has the largest output. The alternative method for the noncoherent demodulation of orthogonal signals uses a pair of matched filters in the electrical domain. A balanced photodetector using phaseasynchronous heterodyne or joint homodyne down-conversion provides the electrical signal (cf. Figure 8.12(b)). The received noisy passband or complex-baseband electrical signal is then split, leading to multiple matched filters at passband or complex baseband, one matched filter for each orthogonal pulse. The magnitude of each passband or complex-baseband matched-filter output at the sampling time forms the detection statistic. Because the pulses are orthogonal, the sample at the output of a filter matched to a different input pulse contains only noise. For binary orthogonal signaling, each symbol has an energy E b . The magnitudes of the two samples of the two matched-filter outputs are r 1 and r 2. One sample corresponds to the transmitted symbol. One sample corresponds to noise only. Applying (8.2.22) for the output with the signal, say r 1, gives r1

Å Å = ÅÅ2Eb eiφ + n ÅÅ .

This random signal-plus-noise sample has a ricean probability density function given by (cf. (2.2.33)) ¸ Ar ¹ r −(r 2 + A2)/2σ 2 I0 f (r |1) = 2 e , (10.5.2) 2

σ

σ

where ( A /σ)2 = 2Eb / N 0 for a matched filter (cf. (9.4.3)). ÅÅ For the detection filter not matched to the signal, the sample value is r 2 = ÅnÅ. This noise-only sample has a Rayleigh probability density function given by (cf. (2.2.34)) r 2 2 f (r |0) = 2 e−r / 2σ , (10.5.3)

σ

where σ = N0 /2. In contrast to the conditional Rayleigh probability density function f (r | 0) for the noise-only sample, the variance of the conditional ricean probability density function f (r |1) for the signal-plus-noise sample depends both on the signal magnitude A and on the noise. The signal-dependent variance is generated by the lightwave signal mixing with the lightwave noise in the square-law photodetector. Signal-dependent shot noise and nonlinear phase noise are not included in this discussion. Optimal detection consists of choosing the largest of the two sample values, r 1 and r 2 . Equivalently, the difference r 1 − r 2 can be compared with zero. A detection error occurs when the sample value in the path not matched to the signal exceeds the sample value in the path matched to the signal. This error probability is 2

pe

=

³∞ 0

p(e|r ) f (r )dr,

(10.5.4)

where the conditional probability p(e|r ) that an error occurs is the probability that a Rayleigh random variable exceeds r . This probability is given by

10.5 Noncoherent Demodulation

p (e|r ) =

³∞ r

x − x 2/ 2σ 2 e dx 2

σ

503

= e−r /2σ . 2

2

Substituting this expression into (10.5.4) and using a ricean distribution for f (r ) (cf. (10.5.2)), the unconditioned probability of a detection error is 2 2 pe = e− A /2σ

³∞ 0

r

σ

¸ ¹ σ

2 2 Ar e −r /σ I0 dr, 2 2

/2σ )2 = e− /σ has been used. The integral can be evaluated by substiwhere (e− √ √ tuting r ² = 2r and A² = A / 2, then manipulating the resulting expression into the form ¸ A²r ² ¹ ³ 1 − A2 /4σ 2 ∞ r ² −(r ²2+ A²2 )/ 2σ 2 ² I0 pe = e 2 σ2e σ 2 dr . 0 r2

2

r2

2

The integrand is now in the form of a ricean distribution and so integrates to one. Therefore, the probability of a detection error is pe

= 21 e− A /4σ = 21 e− E /2N = 12 e−E /2, 2

2

b

0

(10.5.5)

b

where ( A /σ)2 = 2E b / N0 (cf. (9.4.3)), and (8.2.20) has been used to relate E b/ N0 for a heterodyne demodulator to the expected number of photons Eb per bit for an ideal direct-photodetection demodulator with the number of noise photons N0 equal to one, which is the minimum amount of noise for noncoherent demodulation. √ Expression (10.5.5) should be compared with pe = 21 erfc( E b /2N 0) for coherent



orthogonal modulation (cf. (10.2.1b)). Using erfc(x ) ≈ (1/ x π)e− x (cf. (2.2.20)) to approximate the erfc function in a large-signal-to-noise regime gives the approxima√ tion pe ≈ e− Eb /2 N0 / π E b /2N0 . Comparing this expression with (10.5.5), noncoherent orthogonal modulation is equivalent to coherent orthogonal modulation in the argument of the exponent. This provides support for the earlier statement that coherent modulation and noncoherent modulation have similar performance in the large-signal-to-noise regime.

10.5.2

2

Detection of Differential-Phase-Shift-Keyed Signals

Differential-phase-shift keying is a noncoherent orthogonal signaling format with the waveforms defined over an interval of duration 2T (cf. Figure 10.6). For binary differential-phase-shift keying, the energy E used to decide one bit is defined over an interval 2T . The energy E in the sample is twice the average energy per bit E b defined over the interval T . Each bit is detected by a noncoherent binary orthogonal demodulator. Substituting 2E b into (10.5.5), the probability of a detection error for differential-phase-shift keying is pe

= 21 e− E /N = 12 e−E . b

0

b

(10.5.6)

Comparing this expression with (10.5.5), a given value of E b / N0 for noncoherent demodulation of differential-phase-shift keying requires only half the symbol energy for a phase-asynchronous, equal-energy orthogonal modulation format. Compared with

504

10 Modulation and Demodulation

binary phase-shift keying, it has a larger pe for the same value of E b/ N0 as given in (10.2.1a). Moreover, errors tend to come in pairs because one noisy symbol affects the probability of a detection error for two consecutive intervals. The approximate probability of a detection error for multilevel differential-phase modulation can be derived from the corresponding expression for L-ary phase-shift keying given in (10.2.19) for independent noise samples. For this case, the noise term in (10.2.19) is doubled, with pe

»² ≈ erfc ( Eb/2N0 ) log2 L

¼

sin(π/ L )

(L > 2).

(10.5.7)

The exact probability of a detection error for L-ary differential-phase modulation also depends on the phase error in the two subintervals. An approximation models the sample as the sum of two independent identically distributed random variables. This is considered in an end-of-chapter exercise.

10.5.3

Detection of Noncoherent On–Off-Keyed Intensity Signals

On–off-keyed modulation uses a null pulse for a space and a pulse with energy E p = E 1 for a mark. Because one of the pulses is a null pulse, these two pulses are clearly orthogonal. Accordingly, ideal noncoherent on–off intensity keying has some features in common with noncoherent orthogonal signaling based on equal-energy pulses. It is different than orthogonal signaling in that the detection statistic is an intensity sample generated from a single detection filter. This sample is compared with a threshold to determine the most likely transmitted symbol. When the total lightwave amplitude in a single polarization component is the sum of a lightwave signal s (t ) and additive lightwave noise no (t ), which may be filtered, the directly photodetected baseband electrical signal r (t ), given by (9.7.1) and repeated here, is r (t ) =

|s (t ) + no (t )|2 + nshot (t ) + ne (t ). The energy in the electrical pulse r (t ) is different from the energy in the lightwave pulse s (t ), so these energies are denoted as E elec and E opt when a reminder is needed. The electrical signal r (t ) has three forms of noise: additive electrical noise n e (t ), signal-dependent shot noise n shot (t ), and mixing noise generated by the cross term of |s (t ) + no (t )|2 . The probability pe of a detection error for each of the three noise sources 1 2

was described briefly in Section 9.7. This section discusses demodulation performance in more detail for each noise source treated separately. The minimum distance dmin for intensity modulation is defined in terms of the squared magnitude instead of the amplitude. This leads to a set of L signal points defined on the nonnegative real line, representing intensity, that are uniformly separated by a distance dmin. A four-level intensity-modulated signal constellation with equally spaced intensity levels is shown in Figure 10.7(b). For an equiprobable prior with the squared magnitudes separated by the distance dmin , the mean lightwave symbol energy E opt is

10.5 Noncoherent Demodulation

E opt

= dmin L

L ± (± − 1) = L −2 1 dmin .

±=1

505

(10.5.8)

Additive Electrical-Noise Model

The additive electrical-noise model ignores additive lightwave noise and shot noise. White gaussian noise ne (t ) is added to the electrical signal after direct photodetection. Because there is no lightwave noise, there is no requirement for optical filtering before photodetection. The received signal is then given by r (t ) = 21 |s (t )|2 + ne (t ). Then the directly photodetected signal | s(t )| 2 at the receiver does not depend on the carrier phase φ(t ).5 For an additive electrical noise model, the optimally filtered sample signal-to-noise ratio γ is given by (dmin /2σ)2 (cf. (9.4.3)), where σ 2 is the variance of the additive elec√ trical noise term ne (t ). Using pe = (( L − 1)/L )erfc( γ/2) (cf. (9.6.8)) and replacing dmin with 2 E opt/( L − 1) (cf. ((10.5.8)) in the expression for γ gives pe

=

L −1 erfc L

¸





E opt

2σ( L − 1)



¹

.

(10.5.9)



For ideal on–off keying, L = 2, E b = Eopt , and N 0 = 2σ . Then (10.5.9) reduces to (9.4.1). Comparing the argument of the erfc function in (10.5.9) for L larger than two with the argument for L = 2, the mean lightwave energy Eopt required to obtain the same argument of the erfc function for arbitrary L is larger by a factor of L − 1 than it is for binary intensity modulation. Ignoring the coefficient ( L − 1)/ L outside the erfc function, this means that the energy efficiency (cf. Section 10.1.5) of L -ary intensity modulation is

/2 = 1 , (10.5.10) = ( L −dmin 1)dmin /2 L−1 where ideal on–off keying with E b = dmin /2 (cf. Figure 10.3) is used as the reference format. For L = 4, this is one-third as energy efficient, or about −4.8 dB, as binary E

on–off keying.

Signal-Dependent Shot-Noise Model

The signal-dependent shot-noise model ignores additive lightwave noise and additive electrical noise. Shot noise that is dependent only on the lightwave signal was discussed using a photon-optics model for binary signaling in Section 9.5.2 and for multilevel signaling in Section 9.6.2. The received sequence of photocounts over a time interval of duration T is modeled appropriately using a Poisson distribution that has a signaldependent variance equal to the mean signal (cf. (6.2.29)).

()

5 The timewidth of s t at the receiver does depend on

lightwave signal (cf. (4.5.7)).

φ(t ) through the linewidth σλ of the modulated

506

10 Modulation and Demodulation

Here, we discuss the approximation generated by replacing the Poisson probability mass function for signal-dependent noise by a continuous gaussian probability density function. Inspection of Figure 9.19 reveals a gaussian-like shape for large expected values of the Poisson distribution. This suggests that the detection statistic for a signal-dependent noise model can be approximated by replacing the discrete Poisson probability mass function with a continuous gaussian probability density function. This approximation must be done with caution. The gaussian distribution is a symmetric function about its mean and is a poor approximation to a skewed Poisson distribution with a small mean. Moreover, the probability of a symbol error is determined by the tails of the distribution, and a gaussian approximation is a poor fit to the tails of a skewed Poisson distribution. Recognizing the limitations of this approximation, the probability of a binary detection error for a system with signal-dependent noise is approximated using two gaussian distributions with unequal variances by setting s0 = E0 = σ 02, then setting s1 = E1 = σ12 . Substituting these values into (9.5.28) and using (9.5.27), the probability of a detection error is pe

=

1 erfc 2

¸ Q ¹ 1 ¸ 1 E − E ¹ 1 ¸√E − √E ¹ √ = 2 erfc √ √E1 + √0E = 2 erfc 1√ 0 . 2 2 2 1

0

(10.5.11)

Substituting the same values into (9.5.26), the threshold ³ is given by

³=

²

E1 E0

,

(10.5.12)

which is the geometrical mean of the two expected values for the two symbols. Figure 10.20 plots the exact Poisson probability mass function of a mark with mean E1 = 100 and a space with mean E0 = 10, along with the two gaussian probability density functions whose variances are equal to the two expected values. Examining Figure 10.20, the gaussian approximation to the Poisson probability distribution overestimates the conditional probability p0| 1 of detecting a space given that (a) 0

E0 = 10

E1 = 100

–5 ) s |m(p go L

) s |m( p go L

–5

(b) 0

–10 –15 –20

0

50

100 Counts (m)

150

200

–10 –15 –20 25

Gaussian

30

Poisson 35

40

45

50

Counts (m )

( | )

Figure 10.20 (a) Log of the conditional probability mass functions p m s ± for a mark and space

based on a Poisson probability distribution (solid lines) for E0 = 10 and E 1 = 100 as well as a gaussian approximation (dashed lines) that has the same mean and variance. (b) Detail of the intersection of the probability density functions which determines the optimal threshold ³.

10.5 Noncoherent Demodulation

507

a mark was transmitted and underestimates the conditional probability p1|0 of detecting a mark given that a space was transmitted. The threshold using the approximate gaussian probability density function is only an estimate of the optimal threshold. A gaussian probability density function for the detection statistic provides a reasonable estimate of pe for large expected signal values, but does not provide a good estimate for moderate expected signal values, nor does it yield an accurate estimate of the optimal threshold ³.

Signal–Noise-Mixing Model

The lightwave signal and the lightwave noise are mixed in the process of direct photodetection, producing signal-dependent noise, which is discussed here. Shot noise and additive electrical noise are ignored. For this noise model, an optical filter that discriminates between the signal and the noise prior to direct photodetection must be specified. The optimal filter for single-symbol detection of a pulse with slowly varying phase is a matched filter implemented in the optical domain before direct photodetection, as was discussed for orthogonal signaling in Section 10.5.1. When a space is transmitted, the sample r 0 can be expressed as the magnitude |n | of the noise only and is a Rayleigh random variable. The conditional probability density function f (r |0) is a Rayleigh probability density function (cf. (10.5.3)) with mean noise √ amplitude given as σ π/2. The variance of this conditional distribution is σ 2(2 −π/2), where σ 2 is now the variance of the lightwave noise term no (t ). The variance is proportional to σ 2 and does not depend on the signal. This conditional distribution has the same form as the case of orthogonal signaling for all detection filters not matched to the received pulse. When a pulse is transmitted, E p = E 1 . Using (8.2.23), the sample r 1 is |2E 1ei φ + n| and is a ricean random variable with a variance that does depend on the signal. The conditional probability density function for a mark f (r |1) is a ricean probability density function (cf. (10.5.2)) characterized by the magnitude A1 when a mark is transmitted. This conditional distribution has the same form as the case of orthogonal signaling for the detection filter matched to the received pulse. The two conditional distributions are plotted in Figure 10.21. Because the conditional probability density functions have different functional forms, the probability of a detection error depends both on the threshold ³ and on the prior. The solution for the optimal, prior-dependent threshold is discussed in Section 14.2.4 using a photonoptics signal model. In this section, only large values of E b / N0 are considered, so that I0 (r ) ≈ er . For this case, the conditional probability density function f (r | 1) approaches a gaussian distribution with mean A1 and variance σ 2 . Using this approximation, and an equiprobable prior, the conditional probability p0|1 of a detection error is

³³ e−(r − A ) / 2σ √1 2πσ −∞ ¸ ³ ¹ 1 ≈ 2 erfc √ . 2σ

p0| 1 ≈

1

2

2

dr (10.5.13)

508

10 Modulation and Demodulation

0.6

noitcnuF ytisneD ytilibaborP

0.5 Space (Rayleigh)

0.4

Mark (ricean)

0.3 0.2

Θ =A 1 /2

0.0

p(1|0)

p(0|1)

0.1

0

2

4

6

8

Sample Value Figure 10.21 Rayleigh and ricean distributions used to determine the error probability using a

threshold ³ = A 1/ 2. The areas correspond to p1| 0 and p0|1 .

For ideal noncoherent on–off keying with an equiprobable prior, 2E b = E 1 . This means √ that, for a matched filter, A 1/σ = 2 E b/ N0 (cf. (9.4.3)), where N0 is the power density spectrum of the lightwave noise. Using the elementary threshold ³ = A 1/2, the conditional probability p0|1 is p0|1



1 erfc 2

¾¿

Eb 2N0

À

≈ 2√π( E1 /2N ) e− E /2 N , b

b

0

0

where (2.2.20) has been used for the approximating expression. For an equiprobable prior, the threshold ³ = A 1/2 becomes a good approximation to the threshold that minimizes the probability of a detection error when the mean signal magnitude A 1 is √ much larger than the mean noise magnitude σ π/2. In a similar way, using (10.5.3), the conditional probability p1|0 with the elementary threshold ³ = A1 /2 is p1|0

1

≈ σ2

³∞

A1 /2

2 2 re−r / 2σ dr

= e− A /8σ = e− E /2 N . 2 1

2

b

0

When E b/ N 0 is much larger than one, p1|0 is much larger than p0|1 (cf. Figure 10.21). Then, for an equiprobable prior,

= 21 p1|0 + 21 p0|1 ≈ 12 p1|0 = 12 e− E /2N , (10.5.14) which is equivalent to (10.5.5). Therefore, for large values of E b/ N0 , the probability of a pe

b

0

detection error for noncoherent on–off-keyed intensity modulation with only lightwave noise is equal to the probability of a detection error for noncoherent orthogonal signaling with equal-energy pulses.

10.6 Energy Demodulation

509

Suboptimal demodulation A simple method of suboptimal demodulation for an on–off-keyed signal with signal– noise mixing that does not require the use of an optically matched filter is shown in Figure 7.21. A complex gaussian noise process is bandlimited to a bandwidth B by a rectangular bandpass filter before direct photodetection. The detection filter then integrates the directly photodetected electrical signal r (t ) over an interval of duration T to generate the sample. When a mark is transmitted, the sample can be approximated by a noncentral chisquare probability density function with 2 K degrees of freedom, where K = µT B ¶ is the number of independent, exponentially distributed subsamples used to generate the sample (cf. Section 6.5.1). When the product of the bandwidth B and the symbol interval T is less than or equal to one (cf. Figure 6.6), K is equal to one, and the sample when a mark is transmitted can be described by a noncentral chi-square probability density function with two degrees of freedom. Similarly, when a space is transmitted, the sample can be described by a central chi-square probability density function with two degrees of freedom, which is the same as an exponential probability density function. A sample of the squared magnitude is described by a noncentral chi-square probability density function with two degrees of freedom. A sample of the magnitude is described by a ricean probability density function (cf. (10.5.2)). When the integrating detection filter is replaced by a matched filter, the detection statistic is also described by a ricean probability density function, but with a different signal-to-noise ratio because the receivers are different. This difference is discussed in a problem at the end of the chapter.

10.6

Energy Demodulation A phase-asynchronous demodulator implements noncoherent demodulation whenever the carrier phase φ(t ) varies slowly compared with the duration of a symbol. In contrast, when the carrier phase φ(t ) varies rapidly during the duration of the impulse response of the detection filter, the corresponding signal waveform at the receiver cannot be filtered as such. Any filter applied to such a signal will average the amplitude towards zero at the filter output. Instead, in the limit of a completely noncoherent carrier with the coherence timewidth τc of the carrier phase φ(t ) approaching zero, the energy E is the detection statistic. The energy is obtained by integrating the squared magnitude of the signal, irrespective of the modulation format. Accordingly, this kind of demodulation is called energy demodulation. An energy demodulator can be implemented using either direct photodetection or heterodyne demodulation followed by envelope detection as is shown in Figure 10.22. The passband lightwave signal ¶ s (t ) has a bandwidth B that is the spectral width of the power density spectrum of that noncoherent lightwave (cf. Section 7.8.1). Because of the rapidly varying phase, the bandwith B of the lightwave is much larger than the bandwidth of the modulating waveform. Additive lightwave noise is not shown in Figure 10.22. Were this noise included, it would be reasonable to bandlimit to the bandwidth

510

10 Modulation and Demodulation

Optical

(a) s˜(t)

Electrical

Optical/electrical conversion

Square±law photodetector

r(t)

Integrate and sample E T

Integrate and sample

(b) s˜(t) L˜

Balanced photodetector

r˜(t)

Squared magnitude

±

T

E

Figure 10.22 (a) A direct-photodetection energy demodulator. (b) A heterodyne energy

demodulator.

B of the phase-noise-modulated signal ¶ s (t ), but not less. This bandlimiting is defined by the time-varying phase and is not equivalent to a matched filter. To motivate the distinction between noncoherent demodulation and energy demodulation, it is informative to consider the case of a partially coherent phase. For a partially coherent phase, the decrease in the sample amplitude that would be caused by filtering can be reduced if the coherence timewidth is smaller than the symbol interval. Subdivide the symbol interval of duration T into subintervals, with the duration of each subinterval equal to the coherence timewidth τc ≈ 1/ B of the random lightwave signal. Over a time interval of duration T , there are µT /τc ¶ such subintervals. Each of these subintervals has an independent random phase φ that is unknown, but nominally constant. The complexbaseband pulse s (t ) is segmented into subpulses sk (t ) each of duration τc , and a bank of matched filters yk (t ) is formed, one filter matched to each subpulse. The subpulse sk (t ) in the kth subinterval is then match-filtered over the duration of the subinterval to form a subsample r k . The squared magnitudes of all of these subsamples are added to generate the detection statistic as given by

ÅÅ2 ± ÅÅų (k +1)τ Å ² ² ² r= ÅÅ yk (t − t )sk (t )dt Å , Å k τ k =0 K −1

c

c

where yk (t ) is the detection filter for the kth subinterval. As τc decreases, the number of subintervals goes to infinity, and the detection statistic approaches the energy in the symbol. This kind of demodulator is optimal when the carrier phase φ(t ) varies rapidly, but is suboptimal when the carrier phase φ(t ) varies over a time interval comparable to the symbol interval T . This is discussed further in an end-of-chapter exercise.

10.6.1

Sample Statistic

For a partially coherent carrier, an approximation to the probability density function for the complex sample after a detection filter that includes the effect of the time-varying carrier phase can be derived. The phase-noise process is the integral of an additive

10.6 Energy Demodulation

511

white gaussian frequency-noise process with a constant power density spectrum. The carrier-phase noise process φ(t ) is modeled as a Wiener process with a time-varying variance of 2π Bt with respect to any starting time (cf. Section 7.8.1). To proceed, normalize φ(t ) before the detection filter as the random process ψ(ξ) defined by ψ(ξ) =. √1 φ(ξ T ), (10.6.1)

´ . where ξ = t / T , and ´ = 2π T B describes the variance of the phase change in radians over an interval of duration T . For simplicity, the detection filter is taken to have a rectangular impulse response of duration T , with the signal amplitude at the input to the detection filter set equal to one. With no additive noise, the random variable r for the complex sample after the detection filter is given by

³ ³1 √ 1 T i φ(t ) . r= e dt = ei ´ψ(ξ) dξ. (10.6.2) T 0 0 For a rapidly varying phase, ´ is large, and the sample amplitude at the output of the

detection filter tends to zero. In this case, the use of a linear detection filter to form the detection statistic is not appropriate. When ´ is small, the phase varies slowly compared with a symbol interval. Then the random variable r can be approximated by the first two terms of the series expansion of √ i ´ψ(ξ) e ,

³ 1»

¼ ´ψ(ξ) − 21 ´ψ 2(ξ) dξ. (10.6.3) 0 The marginal probability density function fθ (θ) for the random phase θ of the comr





1+i

plex sample r is determined by using the first two terms of the expansion given in (10.6.3) so that

θ

¸

= tan−1 Im[r ]

¹

Re[r ]

[r ] ≈ Im Re[r ] √ ³1 ≈ ´ ψ(ξ)dξ. 0

The first √ approximation uses tan−1(x ) ≈ x for x º

¸ 1. The second approximation uses Im[r ] ≈ ´ 01 ψ(ξ)dξ and Re[r ] ≈ 1 (cf. (10.6.3)). Integrating ψ(ξ) by parts with ψ ²(ξ) = fd (ξ) gives

¾ À √ ³1 √ ÅÅ1 ³ 1 ² θ = ´ ψ(ξ)dξ = ´ ξψ(ξ) 0 − ξψ (ξ)dξ 0 0 ¾ À ³ 1 √ = ´ ψ(1) − ξ fd (ξ)dξ 0 ³ 1 √ = ´ (1 − ξ) f d (ξ)dξ, 0

512

10 Modulation and Demodulation

º1

where f d (t ) is the frequency-noise process (cf. (7.8.1)), and where ψ(1) = 0 f d (ξ) dξ has been used. Modeling the frequency-noise process f d (t ) as a stationary, zero-mean, gaussian random process with unit variance, the random phase θ of the complex√sample r is now in the form of a filtered gaussian random process with a filter function ´(1 − ξ). The resulting probability density function f θ (θ) for the random phase θ is gaussian and can be written as 2 2 1 e−θ /2σθ . (10.6.4) f θ (θ) ≈ √ 2πσθ Setting the power density spectrum N f of the frequency-noise process fd (t ) equal to one and using (2.2.65b), the variance σ θ2 of the gaussian probability density function for the phase θ of the complex sample is

³1 σθ2 = ´ (1 − ξ)2 dξ 0 = 13 ´.

(10.6.5)

This expression describes the effect of a slowly varying carrier phase on the phase θ of the complex sample r when a rectangular filter is used as the detection filter. When ´ is small, there is little variation of the phase over T and the variance σθ2 of the phase of the complex sample r is small. In this regime, noncoherent demodulation based on a matched filter is appropriate. When ´ is large, the variance σθ2 of the phase of the sample is large, and the magnitude of the complex sample r will tend toward zero. In this regime, energy demodulation is appropriate.

10.7

References The general topic of modulation is covered in Carlson, Crilly, and Rutledge (2001), and in Proakis (2001). Modulation and coding for linear gaussian channels are discussed in the survey article by Forney and Ungerboeck (1998). Intensity modulation for lightwave communication systems is treated in Gagliardi and Karp (1976), in Gowar (1984), and in Einarsson (1996). The representation of signals in a signal space is covered in Wozencraft and Jacobs (1965). Coherent lightwave communications is covered in books by Jacobsen (1994), by Shimada (1995), by Betti, De Marchis, and Iannone (1995), and by Kazovsky, Benedetto, and Willner (1996), as well as in papers by Barry and Lee (1990) and by Salz (1985). Phase-modulated systems are covered in Ho (2005) and in Seimetz (2009). The effect of phase noise on a lightwave communication system is discussed in Ho (2005). The relationship between heterodyne demodulation and direct photodetection is considered in Tonguz and Kazovsky (1991). Practical issues of high-data-rate lightwave phase-modulated systems are covered in Gnauck and Winzer (2005). Other modulation formats appear in Winzer and Essiambre (2006). Detection statistics for direct photodetection are discussed in Urkowitz (1967). Four-dimensional modulation for lightwave systems using polarization is considered in Agrell and Karlsson (2009). A spectrally efficient polarization-multiplex system is discussed in Winzer, Gnauck, Doerr, Magarini, and Buhl (2010).

513

10.9 Problems

10.8

Historical Notes The analysis of shot noise was first reported by Campbell (1909a, b). The fundamental difference between homodyne and heterodyne demodulation for lightwaves was first discussed in Oliver (1961) and in Haus, Townes, and Oliver (1962). The origin of “mixing” shot noise in heterodyne demodulation from an image mode was first discussed in Personick (1971).

10.9

Problems 1 The photocharge and the electrical energy in a pulse For direct photodetection, the photocharge W in an electrical pulse p t is given by

()

³∞ . W = p(t )dt , −∞

R

and is directly proportional to the lightwave energy with the responsivity (cf. Table 6.2) as the proportionality constant. The electrical energy in the same pulse for a resistance R equal to one is given by

. E=

R

³∞

−∞

p2(t )dt .

Using = 1 and R = 1, compare the lightwave signal energy and the electrical signal energy for the pulses following: (a) p(t ) = A rect(t ) (b) p(t ) = A sinc √ (t ) 2 (c) p(t ) = ( A/ 2π)e−t /2 . 2 Signal constellations A complex signal constellation with four points at the four two-dimensional coordinates E 1 0 , E 0 5 0 5 , E 1 5 0 , and E 2 0 5 is shown below. (a) For a system with only additive white gaussian noise, sketch the four decision regions, showing the boundaries between the regions. (b) By using the minimum distance dmin , determine the approximate probability of a detection error in terms of E N 0.



(, )



(., .)



(., )



(, .)

/

Q

I

3 The effect of a constant-bias signal on the optimal threshold Consider the two three-point signal constellations shown below where the three points in each case form an equilateral triangle with sides of length d.

514

10 Modulation and Demodulation

(a) Determine the mean symbol energy E in terms of d when each of the three signal points is equidistant from the origin as in part (a) of the figure. Repeat for part (b). In this case, the three signal points do not have the same energy. (b) Partition the plane for each constellation into three optimal decision regions when the noise is additive white gaussian noise. (c) Partition the plane for the constellation shown in part (a) of the figure into three optimal decision regions for the case of zero-mean gaussian noise with a variance that is proportional to the mean signal. Compare your answer with the results of part (b) of this problem. (d) Repeat part (c) using part (b) of the figure and sketching the approximate decision regions. Compare these regions with the results for an additive white gaussian noise channel. Are the decision regions the same for both signal constellations? (a)

Q

(b)

Q

I I

4 Lightwave amplifier error rate using a gaussian probability density function

(a) For ideal intensity modulation with E0 = 0, and using an ideal lightwave amplifier with Nsp = G, show that the expression for can be written as

Q

Q = σs1 +− σs0 ≈ √2E E+1 √ K , 1 0 1

where K is the number of degrees of freedom, and E 1 is the mean number of photons at the input to the lightwave amplifier with s1 = E1 G. (b) Determine the relative increase in the expected number of photons E1 required to achieve pe = 10 −9 for K = 2. This expression models a polarizationinsensitive receiver that has twice as much noise per mode as a polarizationsensitive receiver. 5 Cascaded amplifiers including spontaneous–spontaneous-emission noise (a) Modify the expression for the parameter for a cascade of amplified fiber segments given in (9.7.3) to include the spontaneous–spontaneous-emission noise. (b) With 6, determine the relative error in pe when the spontaneous– spontaneous-emission noise is neglected compared with the case that includes the spontaneous–spontaneous-emission noise. What conclusion can you draw from this comparison?

Q

Q =

515

10.9 Problems

6 Probability of a detection error for an avalanche photodiode The probability of a detection error for a direct-photodetection demodulator that uses an avalanche photodiode (cf. Section 7.6) is determined using the probability mass function defined in (7.6.1). For this distribution, let 0 04 and G 50. (a) Plot the conditional probability density function for a mark using an expected number of primary photoelectrons k1 500. On this curve, overlay a plot of the gaussian distribution with the same mean and variance. (b) Using Figure 10.20 as an example, on the same figure as your answer to part (a), plot the conditional probability density function of a space with k0 150. Overlay on this figure a plot of a gaussian distribution with the same mean and variance. (c) For an equiprobable prior, determine the optimal threshold and the probability of a detection error using the probability mass function defined in (7.6.1). (This requires a numerical solution.) (d) For an equiprobable prior, determine the optimal threshold and the probability of a detection error using a gaussian distribution with the same mean and variance as the probability mass function defined in (7.6.1). Comment on the result.

κ= .

=

=

=

7

Q for an avalanche photodiode

Q for an avalanche photodiode can be √ √ Q = k1√− k0 ,

(a) Neglecting additive noise, show that written as

F

where k1 and k0 are, respectively, the expected number of primary photoelectrons for a mark and a space at the input to the avalanche photodiode, and F is the excess noise factor defined in (7.6.2). (b) Show that when additive noise with variance σ 2 is included, is modified to read

Q

− k0 ¼. ² k1 + σ 2 / F ¹Gº2 + k0 + σ 2/ F ¹Gº2 k1

Q = √ »² F

8 Multilevel intensity modulation A four-level Gray-coded intensity-modulated communication system is designed to achieve a probability of detection error pe . It has a mean background noise term s0 and a signal-independent noise variance 2. Using (9.6.8), determine the following. (a) The required value for . (b) The required expected signal levels s1 through s3 in terms of s0 , 2 , and pe . (c) The threshold values 1 through 3 . (d) The expected number of detected photons E per symbol. (e) The power penalty compared with an on–off-keyed intensity-modulated system operating at the same information rate.

σ

Q

³

³

σ

516

10 Modulation and Demodulation

9 The effect of the extinction ratio on the optimal threshold Consider a single-carrier system that transmits a mean power P at a symbol rate R on an optical fiber. The length of the span is L km, and the fiber has an attenuation of dB/km. The receiver is an ideal photon-counting receiver ( 1). (a) Determine the expected number of photons for a mark E1 and the expected number of photons for a space E0 in terms of the expected number of photons per bit Eb and the transmitter extinction ratio ex defined in (7.5.6). (b) Derive an expression that relates the extinction ratio ex to the error rate pe . (c) Now suppose that the dark current in the photodetector is 10% of the mean photodetected electrical signal. Determine the modified extinction ratio required to achieve the same probability of error as in part (b).

κ

η=

10 Fourier-series coefficients for a pseudorandom data sequence Random data sequences are often emulated by periodic sequences called pseudorandom sequences. An example of a 15-bit pseudorandom sequence is shown in the figure below. For this sequence, determine the following.

1

1

1

1

0

1

0

1

1

0

0

1

0

0

0

t

(a) The Fourier-series coefficients. (b) The proportion of the total power contained in all frequency components smaller than the symbol rate R = 1/ T , where T is the interval occupied by a bit. (c) The proportion of the total power contained in the frequency components that are less than the symbol rate R = 1/ T for a random binary datastream with a power density spectrum given by (2.2.62). Compare this result with your answer to part (b) and comment on using these sequences to emulate a random datastream. 11 Photon noise Let the power density spectrum for the spontaneous emission N sp expressed in terms of the expected number of photons have a value of two at a wavelength of 1550 nm.

517

10.9 Problems

(a) Determine the power density spectrum N sp from spontaneous emission in dBm/Hz and evaluate the noise power over a bandwidth of 25 GHz. (b) Compare the power density spectrum N sp with the power density spectrum for thermal noise N 0 = kT0 for a responsivity = 1 and a resistance R = 50 µ. (c) Let the expected number of signal photons Eb for a bit also have the value of two. Determine the probability of a detection error pe for both heterodyne and homodyne detection, including the effect of shot noise and spontaneous emission noise for an ideal photodetector ( η = 1). (d) Repeat part (c) neglecting photon noise, and determine the relative error in pe when photon noise is neglected.

R

12 MSK and Offset QPSK Starting with the phase continuity condition for MSK and offset QPSK given in (10.1.18), show that for the in-phase component given by cos j the following relationships hold. (a) cos j cos j −1 for any j when the data values are equal so that d j d j −1 . (b) cos j cos j − 1 whenever j is odd and the data values are unequal so that d j −1 dj . Using these results, show that the components for offset QPSK under a continuous phase constraint can change only at intervals of 2T when cos t 2T 0.

θ

θ = θ θ =− θ =−

=

(π / ) =

13 Signal constellations For each of the signal constellations shown in Figure 10.1, let dmin denote the smallest euclidean distance between any two points. Let E b E p log 2 L denote the 2 2 is the mean energy in a mean energy per bit, where E p 1 L s s ± R± I± pulse. (a) Compute E b as a function of dmin for each signal constellation. (b) Give approximate (union bound) expressions for pe as a function of E b for each signal constellation.

= = ( / )∑ ( + )

/

14 Detection probabilities Consider a constellation that consists of three equispaced signal points as shown in part (a) of the figure below. √

Q

Q



E

φ

E I

(a)



E cosφ

I

(b)

(a) Suppose that the noise is additive, with E / N0 = 5, where E is the mean symbol energy. i. Determine the optimal thresholds for detection. ii. Calculate the exact probability of a detection error. iii. Calculate the approximate probability of a detection error using the minimum distance.

518

10 Modulation and Demodulation

(b) Suppose that the demodulation occurs with a fixed phase error φ as shown in part (b) of the figure. Repeat part (a) using the same thresholds, and determine the power penalty in decibels as a function of φ. (This requires numerics.) 15 Detection error for phase-shift keying The marginal probability density function f φ of the phase (cf. (2.2.35)) for a constant plus a circularly symmetric gaussian random variable is an even periodic function with a period 2 , an angular frequency 0 2 T 1, and a cosine Fourier-series coefficient An with

(φ)

π

ω = π/ =

f φ (φ) =

1 2π

+

∞ ±

n=1

A n cos (n φ),

where the zero-frequency component is 1/ 2π . The coefficients of the cosine Fourier series are given by6 An ( F ) =

1 2

Î

Ì

F − F /2 e I(n −1)/2

¸F ¹

¸ F ¹Í

, is the modified Bessel function of order n, and where F = E / N0 = π

2

+ I(n +1)/2

2

where In (Eb / N 0)log2 L. (a) Using these expressions, show that the probability of a symbol error pe can be written as ∞ »n¼ L − 1 2π ± pe = − An ( F )sinc . L L L n =1

(b) Calculate the relative error in pe as a function of F for L = 4 when the approximate expression for pe given in (10.2.19) is used instead of the expression given in part (a). 16 Asynchronous demodulation An asynchronous demodulator generates the sample values from the squared magnitude of the received complex signal in additive noise. The noise is modeled as a circularly symmetric gaussian random variable with variance 12 for each signal component. For each interval in which a space is transmitted, the received signal is zero and only noise is received. The signal-only sample for a mark is a constant plus a circularly symmetric gaussian random variable with a variance of 22 for each independent signal component. (a) Determine the conditional probability density function for the squared magnitude of the sample value given that a space is transmitted. (b) Determine the conditional probability density function for the squared magnitude of the sample value given that a mark is transmitted, including the effect of the random mark signal. (c) Plot the two probability distributions for 12 1 and 22 5. (d) Determine an expression for the optimal threshold in terms of 12 and 22 . (e) Determine an expression for the bit error rate in terms of 12 and 22 using an equiprobable prior.

σ

σ

σ =

σ =

σ

6 See Prabhu (1969).

σ

σ

σ

519

10.9 Problems

17 Suboptimal asynchronous demodulation A simple method of suboptimal demodulation for on–off-keyed intensity modulation with signal–noise mixing is given in Figure 7.21. The optical noise is additive white gaussian noise with a constant power density N0 . This noise is bandlimited to a bandwidth B by a rectangular bandpass filter before direct photodetection. The detection filter then integrates the directly photodetected electrical signal r t over an interval of duration T to generate the sample. (a) Suppose that the received lightwave signal pulse is given by s t A rect t T . Determine the bandwidth B of the rectangular noise-suppressing filter that maximizes the sample signal-to-noise ratio . (b) Compare the sample signal-to-noise ratio derived in part (a) with the sample signal-to-noise ratio for a receiver that uses a matched filter in the optical domain before direct photodetection. (c) Suppose that T B 1. Would a different lightwave signal pulse produce a larger sample signal-to-noise ratio than the rectangular pulse? Why?

()

()=

(/ )

γ

=

γ

18 Noncoherent combining A BPSK data pulse is sent twice with no interference between the two pulses. Each copy of the pulse has a constant independent random phase i . (a) Show that the maximum-likelihood demodulator passes each pulse through a detector, then adds then output intensities to form the detection statistic. (b) Repeat if the pulse is sent L times, each with independent random phase. (c) Does this imply that a single pulse with rapidly varying phase be detected by integrating the intensity?

φ

19 Phase-synchronous and direct-photodetection demodulation with background noise This problem compares the probability of a detection error pe of a shot-noiselimited phase-synchronous homodyne demodulation and photon counting in the presence of background noise. The background noise is modeled as a constant photogeneration arrival rate . For dark current, this term is rdark e dark, where dark is a constant arrival rate and Wdark dark T is the number of background photoelectrons in an interval T for a constant dark-current arrival rate dark . (a) Using an appropriate wave-optics model for the background noise, derive an expression for the bit error rate for the phase-synchronous homodyne demodulation of binary phase-shift keying in the presence of background noise. (b) Compare this expression with the bit error rate for photon counting in the presence of a background noise term W0 given in (9.5.41) when W1 2Wb and W0 Wdark . (c) Which modulation format is more robust to the presence of background noise? Why?

= μ



μ

μ

μ

=

=

20 Comparison of phase-synchronous and phase-asynchronous modulation formats (a) Using (2.2.20), derive an approximate expression for the probability of a detection error for a binary phase-synchronous orthogonal modulation format for values of Eb N 0 much larger than one.

/

520

10 Modulation and Demodulation

(b) Using this result, show that whenever E b / N0 is large, the performance expressions for phase-synchronous demodulation and phase-asynchronous demodulation for an orthogonal binary modulation format have the same argument of the exponential function in the expression for pe . 21 Nearest neighbors for quadrature amplitude modulation The interior points, the exterior points, and the corner points of a square quadrature amplitude-modulation constellation have different numbers of nearest neighbors. Accounting for these differences, show that the average number of nearest neigh-

»

√ ¼

bors n¯ for QAM is 4 1 − 1/ L .

22 Phase-noise error floor for frequency-shift keying Consider an FSK demodulator consisting of two ideal rectangular passband filters of the form f fd 2 H f fc rect fd

( − )=

¸ ± / ¹ .

Each filter has a width f d and the two filters are separated in frequency by f d . Let the frequency spectrum associated with the phase noise be given by (7.8.7), with a full-width-half-maximum bandwidth of B . Determine the phase-noise error floor as a function of the ratio f d / B . 23 Magnitude of a lightwave signal compared with the squared magnitude Show that the probability of a detection error for ideal intensity modulation based on the magnitude of the lightwave signal is equivalent to the probability of a detection error based on the squared magnitude of the lightwave signal. 24 Differential-phase-shift keying Show that the maximum-likelihood demodulator for binary differential-phase-shift keying reduces to a comparison between r k rk +1 2 and rk rk +1 2 . How does this compare with the detection statistic rk rk −1 applied to a threshold?

( +

)

( −

)

25 Approximate forms for direct photodetection with lightwave amplifiers When a lightwave amplifier is used in a direct-photodetection system, the signaldependent variance of the conditional gaussian distribution f r 1 for a mark can be approximated by (cf. (7.7.11b))

(|)

σ12 ≈ 2E1Nsp , where E1 is the expected number of detected signal photons for a mark and N sp is the expected number of detected noise photons. (a) Show that, for typical conditions, the shot noise can be neglected in comparison with σ12 . (b) Using FNP ≈ 2Nsp / G (cf. (7.7.17)), derive an expression for the expected number of noise photons Nsp in terms of the noise figure FNP . (c) Using the expression from part (b) with ideal photodetection, derive the variance σ12 in terms of the noise figure, the gain G, and the mean number of photons E1 for a mark.

10.9 Problems

(d) Repeating the steps in part (c), derive an expression for the variance space. (e) Using the results from part (c) and (d), show that the value of is



Q = σs1 +− σs0 = E√1 −F 1

0



521

σ02 for a

Q

E0

NP

,

which is (9.7.3). 26 Basis for nonorthogonal signaling This problem refers to the set of nonorthogonal signals shown in Figure 10.18. (a) Show that the set of basis functions y1 t y2 t can be expressed in terms of a linear combination of the signal states s1 t s2 t . (b) Show that the expression derived in part (a) reduces to s j t y j t for j 1 2 when the angle between the two signals in Figure 10.18 is 2, so that the signals are orthogonal.

{ ( ), ( )} { ( ), ( )}

,

()= () π/

=

11

Interference

The unintentional redistribution of energy from a transmitted symbol to other symbols or other waveforms exists in all practical lightwave channels. It is called interference. The symbol energy may be redistributed between symbols in the same waveform, between polarization modes within the same spatial mode, or between the spatial modes of a multimode or multicore fiber. This means that the lightwave signal at the output of the fiber is corrupted by these various forms of interference as well as by noise. The detection methods derived in Chapter 9 considered only noise. Now, in this chapter, the discussion of detection is expanded to include the topic of interference. Intersymbol interference is caused by symbol dependences, either intentional or unintentional, within the datastream in a single subchannel. Interchannel interference is caused by dependences within datastreams in separate subchannels. These dependences may be linear or nonlinear. Both linear and nonlinear dependences occur for lightwave communication systems. While it may seem that a lightwave system should be designed to prevent interference, the demand for increased data rates on bandlimited channels inevitably leads to the presence of intersymbol interference that must be mitigated or accommodated in the receiver. This task is called interference equalization. Methods to do this vary both in performance and in complexity. Basic methods eliminate or suppress the interference prior to detection by means of a linear filter or by decision feedback. Advanced methods use the interference within the method of detection to improve the performance. Maximum-likelihood sequence detection accommodates interference by optimally processing a sequence of received samples as a block. The output of this technique is an estimate of the block of symbols. Another method, called maximum-posterior symbol detection, estimates each symbol individually based on the entire block of sample values marginalized to each individual symbol. This technique minimizes the symbol error rate instead of minimizing the block error rate. Minimum symbol probability-of-error and minimum block probability-of-error are not the same optimality criteria. A larger bit error rate with the bit errors clustered into block errors or error events may be preferred to a smaller bit error rate with the bit errors randomly distributed. A combination of several methods of managing interference may be used for the same channel waveform. In this case, the equalization is designed to first concentrate the widely dispersed energy from a single transmitted symbol into a smaller number of modulation intervals, as by a filter. The remaining dispersed symbol energy after this

11.1 Intersymbol Interference

523

linear equalization is then accommodated in other ways, such as maximum-likelihood sequence detection or maximum-posterior symbol detection. The term intersymbol interference refers to the spreading of energy from a channel symbol to the neighboring symbols of that waveform, such as is caused by conventional linear dispersion. A much different kind of interdependence between symbols, perhaps widely separated symbols, can be intentional. This is the basis of coded modulation, which is discussed in Chapter 13. Some algorithms used to accommodate intersymbol interference are also used for the decoding of the intentional dependences of a code. The sequence detection algorithms and their application to demodulation are discussed in this chapter. Applications of these algorithms to decoding are discussed in Chapter 13. Conventional practice is to treat the tasks of equalization and channel decoding separately. Equalization is local and decoding is global with respect to a long datastream.

11.1

Intersymbol Interference Figure 11.1(a) shows a transmitted lightwave waveform consisting of three pulses for which the energy of each symbol, as transmitted, is confined to the symbol interval T . Because of linear dispersion within the channel, the energy in each received pulse is spread to the J neighboring symbol intervals on each side of the transmitting interval. Figure 11.1(b) shows three nonnegative received electrical pulses with J = 1 and with only a small overlap. The received waveform is the sum of these three pulses. An illustration with larger J would show significant pulse overlap. A matched filter for the center pulse collects unwanted energy from the other pulses, even if that filter were truncated to the symbol interval T . Figure 11.2 shows a different instance of intersymbol interference. This is an overlapping sequence of real-baseband symbols as would result after demodulation of on–off-keyed binary intensity modulation. The individual transmitted on–off-keyed pulses are shown lightly. The superposition of these symbols given by the intensity of their sum is highlighted to show the resulting intersymbol interference. For a modulated T (a) Transmitted lightwave pulses t (b) Photodetected electrical pulses

t Figure 11.1 The transmitted and received pulses for a dispersive channel.

524

11 Interference

edutilpmA

Minimum separation

Time Figure 11.2 Intersymbol interference for on–off-keyed intensity modulation. The spacing between

the dashed lines represents the minimum separation between the minimum expected intensity for a mark and the maximum expected intensity for a space.

waveform, the intersymbol interference reduces the separation between the highest low amplitude and the lowest high amplitude, as compared with a sequence without interference. The minimum separation determines the error event that dominates the probability of a detection error. It is shown in Figure 11.2. It is not the minimum distance dmin because it includes the effect of intersymbol interference. For a phase-synchronous channel, the channel input begins as a sequence {s j } of real or complex values spaced by T representing the data. This real or complex sequence defines pulse amplitudes that are interpolated using a transmit pulse x (t ), typically real, to create the transmitted real or complex waveform (cf. (9.2.3) and Figure 9.3) s (t ) =

∞ ±

j =−∞

s j x (t

− j T ),

where s j is an element of the specified real or complex signal constellation. For phase-synchronous demodulation at the receiver, the pulse after demodulation and before filtering is given by p(t ) = x (t ) ± h(t ), where the impulse response of the lightwave channel h(t ) (cf. (8.1.4)) incorporates all scaling constants. Accordingly, the noisy received waveform r (t ) before filtering is a superposition of time-shifted and amplitude-modulated copies of the pulse p(t ) given by r (t ) =

∞ ±

j =−∞

s j p(t

− j T ) + n (t ),

(11.1.1)

with the additive real or complex noise n(t ) included. The received noisy waveform is passed through a detection filter with impulse response y(t ) to provide the waveform to be used for sampling, also called r (t ).1 The purpose of the detection filter is to mitigate the effect of the noise or the interference or both. Because convolution commutes, multiple filters can be placed in any order for a

()

1 For notational brevity, the received waveform may be denoted r t both before and after filtering.

11.1 Intersymbol Interference

525

linear time-invariant channel. This means that an equalization filter can be placed even in the transmitter to pre-equalize the transmitted pulse. Instead, or in combination, a pulse-shaping filter can be placed at the receiver to yield a suitable detection statistic. A single isolated pulse after the detection filter is given by (cf. (9.2.4)) q ( t ) = p ( t ) ± y( t )

= x (t ) ± h(t ) ± y(t ), (11.1.2) where q (t ) is usually normalized so that q (0) = 1. Accordingly, the noisy waveform in

(11.1.1) after filtering becomes a superposition of shifted and modulated copies of this filtered pulse to which the filtered noise is added, so that r (t ) =

∞ ±

j =−∞

s j q(t

− j T ) + n± (t ),

(11.1.3)

with the filtered noise n ±(t ) given by n ±(t ) = n(t ) ± y(t ). The detection filter y (t ) in (11.1.2) can be chosen so as to form at its output the target pulse q (t ) satisfying an appropriate optimization criterion. Ignoring scaling constants, the noisy sample value r k at time kT after the detection filter is rk

= r (kT ) =

∞ ±

j =−∞

s j q (( k − j )T ) + n ± (kT )

= skq 0 +

± j ²= k

s j qk− j

+ n±k ,

(11.1.4)

where qk = q(kT ) and n±k = n ±(kT ) is a real or complex sample of the filtered noise. The first term of the last line is the transmitted symbol sk at time kT. The second term, which consists of the summation over all j not equal to k, is the intersymbol interference. More specifically, it is called the linear intersymbol interference. The effect of intersymbol interference can be visualized using a so-called eye diagram. An example of an eye diagram for a binary signal constellation is shown in Figure 11.3. The superimposed trajectories of multiple waveforms over multiple modulation intervals form the eye diagram which is used to assess the system performance. The waveform does not change from the previous value, or changes from a high to a low, or changes from a low to a high. The profile of the transition is given by the pulse (a)

(b)

Symbol interval

Sample point

Symbol interval

Sample point

t Figure 11.3 (a) A noise-free eye diagram showing intersymbol interference. (b) An eye diagram

for a Nyquist pulse.

t

526

11 Interference

shape q (t ) after the detection filter. When the waveform after the filter has a large eye opening in amplitude, the system is tolerant of noise or errors in the threshold used for hard-decision detection. When the waveform has a wide eye opening in time, the system is tolerant of errors in the sampling time. Figure 11.3(a) shows an example of an eye diagram for a binary waveform for which the equalization is incomplete and so a Nyquist target pulse is not achieved. Figure 11.3(b) shows another example for which a Nyquist target pulse is achieved. In the first case, a small amount of intersymbol interference remains. The trajectories do not all cross at the sample point. The intersymbol interference “closes the eye” at the sample point by about 10%. This corresponds to about 0.5 dB effective loss in the signal-tonoise ratio γ . If the unequalized channel dispersion were to increase, this loss would become worse. The eye diagram also provides a way to visualize the effect of clock error. If the sample time is incorrect by T /4, then the eye opening of Figure 11.3(a) at that incorrect time is only about 80% as wide, which corresponds to about 1 dB loss in the signal-to-noise ratio. Although a matched filter maximizes the signal-to-noise ratio γ of an isolated pulse, it may partially close the eye of a modulated waveform, thereby increasing the probability of a detection error. Maximizing the signal-to-noise ratio and minimizing the intersymbol interference can be conflicting requirements on the design of a detection filter. This motivates the use of methods of detection based on the sequence of soft-decision detected samples instead of a single hard-decision detected sample. Methods of sequence detection are discussed in Section 11.3. For simple binary phase-shift keying, the effect of the uncompensated intersymbol interference on the probability of a detection error can be determined from the minimum spacing deye of the eye diagram at the sampling instant. Amplification at the sampler does not affect the ratio of the signal to the noise (cf. (9.4.3)), so, with this ratio held constant, the signal can be regarded as normalized such that the signal sample values for an isolated pulse are ³1. The distance between these values is two, so with no intersymbol interference dmin = 2. A negative pulse has the smallest magnitude when each neighboring pulse has a sign that causes its interfering sidelobe to be positive. This is the worst-case interference, where the noise-free sample value r k , denoted by an overbar, is written as rk

= −1 +

± ²² j ²= k

²

sk − j q j ² .

(11.1.5a)

Similarly, for a positive pulse with the worst-case interference, rk

=1−

± ²² j ²= k

²

sk − j q j ² .

(11.1.5b)

Combining these expressions, the worst-case uncompensated spacing deye is

⎛ ⎞ ± ² ² ²s k − j q j ²⎠ , deye = 2 ⎝1 − j ²= k

(11.1.6)

11.2 Equalization

527

rather than the minimum distance of dmin = 2 when intersymbol interference is absent. This reduction in the minimum spacing, which is due to intersymbol interference when this worst-case occurs, is equivalent to a reduction in energy by the ( ∑ interference )pattern 2 factor 1 − j ²= k |sk − j q j | . For a worst-case analysis, this effective loss, although intermittent, is often regarded as a permanent loss in the signal energy when calculating the probability of a detection error.

11.2

Equalization The mitigation of linear intersymbol interference within a single datastream prior to detection is called equalization.2 Equalization by means of a linear filter for the received signal is called linear equalization. The elementary functional blocks of a phase-synchronous discrete-time electrical channel are shown in Figure 11.4. A phase-asynchronous lightwave channel is discussed in Section 11.5. The detection filter y (t ) that achieves the desired target pulse q (t ) at the sampler depends both on the transmitted pulse x (t ) (cf. (11.1.2)) and on the channel impulse response h (t ). Whenever the probability pe of a detection error is to be minimized, the appropriate optimality criterion for defining y(t ) is, of course, the minimum probability of a detection error itself. This criterion, however, is defined at the output of a nonlinear process that uses thresholds. Instead, one may choose to minimize the error variance in the signal at the input to the detection process or, which is perhaps better in some cases, to maximize the signal-to-noise ratio at the input to the detection process. Maximizing the signal-to-noise ratio is appropriate for phase-shift-keyed signal constellations, including BPSK and QPSK with hard-decision detection, because the detection threshold does not depend on the pulse amplitude. Minimizing the error variance is also appropriate for soft-decision detection or for a quadrature-amplitude-modulated (QAM) signal constellation. Simple optimality criteria are based on linear methods such as adjusting the form of a linear detection filter so as to maximize the sample signal-to-noise ratio or minimize the sample error. Other optimality criteria minimize the probability of a detection error using nonlinear methods based on the symbol samples. These methods are discussed in Section 11.3. This section discusses a linear equalization filter designed to minimize the error ek in the kth output sample. With reference to (11.1.4), this error is n(t)

no (t)

s(t) =

± j

s jδ(t ± jT )

x(t)

h(t)

Data sequence

Interpolation

Channel

+ Lightwave noise

Linear Photodetection

r(t)

+ Electrical noise

Figure 11.4 Block diagram of a phase-synchronous electrical channel. 2 Equalization was first introduced to make a spectrum flat, or equal.

Sample y(t) Equalization

r k = r(kT)

528

11 Interference

ek

= r k − sk ± = (q0 − 1)sk + s j qk − j + n ±k . j ²= k

(11.2.1)

The three terms have different significance. The first term of (11.2.1) is the error when the signal sample is not equal to one. For a modulation format that uses a threshold equal to zero, this term affects only the signal-to-noise ratio, but not the placement of the threshold. For a modulation format that uses a threshold not equal to zero, this term will also cause misalignment of the signal with the threshold. The second term of (11.2.1) is the error due to intersymbol interference from the neighboring symbol intervals. This term depends on the neighboring symbols and is random through the randomness of those symbols. The last term of (11.2.1) is the filtered real or complex noise sample. This term is intrinsically random. The following subsections describe various criteria for equalization based on ek . 11.2.1

Zero-Forcing Equalization

For q0 = 1 and q j = 0 for j ² = 0, there is no intersymbol interference. Then, but for the noise, there is no error because the received noiseless sample r k is equal to sk . Imposing this zero-forcing criterion defines the zero-forcing equalizer, which recovers a Nyquist pulse (cf. Section 8.3) from the received pulse p(t ). When q (t ) is a Nyquist pulse, the noisy sample is rk

= sk + n ±k ,

(11.2.2)

and there is no intersymbol interference in the sample r k . Detection is then a simple symbol-by-symbol thresholding operation. In general, the noise samples n±k are correlated. For white noise n (t ), the samples are uncorrelated only when the zero-forcing detection filter y (t ) also satisfies the condition that y (t ) ± y ∗ (−t ) is a Nyquist pulse (cf. (9.4.12)). When this condition is not satisfied, the noise samples are not independent even though the noise before the filter is white. Although the noise samples may be correlated, symbol-by-symbol thresholding is still an acceptable detector, but it is not optimal. Moreover, although forcing the target pulse q (t ) to be a Nyquist pulse eliminates intersymbol interference, it need not minimize the variance of the filtered real or complex noise sample n±k in (11.2.2). Therefore, the ideal case is to have a detection filter that is both a zero-forcing filter and a matched filter. These two conditions cannot be simultaneously satisfied, in general, by a detection filter in the receiver. They can be simultaneously satisfied for a linear time-invariant channel by a combination of a suitable pulse-shaping filter in the transmitter and a detection filter in the receiver.

11.2.2

Matched Filter Equalization

The matched filter, as developed in Section 9.4.2, maximizes the signal-to-noise ratio of an isolated pulse received in white noise. However, when a modulated sequence

11.2 Equalization

529

of pulses is passed through the matched filter, the filter output will have intersymbol interference unless it is a Nyquist pulse. But a filter y (t ) matched to the pulse p(t ) is equal to p∗ (−t ), so the filter output is p(t ) ± p∗ (−t ). If one is to remove intersymbol interference, this means that the matched filter p (t ) must also satisfy the condition that | P( f )|2 is the Fourier transform of a Nyquist pulse. To maximize the signal-to-noise ratio without introducing intersymbol interference requires a shaping filter at the transmitter forming a transmitted pulse x (t ) such that the target pulse q (t ) = p(t ) ± p∗ (−t ) is a Nyquist pulse. This emphasizes the fact that both a detection filter y(t ) at the receiver and a pulse-shaping filter at the transmitter are required in order to control the combination of intersymbol interference and noise. As an example, the unit-interval (T = 1) Nyquist target pulse with a raised cosine spectrum sin(π t )cos (βπ t ) ( ) q (t ) = π t 1 − (2βt )2 , given in (9.2.7) for a channel with no dispersion, is obtained at the output of a detection filter matched to the transmit pulse if the Fourier transform of the transmit pulse x (t ) is (cf. (9.2.8))

⎧ ⎪⎨ 1 √ 1/ 2 X(f ) = ⎪⎩ (1/ 2)(1 + cos [(π/β)(| f | − (1 − β)/2]) 0

| f | ≤ (1 − β)/2 (1 − β)/2 ≤ | f | ≤ (1 + β)/2 otherwise , for for

because then Q ( f ) = | X ( f )| 2. Then the transmitted waveform s (t ) is (cf. (9.2.6)) s (t ) =

∞ ±

j =−∞

s j x (t

− j T ),

where x (t ) is the inverse Fourier transform of X ( f ). The transmitted pulse x (t ) is not itself a Nyquist pulse, and so the s j are not directly visible within the waveform s (t ). The s j become visible only in the sampled outputs of the detection filter. In this way, the task of equalization is distributed between the transmit pulse shape x (t ) and the detection filter y(t ). Any channel dispersion h(t ) must be known at the transmitter and incorporated into the definition of x (t ). For a channel with a transfer function H ( f ), the modified transmit pulse in the frequency domain that accounts for the dispersion should be X ( f )/ H ( f ). Methods to estimate h (t ) from the channel output are discussed in Section 12.5. To use this estimate in the transmitter, however, requires a reverse channel from the receiver back to the transmitter. 11.2.3

Minimum-Error Linear Equalizer

When the transmit pulse x (t ) is constrained in some way or the channel impulse response h (t ) is not known at the transmitter, the matched filter output q(t ) might not be a Nyquist pulse. In this case, a linear detection filter can be designed instead to simply balance the errors from intersymbol interference and the errors from noise by minimizing the variance of the sample error given in (11.2.1). A linear detection filter satisfying this criterion is called a minimum mean-squared-error equalizer. Minimizing

530

11 Interference

the minimum mean-squared error is not the same criterion as minimizing the probability of a detection error. Rather, because the detected symbols must consist of the discrete set of points of a signal constellation, the minimum probability of detection error is not obtained by a linear filter. Indeed, optimal detection involves a nonlinear sequence detector as discussed in Section 11.3. Were additive white noise the only impairment, the detection filter that produces the minimum mean-squared error would be a matched filter. Moreover, when the noise is also gaussian, the matched filter produces the minimum probability of a detection error (cf. Section 9.4.2). On the other hand, were intersymbol interference the only source of error, then the intersymbol interference would be eliminated using a zero-forcing detection filter y(t ) because the equalized pulse q (t ) is then a Nyquist pulse. When both intersymbol interference and noise are present, the minimum-error criterion defines an objective function which, when minimized by the choice of the detection filter, balances the two types of errors. Linear detection filters based on a different criterion may lead to a different balance between the noise and the interference, as discussed in Section 12.5.2. For a nongaussian noise source, the filter that produces the minimum mean-squared error does not produce the minimum probability of detection error even when there is no intersymbol interference. Nevertheless, minimum mean-squared-error equalizers implemented as linear filters are widely used in dispersive, noisy channels. This is because linear filters are robust. They are not dependent on the higher-order moments of the probability density function, which might not be fully known. 11.2.4

Detection Filters for Additive White Noise

The sample value r k at time kT after the detection filter y(t ) is given by (11.1.4). The data symbols {sk } are real or complex numbers comprising the samples of a random stationary datastream with variance σs2 = ´s 2µ−´s µ2 . Let J be such that the J samples of the matched-filter output both before and after the maximum sample include all samples that contain significant energy. The duration of the impulse response after the matched filter is 2 J + 1. The optimality criterion studied in this section is a linear method based on minimizing the mean-squared error using a linear detection filter. In contrast the optimality criterion studied in Section 11.3.2 is based on minimizing the probability of error using a nonlinear maximum-likelihood demodulator. The detection filter y (t ) that minimizes the mean-squared error in the presence of both additive white gaussian noise and intersymbol interference can be derived using variational calculus (cf. Section 9.4.3) to find the stationary point in function space. The impulse response y(t ) of the resulting linear detection filter has the form 3 y (t ) = 3 See George (1965) and Tufts (1965).

K ±

k =−K

ck p∗ (kT

− t ),

(11.2.3)

11.2 Equalization

p(t)

Matched Sample filter v(t) p *(−t)

vk

Transversal filter ck

531

qk

Equalized output

()

()

Figure 11.5 A linear detection filter y t for a received pulse p t partitioned into a matched filter

and a transversal filter.

where K is an empirical constant to be discussed below, and p∗ (−t ) is the matched filter for the received electrical pulse p(t ) in white noise, not necessarily gaussian. The sequence of coefficients {ck } defines a discrete-time filter called a transversal filter. This linear equalization filter minimizes the mean-squared error in the samples. It does not, in general, minimize the probability of a detection error. The overall linear detection filter is shown in Figure 11.5. The output of the matched filter is sampled synchronously with the phase and frequency of the data clock. The kth noiseless sample vk of the matched filter output for the signal pulse p (t ) is

² vk =. p(t ) ± p∗ (kT − t )²t =kT .

(11.2.4)

This is the input to the transversal filter. Each filter coefficient ck is determined from the discrete-time target impulse response {qk } of the channel shown in Figure 11.4. The kth noiseless target sample qk of the transversal filter output is qk

=

K ±

j =− K

c j vk − j ,

(11.2.5)

where (11.2.4) has been used, from which the ck are computed. The received pulse p(t ) is the convolution of the transmit pulse x (t ) and the infiniteduration impulse response h(t ) of the lightwave channel given in (8.1.4). Therefore, the pulse response v(t ) at the output of the matched filter has an infinite duration. To implement the summation given in (11.2.5), the number of nonzero coefficients of the transversal filter sequence {ck } must be truncated to the finite number of significant samples, which is denoted 2K + 1. This truncation sets the size of the transversal filter. Suppose that the pulse response v(t ) at the output of the matched filter is symmetric about the peak value of the pulse as shown in Figure 11.6(a). The sample of the pulse response at the output of the matched filter at which the value of vk is maximum is indexed by k = 0. The sample v0 may be delayed with respect to the first sample of v(t ) that contains energy. The part of the pulse v(t ) in the time interval before the maximum value is called predetection intersymbol interference while the part of the pulse v(t ) in the time interval after the maximum value v0 is called postdetection intersymbol interference. The transversal filter is described by the sequence {ck } with 2K + 1 coefficients, and the length of the sequence {z k } after the noise filter is 2J + 1 so the length of the output pulse sequence is 2( K + J ) + 1. Therefore, it is not generally possible to completely compensate for intersymbol interference using a finite number of transversal filter coefficients. This error decreases as the number of coefficients K used in the transversal

532

11 Interference

r(t)

Postdetection interference

Predetection interference

p∗ (± t )

Sampler

z ±K

...

z±1

X

c1

X

−2

−1

0

1

2

Postdetection interference

z0

z1

...

zK

Matched filter

cK

−3

Peak matched filter output

Predetection interference

v (t)

3

c0

X

c±1

X

c±K

X

r0

Index (b)

(a)

v(t ) for the output of the matched filter, showing the samples that contribute to predetection and postdetection interference. (b) The structure of the transversal filter.

Figure 11.6 (a) A symmetric pulse response

filter becomes large compared with the number of symbol intervals J that contain a proportion of the pulse energy after the noise filter. The sequence of samples z k at the output of the matched filter is zk

=

± j

s j v k− j

+ n± (t ),

(11.2.6)

where s j is an element of the real or complex signal constellation, vk is a sample at the output of the matched filter, and n± (t ) is the filtered noise after the matched filter. At the output of the transversal filter, the sequence of noisy samples r k is rk

= =

K ±

j =− K K ±

j =− K

c j zk − j

+ n±k

(11.2.7a)

s j qk − j

+ n±k ,

(11.2.7b)

where qk is the target discrete-time impulse response given in (11.2.5) and n ±k is a sample of the filtered noise with variance σn2± given by

σn2± =

N0 2

K ±

k =−K

| yk |2 ,

(11.2.8)

where N0 is the equivalent constant-power-density spectrum at the input to the detection filter y(t ). This expression is the discrete equivalent of (2.2.65b). The summation in (11.2.7a) is shown pictorially in Figure 11.6b along with the terms generated by the predetection interference and the postdetection interference. Using (11.2.7b), the sample value r k can be expressed as rk

= q0 sk +

−1 ±

j =− K

q j sk − j

+

K ± j =1

q j sk− j

+ n±k ,

(11.2.9)

where the terms, in order, are the desired value q0sk , a postdetection interference term, a predetection interference term, and a filtered-noise term.

11.2 Equalization

11.2.5

533

Decision Feedback

A nonlinear equalization technique, called decision feedback, removes intersymbol interference by using the previously detected data values ³ sk to estimate and cancel out the interference. This method is appropriate when the probability of a detection error is sufficiently small that the previous estimates ³ sk of the demodulated values have a low error rate. Decision feedback is appropriate as such for intersymbol interference that follows the sample. Decision feedback may be combined with a partial zero-forcing equalizer or a minimum mean-squared-error equalizer. For a real-valued sequence with output sample r k given in (11.2.9), subtract a term based on the previous estimated symbols ³ sk − j for j = 1, . . . , K . The sequence of modified sample values rk± is written as r ±k

= rk −

K ± j =1

= q 0 sk +

q j³ sk− j

−1 ±

j =− K

⎞ ⎛K K ± ± q j sk− j − q j³ sk− j ⎠ + n ±k . q j sk − j + ⎝ j =1

If the previous estimates ³ sk contain no errors, then ³ sk r ±k

= q0 sk +

−1 ±

j =− K

q j sk− j

j =1

= sk and + n±k .

(11.2.10)

For this ideal case, all of the predetection interference that occurs from detected symbols before the sampling time is canceled by the feedback, leaving only the desired value q0sk along with any postdetection intersymbol interference that occurs from samples after the sampling time. Decision feedback is not appropriate unless the detected symbols ³ sk have a low error rate. Otherwise, incorrect values of ³ sk are sometimes fed back. A single incorrect decision could then tend to produce multiple subsequent errors, leading to error propagation. Thus, the resulting errors in the demodulated symbols ³ sk tend to come in clusters, with the length of the cluster related to the duration of the postdetection interference. When decision feedback is used in combination with a linear equalizer, the transversal filter coefficients ck must adapt to the presence of the feedback. Initially, when there are no previous symbol estimates, there is no need for feedback. As the estimates ³ sk become available, the linear equalizer compensates for the postdetection intersymbol interference that occurs from samples after the sampling time. In this case, the transversal filter coefficients for predetection interference are set to zero with ck = 0 for k > 0. 11.2.6

Prefiltering and Precoding

Dispersion in the channel could be fully or partially anticipated at the transmitter. Because linear time-invariant operations can be executed in any order, a compensation filter can be placed prior to the channel dispersion without changing the overall

534

11 Interference

effect on the signal. Such a filter in the transmitter is called a prefilter. A prefilter may require the transmitter to have knowledge about changes in the impulse response h (t ) of the channel. A prefilter in the transmitter affects intersymbol interference, but does not affect noise introduced within the channel. A detection filter in the receiver affects both intersymbol interference and noise. Together, a prefilter and a detection filter can control both of these impairments. The use of a prefilter that inverts or partially inverts the impulse response of the channel is called pre-equalization. An evident example of pre-equalization is a sign-reversed quadratic phase shift applied to the transmitted signal that equalizes the quadratic phase shift caused by fiber dispersion (cf. Section 8.1). A special case of prefiltering uses a simple code at the transmitter that generates the inverse of the dependences introduced by the channel into the discrete-time samples. This is called precoding. Precoding has an advantage over decision-feedback demodulation because the symbols at the transmitter are known with certainty. One form of coded modulation that compensates for the channel response, called a partial-response code, is discussed in Sections 9.2 and 13.7.

Equalization and Channel Coding in a Dispersive Lightwave Channel

Chapter 14 will show that the use of a code to prevent errors, as discussed in Chapter 13, is unavoidable in a high-performance communication system. Our usual practice is to regard the channel as fixed, then choose the code to satisfy performance goals. Instead, the channel can be changed to accommodate the desired code. The use of a code then impacts other aspects of the system. Using a channel code to transmit at the same information rate R while keeping the signal constellation fixed in size requires a channel symbol rate equal to R / Rc . The term Rc , the code rate, is the ratio of the length k of a dataword to the length n of a codeword (cf. Chapter 13). The transfer function of a dispersive lightwave channel, given in (8.1.3), is repeated here as 2 2 H ( f ) = e−i 2π β 2 L f ,

(11.2.11)

where β2 is the group-velocity dispersion coefficient (cf. (4.4.16)), and L is the length of the fiber span. Figure 11.7 plots the input lightwave pulse power and the output pulse 10

8 edutilpmA

edutilpmA

6

4

4 2

4

2

–1

0 Time

1

2

0

(c) Rc = 0.5

8

6

6

0 –2

10

(b) R c = 0.75

edutilpmA

8

10

(a) Uncoded

2 –2

–1

0 Time

1

2

0

–2

–1

0

1

Time

Figure 11.7 The input (solid) and output (dashed) lightwave pulse powers for (a) an uncoded

system, (b) a system with a code rate R c = 0.75, and (c) a system with a code rate Rc Lower code rates produce more spreading, which must be equalized at the receiver.

= 0.5.

2

11.3 Sequence Detection

535

power for an uncoded system and two code rates. The figure shows that as the coding overhead increases, the input coded pulse width must decrease to maintain the same data rate. This leads to a broadened output pulse that depends strongly on the code rate because of the quadratic frequency-dependence of H ( f ) given in (11.2.11). This means that the task of equalization is more difficult when a low-rate channel code is used.

11.3

Sequence Detection Sequence detection views an interdependent sequence of measured samples as a whole, estimating the entire underlying symbol sequence as such rather than estimating each symbol of that sequence in isolation. This is appropriate only when there is dependence between the received samples. Otherwise, sequence detection provides no improvement. Dependence between the received samples caused by intersymbol interference is discussed in this chapter. Dependence between the transmitted symbols intentionally inserted by a channel code is discussed in Chapter 13. Accordingly, the algorithms for sequence detection described in this section are relevant both for equalization in this chapter and for decoding in Chapter 13. Both maximum-likelihood sequence detection and componentwise maximum-posterior sequence detection will be discussed. Minimum-distance sequence detection is discussed first, and in detail, as a special case of maximum-likelihood sequence detection. Maximum-likelihood sequence detection minimizes the probability of a block detection error for equally likely sequences. This error criterion is indifferent to the number of incorrect symbols in an incorrect sequence. Minimizing the probability of a block error does not necessarily minimize the probability of a symbol error. In contrast, componentwise maximum-posterior sequence detection minimizes the probability of error of each symbol using evidence gleaned from all symbols of the sequence. It does not necessarily minimize the probability of a block error. The two criteria are different and, as criteria, neither can be said to be better. Indeed, it may be difficult to say which is to be preferred. The filtered received waveform at the sampler, as shown in Figure 9.3, is a superposition of shifted copies q(t − j T ) of the target pulse, with the j th pulse of the sequence modulated by the real or complex value s j representing the data in the j th interval (cf. (11.1.3)), r (t ) =

∞ ±

j =−∞

s j q (t

− j T ) + n ± (t ).

For additive white gaussian noise at the input to the detection filter, the noise samples at the output may be dependent or may be independent, depending on the form of the detection filter. Here we consider only independent noise samples. In some cases, these noise samples could be dependent but treated as independent. The only dependence considered here is in the output symbol sequence itself.

536

11 Interference

When the detection filter y (t ) shown in Figure 9.3 does not produce a Nyquist pulse, each received pulse q (t ) is spread into nearby samples both before and after the peak of q(t ). The number of other symbols deemed to significantly affect the sample value for a given sample is called the constraint length of the interference, and denoted ν . For a symbol from a signal constellation with L values, there are L ν subsequences from the ν other modulation symbols that could cause interference with that symbol. Each symbol of the sequence has this many possible subsequences that can cause intersymbol interference, one of which will be present. The set of possible subsequences that could interact with a given symbol grows exponentially in the constraint length ν . The interfering samples of the target pulse q(t ) may occur both before and after the peak. For expository convenience, it is useful, and does no harm, to index the samples of the pulse starting at the first significant sample of the pulse, which may occur before the peak of the pulse. With this indexing, the pulse is causal. A causal pulse does not interfere with a pulse that comes before it. Each symbol in a demodulated sequence depends on ν transmitted symbols s j −1 , . . . , s j −ν . The values are indexed with respect to the first sample of the pulse out of the detection filter that is deemed to contain energy. These nonzero signal values are represented by a channel state vector σ j , σj

= {s j −1, s j −2 , . . . , s j −ν+1 , s j −ν },

(11.3.1)

where s j is the complex amplitude of the j th transmitted pulse as seen at the receiver, and the vector σ j is written with the most recent symbol at the left. Because of intersymbol interference, a received sample r j is affected by s j and all transmitted symbols in the state vector. As an example, for binary antipodal signaling with r j = s j + as j −1 + bs j −2 + n j , where the two constants (a, b) describe the interference, the constraint length ν is two and L is two. Then σ j = {s j −1 , s j −2 }. Because L ν = 4, there are four possible channel states, {0, 0}, {1, 0}, {0, 1}, and {1, 1}. For each index j , the state corresponds to the pair of transmitted bits at indices j − 1 and j − 2. The most likely transmitted sequence s j (t ) is determined using the equiprobable prior on the set of sequences. For this prior, the entire sequence s j (t ) of length K can be regarded as a single symbol in a higher-dimensional signal space. Each sequence has the same likelihood, and the most-likely sequence is detected. This observation leads to an extension of symbol detection to sequence detection that is called maximum-likelihood sequence detection or maximum-likelihood sequence estimation. A variation of maximum-likelihood sequence detection is minimum-distance sequence detection. For the case of additive white gaussian noise, minimum-distance sequence detection is the same as maximum-likelihood detection. For other forms of noise, minimum-distance sequence detection is a computationally attractive procedure, but it is not the maximum-likelihood sequence detector. Minimum-distance sequence detection will be described in Section 11.3.2. The general case of maximum-likelihood sequence detection is discussed in Section 11.3.3.

11.3 Sequence Detection

11.3.1

537

Trellis Diagrams

Sequence detection is best described using a kind of diagram called a trellis. A trellis consists of an array of nodes arranged as a sequence of columns called frames. For a symbol alphabet of size L and constraint length ν , each frame has L ν nodes representing the L ν possible states of the sequence. Each frame corresponds to one term of the transmitted sequence, and the frames are indexed in the same way that the sequence is indexed. The frame index can be regarded as discrete time. The L ν nodes in a frame represent the L ν distinct states for that frame of the sequence. A state consists of the values of the previous ν modulation symbols. Each node corresponds to a state of a sequence of constraint length ν . The nodes in successive frames are connected by branches. Branches between the nodes describe the possible transitions from one state to another state as the time index changes. Interference caused by linear dispersion has the form of a convolution of real or complex sequences that can be described by means of a trellis. Because a trellis does not itself require linearity, a trellis can also describe nonlinear intersymbol interference that is described by real or complex numbers. The use of a trellis to treat nonlinear intersymbol interference is discussed in Section 11.6. Moreover, a trellis can describe a channel code that is defined by a convolution in a finite field as described in Section 13.2. The use of a trellis to represent channel codes is treated in Chapter 13. The short trellis depicted in Figure 11.8 corresponds to binary antipodal signaling with r j = s j + as j −1 + bs j − 2 + n j . Because binary signaling is used, the size L of the alphabet is two. Because two other symbols can affect the current sample, the constraint length ν is two. The number of nodes in each frame of the trellis is given by L ν = 4. A trellis with this L and ν could be many thousands of frames long, one frame organized as a column for each transmitted symbol in a long block or message. Simply replicate the frame in the center of the trellis an arbitrary number of times to form the long trellis. The example in Figure 11.8 is a short trellis only six frames long, two of which are attached for termination and contain no data. The termination provides a guard interval in the waveform before the next block of symbols begins. Noisy Sample Value

r0

rj 00

Channel State Values

0

1

r1 0

00

1

10

0 1

r2

r3

0

00

00

1 0

10 01

1 0

1

00

1 0

10

1 1

01

10

1

11

1

0

01

00

0

0

00

0 0

0

0

0 11

0

0

01

11

ν = 2. The nodes are labeled with the most recent databit at the left. The highlighted path through the trellis corresponds to the binary sequence {1, 1, 0, 1, 0, 0} . Figure 11.8 A terminated binary trellis with constraint length

538

11 Interference

The four nodes in each column of Figure 11.8 are labeled with the four possible states of the trellis, which states are the two most recent databits. These four states are 00, 10, 01, and 11, with the most recent of the two databits at the left. The initial state of the trellis is defined as the state labeled “00.” This can be regarded as the initial condition that there are two virtual zero databits prior to the start. The trellis begins with two transitions from the initial state corresponding to the two possible values of the first databit of the sequence. Therefore, for j = 1, there are two nodes, corresponding to two states: state 00 corresponding to the transmitted bit s0 = 0 and state 10 corresponding to the transmitted bit s0 = 1. The labeling of each node is conventional, with the newer bit s j −1 to the left of the older bit s j −2 . For each of these two nodes, there are two branches that lead to two of the four nodes for the next frame. Each branch is labeled with the databit s j corresponding to that branch as shown on Figure 11.8. For this reason, each of the four states has two branches leaving and two branches entering. The highlighted path in Figure 11.8 corresponds to the data sequence 1, 1, 0, 1, 0, 0 read from left to right. The last two zeros are artificial, contain no data, and are used to terminate the trellis. Each of the two branches leaving a node is labeled with a databit, either zero or one, associated with that branch, as shown by the branch labels on Figure 11.8. A sequence {r j } of received noisy sample values is written above the trellis in Figure 11.8. For a softdecision demodulator, which is the usual case for intersymbol interference, equalization of each realization of sample value r j is a real (or complex) number. For a hard-decision demodulator, each sample value r j would be an estimated bit, either a zero or a one, but this is not the usual case for equalization. A sequence estimator chooses the path through the trellis that best agrees with the sequence of sample values in the sense of the minimum distance between the sequence of samples and the sequence of branch labels. The distance measured is either the euclidean distance for soft-decision data or the Hamming distance for hard-decision data. The symbol s j is the transmitted symbol associated with a branch. The expected value of the received noise-free symbol for softdecision data associated with a branch is r j = s j + as j −1 + bs j −2. There are many paths through the trellis of Figure 11.8, and many more paths for a much longer version of this trellis – each path going from the designated start node on the left to the designated stop node on the right. A path through the trellis is a sequence of branches, organized branch by branch, each branch labeled with a value s j for j = 1, . . . , K . For the binary case, the s j may be from the binary alphabet {0, 1}, from the real bipolar alphabet {− A , A }, or from the real on–off alphabet {0, A}. 11.3.2

Minimum-Distance Sequence Detection

Every branch of a frame in Figure 11.8 is explicitly labeled with a data value s j . Every frame is labeled the same way. Every branch also has an implicit label, not shown, corresponding to the value of the noise-free sample r j on that branch. This value is given by r j = s j + as j −1 + bs j −2 . The actual received sequence of noisy samples {r j } is written above the frames. According to the received sequence, every branch is assigned a

11.3 Sequence Detection

539

parameter called the branch metric. The branch metric is the distance d(r j , r j ) between the noiseless sample r j for that branch and the actual noisy sample r j that was received. When the specific channel state σ ± is under consideration, the branch metric is written as d±+1(σ ± , σ ±+1 ). For minimum-distance sequence detection, the branch metric for intersymbol interference is the squared euclidean distance. In turn, for each path, the sum of all branch ∑ K −1 d2 (r , r ). The goal of metrics on that path is called the path metric D K = j j j=0 minimum-distance sequence detection is to pick the path from the initial node to the terminal node that is nearest, as measured by the path metric, to the received noisy sequence r. The minimum path distance to each node of frame k + 1 can be determined iteratively by first calculating the minimum path metric to every node in frame k. Then, for each node in the new frame k + 1, one extends each path from a node in the previous frame k to that node in frame k + 1 using the branch metric. The path in this set of extended paths to that node that has the minimum distance is then selected. For the trellis shown in Figure 11.8, L = ν = 2 and there are four nodes in each frame and, for each node, two possible branches terminate on that node. Each of the two branches completes a path to that node whose path metric is obtained by augmenting the two path metrics from nodes in the previous frame. These possible paths usually have different values of the path metric. The most likely path for a node in the new frame is the path with the smallest path metric. The other path is discarded. As an example, Figure 11.9 shows three real sample values r0 , r1, and r 2 for the initial segment of the trellis shown in Figure 11.8. The intersymbol interference in the example is described as the noise-free sample r j = s j + 0.2s j − 1 + 0.1s j −2, where s j ∈ {0, 1}. The squared euclidean distance branch metric is (r j − r j )2 = (s j + 0.1s j −1 + Noisy Sample Values r0 = 1.16

00

0 (–1.162)

1

(–0.162 )

r1 = 0.83

00 10

0 (–0.832) 1 (0 .17 2 ) 0 (– 0.7 2 3)

1

(0 .2 72 )

r2 = 0.21

00 2.035 10 1.375 01 0.559 11 0.0985

0 (–0.212 )

1 (0

00

.79 2 )

0.571 2) .11 –0 2) ( 10 0 9 1 (0.8 0 (–0 1.35 .11 2) 1( 0.8 01 2 9) 2) 0.0986 .01 0 (–0 11

1 (0.99 2)

... ... ...

...

1.08

,

Figure 11.9 The accumulated distance using the first three sample values r 0 r 1, and r 2 of the trellis shown in Figure 11.8 with the branch metric s j 0 1s j −1 0 1s j −2 r j 2. The

( + .

+ .

− )

minimum total accumulated path metric is shown below each node and is the sum along the best path to that node.

540

11 Interference

0.1s j −2 − r j )2 . The branch metrics defined between the connected nodes are shown for each possible received bit, and the summed path metric, which is the total accumulated distance, is calculated for each of the four possible states given by the two samples. Now let r2 = 0.21 and extend the path to the next frame. Examining Figure 11.9, there are two ways to arrive at node 00, either from node 00 or from node 01. The path that produces the minimum accumulated distance is highlighted in black. The other path is shown in gray. Because all the surviving paths originate from node 10 in the second frame, the first bit of the sequence must be a one, and is thus detected as such.

Viterbi Algorithm

The Viterbi algorithm is an efficient iterative procedure for finding the data sequence at the minimum distance from the received sequence. The previous paragraph motivates this algorithm. (For soft-decision sequences in additive gaussian noise, the distance is the euclidean distance. For hard-decision binary sequences in equiprobable binary noise, the distance would be the Hamming distance (cf. Section 13.2.2)). Choose an integer N larger than the constraint length ν ; a choice of N that is two or three times the constraint length ν will usually do. The best path and its path metric from the start node of the trellis to each node in the N th frame must be computed. For a suitable value of N , all of these paths will almost always agree in the first branch, so the first bit is known and shifted out of the computation. The first frame is then dropped from the trellis, thereby creating a new start node. Then compute the most likely path from the new start node to every node in the (N + 1)th frame and repeat so as to detect the second bit. Continue this process indefinitely, performing the computations iteratively. Upon computing the best path to all nodes in frame k + 1, the (k + 1 − N )th bit is detected. The iteration from the kth frame to the (k+1)th frame uses a computational step called add–compare–select. The new iteration begins knowing from the previous iteration the best path to each of the nodes in the previous frame and the path metric of that best path. There is no need to refer to other paths. Only the best path to each node needs to be remembered. These best paths and their path metrics were stored during the previous iteration. Only the best paths to the nodes in the kth frame are extended to the (k + 1)th frame. The best path to a node in the (k + 1)th frame is computed as follows. For each branch leading to that node starting from a node in the kth frame, add that branch metric to the path metric of that node in the kth frame. Then compare all such metrics, and select the smallest, discarding the others. Do this for every node in the (k + 1)th frame. Each remaining path through the trellis after this step is called a surviving path. A decision on the value of the transmitted value of the j th bit is determined when all the surviving paths in the trellis pass through the same node for an initial value of j . This is highly likely with a sufficiently large choice of N , and detection failures are rare.

Detection Errors

The probability of a detection error for minimum-euclidean-distance sequence detection is determined using the same line of reasoning as used for the probability of a detection error for symbol-by-symbol detection (cf. Section 10.2.3). For this purpose,

11.3 Sequence Detection

541

the minimum distance of a signal constellation is replaced with the minimum euclidean distance in the set of sequences. The union bound can be used to bound the probability of a detection error. However, because there are an infinite number of sequences, there are an infinite number of terms in the union bound. Most of these terms are either superfluous or quite small, but those terms are hard to recognize as such. Let dmin be the minimum euclidean distance between any two sequences. Any sequence at euclidean distance dmin from a sequence is a nearest neighbor of that sequence. The most common detection error is one that selects a nearest-neighbor sequence in place of the correct sequence. The probability of a sequence detection error is given by (10.2.13), and can be written using the union bound as pe


dmin , and many are superfluous.4 If the signal-to-noise ratio, appearing here as 2 dmin /4N 0, is sufficiently large, the latter terms can be neglected, though the inequality is then no longer a formally valid inequality. Averaging over the remaining sequences leads to the informal inequality pe

² 2n¯ erfc

´µ

2 /4N dmin 0



,

(11.3.3)

where n is the average of the number of nearest neighbors over all sequences. For any small ², this inequality will hold to within ² for a sufficiently large signal-to-noise ratio. The validity of this informal expression as a function of the signal-to-noise ratio is explored by simulation.

Minimum-Error Equalizer

Section 11.2.3 described the optimization of an equalizer based on the minimummean-squared-error criterion. This criterion leads to a linear equalizer. Therefore, small changes in the noisy received signal lead to only small changes in the equalized signal. When an equalized signal is detected using hard-decision detection, which compares the equalized sample with one or more thresholds, a small change in the noisy received signal may lead to a different detected signal point. This results in a nonlinear minimumerror equalizer, which is, itself, the detector. The minimum probability of error detection can be described as a modification of Figure 11.5, replacing the transversal filter with a Viterbi detector. 4 Although each of these neglected terms is small, there are a great many such terms for long sequences. As

the signal-to-noise ratio becomes small, the erfc function does not decrease faster than does the accumulation of the many small terms, and at some point those many small terms cannot be ignored.

542

11 Interference

11.3.3

Maximum-Likelihood Sequence Detection

The minimum-probability-of-error detector can be introduced informally with reference to Figure 11.5 simply by replacing the transversal filter with a Viterbi detector. Minimum-distance sequence detection using the euclidean distance as described in Section 11.3.2 is an embodiment of the maximum-likelihood procedure for the case of additive white gaussian noise because, in this case, the log-likelihood reduces to the euclidean distance. For other cases, minimum-distance sequence detection may still be a convenient and successful procedure, but it is no longer maximum-likelihood sequence detection. The general form of maximum-likelihood sequence detection is described in this section. To this end, the path metric D K between two sequences r and s, each of length K , is generalized as

.

D K (r, s) = − log e ³(s; r),

(11.3.4)

where ³(s; r) is the likelihood function p(s|r) (cf. (9.5.9)). For additive white gaussian noise on a real-baseband or complex-baseband channel, this path metric reduces to the euclidean distance, and so this is the case studied in Section 11.3.2. As an example, for loge ³(s; r) given in (10.2.4b), D K −1 can be written as D K −1 (r, s) = |s|2 − 2 Re[ r · s]

=

·

R

|s (t )|2 dt − 2 Re

¸· R

r (t )s ∗(t )dt

¹

,

(11.3.5)

where R is an integration region to be discussed below. The term | s|2 does not depend on r (t ). It can be suppressed for those cases for which all sequences have the same energy. Otherwise, this term must be kept. The second term of DK −1 is equivalent to a bank of matched filters for sequences that project the demodulated noisy sequence r onto each possible demodulated noise-free sequence s. Our task here is to organize the compact structure of (11.3.5) into a tractable computational structure. ∑ For a sequence s± (t ) of length K , s± (t ) = Kj =−01 s j q(t − j T ) (cf. (11.1.3)), where the s j are appropriate to that ±. The first term of (11.3.5) can be written as

·

R

|s±(t )|

2

dt

= =

where Z j −m

±±

K −1 K −1 j =0 m =0

±±

K −1 K −1 j =0 m =0

= Zm∗ − j =

s j s∗

·

m

R

q(t

− j T )q ∗ (t − mT )dt

s j sm∗ Z j −m ,

· R

q (t

− j T )q ∗(t − mT )dt .

(11.3.6)

The integration region R is chosen to include the maximum shifted target pulse response over the constraint length ν . The term Z j −m depends only on the intersymbol interference and not on the data. It depends on the difference j − m except for the ν values

11.3 Sequence Detection

543

at the end points of the sequence. When the sequence length K is much greater than the constraint length ν , the effects of terminating the convolution, as shown on the right of Figure 11.8, can be neglected. In this case, Z j −m is nonzero when the shifted target pulses q (t − j T ) and q∗ (t − mT ) overlap. This occurs when | j − m| ≤ ν , so the absolute difference of the shifted target pulses is less than the constraint length. In a similar way, the second term in (11.3.5) can be written as 2 Re

¸·

R

r (t )s ∗ (t )dt

⎡ K −1 · ⎤ ± = 2 Re ⎣ s∗j r (t )q ∗(t − j T )dt ⎦ R ⎡ Kj =−01 ⎤ ± = 2 Re ⎣ s∗j Y j ⎦ ,

¹

j =0

where Yj

=

· R

r (t )q ∗(t

− j T )dt

(11.3.7)

is the projection of the noisy demodulated waveform r (t ) onto the known target pulse response q (t ). Optimal detection is now reduced to the task of minimizing the total path metric DK

⎡ K −1 ⎤ K −1 K −1 ±± ∗ ± s j sm Z j −m = −2 Re ⎣ s∗j Y j ⎦ + j =0

(11.3.8)

j =0 m =0

over all possible data sequences {s± } of length K . To proceed, decompose the total path metric DK into the path metric DK −1 to the channel state σ K −1 and a branch metric dK (σ K −1, σ K ) from the channel state σ K −1 to the channel state σ K as follows: D K −1

⎛ ⎡ K −2 ⎤ K −2 K −2 ⎞ ± ± ± = ⎝−2 Re ⎣ s ∗j Y j ⎦ + s j sm∗ Z j −m ⎠ + d (σ − , σ ). j =0

K

j =0 m = 0

K 1

K

Using (11.3.8), the first parenthesized term on the right is the path metric DK −1 for a sequence of length K − 1. The second term on the right, dK

⎡ ⎛ K −2 ± (σ − , σ ) = Re ⎣s ∗ − ⎝−2Y j + 2 K 1

K

K 1

m =K −1−ν

sm Z K − 1−m

⎞⎤ + s − Z 0⎠⎦ , K 1

(11.3.9)

is the branch metric, the development of which is asked for as an end-of-chapter exercise. This branch metric is a function of only the previous ν symbols that range from K − 1 − ν to K − 2. Repeating the factorization, the overall path metric D K for this example may be written as DK

= Dν +

±

K −1 j =ν

d j +1(σ j , σ j +1 ),

(11.3.10)

544

11 Interference

where

⎡ν−1 ⎤ ν−1 ν−1 ± ∗ ⎦ ±± ∗ Dν = −2 Re ⎣ s j Yj + s j sm Z j −m j =0

j =0 m =0

(11.3.11)

are the initial path metrics to the channel state σ ν . This factorization is the basis of the Viterbi algorithm now used to find the maximumlikelihood path. Other systems described by a different likelihood function ³ (s; r) will have a different path metric. One such metric is discussed in Section 11.6.1. For a sequence of length K , this factorization reduces the problem of searching through an exponentially large number of possible transmitted sequences to a problem of determining which of the ν branch metrics produces the minimum accumulated distance as each noisy demodulated component r j in the sequence is demodulated. The sequence-detection computational problem is now exponential in the constraint length ν instead of exponential in the sequence length K . 11.3.4

Maximum-Posterior Sequence Detection

Given an unknown transmitted sequence s with a prior p(s) and a received sequence r with a conditional probability distribution p(r|s), three meanings of the maximumposterior detection from the received sequence r should be distinguished:

³s = max p(s| r), s ³sk = max p(sk |rk ), s ³sk = max p(sk |r), s k

k

(11.3.12a) (11.3.12b) (11.3.12c)

where p(sk | r) in (11.3.12c) is the marginalization of p (s|r) to the kth symbol. The first expression may be called the sequence maximum posterior, the second the component maximum posterior, and the third the componentwise maximum posterior for an equiprobable prior distribution. The first expression is the same as maximum-likelihood sequence detection, which is the topic of Section 11.3.3 and is not of separate interest in this section. The second expression detects each transmitted symbol individually without reference to evidence from the other received samples. The third expression detects each symbol individually, but first computes the probability of that symbol on the basis of all the evidence that can be gleaned from the other samples of the entire received sequence r, as will be described. Maximum-posterior detection has come to mean maximum-posterior componentwise detection as in (11.3.12c), which is the meaning used herein. The expression for maximum-posterior detection given in (11.3.12c) describes symbol-by-symbol demodulation. This method generates the posterior probability distribution on the alphabet of symbols {s± } for each component of a sequence. This posterior probability is based on evidence from all components of the received sequence, which differs from the criterion used for maximum-likelihood sequence detection given in (11.3.12a), and from the criterion used for symbol-by-symbol detection given in (11.3.12b).

11.3 Sequence Detection

545

The logarithm of the single-letter posterior probability p(sk |rk ) of the symbol sk is called the intrinsic evidence for symbol sk . Additional extrinsic evidence is derived from the rest of the demodulated sequence by the process of marginalization. In this chapter, extrinsic evidence results from the symbol dependences introduced by intersymbol interference. In Chapter 13, extrinsic evidence results from the intentional symbol dependences introduced by a code. The componentwise posterior probabilities generated by the marginalization process are then used to detect the symbols one-by-one. As such, the posterior probabilities can be used with a hard-decision detection process to form a componentwise estimate of each symbol. Alternatively, the posterior probabilities can be used as soft decisions on the individual symbols and sent directly to the decoder as such. Posterior probabilities were introduced in Section 9.5 to study memoryless channels for which the prior probability ps ± for each symbol s± may be different. Extending that analysis to sequences, consider the prior for a set of sequences and the prior for the component symbols within each sequence. If the sequences are equally likely, then the ratio of posterior probabilities for each sequence can be expressed as a likelihood ratio (cf. (9.5.12)), and maximum-likelihood sequence detection can be used. However, if the sequence is demodulated on a symbol-by-symbol basis, then, for a particular component k within the sequence, the assumption of equally likely symbols is not typically satisfied because of the dependent structure of the demodulated sequence. This dependent structure is caused by a combination of unintentional dependences caused by intersymbol interference and intentional dependences caused by coded modulation. For this case, there is additional intrinsic evidence contained in each symbol and additional extrinsic evidence contained in other symbols caused by the dependent structure of the sequence. The posterior probability for the kth component of a sequence can be derived by applying Bayes’ rule (cf. (2.2.9)) to the demodulated sequence (or block) of symbols r, p(s| r) =

p(s) p(r|s) . p(r)

(11.3.13)

The left side of this expression is the conditional probability that the sequence s was transmitted given that the noisy sequence r is received. This is the posterior probability of the sequence. On the right, p(r|s) is the conditional probability that the sequence r is received given that the sequence s was transmitted, p(s) is the prior probability that the sequence s was transmitted, and p (r) is the probability that the sequence r is received. To estimate the kth component sk of the transmitted sequence, we require the marginal posterior probability p(sk |r) for each possible symbol s for each component k. The number of values of s depends on the signal constellation. This marginal posterior probability is conditioned on the complete received sequence r.

Marginalization

The process of marginalization is now explained by a simple example using a data blocklength of two. The received sequence consists of three noisy samples (r1 , r2 , r3 ) generated from two binary data symbols (s1, s2 ) with a specified prior p = p(s1 , s2 )

546

11 Interference

by a duobinary partial-response waveform. The three sample values at the output are given by

= s1 + n 1 , r2 = s1 + s2 + n 2 , r3 = s2 + n 3 , r1

where ni is a sample of the noise. These noise terms are random variables whose probability distributions are known. These are regarded here to be independent gaussian random variables with variance σ 2 , but the discussion of marginalization applies in general to any probability distribution on (n1 , n 2, n3 ). The Bayes relationship now gives p(s1 , s2) p (r 1, r2, r3| s1, s2) . p ( r1 , r2 , r3 )

p (s1, s2|r 1, r 2, r 3) =

(11.3.14)

The prior p(s1 , s2 ) is known. The channel model p (r1 , r2 , r3 |s1, s2 ) is known. The received sequence (r1 , r2 , r3 ) is observed. Therefore, for each (s1 , s2 ) pair, the numerical values of all the terms on the right are known. Therefore, for each (s1, s2) pair, the probability p (s1, s2| r1, r2, r3) on the left can be computed for the received sequence. The marginalized probabilities on the first bit s1 are obtained by summing out the second bit as follows: p (s1

= 0|r1 , r2 , r3 ) = p (0, 0|r1, r2, r3) + p (0, 1|r1, r2, r3 ) , p (s1 = 1|r1 , r2 , r3 ) = p (1, 0|r1, r2, r3) + p (1, 1|r1, r2, r3 ) .

(11.3.15a) (11.3.15b)

Then the first databit s1 is demodulated as either zero or one according to which probability is larger. Because these two probabilities sum to one, only the first of the two probabilities actually needs to be computed in this way. The marginalized probabilities for the second databit s2 are computed in the same way. This simple procedure extends in a straightforward way to longer sequences and larger alphabets such as {0, . . . , L − 1}. The marginal probability p(sk |r) is determined by summing out the probabilities of all components in the sequence except for the kth component. The i th summation sign, with m = i , is over the letters in the alphabet and ∑ 1 . Thus should be understood as sLi − =0 p(sk

= d |r) =

±

m =1

···

±

±

m =k−1

m =k +1

···

±

m= K

p(s| r, sk

= d),

(11.3.16a)

which is computed for each k from 1 to K for each letter d in the alphabet of size L . This is written in compact form as p (sk

= d |r) =

±

m :∼k

p(s|r, sk

= d ),

(11.3.16b)

where the notation m :∼ k indicates that except for component sk , every component is summed out over all possible values as in (11.3.15) to form the marginal posterior probability p(sk = d| r). The computation of this posterior probability p(sk = d| r) for each component sk from the joint posterior probability p(s| r) is called marginalization

11.3 Sequence Detection

547

of the block probability to the kth component. For an alphabet of size L, there are L K −1 terms in the sum. For L = 2 and K − 1 = 20, which is a small blocklength, there are more than one million terms in that sum. This computation must be executed for each k. This marginal posterior probability for the kth component must be computed for each letter d in the channel input alphabet of size L . Because sk can take on L values, there are L marginal probabilities for each component k, one of which is trivial to compute because the marginal probabilities must sum to one. For a sequence of length K , this means that there are K ( L − 1) posterior probabilities to evaluate, each of which involves L K −1 terms to be added. For K in the thousands, this computation is regarded as impractical in general. However, an efficient algorithm, discussed at the end of this section, is available for sequences that can be described by a trellis of small constraint length. For a binary sequence, the two posterior probabilities for the kth component correspond to sk = 0 and sk = 1. To determine the maximum-posterior transmitted bit for the kth component, given that the sequence r is received, form the posterior probability ratio u (sk |r) of the posterior probability that the value sk = 0 was transmitted to the posterior probability that the value sk = 1 was transmitted. Using (11.3.13), and observing that the term p(r) is common to both terms, this ratio can be written as

∑ p(sk = 0| r) . s :s =0 p (r|s) p (s) u(sk |r) = = ∑ , (11.3.17) p(sk = 1| r) s :s =1 p (r|s) p (s) ∑ where the notation indicates that the summation is over all possible transmitted k

k

s :s k =d

sequences that have the value d for the symbol sk . For binary modulation, d takes only the two values of {0, 1}, giving the simple ratio in (11.3.17). The conditional posterior probability ratio u(sk | r) is used in the detection process. For hard-decision detection, the optimal decision rule for each bit in the sequence is given by (9.5.8) choose H0 if u (sk |r) ≥ 1,

choose H1 if u (sk |r) < 1,

(11.3.18a) (11.3.18b)

where H0 is the assertion “sk = 0” and H1 is the assertion “sk = 1.” This process is repeated for each component k of the sequence to recover each databit of the sequence, and thereby to recover the entire sequence.

Intrinsic Evidence and Extrinsic Evidence

It is instructive to compare maximum-posterior sequence detection with other demodulation techniques for a sequence of bits corrupted by memoryless binary noise. The joint prior probability for the sequence p(s) can be factored into the product of the K bitwise prior probabilities, p ( s) =

K º

m =1

p(sm )

= p ( sk )

K º m =1 m ²=k

p(sm ).

(11.3.19)

548

11 Interference

Substituting this expression into (11.3.17), and factoring out the prior probability for the kth component from the numerator and the denominator, gives

∑ ¿K = 0) s:s =0 p(r|s) m :∼ k p(sm ) u (sk |r) = . ∑ ¿K » ¼½= 1)¾ s:s =1 p(r|s) m:∼ k p(sm ) p(sk p(sk

k

(11.3.20)

k

rk

The first term, rk , on the right is the ratio of the prior probabilities for the kth component (cf. (9.5.11)). The second term includes both the intrinsic evidence contained in the kth component of the sequence and the extrinsic evidence contained in the other components of the sequence. When there are dependences between the symbols of the sequence, (11.3.20) becomes nontrivial. Its computation can be intractable unless the dependences have a simple structure, as in the next subsection. Indeed, the simplest case is trivial. Then the sequence has no symbol dependences such as intersymbol interference. The channel is memoryless, and each output depends only on the corresponding input. The block conditional probability p(r|s) is a product of the conditional probabilities for the components of the sequence with p(r|s) =

K º

k =1

p(rk | sk ).

Substitute this expression into (11.3.20) and factor out the conditional probability p(r| sk ) for the kth component. This gives

À ∑ ¿K Á = 0) p (r|sk = 0) m :∼k m :∼k p (sm ) p (r| sm ) u (sk |r) = ∑ ¿K p(s ) p(r|s ) m m k » ¼½= 1¾) »p (r|s¼½k = 1¾) » m:∼k m:∼¼½ ¾ p(sk p(sk

rk

= rk λk .

λk

1

(11.3.21)

The numerator and denominator of the middle factor λk are the intrinsic evidence for sk = 0 and sk = 1, respectively. Expression (11.3.21) is the posterior probability ratio (cf. (9.5.11)) for the kth component, with λk being the likelihood ratio defined in (9.5.12). The third factor is the ratio of the extrinsic evidence for sk = 0 and sk = 1, respectively. When the sequence has no symbol dependence, there is no extrinsic evidence provided by the other components of the sequence and maximum-posterior sequence demodulation reduces to maximum-posterior symbol-by-symbol demodulation (cf. (11.3.12b)). If the priors for each symbol are equal, then rk = 1, and the posterior probability ratio u (sk |r) is equal to the likelihood ratio λ k = λ(sk ).

The Bahl Algorithm

The Bahl algorithm is an efficient method of marginalization for sequences with a small constraint length. It computes the posterior probabilities needed to form (11.3.17). This algorithm will be discussed for the binary modulation alphabet for sequences with a constraint length of ν . This algorithm is described graphically on a terminated trellis

11.3 Sequence Detection

549

with 2 ν states. The terminated trellis shown in Figure 11.8, which can be lengthened by the insertion of more frames, will serve as a short example. A terminated trellis begins with a single designated start node and ends with a single designated stop node. For a trellis consisting of n data frames, there are 2 n paths through the trellis. For n = 1000, there are 21000 ≈ 10300 paths, so an algorithm organized by paths is intractable. However, there are only 1000 frames, so an algorithm organized by frames might be tractable. The Bahl algorithm is such an algorithm. To explain the Bahl algorithm, the marginalization given in (11.3.16a) will be described in terms of a longer version of the trellis of Figure 11.8, such as a version with 1000 frames. The memoryless received noisy sequence to be marginalized is r = (r1, r2, . . . , r K ), with the received symbol ri at the i th position. There are eight branches in frame i of the trellis connecting the four nodes at the beginning of the frame to the four nodes at the end of the frame. Each node at the beginning of frame i has two branches leaving it, and each node at the end of frame i has two branches entering it. Each branch leaving a node of frame i is labeled with either databit zero or databit one, corresponding to the transmitted symbol si . Each branch of frame i has its own expected received signal given by r i = si + asi −1 + bsi − 2, which depends on the trellis state at the node starting that branch and the form of the interference. The probability of receiving ri according to that branch is p(ri |r i ). For gaussian noise this probability is √ 2 2 p(r i |r i ) = 1/( 2πσ) e(ri −r i ) /2σ . Given the received sequence, the conditional probability of a path leading from the start node to the terminal node is the product of all conditional probabilities attached to all branches comprising that path. Every path from the start node to the terminal node has such a probability that can be computed in this way, but there are 21000 such paths, so there are 21000 probabilities. These 2 1000 probabilities sum to one. Most will be exceedingly small. The maximum-likelihood path is the path with the largest probability. The maximum-posterior value of the kth bit including the extrinsic evidence is based on marginalization to that bit. The marginalization to a zero bit (respectively a one bit) in the kth frame can now be seen as the sum of the probabilities of all paths from the start node to the terminal node that go through a branch labeled with a zero bit (respectively a one bit) in the kth frame. For the trellis shown in Figure 11.8, there are four such branches in the kth frame. Although there are far too many paths for us to compute all path probabilities, there are not too many nodes. There are only 1000 frames, and each has four nodes. Rather than calculate the path probabilities, compute the node probabilities in two ways described as follows. For each node in the kth frame, compute the sum of the probabilities of all paths from the start node to that node. These are the forward node metrics pF (i j ) to the kth frame, where i and j label the nodes in that frame. Also compute the sum probabilities of all paths from each node in the kth frame to the terminal node (or from the terminal node to that node of the kth frame). These are the backward node metrics p B (i j ) to the kth frame. Figure 11.10 illustrates one frame of the Bahl algorithm. The forward node metric pF (i j ) of node i j at the left of the kth frame is the sum of the probabilities of all paths from the start node to that node. The backward node metric pB (i j ) of node i j at the

11 Interference

00

0

00 pF (00)

Start node

0

10 pF(10) 01 pF(01)

...

k –1

00 pB(00) 0

10 pB(10)

Terminal node

.. .

01 pB(01)

0

11 pF(11)

00

550

k

11 pB(11)

k+1

Figure 11.10 Marginalization to bit zero using the Bahl algorithm for frame k of a trellis.

right of the kth frame is the sum of the probabilities of all paths from that node to the terminal node. For frame k shown in Figure 11.10, the marginalization to bit zero is the sum of the probabilities of all paths from the start node to the terminal node that go through a branch labeled zero in the kth frame. These branches are shown as solid lines in Figure 11.10. The marginalization p(sk = 0|r ) to bit zero for the kth frame can be seen from Figure 11.10 to be the sum p(sk

= 0|r ) = p (00) p(0|00) p (00) + p (10) p(0|10) p (01) + p (01) p(0|01) p (00) + p (11) p(0|11 ) p (01), F

B

F

F

B

B

F

B

(11.3.22)

which is the sum of the probabilities of the four paths traversing the four branches labeled zero in the kth frame. The other four branches shown as dashed lines are used to calculate p(sk = 1|r ). Figure 11.10 also illustrates another aspect of the Bahl algorithm. Given pF (i j ) for all nodes at frame k − 1, it is easy to calculate p F (i j ) for all nodes at frame k. Indeed, forward node metrics for all frames can be first computed and stored by moving forward from left to right, starting at the initial node of the trellis. The complexity is proportional to 2ν . Likewise, given pB (i j ) for all nodes of frame k + 1, it is easy to calculate pB (i j ) for all nodes of frame k. Accordingly, backward node metrics for all frames can be then computed and stored by moving backward from right to left, starting at the terminal node of the trellis. The complexity is proportional to 2ν . To calculate the forward and backward node metrics, two temporary arrays of width L ν and length n are created. For our example, each array is 4 by 1000. The algorithm fills the first array from the top down and the second array from the bottom up. The L ν entries in the kth row of the array correspond to the L ν nodes in the kth frame of the trellis. The entries in the kth row of the first array are the sums of the probabilities of all the paths reaching that node beginning at the start node. Similarly, each entry in the second array is the sum of the probabilities of all the paths leaving that node ending at the terminal node. When both arrays are filled, they are combined, frame by frame as indicated for the example in Figure 11.10, and expressed in (11.3.22). While the Bahl algorithm and the Viterbi algorithm both move sequentially through the same trellis, they are significantly different at the computational level because they are computing different metrics. The fundamental computational step in the Viterbi algorithm is “add–compare–select,” with the algorithm moving through the trellis in one direction from the start node to the stop node. The fundamental computational step

11.4 Interchannel Interference

551

in the Bahl algorithm is “sum–product,” with the algorithm moving through the terminated trellis both forward and backward. Accordingly, this algorithm is also called the forward–backward algorithm.5 It is an instance of a sum–product algorithm. The Bahl algorithm has a complexity per detected bit proportional to 2ν , where ν is the constraint length. This implies that the Bahl algorithm is not practical for large constraint lengths.

11.4

Interchannel Interference Interchannel interference occurs between subchannels defined using multiple subcarriers. In the simplest formulation, each subcarrier has a similar power density spectrum given by Sm ( f − f m ), where fm is the individual subcarrier frequency offset. Each subcarrier is noncoherent, with each receiver observing only its own signal degraded by the interchannel interference from the other subchannels. The waveforms in the other subchannels are not observed, as such, by the receiver, and cannot be used in the detection of the subchannel of interest. When the subcarriers are mutually noncoherent, the power in the interchannel interference in a subchannel of interest is determined by summing the power of each interfering subchannel within the bandwidth B of the subchannel of interest. Conventional practice is to equalize the power density spectrum in each subchannel so that the power density spectrum of every subchannel differs only by the center frequency f m . For this system, the additive interference power PL in the m ± th subchannel can be written as PL

=

± · B /2

m ²= m ±

−B / 2

Sm ( f

− f m )d f ,

(11.4.1)

where S m ( f − f m ) is a subchannel-dependent power density spectrum. When each subchannel is constrained to have the same power and the number of subchannels is large, the interference power PL in most subchannels is proportional to the signal power Ps in a single subchannel with PL = α Ps , where α is the proportion of the power in a single subchannel that produces interference.

11.4.1

Uncompensated Linear Interchannel Interference

When the interchannel interference is not estimated, as is common, it can be regarded as a form of noise. The probability density function of the additive interchannel interference depends on the statistical properties of the interfering signals. A simple model describes the modulation format for each subchannel as a gaussian random process. For this case, the additive interchannel interference is a linear combination of multiple gaussian random processes and hence is also a gaussian random process. Then each sample of the demodulated waveform is treated as a gaussian random variable with a variance that is the sum of a signal-independent noise term Pn and a linear interference term PL 5 The letters BCJR are also used here to refer to the four authors who first published this algorithm. See

Bahl, Cocke, Jelinek, and Raviv (1974).

11 Interference

from the other subchannels that depends on the total power. The corresponding optical signal-to-noise-plus-interference ratio (OSNIR) in a single subchannel is OSNIR =

Ps

Ps Pn

+P = L

Pn

+ α Ps .

(11.4.2)

For large signal powers or many interfering subchannels, the linear interchannel interference term α Ps can be larger than the additive-noise term Pn , resulting in an interference-limited channel. For such a case, the performance of each channel is quantified by the optical signal-to-interference ratio, which is defined as OSIR = Ps / PL = 1/α. This is a constant independent of the increasing signal power and the additive noise term Pn .

Interchannel Interference in a Network When M wavelength subchannels generated at one location are terminated at a common location, processing the aggregate received signal may be considered as an M-by- M mimo signal, so the interference among subchannels can be estimated, partially canceled out or otherwise accommodated. In some lightwave networks, however, wavelength subchannels generated at one source location of the network may not have the same destination. For such cases, a subset of wavelength subchannels may be dropped or added at different locations or nodes within the network.6 A simple network with two fiber segments and one intermediate network node is depicted in Figure 11.11. A signal can be observed only at the termination of that subchannel. Any interference generated over the first fiber segment when the M − N wavelength subchannels couple into the N dropped subchannels cannot be fully compensated for at the intermediate network node using only information derived from the N dropped wavelength subchannels because the remaining M − N wavelength subchannels that produced interference are not seen at the intermediate node. For this case, the interference from the M − N wavelength subchannels generated over the first fiber segment is a source of noise, which is possibly nongaussian.

Segment 1

xuM

xumeD

...

M wavelength subchannels

Segment 2

M ...

...

552

N wavelength subchannels dropped/added Figure 11.11 A node of an optical network schematically depicted as a dashed box, showing

several wavelength subchannels being dropped and added.

6 This type of wavelength multiplexing/demultiplexing device is called a reconfigurable add–drop

multiplexer.

11.4 Interchannel Interference

11.4.2

553

Uncompensated Nonlinear Interchannel Interference

Nonlinear interchannel interference is generated between wavelength subchannels from cross-phase modulation and four-wave mixing, as was discussed in Section 5.4. Under specific operating conditions, the probability density function of the nonlinear interference PNL can be taken to be a circularly symmetric gaussian random variable with a variance that depends on the strength of the nonlinearity. These conditions occur when there are a large number of symbol pulses in other wavelength subchannels that walk off and interfere with a single symbol pulse in a wavelength subchannel of interest (cf. Figure 5.1). Asserting the central limit theorem, the nonlinear interchannel interference in each wavelength subchannel can be modeled as a circularly symmetric gaussian random variable. The strength of the nonlinear interference power PNL depends on the source term for the nonlinear Schrödinger equation. For a Kerr nonlinearity, the interference power is proportional to the product of the signal power in three other wavelength subchannels, which need not be distinct. When all of the wavelength subchannels have the same signal power Ps , the nonlinear interference power PNL has a Ps3 dependence. Including this nonlinear interference term, the optical signal-to-noise-plusinterference ratio given in (11.4.2) is modified to read OSNIR =

Pn +

Ps . PNL + PL

(11.4.3)

For small signal powers, the linear interference term PL , and the nonlinear interference term PNL are negligible compared with the additive-noise term Pn . In this regime, the OSNIR scales linearly with the signal power Ps . As the signal power increases, the OSNIR reaches a maximum value and then decreases because of the Ps3 dependence of the nonlinear interference term PNL in the denominator. This means that there exists an optimal signal power that maximizes the optical signal-to-noise-plus-interference ratio. The capacity for this kind of channel is discussed in Section 14.6.4. 11.4.3

Linear Equalization of Polarization Interference

Techniques that treat the dispersal of signal energy between subchannels are similar to the techniques that treat the dispersal of the signal energy in time within a single subchannel. This section considers a receiver that aligns the two polarization components of a dual-polarization modulation format in the presence of dual-polarization additive white gaussian noise N. The two-input two-output dual-polarization channel is described as a combination of first-order frequency-dependent polarization-mode dispersion and frequencyindependent polarization-dependent loss. A block diagram of such a receiver is shown in Figure 10.16. Polarization alignment as described here requires that the square channel matrix H( f ) of (10.3.6) be known. Methods to estimate H( f ) are discussed in Section 12.6. The form of the channel matrix H( f ) depends on the polarization-dependent effects included in the analysis. When the channel model considers only polarization-independent loss, the channel matrix H( f ) is a normal matrix (cf. (8.1.16)) and so can be diagonalized by a

554

11 Interference

Vector channel

Zero-forcing alignment

N S( f )

(f )

+

Z( f )

±1

(f )

Z′( f ) = S ( f ) +

±1

(f)N

Figure 11.12 A zero-forcing polarization alignment.

unitary matrix. When the channel model includes polarization-dependent loss, the channel matrix H( f ) is not a normal matrix, and cannot be diagonalized by a unitary matrix. For the general channel model, the inverse H( f )−1 of the estimate H( f ) is used to align the received block signal Z( f ) in additive noise. The dual-polarization additive noise N is independent white gaussian noise with equal power in each polarization component. A diagram of this polarization alignment operation, first shown in Figure 10.16 in the time domain, is shown in Figure 11.12 in the frequency domain. Applying the inverse of the channel matrix yields Z± ( f ) = H−1 ( f )Z( f )

= H−1 ( f ) (H( f )S( f ) + N) = S( f ) + H−1 ( f )N.

(11.4.4)

After alignment, the probability distribution describing the two-input two-output channel is a multivariate gaussian distribution, with the receiver vector signal given in (11.4.4) being the mimo channel equivalent of the zero-forcing equalizer for the single-input single-output channel discussed in Section 11.2.4. The use of a zero-forcing polarization alignment algorithm removes the effect of interchannel interference caused by the coupling of the polarization modes. However, the noise in a subchannel may be correlated with the noise in other subchannels. This correlation occurs even when the receiver has perfect knowledge of the channel matrix because H−1 is not a normal matrix when there is polarization-dependent loss. Accordingly, the channel given in (11.4.4) cannot be expressed as a set of independent, parallel subchannels. Using the singular-value decomposition, as given in (2.1.91), the channel matrix is written as

H( f ) = U( f )M( f )V† ( f ),

(11.4.5)

where both U( f ) and V† ( f ) are unitary matrices, which may be regarded as generalized rotations. The 2 × 2 matrix M( f ) has only diagonal elements m k ( f ), which are the singular values of the 2 ×2 channel matrix H( f ) describing the polarization interference. Using knowledge about the channel matrix H( f ), apply a unitary transformation V( f ) on the input block signal S( f ) such that S ± ( f ) = V† ( f )S( f ).

Multiply each side of this equation by V( f ) on the left. Then, because V( f )V( f )† = for a unitary transformation, it follows that S( f ) = V( f )S± ( f ).

I

(11.4.6)

11.4 Interchannel Interference

Vector channel

Transmitter rotation S± ( f )

(f )

Receiver rotation

N S( f )



(f )

(f )

+

Z( f )



( f)

Z′( f ) =

555

556

11 Interference

power in each uncorrelated parallel subchannel for each frequency component f . For a dispersionless channel, (11.4.9) reduces to (10.3.6).

11.5

Equalization of Intensity Modulation The linear equalization techniques discussed in Section 11.2 can be applied to an intensity-modulated waveform for which the electrical channel is linear in the lightwave power rather than linear in the lightwave amplitude. This section determines the appropriate conditions needed to ensure linearity in the lightwave power.

11.5.1

Intensity Interference

When the received electrical signal is proportional to the lightwave power, the interchannel and intersymbol interference depend on the coherence properties of the carrier. One form of interchannel interference occurs between the spatial modes of a multimode fiber. Intersymbol interference occurs between pulses within a mode.

Intermodal Interference

The received electrical pulse p(t ) that is proportional to the intensity of a single lightwave pulse x (t ) launched into a multimode fiber with M spatial modes was derived in (8.2.25) for the case of mode-dependent group delay τm . It is repeated here: p (t ) =

=

1 2

˱ M

± M

m =1

+ 21

m =1

am | x (t

Fm Pin (t M ± M ± m =1 n²=m

− τm )|eiφ(t −τ

m

)

Á À± M

n=1

an∗ |x (t

− τn )|e−iφ(t −τ )

Á

n

− τm ) am an∗| x (t

− τm)||x (t − τn )|ei(φ(t −τ )−φ(t −τ )) . n

m

(11.5.1)

The second term describes the multiple copies of the pulse in different modes as they overlap at the receiver, causing intermodal interference. For a fully noncoherent carrier, the expectation ´ei (φ(τn )−φ(τm ))µ of the phase term in (11.5.1) is zero for τm ² = τn . This means that the term ei(φ(t −τn )−φ(t −τm )) is rapidly varying compared with the modulation and will average to zero in any detection filter. The electrical channel is then linear in the lightwave power (cf. (8.2.26)). The received electrical pulse is then ∑ p(t ) = mM=1 Fm Pin (t − τm ). When a coherent carrier is used, the second term is not zero, so the electrical channel is nonlinear in the lightwave power.

Intersymbol Interference

Intersymbol interference within a modulated datastream in a single mode can also lead to an electrical channel that is nonlinear in the lightwave power. For this case, the received

11.5 Equalization of Intensity Modulation

557

noise-free electrical waveform r¯ (t ) is the squared magnitude of the received lightwave waveform. It is given by r¯ (t ) =

=

²² ∞ ±

1 ²² 2 ²²

j =−∞ ∞ ±

j =−∞

+ 21

²²2 ² s j xout (t − j T )²² ²

S j p(t

− jT)

∞ ± ∞ ±

j =−∞ k =−∞ k²= j

s j s∗k | xout (t

− j T )||xout (t − kT )|ei (φ( j T )−φ(kT )) ,

(11.5.2)

where p(t ) = 21 | xout(t )| 2 is the received electrical pulse, x out(t ) is the received lightwave pulse, and S j = | s j | 2. The received electrical waveform in (11.5.2) has a form similar to the received electrical pulse p (t ) given in (11.5.1). The modal delay τ j is replaced by the modulation interval j T , the input lightwave pulse x (t ) is replaced by the output lightwave pulse xout(t ) to account for dispersion, and the proportion a j of the single pulse in each mode corresponds to the symbol amplitude s j in each modulation interval. When the interfering symbols in a sequence of pulses have a common phase, the full form of (11.5.2) must be used. This occurs when a coherent carrier is used or when the modulation format uses intersymbol interference in the form of a sequence of overlapping pulses for the complex-baseband waveform, which is subsequently modulated onto the lightwave power. Because the pulses are overlapped by filtering before modulation, the overlapping pulses at baseband have a dependent phase – even when the lightwave carrier is fully noncoherent and the lightwave channel is dispersive. Therefore, the expectation does not, in general, evaluate to zero, thereby leading to an electrical channel that is nonlinear in the lightwave power. For the electrical channel to be linear in the lightwave power, the modulating waveform itself cannot consist of overlapping pulses. Therefore the transmitted pulse x (t ) is constrained so that x (t − j T )x (t − kT ) = 0 for j ² = k. Any overlap that is necessary, such as to form a Nyquist pulse, must be achieved by filtering after the modulation is transferred onto the noncoherent carrier. When the conditions leading to an electrical channel that is linear in the lightwave power are satisfied, the term ´ei(φ( j T )−φ(kT ))µ is zero for j ² = k, with the expected waveform given by (cf. (8.2.26)) r¯ (t ) =

∞ ± j =−∞

S j p(t

− j T ),

(11.5.3)

where p(t ) is the received electrical pulse for a fully noncoherent lightwave carrier (cf. (8.2.28)). For this case, the electrical channel is linear in the lightwave power. Figure 11.14 plots the electrical waveform generated by direct photodetection of a sequence of pulses for a coherent carrier and for a noncoherent carrier. Received gaussian pulses with different pulse widths are compared. For this example, the overlap

558

11 Interference

(a)

(b)

edutilpmA

edutilpmA

Coherent

Noncoherent 1

1

1

1

0

1

0

1

1

0

0

1

0

0

0

Time

1

1

1

1

0

1

0

1

1

0

0

1

0

0

0

Time

Figure 11.14 Electrical waveforms generated by direct photodetection for two received sequences

of gaussian pulses for a coherent carrier (thin line) and a noncoherent carrier (thick line) for two values of the root-mean-squared pulse width σ in terms of the symbol interval T . (a) A sequence with σ = T /4. (b) A sequence with σ = 3T / 4.

between the pulses is caused solely by dispersion, so the waveforms using a noncoherent carrier are linear in the lightwave power. For the first pulse width, almost all of the received energy is confined to one symbol interval. Figure 11.14(a) shows that electrical waveform for a sequence of pulses with minimal pulse overlap at the receiver both for a coherent carrier and for a noncoherent carrier. The electrical waveform generated using a coherent carrier, which is produced by squaring the sum of the pulse amplitudes, is nearly the same as the waveform generated using a noncoherent carrier, which is produced by summing the squares of the pulse amplitudes. For this case, the nonlinear intersymbol interference term in (11.5.2) is negligible. For the second pulse width, the fiber dispersion spreads the received energy over adjacent symbol intervals. Figure 11.14(b) shows that electrical waveform for a sequence of received pulses that have a large amount of overlap both for a coherent carrier and for a noncoherent carrier. For this case, the nonlinear intersymbol interference term in (11.5.2) is not negligible. Therefore, the electrical waveform generated by a coherent carrier is significantly different than the electrical waveform generated using a noncoherent carrier. Techniques to equalize a nonlinear electrical channel are discussed in Section 11.6.

11.5.2

Intensity Equalization with Shot Noise

The functional blocks of an equalizer for an electrical channel that is linear in the lightwave power, as shown in Figure 11.15, are described in this section. Figure 11.15 is motivated, in part, by restating Figure 11.5 in terms of intensity. The input is a sequence {S j } of real values that represents the data. The input electrical ∑ S P (t − j T ) modulates the transmitted lightwave power, with waveform S(t ) = j j in Pin(t ) being the transmitted pulse power. The noise-free electrical waveform r¯ (t ) generated by direct photodetection for a sequence of nonoverlapping intensity-modulated pulses launched into M modes is given by

11.6 Interference in Nonlinear Channels

y(t)

n e(t) Pin(t) Transmit power pulse

p(t)

helec (t) Fiber + direct photodetection

559

Noise filter b *(–t)

+

Electrical noise

v(t)

Sample

vk

Transversal filter ck

qk Equalized output

Figure 11.15 Block diagram for an equalized electrical channel that is linear in the lightwave

power.

r¯(t ) =

∞ ± j =−∞

Sj

M ± m =1

Fm p(t

− τm − j T ),

(11.5.4)

where S j is the lightwave power in each symbol, and p(t ) = h elec(t ) ± Pin (t ) is the output electrical pulse for a fully noncoherent lightwave carrier in each mode, including the wavelength-dependent group delay (cf. (8.2.29)). The received electrical waveform r (t ) is linear in the lightwave power. It is a superposition of delayed and weighted copies of the electrical pulse p(t ) with the weights depending both on the symbol amplitude S j in the j th time interval and on the proportion Fm of the lightwave symbol power in the mth mode. ∑M F p(t − τ ) equal to the complete received electrical pulse p(t ), Setting m m =1 m replacing the electrical amplitude s j with the lightwave power S j , and including a noise term, (11.5.4) produces the same form as (11.1.1). Because (11.1.1) is a linear model, the methods presented in Section 11.2 can be used for equalization. The detection filter y(t ) shown in Figure 11.15 consists of a noise-suppressing filter followed by a transversal filter. Referring to Section 9.4.3, the optimal noise-suppressing filter b∗ (−t ) for a combination of additive white gaussian noise and shot noise varies from a matched filter to an integrator as the noise varies from only additive white gaussian noise to only shot noise. Accordingly, the detection filter y(t ) that minimizes the mean-squared error is modified by replacing the matched-filter response p (t ) with a noise filter response b(t ) designed for a combination of additive noise and shot noise. This leads to an overall equalization filter y (t ) given by (cf. (11.2.3)) y (t ) =

± k

ck b(kT

− t ).

(11.5.5)

Figure 11.16(a) shows the optimal noise filter b∗ (−t ) for an ideal shot-noise-limited system. This noise filter is an integration for each symbol interval that contains part of the received pulse p(t ), with the integration area proportional to the part of the received pulse in each modulation interval. Figure 11.16(b) shows the optimal noise filter when the additive noise is larger than the shot noise. For this case, the noise filter p∗ (−t ) approaches a matched filter.

11.6

Interference in Nonlinear Channels The previous sections have shown that linear interference can be accommodated using a variety of methods. Linear equalization is designed to first concentrate the

560

11 Interference

(a)

(b) )stinu yrartibrA( )t( b

)stinu yrartibrA( )t( b 0

1

2 t/T

3

4

0

1

2 t/T

3

4

Figure 11.16 (a) Optimal noise filter for a received exponential pulse for an ideal

shot-noise-limited system. (b) Optimal noise filter when the additive noise is larger than the shot noise.

widely dispersed energy from a set of subchannels or from individual symbols within a subchannel into a suitable set of waveforms used for detection. The remaining dispersed energy may be accommodated by maximum-likelihood sequence detection or maximum-posterior symbol detection. Techniques for linear channels are based on the formal foundation of linear system analysis. This section extends those methods to nonlinear channels. First, methods of sequence detection for nonlinear channels are discussed. In general, maximum-likelihood sequence detection for a nonlinear channel is not equivalent to minimum-distance sequence detection as would be the case for a linear channel with additive white gaussian noise. Instead, maximum-likelihood sequence detection for a nonlinear channel uses a state-dependent branch metric based on a conditional probability density function that may be nongaussian and may have a variance that depends on the mean. A method of equalization for a nonlinear lightwave channel with nonlinear interchannel interference is discussed next. This method is based on treating the nonlinear lightwave channel as a multiple cascade of a nondispersive nonlinear channel followed by a linear dispersive channel. This kind of equalization may be used separately or in conjunction with sequence detection. 11.6.1

Sequence Detection for a Nonlinear Channel

For linear time-invariant channels, the impulse response of the electrical channel p(t ) does not depend on the transmitted symbol sk or on the channel state σ k defined in (11.3.1). In contrast, the impulse response p(t ; σ k ) for a time-invariant nonlinear channel can depend on the channel state. Accounting for this dependence, the received noise-free sequence r¯ (t ) (cf. (11.1.1)) depends both on the transmitted symbol sk and on the channel state σ k so that r¯ (t ) =

± k

sk p(t

− kT ; σ k ).

(11.6.1)

The dependence of p(t − kT ; σ k ) on the channel state σ k means that the received noise-free waveform r¯ (t ) for a sequence of pulses cannot be expressed in terms of a linear combination of shifted and scaled responses p(t ) as given in (11.1.1).

11.6 Interference in Nonlinear Channels

561

We will develop sequence detection using an intensity-modulated linear dispersive lightwave channel with a coherent source and direct photodetection. The directly photodetected overlapping coherent pulses lead to the nonlinear electrical channel response as given in (11.5.2). This nonlinear channel differs from a linear channel in two ways. The first is that the channel response depends on the channel state as given in (11.6.1). The second is that the noise in the received electrical signal need not be additive. This could be because this occurs in direct photodetection when the lightwave signal mixes with the spontaneous emission, producing signal-dependent electrical noise (cf. (7.7.11)). In principle, the first issue could be treated by a matched filter for each possible channel-state-dependent response. This might not be feasible because of the large number of possible state-dependent responses. Instead, for the nonlinear channel under consideration, a suboptimal detection strategy generates the detection statistic r k by integrating the electrical waveform generated by direct photodetection over a symbol interval. This detection statistic is the photodetected lightwave energy E k (cf. Table 6.2). . It has an expected value E k = ´ E k µ given by Ek

=

· (k +1)T kT

r (t )dt ,

(11.6.2)

where r (t ) is the noisy electrical waveform generated by direct photodetection. Using an integrating detection filter, the approximate conditional probability density function of the photodetected lightwave energy E given an expected value E k for the kth interval is a conditional noncentral chi-square probability density function with 2K degrees of freedom (cf. (6.5.5)) f ( E| Ek) =

1 N sp

´ E ¶(K −1)/2 Ek

´ √

2 Ek E e−( E + Ek )/Nsp I K −1 Nsp



E

≥ 0,

(11.6.3)

where N sp is the expected lightwave noise. In the presence of dispersion, the expected value E k is state-dependent. For example, for binary modulation with the dispersion redistributing the pulse energy into two other time intervals, there are 23 = 8 possible state-dependent values for E k . These eight values depend on the transition between the channel states. The state-dependent transition probabilities given in (11.6.3) mean that maximumlikelihood sequence detection is no longer minimum-distance sequence detection as was the case for an additive white gaussian noise channel. For the nonlinear channel under consideration, maximum-likelihood sequence detection uses a state-dependent branch metric based on the log-likelihood function log e ³(E k ; E ). This function is defined in terms of the logarithm conditional probability loge f ( E | E k ) (cf. (9.5.9)). Examples of the state-dependent conditional probability density functions for the specific nonlinear channel used in this section are shown in Figure 11.17 for the case of a symmetric pulse response. The state transitions are shown in Figure 11.17(a), and the conditional probabilities are shown in Figure 11.17(b). For a symmetric impulse response, the bit pattern (1, 0, 0) produces the same expected value as the bit pattern (0, 0, 1). This is also true for the bit patterns (0, 1, 1) and (1, 1, 0), so there are

562

11 Interference

(a)

(b) (0,0,0)

5 (k–1,k–2)

(k,k–1)

00

00

01

11

10

01

11

(0,0,1)

4 )kE |E( f

10

(0,0,0) (0,0 ,1) ) ,0 ,0 (1 1,0,1) ( (0,1 ,0) (0 ,1 ,1 ) ,0) 1 , (1 (1,1,1)

(1,0,0)

3

(1,0,1)

(0,1,1) (1,1,0)

2

(0,1,0)

(1,1,1)

1 0

0

0.5

1

1.5

2

Photodetected Lightwave Energy Figure 11.17 A family of conditional distributions. (a) Allowed transitions from state σ k to σ k +1 . (b) State-dependent conditional probability density functions f ( E |E k ) for a symmetric pulse response.

six possible expected values E k . The state-dependent variance of (11.6.3) is given by 2 (6.5.6b) and is σ E2 = 2E k N sp + K N sp . The corresponding state-dependent branch metric dk (σ k −1, σ k ) = −log e f ( E | E k )

(11.6.4)

can be used in a trellis to find the most likely transmitted sequence. Examining Figure 11.17(b), the variance of the probability density function of each branch metric depends on the channel-state transitions shown in Figure 11.17(a). This state-dependent variance is in contrast to an additive white gaussian noise channel, for which the variance of each branch metric is a constant that is independent of the channel state and has the form of a euclidean distance.

Simplified Forms of the Branch Metric

The large number of channel-state-dependent branch metrics has led to the development of simplified forms for the probability density function. These can be used to define a branch metric that does not depend on the expected signal value, and thus is not statedependent. For the nonlinear channel under consideration, one simplified form can be derived by observing that the direct-photodetection demodulator transforms an offset gaussian probability density function of the signal plus the spontaneous emission in the optical domain, which has a variance that does not depend on the mean, into a noncentral chi-square probability density function in the electrical domain, which has a variance that does depend on the mean. When inversion by a nonlinear square-root operation is applied in the electrical domain, the resulting probability density function will more closely simulate the gaussian probability density function defined in the optical domain before photodetection. The independent variable of this probability density function is

11.6 Interference in Nonlinear Channels

563



the square root of the photodetected lightwave energy E . Given that the transformed conditional probability density function is modeled as a gaussian probability density function with a constant variance, the corresponding √ branch metric can be expressed as a squared euclidean distance (cf. (10.2.4a)) using E as the variable so that



. Â√ E − ÃÄ Ek ÅÆ2 ,

dk (σ k −1, σ k ) =

(11.6.5)

where ´ E k µ is the expected value for each state transition. This metric produces results that are in good agreement with simulations that use the exact probability density function and with results from experimental systems.

Sequence Estimation for a General Nonlinear Channel

Trellis-based sequence-detection methods for other nonlinear channels can be developed in a similar way. A resulting trellis will have a different number of states per frame and different state-dependent branch metrics that connect the frames. Once the trellis and the state-dependent branch metrics have been determined, the methods of Section 11.3 can be used to determine the most likely transmitted sequence. 11.6.2

Equalization of a Nonlinear Channel

This section presents a method of equalization for a lightwave channel with nonlinear phase that is based on the numerical computation method discussed in Section 5.6. When this method is run backwards on a received waveform, it is a form of equalization. For this method of equalization, a nonlinear lightwave channel consists of the concatenation of two separate compensation steps – one for the linear dispersion and one for the nonlinear phase shift. Each step ignores the other impairment. The compensation step for the nonlinearity ignores the dispersion, observing that, in a dispersionless fiber, the effect of the nonlinearity for a single channel is an intensity-dependent phase shift given in (5.4.11), whose strength depends on the signal power. The compensation step for the linear dispersion ignores the fiber nonlinearity and is based on the transfer function given in (8.1.3). Three characteristic scale lengths were defined in Section 5.3.4 as the nonlinear length L NL , the effective length L eff , and the dispersion length L D . Suppose that the length L of the fiber segment is larger than the effective length L eff . Further suppose that the dispersion length L D is larger than the effective length and that the nonlinear length L NL is smaller than the effective length. Then the signal propagation characteristics over the segment of fiber can be divided into two regions. For z ² L eff , the lightwave channel subsegment is approximately a nondispersive nonlinear channel of length L eff . For z ³ L eff , the lightwave channel subsegment is approximately a linear dispersive channel of length L − L eff . For a single polarization, the dispersion for the subsegment defined by z ³ L eff over the length L − L eff can be equalized using the inverse of the complex transfer function given in (8.1.3). With time referenced to the overall group delay τ of the lightwave signal, the transfer function H ( f ) for the equalization filter has the same form as (8.1.3),

564

11 Interference

with a sign change for the phase. When the attenuation in each fiber segment is compensated by lightwave amplification, then, suppressing the constant phase of the carrier frequency, we can write 2 2 H ( f ) ≈ ei2π β2 f L comp ,

(11.6.6)

where the optimal compensation length L comp need not be equal to the length of the subsegment L − L eff because of residual coupling between the linear dispersion and the nonlinear phase shift. For z ² L eff , the nonlinear intensity-dependent and frequency-independent phase shift over a subsegment of length L = L eff is given by (5.4.11)

φ = γ|a(0, τ)|2 Leff ,

(11.6.7)

NL

where |a(0, τ)|2 is the root-mean-squared power of the complex signal envelope at the input to the subsegment. The nonlinear phase shift can be compensated when the nonlinear phase noise introduced by the lightwave amplification is negligible. Then the nonlinear phase shift can be equalized by multiplying the lightwave signal by eiφNL with an intensity-dependent phase φNL that is opposite in sign to the term given in (5.4.9). When the nonlinear phase error and the linear dispersion are large, the iterative method called back propagation may still be used. This method alternates between equalization of the nonlinear phase error and equalization of the linear dispersion. The concatenation of two separate equalization blocks, one for the linear dispersion and one for the nonlinear phase error, defines the equalization block of one segment of fiber and is shown in Figure 11.18(a). This method is the reverse of the split-step Fourier method used to numerically solve the nonlinear Schrödinger equation in Section 5.6. The inherent coupling of the nonlinearity and the dispersion results in a residual error after the two equalization blocks have been applied to the received signal. Given that shorter pulses redistribute energy more rapidly in a dispersive fiber (cf. (4.5.6)), Nonlinear Equalization Block

(a) H( f )

ei φNL

Linear equalization

Nonlinear power-dependent phase equalization

a(L, τ )

(b) a(L, τ )

Block 1

a(L ±

ΔL, τ )

...

a( ΔL, τ )

a(0, τ )

Block N

a(0, τ )

Cascaded equalization blocks Figure 11.18 (a) The basic processing block of a nonlinear back-propagation equalizer consists of

the concatenation of a linear equalizer with a transfer function H ( f ) followed by an intensity-dependent nonlinear phase shift φNL . (b) Block diagram of a processing block consisting of subblocks, each with different parameters.

11.8 Historical Notes

565

the number of subsegments required to decouple the linear and nonlinear impairments increases as the symbol rate increases. Now consider a span of K fiber segments in which the gain balances the attenuation in each segment. When a single equalization block suffices in each segment, the equalizer for the complete span consists of a concatenation of K equalization blocks as is shown in Figure 11.18(b). The equalization block for the linear dispersion in each segment of the span is typically the same as the single-segment case. However, for a span of multiple amplified segments, the effect of the accumulated nonlinear phase noise cannot usually be neglected. This means that the nonlinear phase equalization block for the kth segment can depend on the previous k − 1 segments (cf. (7.7.19)).

11.7

References Equalization of lightwave systems is discussed in Messerschmitt (1978) and in Einarsson (1996). Equalization of additive-noise channels is discussed in Blahut (2010), Haykin (1996), and Sayed (2008). Maximum-likelihood sequence detection for additive white gaussian noise channels was introduced in Forney (1972), and is discussed in Blahut (1988) and Blahut (1990). Sequence detection for nonlinear intensity-modulated systems is discussed in Ali´c, Papen, Saperstein, Milstein, and Fainman (2005) and Agazzi, Hueda, Carrer, and Crivelli (2005). Simplified forms of branch metrics are discussed in Hueda, Crivelli, Carrer, and Agazzi (2007) and Bosco, Poggiolini, and Visintin (2008), with experimental validation discussed in Poggiolini, Bosco, Benlachtar, Savory, Bayvel, Killey, and Prat (2008). Compensation of nonlinear phase noise is discussed in Liu, Wei, Slusher, and McKinstrie (2002) and Ho (2005). Nonlinear equalization based on digital back propagation is discussed in Killey (2005) and Ip and Kahn (2008). An experimental demonstration of nonlinear compensation is presented in Temprana, Myslivets, Kuo, Liu, Ataie, Ali´c, and Radi´c (2015). Cross-polarization is discussed in Karlsson and Sunnerud (2006), with approximate statistical probability density functions derived in Winter, Bunge, Setti, and Petermann (2009) and Winter, Setti, and Petermann (2010).

11.8

Historical Notes Early work on the linear equalization of intersymbol interference in gaussian noise using the mean-squared-error criterion appears in Tufts (1965) and in George (1965), with later contributions by Berger and Tufts (1967). The concepts were applied to shot-noiselimited systems in Messerschmitt (1978). Nonlinear equalization using the minimumprobability-of-error criterion was developed by Ungerboeck (1971) and Forney (1972). Decision feedback equalization was first reported in Austin (1967) and in Monsen (1971). The Viterbi algorithm was introduced to explain the maximum-likelihood decoding of convolutional codes in Viterbi (1967). The practicality of the Viterbi algorithm and its use for sequence detection was recognized in Forney (1972, 1973).

566

11 Interference

In particular, Forney (1972) showed that the minimum-probability-of-error criterion for a sequence is satisfied by a nonlinear detection process consisting of a whitened matched filter followed by a Viterbi detector. The Bahl algorithm was published by Bahl, Cocke, Jelinek, and Raviv (1974). A similar algorithm had been used by Baum and Welch in a different context to extract hidden Markov models from a data set, but was not published. The general form of the algorithm is called the forward–backward algorithm. It appears that the first application of sequence detection to lightwave communications was proposed in Winters and Kasturia (1991), with the first presentation of the state-dependent signal statistics for intensity modulation given in Ali´c, Papen, Saperstein, Milstein, and Fainman (2005).

11.9

Problems 1 Minimum distance for coherent and noncoherent carriers (a) On a sketch or copy of Figure 11.14(b), draw a line indicating the maximum amplitude for a space using a noncoherent carrier. (b) Repeat for the maximum amplitude for a space using a coherent carrier. (c) Repeat for the minimum amplitude for a mark using a coherent carrier. (d) Repeat for the minimum amplitude for a mark using a noncoherent carrier. (This is the same value as for part (c).) (e) Using these values, determine which system has the largest minimum separation. 2 Eye diagrams (a) Describe and sketch the eye diagram for the 4-ary pulse amplitude signal constellation shown in Figure 10.1(a) using a gaussian pulse with 3T 4 (cf. Figure 11.14(b)). (b) Repeat for the Nyquist pulse given by

σ= /

q (t ) =

sin(π t )cos (π t ) ( ). (π t ) 1 − (2t )2

3 Uncompensated intersymbol interference for intensity modulation For simple on–off-keyed intensity modulation, the effect of intersymbol interference is to reduce the minimum sample value for a mark and increase the maximum sample value for a space, thereby reducing the minimum separation deye compared with the minimum distance dmin in the absence of the interference. (a) The minimum high sample s1± without noise occurs for an isolated mark because the neighboring spaces do not add to the value. Show that this worst-case value is

s1±

= s1 − ² ´s ,

(11.9.1a)

where d10 = ´s = s1 − s0 is the minimum distance in the absence of intersymbol interference, and

11.9 Problems

² = 1 − S1

·T 0

567

s (t )dt

is the part of the sample value for a mark Çthat is lost because the pulse has ∞ spread to other symbol intervals, where S = 0 s (t )dt is the total signal in one pulse. (b) Show that the maximum value s0± for a space is s0±

(c)

= s0 + ² ´ s. (11.9.1b) . Show that the minimum separation deye = s ± − s ± in the presence of intersymbol 1

interference is

deye

0

= d10 (1 − 2²).

(11.9.2)

(d) Using d10 = 2, compare the minimum separation for intensity modulation given in (11.9.2) ²² with the ²² minimum separation for binary phase-shift keying ∑ deye = 2 j ²=k sk − j q j given in (11.1.6). Comment on the result. In what way is the interference similar? In what way is the interference dissimilar? 4 Component variability Consider a random offset in the center frequency of a wavelength-multiplexed system. Each wavelength subchannel has a signal with a power density spectrum S f given by a gaussian function with a spectral mean-squared value of 02. The carrier of the nth subchannel is offset by n f 0 from the common carrier f c . Define the relative interchannel interference for the central wavelength channel as

σ

interference power in band expected signal power in band

=



n ²=0

( )

Ç f /2 S ( f − n f )d f 0 , Ç−f f/2/2 S ( f )d f 0

0

0

− f 0/2

where S ( f − n f 0) is the power density spectrum for subcarrier n, which includes the effect of the random frequency offset of the filter used to detect that channel. Let the random frequency offset f off be a zero-mean gaussian random variable with 2 . variance σoff (a) Determine the ratio f 0 /σ0 that is required for the relative interchannel interference to be less than −40 dB in the absence of a random frequency offset when the two wavelength subchannels adjacent to the channel of interest generate the most significant contribution to the interchannel interference. (b) Derive an expression for the interchannel interference power including the effect of the random offset for the two interfering subchannels. (c) For the value of f 0/σ 0 derived in part (b), determine the maximum value of 2 the root-mean-squared width of the random offset σoff , expressed in terms of the channel spacing f 0 , that produces less than −20 dB of mean interchannel interference from two interfering subchannels.

568

11 Interference

5 Wavelength intersymbol interference This problem compares the functional form of the intersymbol interference parameter defined in Problem 3 with the linear interchannel interference in a wavelength-division-multiplexed system. (a) Let S f be the power density spectrum of a single wavelength subchannel. ∞ The total power in the signal is given by P −∞ S f d f . The interference from other wavelength subchannels is reduced by using a filter with a transfer function H f rect f f , where f is the frequency spacing between the wavelength subchannels. Derive an expression for the proportion F of the signal power that lies outside a subchannel with a bandwidth B in terms of S f . (b) Show that this expression is the spectral equivalent of the temporal intersymbol interference parameter defined in Problem 3.

²

( )

=

( )=

( /´ )

Ç

( )

´

( )

²

6 Worst-case intersymbol interference Consider the situation depicted below, for which the probability density functions for a mark and a space can be resolved into four separate probability density functions. Each of the four probability density functions is a gaussian distribution with the same variance 2. The probability density functions with means r1 and r0 represent symbols of the sequence that are not affected by intersymbol interference. The two other probability density functions with means r1 x and r0 x represent symbols of the sequence that have significant intersymbol interference. The threshold is set at r1 r0 2.

σ



+

( + )/

r0

r0 + x

r1 – x

r1

Threshold r

(a) Determine the relationship between x and the intersymbol interference parameter ² defined in Problem 3. (b) Determine the conditional error probability p1|0 using only the probability density function of a mark with mean r1 and the probability density function of a space with mean r0 + x. Repeat for the conditional probability p0|1 using r0 and r1 . (c) Repeat part (b) using the probability density function of a mark with mean r1 − x and the probability density function of a space with mean r0. (d) Using the two conditional probability density functions from part (b) and the two conditional probability density functions from part (c), determine the probability of a detection error pe when the priors are equal. (e) Let r1 − r 0 = 10 and σ = 1. Find the total probability of a detection error when x = 1 and determine the relative contribution from each of the four error terms – two from part (b) and two from part (c). Which term has the largest contribution? Why?

11.9 Problems

569

(f) Using the results of part (e), compare this result with the probability of a detection error derived using (11.9.2). Comment on the conditions for which the minimum separation can be used to accurately determine the effect of the intersymbol interference. 7 Effect of group-velocity dispersion and laser linewidth on the intersymbol interference A system transmits R bits/second over a span of L km. A mark is represented by a gaussian pulse with a root-mean-squared timewidth Trms . The modulated lightwave pulse has a root-mean-squared spectral width λ . The intensity modulator has an extinction ratio ex (cf. (7.5.6)). The expected number of photons per bit is Eb . There are no other noise sources. Derive an expression for pe in terms of Eb , ex , and the intersymbol interference parameter defined in Problem 3.

σ

²

8 Minimum separation for coherent and noncoherent carriers 2 2 Referring to Figure 11.14, let p t e−t /2Trms . (a) Considering only the adjacent symbols’ contribution to the interference, determine the electrical amplitude of an isolated mark surrounded by spaces using the following methods. i. Adding the amplitudes of the isolated pulses to produce a waveform, then squaring the resulting waveform. This method is appropriate for a coherent carrier. ii Squaring the amplitude of each isolated pulse, then adding the squared pulses together to produce a waveform. This method is appropriate for a noncoherent carrier. (b) Repeat part (a) for an isolated space surrounded by marks. (c) Using the results of the previous two parts, derive an expression for the minimum separation deye both for a coherent carrier and for a noncoherent carrier. Comment on the result with respect to Figure 11.14. (d) Form the difference between the minimum separation derived using a coherent carrier and the minimum separation derived using a noncoherent carrier. Plot this difference over the range of T 8 Trms 7T 8. Comment on the result. (e) Describe how you could estimate the minimum separation deye for a partially coherent carrier. What additional information is needed?

()=

/
≥ ∑ = v( )

v( )

12 Effect of code rate on the output pulse width Referring to expression (11.2.11), let the input coded pulse be a gaussian pulse of 2 2 the form s t e−t / 2Trms Rc , where Rc is the code rate and Trms is the root-meansquared timewidth (cf. (2.1.30)). Derive an expression for the root-mean-squared timewidth of the magnitude of the output pulse in terms of Rc , Trms , and the 2 term 2 L. (Note: the inverse Fourier transform of 2 a ib e−(a +ib)ω / 2 is 2 e−t / 2(a+ib) , where is the angular frequency and a and b are constants.)

()=

β

√ π( + )

ω

13 Optimal filters Let p t e−t / T u t u t T be the electrical pulse generated by direct photodetection, where u t is the unit-step function. This pulse is filtered by a detection filter with an impulse response y t . (a) Determine the optimal form of the filter and the form of the noisy filtered signal r t for a system dominated by signal-independent noise. (b) Determine the optimal form of the filter and the form of the filtered signal r t for a system dominated by shot noise. (c) Determine the worst-case shot-noise variance when the electrical waveform consists of an infinite series of marks with a functional form e− t / T (Note: the pulses are not truncated at t T as in the previous parts of this problem where there was no intersymbol interference). Compare this worst-case shot-noise variance with the intersymbol interference parameter defined in Problem 3.

()=

( ( ) − ( − )) () ()

()

()

=

²

11.9 Problems

571

14 Nonlinear interference noise Referring to expression (11.4.3), suppose that the linear interference term is insignificant and further suppose that the nonlinear interference term is proportional to the cube of the signal power so that PNL Ps3 . (a) Using these expressions, rewrite (cf. (11.4.3))



OSNIR =

Ps

Pn + PNL

in terms of the optical signal-to-noise ratio OSNR = Ps / Pn , and the parameter C = β Pn2 . (b) Show that the maximum value of the OSNR is given by (2C )−1/3 . 15 Multiplicity factor in sequence error rate Consider an additive-noise-limited system with intersymbol interference consisting of K possible received sequences separated by a minimum separation deye . Let L be the number of sequences in another set of sequences separated by a slightly larger distance deye d, where d is small. Determine an expression for the ratio L K in terms of deye and d when the second set of sequences at the slightly larger separation produces the same probability of a sequence error as the first set of sequences separated by deye . (Note that an expansion of erfc deye d

/



´

√ 2 is erfc(deye) − 2 ´d e−deye / π .)

´

(

+´ )

12

Channel Estimation

The parameters describing a lightwave fiber channel are never fully known at the transmitter. In particular, the carrier phase, symbol timing, and polarization of the received signal are not known at the transmitter, nor are they known a priori at the receiver. These parameters must be inferred at the receiver from the received signal itself. Many of the parameters change slowly in comparison with the symbol rate, and so they can be slowly estimated at the receiver. The estimate is then used in the demodulation process. Many sophisticated techniques have been developed for this purpose. These techniques are organized and explained within the broad formal theory of estimation. Other factors, such as dispersion, noise levels, interference, or mode coupling, may be partially known at the transmitter, but this partial knowledge may be insufficient to achieve the desired performance. A receiver may partially estimate some of these unknown parameters, but then must tolerate any remaining uncertainty. This chapter discusses the estimation of unknown channel parameters. These parameters include the phase and the polarization state of the carrier at the receiver, the channel impulse response for each polarization state, the time at which each symbol interval begins, and, for a frame-based modulation format, the time at which a dataframe begins. Some channel parameters may vary too rapidly for one to form an accurate estimate. These are treated as uncontrollable impairments that are subsumed into the channel model. For example, the carrier phase, when slowly varying, can be estimated and corrected at the receiver, but when the phase is rapidly varying compared with the symbol rate, the channel is deemed to be noncoherent and is treated as such using appropriate modulation methods. If, however, the phase is varying at a rate comparable to the modulation rate, it can be both difficult to estimate and difficult to ignore. Then other methods, such as differential methods, may be appropriate.

12.1

Channel Parameters Several fundamental parameters are affected by the lightwave channel. Let r (t ) be the noisy demodulated complex-baseband waveform for a sequence of K symbols for one subchannel in a single polarization mode. The effect of the channel is to introduce several unknown parameters into the received waveform as follows: r (t ) =

±

K −1 j =0

s j p (t

− j T − mT − α(t )) eiφ(t ) + n(t ).

(12.1.1)

12.1 Channel Parameters

573

The s j are real or complex points of the signal constellation that represent the data, p(t ) is the individual complex-baseband electrical pulse at the receiver before filtering, and n(t ) is an additive circularly symmetric white gaussian noise process. The polarization is also an unknown parameter, when used. The problem of the estimation and separation of the two received polarization modes is discussed in Section 12.6. An even more difficult problem is the separation of the spatial eigenmodes of a multimode fiber, which is treated in Section 12.7. The parameter φ(t ) is the time-varying carrier-phase offset which may be caused by the slowly varying channel delay, or by the phase noise due to the lightwave carrier and the local oscillator. The estimation of the slowly varying carrier phase is called phase synchronization. The received sequence is delayed by τ = mT + α . The integer mT is the unknown delay to the beginning of a dataframe. The integer m varies over the set {0, . . . , K − 1}. The parameter α = τ ( mod T ) is the unknown incremental delay within one symbol interval. The parameter α or α(t ) is the timing offset of a sample from the local clock. It lies in the interval [ 0, T ]. The estimation of α, which determines the sampling instant for a pulse, is called symbol synchronization or clock-phase synchronization. Clock-phase synchronization aligns the local clock with the actual arrival instants of the symbols. It does not determine the beginning of a dataframe in the received sequence. This is a separate task of alignment, called frame synchronization or block synchronization. Maximum-likelihood estimation of waveform parameters from a fully modulated waveform can be quite difficult. Estimation techniques based on both unmodulated and modulated waveforms are used in communication systems. A channel parameter may be estimated without prior knowledge of that parameter. There may be an initial training interval for an initial estimate, followed by an indefinite maintenance procedure during which all parameter variations are tracked. Each channel parameter, treated as an unknown constant, is first estimated by using one or more known data-free signals. In other cases, or during maintenance, the channel parameters may be treated as unknown constants and estimated directly from a data-modulated waveform on the basis of the statistical properties of the random modulation or by a technique that renders the random modulation invisible to the estimation process. Given a received waveform, the minimum probability of a block-detection error would be achieved by forming a single grand maximum-likelihood estimate for a complete block that includes all of the unknown parameters as well as the unknown data. However, forming the grand maximum-likelihood estimate for a large block of modulated symbols with multiple unknown parameters is regarded as overwhelmingly complex, although it need not be exponentially complex and may be tractable given sufficient computational resources. Practical channel estimation techniques in common use estimate each parameter separately, ignoring the others or treating the others as random nuisance parameters whose effect is suppressed or replaced by forming an expectation. The estimation of the channel parameters is essential for demodulation, yet estimation of these parameters from a data-modulated waveform becomes increasingly difficult for coded-modulation formats with information rates that are close to the channel capacity. This is because, for transmission on a channel corrupted by additive white gaussian

574

12 Channel Estimation

noise, the optimal transmitted signal constellation should have a prior probability density function that approximates an independent circularly symmetric gaussian random process, as shown in Chapter 14, and a pulse shape that fills the available spectrum. The optimum waveform emulates white gaussian noise, but cannot be demodulated without knowing the underlying phase reference and clock reference. Yet, the optimum information waveform retains no feature that can be exploited to estimate the carrier or clock phase. As a consequence, channel estimation for advanced phase-synchronous modulation formats may require a periodic data-free segment called a training sequence, or a supplementary data-free side signal called a pilot signal.

12.2

Carrier-Phase Estimation The estimation of the carrier phase of a waveform that is fully modulated is complicated by the presence of the random data modulation. There are several methods that can be used to account for the data modulating the waveform. The data can be temporarily omitted at the transmitter, or it can be removed at the receiver, or it can be treated as a nuisance parameter, or it can be jointly estimated with the carrier phase. The simplest method is to send a predetermined data-free signal, called a training sequence, during a fixed prearranged time, such as at initialization. Using a training sequence reduces the information rate, so it is used only infrequently. Once the phase has been estimated, subsequent phase changes can be tracked in other ways, such as by feeding back the datastream after demodulation to strip the data from the received signal prior to the phase estimation. For a phase-synchronous modulation format, phase offset can be corrected in the optical domain, or can be corrected in the electrical domain either at passband or at complex baseband. For example, heterodyne demodulation (cf. Section 8.2.1) produces a demodulated passband electrical waveform centered at the intermediate frequency f IF . The phase of the lightwave signal is preserved by the balanced photodetection process, so the unknown phase offset of the lightwave carrier can be estimated and removed from the noisy electrical signal r (t ). Alternatively, for homodyne demodulation to complexbaseband, the in-phase and quadrature signals are demodulated separately as is shown in Figure 8.11(a). For this case, the carrier phase, and possibly the residual frequency offset, can be estimated from the complex-baseband electrical signal or even from the discrete samples of those in-phase and quadrature signal components.

12.2.1

Maximum-Likelihood Phase Estimation

A constant-amplitude complex-baseband signal with an unknown phase received in additive gaussian noise is given by r (t ) = Ae iφ

+ n(t ),

(12.2.1a)

which means r I (t ) + ir Q(t ) = A cos φ + i A sin φ + n I (t ) + in Q (t ),

(12.2.1b)

12.2 Carrier-Phase Estimation

575

where A is a constant amplitude, φ is the unknown phase to be estimated, and n (t ) is a circularly symmetric white gaussian noise process with power density spectrum N 0 = 2σ 2 . The signal r (t ) is observed over an interval of duration Ts , and the task is to estimate the phase φ from the received complex-baseband signal r (t ). An evident estimate is

³´T µ r ( t ) dt ²φ ( Ts ) = tan−1 ´0T , r (t )dt s

0

Q

s

(12.2.2)

I

where the caret above φ indicates that this is a computed estimate of φ over the interval Ts . This estimate is, indeed, the optimal maximum-likelihood estimate for the phase of a sinusoid in additive white gaussian noise, as is shown next. Let r be the signal space representation of the received waveform observed over the interval Ts , and let {sφ } be the set of possible transmitted complex signal values Aei φ differing only by the unknown phase φ. Given r, the maximum-likelihood phase estimate consists of choosing the most likely transmitted signal s² φ that produced r. This problem is formally similar to the maximum-likelihood detection of the transmitted signal in additive white gaussian noise (cf. Section 9.5). Accordingly, the maximum-likelihood estimate is the phase of the complex signal s² φ that minimizes the squared euclidean distance between the received noisy waveform r and any possible transmitted sample waveform s² φ. Using (10.1.3), the squared euclidean distance between r and s² φ is



¶2 ¶¶ ¶¶2

d 2 (r, s² φ ) = ¶r − s²φ ¶

= |r|2 +

s² φ

· ¸ − 2 Re r · s²φ .

(12.2.3)

The first term of (12.2.3) is the energy in r (t ) and is independent of ² φ . The second term ² is the energy in s² and is also independent of φ because the phase does not affect the φ signal energy. Therefore, minimizing the distance between r and s² φ requires maximizing the third term. Using (2.1.65), the maximum value is the solution to d d² φ Re

¹º T

s

0

² r (t )s∗ (t )e−iφ dt

»

= 0,

(12.2.4)

where, in this section, s ∗(t ) = A. The estimate ² φ is then determined using the following set of equations:

¹ ºT( ) ¼ ² φ½ dt » = 0, r (t ) + ir (t ) de−iφ /d² 0 ºT ºT A sin ² φ r (t )dt + A cos ²φ r (t )dt = 0,

Re A

s

I

Q

s

0

s

I

0

with the solution for ² φ given as

Q

³´ T µ r ( t ) dt ²φ = ±tan−1 ´0T , r (t )dt s

0

s

Q I

(12.2.5)

(12.2.6)

576

12 Channel Estimation

rQ(t)

Lowpass filter sin 2π f c t

Local oscillator

tan–1

cos 2π fc t Lowpass filter

A cos(2 π f c t + φ(t)) + nˆIF (t)

rQ(t) rI(t)

φˆ (t)

rI(t)

Figure 12.1 Estimation of a time-varying phase.

where the positive sign must be chosen to achieve the maximization (cf. (12.2.2)). Longer integration times improve the accuracy of the estimate because of the longer averaging time for the noise. Shorter integration times may be required when the phase is time-varying so that the changes in phase may be tracked. The estimate in (12.2.6) is derived for a constant phase. Slowly time-varying phase can be tracked by replacing the integral over the interval Ts with a lowpass filter to produce a continuous estimate. When the phase change over the timewidth of the filter is small, this is nearly the maximum-likelihood estimate for that time interval. The resulting structure is shown in Figure 12.1. The incident carrier with a time-varying phase φ(t ) is homodyne-demodulated and filtered, thereby producing r I (t ) = ( A/2)cos ²φ (t ) and r Q (t ) = −( A/2)sin ² φ (t ). These components are used to produce a time-varying ² phase estimate φ ( t ). This estimate is a measurement of the time-varying rotation of the signal constellation in the complex plane at the receiver as compared with the signal constellation at the transmitter. Applying the inverse of this rotation to the demodulated signal yields estimates of the noisy components ² r I (t ) and ² r Q (t ) of the transmitted signal, as given by

¹ ²r (t ) » ¹ ²r (t ) = I

Q

cos ² φ(t) sin ² φ (t )

» »¹ −sin ²φ ( t ) r (t ) , r (t ) cos ² φ (t ) I

Q

(12.2.7)

where the choice of sign gives a clockwise rotation to compensate for a phase error in the anticlockwise direction.

12.2.2

Phase-Locked Loops

A phase-locked loop is a standard variation of a phase estimator that uses feedback in place of a feedforward filter. It provides a continuous estimate ² φ ( t ) of the received carrier phase. This estimator is a form of a feedback control system that adjusts the frequency and the phase of a local oscillator in response to an error signal generated by the mixing of the local oscillator signal and the incident signal. A schematic diagram is shown in Figure 12.2. The loop consists of three components. The phase comparator, which can be implemented as a mixer, generates a signal related to the phase difference of the two passband input signals. One input is the incident signal. The other input is a signal generated by a controlled local oscillator that produces a constant-amplitude sinusoidal signal ¾ v(t ) with a rate of change of phase at the output proportional to the

12.2 Carrier-Phase Estimation

x(t)

Phase comparator

A cos(2 π f IF t + φ(t)) + nˆIF (t)

vˆ(t)

Loop filter

577

z(t)

Controlled local oscillator (CLO)

sin(2 πf IF t + φˆ (t))

Loop output

Figure 12.2 Schematic of a phase-locked loop.

control signal. The signal x (t ) out of the phase comparator is filtered by a loop filter, which controls both the loop impulse response and the noise-filtering characteristics of the phase-locked loop. This filtered signal is the feedback control signal to the controlled local oscillator. There are usually two steps in the operation of a phase-locked loop. The acquisition step initially estimates and sets (or locks) the frequency and phase of the controlled local oscillator to the incident signal. The tracking step maintains phase lock by removing small offsets or changes in phase that may arise after the phase-locked loop has initially locked. The operation of a phase-locked loop is illustrated by an example that uses two frequency down-conversions. The noisy passband received lightwave signal with no modulation is

¾r (t ) = A cos(2π fc t + φc (t )) + ¾no (t ), unknown phase of the carrier and ¾ n o( t )

with φc (t ) being the being a passband additive-noise process. Rather than phase locking in the lightwave domain using a phase-controllable lightwave oscillator such as a laser, this example phase locks in the electrical domain using a phase-controllable electrical oscillator. The lightwave is first heterodyned to the intermediate electrical frequency f IF . The data-free passband waveform at frequency fIF is (cf. (8.2.4c))

¾r (t ) = A cos(2π f t + φ(t )) + ¾n (t ), (12.2.8) where φ(t ) = φc (t ) − φ (t ) is the unknown phase difference between the lightwave carrier and the lightwave local oscillator, and ¾ n (t ) is a passband noise process at the IF

IF

LO

IF

intermediate frequency with a power density spectrum N0 given by (8.2.13). The second local oscillator is a controlled local oscillator in the electrical domain that is to be phase-locked. The electrical passband signal ¾ r (t ) is one input to the phase comparator. The other input ¾ v(t ) to the phase comparator is the output of the controlled local oscillator, which can be written as

¾v(t ) = sin (2π f t + ²φ (t )) , IF

where ² φ (t ) = φ(t ) − φe (t ) is the estimated total phase with φe (t ) being the error in the estimate.

578

12 Channel Estimation

The phase comparator consists of a mixer forming the product ¾ r (t )¾ v(t ) followed by an image-rejecting filter that removes the sum frequency terms at frequency 2 f IF and otherwise has little effect. This is because the bandwidth of this filter is chosen large enough that it does not affect the statistics of the additive noise or the phase noise, and does not affect the dynamics of the phase-locked loop. The output x (t ) of the phase comparator is then x (t ) = ( A cos (2π f IF t

( ) + φ(t )) + ¾n (t )) × Ag sin 2π f t + ²φ (t ) . IF

IF

The mixing gain A g of the phase comparator is herein set to one for brevity. Expanding x (t ) results in two terms. The first term, which is the signal term, has sum and difference frequency components. The sum frequency components are removed by the image-rejecting filter, leaving as the signal −( A /2)sin φe (t ). The second term, which is the demodulated passband noise, is

(

) (

n (t ) = n I (t )cos(2π fIF t ) − n Q (t )sin(2π f IF t ) sin 2π f IF t

) + ²φ (t ) ,

(12.2.9)

where ¾ nIF (t ) is expressed in the form given in (2.2.72). The resulting noise process n (t ) is a real-baseband gaussian noise process with a (two-sided) power density spectrum N 0/2. Combining these two conclusions gives x (t ) = −

A sin φe (t ) + n(t ). 2

(12.2.10)

The feedback signal z (t ) is a filtered version of x (t ). The loop filter is characterized by an impulse response y (t ) with an output given by z (t ) = y (t ) ± x (t ).

(12.2.11)

In the case for which there is no loop filter, y (t ) becomes the Dirac impulse δ(t ), and the feedback signal z (t ) is simply x (t ). This loop is referred to as a first-order phase-locked loop. The rate of change of the phase of the controlled local oscillator is proportional to the input z (t ), d² φ (t ) dt

= C z (τ),

(12.2.12)

for some constant C . Substituting ² φ (t ) = φ(t ) − φe (t ) gives dφe (t ) dt

t) = dφ( − C z (t ). dt Substituting z(t ) from (12.2.11) with x (t ) as given by (12.2.10) yields dφe (t ) − AC y(t ) ± sin φ (t ) = dφ(t ) − C y (t ) ± n (t ). dt

2

e

dt

(12.2.13)

Equation (12.2.13) is a nonlinear differential equation for the phase error φe (t ), with the nonlinear term being sin φe (t ). The first term on the right accounts for the total time-varying phase of the lightwave carrier. The second term on the right is the filtered additive noise.

12.2 Carrier-Phase Estimation

579

The corresponding first-order homogeneous differential equation with loop filter y (t ) = δ(t ) is of the generic form AC dx + sin x dt 2 For small φe (t ), the approximation sin φe (t ) ≈ φe (t ) = φe (0)e− ACt /2 .

= 0. φe (t ) leads to a solution of the form

Noise Analysis

The error φe (t ) in the phase estimate is caused by the frequency noise in the sinusoidal signals and by additive gaussian noise. Here we are interested in the effect of these two noise sources on φe (t ).

Frequency noise The frequency noise comes both from the frequency noise in the incident lightwave carrier and from the frequency noise in the frequency reference. The combined frequency noise is modeled as a stationary zero-mean white gaussian random process with a variance K = B /2π (cf. (7.8.1)), where B is the bandwidth of the power density spectrum. The two-sided power density spectrum Sdφ ( f ) for the derivative of the phase noise dφ(t )/dt is proportional to B, with

S dφ ( f ) = 2π B .

(12.2.14)

Section 10.2.4 discusses the effect of the phase estimation error due to phase noise on the probability of a detection error.

Additive Gaussian Noise The additive noise n(t ) is the demodulated spontaneous emission noise. This noise source is modeled as an independent additive white gaussian noise process with a twosided power density spectrum N 0/2, which is the sum of the demodulated lightwave noise (cf. (8.2.3)) and electrical thermal noise. Linearized First-Order Analysis A phase-locked loop in the locked state has a small phase error due to noise, so sin φe (t ) ≈ φe (t ). Then (12.2.13) can be treated as a linear system driven by noise. When y(t ) = δ(t ), the left side of (12.2.13) reduces to a first-order linear system with the time constant of the loop response given by 2/ AC. The noise term on the right is the sum of two independent gaussian random processes with a constant two-sided power density spectrum N given by N

= 2π B + 12 C 2 N0 ,

(12.2.15)

where (12.2.14) has been used and the power density spectrum of the baseband white gaussian noise source Cn(t ) is C 2 N 0/ 2. This noise is filtered by the system transfer function H ( f ) defined by the left side of (12.2.13), H( f ) = −

1

AC /2 + i2π f

.

580

12 Channel Estimation

The output of this first-order phase-locked loop is the error φe in the phase estimate. The corresponding power density spectrum of the error in the phase estimate after the controlled oscillator is

Sφe ( f ) = N | H ( f )|2

N

. ( AC/ 2) + (2π f )2 Using (2.2.65a), the variance of the phase error σ φ2 (C ) as a function of the loop gain C is º∞ 2 | H ( f )|2 d f σφ (C ) = N −∞ 2 2π B = + C N0 /2 . (12.2.16) =

2

e

e

AC

A

The phase-locked loop balances two conflicting terms by the choice of the loop gain C . To reduce phase variations, the first term should be made small, which requires a large value of C. To reduce the effect of additive noise, the second term should be made small, which requires a small value of C . √ The variance of the phase error is minimized by setting C = 4π B / N0 . Substituting this value into (12.2.16) gives this minimum variance as

σφ = 2

e

¿

2π B N 0 P

√ = √2π T B , E / N0

(12.2.17)

where P = A 2 /2 is the constant power in the demodulated carrier in the absence of data modulation, and where E = P T is the energy in the constant-amplitude carrier over an interval T . Expression (12.2.17) states that the variance of the phase error is inversely proportional to the square root of E / N0 . A first-order loop will track a frequency offset. Because a frequency offset is a time-varying phase, a constant feedback term is required. This means that a steady-state nonzero phase offset is required so as to produce the feedback signal needed to track the frequency offset.

Linearized Second-Order Analysis The performance of a phase-locked loop can be improved by using a loop filter with a first-order transfer function. This produces a second-order phase-locked loop. No steady-state phase offset is required to correct for a constant frequency offset. Following the same line of analysis as for a first-order phase-locked loop, the minimum variance of a second-order loop can be expressed as 1

σφ = 2

e

À

2π T B E / N0

Á

 1+ 2 , 4ζ 1

(12.2.18)

where ζ is the damping coefficient of the second-order system with a natural frequency f n . In the limit as the damping coefficient approaches infinity, the phase-error variance 1 See Kazovsky, Benedetto, and Willner (1996), Section 4.9.3.1.

12.2 Carrier-Phase Estimation

581

of a first-order phase-locked loop given in (12.2.17) is recovered. The damping parameter can be adjusted to balance the time taken to initially acquire the phase estimate with the√ability to track the fluctuating phase noise in steady state. A typical value of ζ is 1/ 2. 12.2.3

Phase Estimation of a Data-Modulated Waveform

An elementary phase-locked loop requires a data-free segment of the received passband waveform. But data-free segments should be rare, perhaps occurring only during initialization. In normal operation, phase locking must be maintained using only the data-modulated carrier. Phase locking to a modulated carrier is more complicated. The data must be removed or suppressed from the phase-locking signal. One method that removes the data by feeding back estimated data after detection is known as decision-directed phase estimation. Decision-directed estimation of the carrier phase will be described for a received waveform r (t ) = s (t )eiφ + n(t ) with only a single modulated component of the carrier. The generalization to a waveform that uses both modulation components is posed as a problem at the end of the chapter. Let s j p(t ) be the real pulse amplitude of the jth symbol, with s j being one of L possible amplitude values defined by the real signal constellation and with p(t ) being the received pulse before filtering. Methods to estimate the channel impulse response and so the received pulse p(t ) are discussed in Section 12.5. Suppose that an estimate ² s j of the received pulse amplitude is obtained for the j th pulse. Given a sequence {² s j } of J symbol estimates over an interval of duration Ts = J T , the estimated real-baseband waveform ² s(t ) can be reconstructed as

²s (t ) =

J −1 ± j =0

²s j p(t − j T ).

The estimated waveform ² s (t ) is equal to the actual signal waveform s (t ), provided that all of the estimated symbols ² s j are correct. In this case, ² s (t )s (t ) is nonnegative even for a pulse p(t ) that extends over multiple symbol intervals. To estimate the phase, substitute the expression ² s (t ) into (12.2.4) to give d

d² φ

Re

¹º

Ts

0

² r (t )² s (t )e−iφ dt

»

= 0,

(12.2.19)

where r (t ) = s (t )ei φ + n (t ) is the received complex waveform. Setting the derivative equal to zero shows that the estimate of the carrier phase is modified by the knowledge of the detected waveform so that

²φ = tan−1

µ ³´ T s (t )r (t )dt 0 ² ´ T ²s (t )r (t )dt . s

0

s

Q

(12.2.20)

I

This reduces to the expression for the unmodulated carrier given in (12.2.6) on setting

²s (t ) = A.

582

12 Channel Estimation

rQ

Filter and sample

sin z(t)

z

CLO cos z (t)

Loop filter

Filter and sample

Re[s(t)ei + n(t)] φ

Hardlimiter

±1

rI Removed for Costas loop

Figure 12.3 Decision-directed phase estimation for binary phase-shift keying. Removing the

hardlimiting operation results in a Costas loop.

For the case of hard-decision detection of binary phase-shift-keyed data, the estimated data values ² s j used to construct the estimated waveform ² s (t ) can be obtained for each j using a hardlimiter, which is represented by the sign function defined in (2.1.6). For the jth symbol, this function gives

à ²s j = −11

if r I ( j T ) > 0 if r I ( j T ) < 0,

(12.2.21)

where r I ( j T ) is the demodulated in-phase baseband signal for the j th symbol interval. This hardlimiting operation is shown, simplified, in the lower path of Figure 12.3. The jth sample in the upper path of Figure 12.3 can be written as s j sin φe ( j ), where φe ( j ) is the phase error for the j th sample. The error signal e j at the output of the multiplier before the loop filter is then ej

= s j ²s j sin φe ( j ),

(12.2.22)

with noise, not shown, entering through the quadrature path. Therefore, the multiplication of the sample for the quadrature signal component by the hardlimited output of the in-phase signal component generates a feedback signal that is suitable for phase locking. The sign of the feedback is correct if ² s j = s j , as is usually the case, but is incorrect if ² s j = −s j . A few incorrect detected bits will rarely break the loop lock, provided that the loop time constant is large compared with a bit duration. Decision-directed methods are most appropriate when the signal-to-noise ratio is large. Then the detected data values are accurate when the phase is accurately known, and the phase estimate is accurate when the data values are accurately known. For this case, the phase error φe ( j ) can be updated at the symbol rate. In a small-signal-to-noise-ratio regime, the nonlinear hardlimiting operation is generally not used because the decision errors do more harm than the noise. In this regime, the data modulation for binary phase-shift keying can be removed by the same feedback but without using the hardlimiting operation. The resulting structure is called a Costas loop. Examining Figure 12.3, one can see that when the hardlimiter is removed, the error signal e j at the output of the multiplier before the loop filter is (cf. (12.2.22)) ej

= s j sin φe ( j ) × s j cos φe ( j ) = 21 s 2j sin(2φe ( j )),

(12.2.23)

12.2 Carrier-Phase Estimation

583

with noise, not shown, entering both through the in-phase path and through the quadrature path. This expression has the same form as the error signal given in (12.2.10). The signal after filtering is the input to a controlled oscillator that generates the in-phase signal and the quadrature signal used for demodulation. While the squaring operation in (12.2.23) is suitable for removing the data, it cannot resolve the sign of the carrier. This inherent 180◦ phase ambiguity means that all data zeros and all data ones would be interchanged. This phase ambiguity is easily resolved by various methods during phase initialization, and is not an issue during lock maintenance.

12.2.4

Generalized Likelihood Function

When given a received block r of unknown noisy data with an unknown carrier phase

φ , the grand maximum-likelihood estimate simultaneously recovers both the carrier phase and the data. Referring to (12.2.3) and generalizing sφ to account for all possible data blocks at all phase levels, maximizing the likelihood function of the carrier phase in additive white gaussian noise is equivalent to minimizing the euclidean distance |r − sφ |2 between the received sequence r and the set of all possible transmitted sequences {sφ } parameterized both by the unknown phase φ and by the unknown sequence of data. Minimizing the resulting distance would require demodulating the data for each member of a suitably spaced discrete set of phase values, then choosing the phase that produces the smallest total euclidean distance. With the phase coarsely sampled at every 10◦, this would require 36 demodulators, providing 36 demodulated data sequences, followed by the selection of the demodulated data sequence with the largest likelihood. This demodulator is regarded as impracticable. To reduce complexity, the effect of the data is suppressed before the phase-estimation process. Methods such as a decision-directed loop and a Costas loop, discussed in the previous section, are modifications of an estimator based on the generalized likelihood principle, which forms an estimate that is based on maximizing the expected likelihood function,2 where the expectation is taken with respect to the random data. Maximization of the generalized likelihood function is a useful principle, but one that is not validated by a compelling optimality statement. For a constellation of equal-energy signals in additive white gaussian noise, the generalized likelihood function ±(φ) is expressed in terms of an expectation of the squared distance d 2 (r, sφ ) or, equivalently, of the log-likelihood function L ± (φ) = loge ±(sφ ; r) (cf. (10.2.4a)). Forming the expectation over all sequences, the generalized likelihood function is the expectation

Ä Å Ä Å ±(φ) =. ²±(sφ ; r)³ = ed (r,sφ ) = e−2 Re[r·sφ ] , 2

(12.2.24)

with d 2(r, sφ ) replaced by −2 Re[r · sφ ] (cf. (10.2.4b)) when the other terms have been suppressed, as is appropriate for an equal-energy modulation format. 2 See Falconer and Salz (1977), and Blahut (2010) Chapter 8.

584

12 Channel Estimation

For the case of binary phase-shift keying in additive white gaussian noise, the bit values s j are ± A. The received noisy complex signal is given by r (t ) =

± j

− j T )eiφ + n(t )

s j p(t

= r (t ) + ir (t ), I

(12.2.25)

Q

where p (t ) is the real pulse at the receiver, and n(t ) is a circularly symmetric white gaussian noise process. Before treating the case of a fully modulated binary waveform, the simpler case of only one modulated pulse p(t ) in complex-baseband noise will be developed. Then r (t ) = ± Ap(t )eiφ + n(t ). Now apply (12.2.24). For this case, the two corresponding log-likelihood functions L ± (φ,± A , ) are (cf. (10.2.4b))

L± (φ,± A , ) = ±

2A Re N0

¹º ∞

−∞

r ∗ (t ) p(t )eiφ dt

»

.

(12.2.26)

For an equiprobable prior on the data, the expectation of the likelihood is

±(φ) = 21 ±(φ, A) + 21 ±(φ,− A) = 21 eL (φ, A) + 21 eL (φ,− A ) ´∞ ´∞ = 12 e(2 A/ N )Re[ −∞ r ∗(t )p (t )e φ dt ] + 21 e−(2 A/ N )Re[ −∞ r ∗(t )p(t )e φ dt ] Æ ( ´ ∞ r (t ) p(t )dt + sin φ ´ ∞ r (t ) p (t )dt )Ç = cosh (2A/ N 0) cos φ −∞ −∞ Á R (0)cos φ + R (0)sin φ Â = cosh , (12.2.27) N /2A 0

i

0

1

Q

I

I

i

0

Q

0

where

.

R I (0) =

.

R Q (0) =

º∞ º−∞ ∞ −∞

r I (t ) p(t )dt ,

(12.2.28a)

r Q (t ) p(t )dt

(12.2.28b)

are the projections of the noisy in-phase component r I (t ) and the noisy quadrature component r Q (t ) onto the known real pulse p(t ). The logarithm of each side gives the squared distance d 2(φ) = L(φ) as the loglikelihood loge ±(φ),

Á R (0)cos φ + R (0)sin φ Â d (φ) = log e cosh . (12.2.29) N0 /2A To find the value of φ that minimizes this expression, set the derivative with respect to φ equal to zero, which gives Á R (0)cos φ − R (0)sin φ Â Á R (0)cos φ + R (0)sin φ Â tanh = 0. (12.2.30) N 0/2 A N 0/2A 2

Q

I

I

Q

I

Q

This expression is satisfied if either of the two terms is zero. Of these two, the minimum occurs when the first term is zero, which is posed as an end-of-chapter problem. Setting the first term to zero gives the estimated phase ² φ as

12.2 Carrier-Phase Estimation

²φ = tan−1

Á R (0) Â . R (0) Q

585

(12.2.31)

I

When p(t ) is a rect pulse, (12.2.30) reduces to the maximum-likelihood phase estimate for an unmodulated carrier (cf. (12.2.2)). Expression (12.2.30) extends to a block of length K of data-modulated pulses in white gaussian noise given in (12.2.25), noting that for a Nyquist pulse the terms corresponding to the individual bits separate into a summation. Therefore, the squared distance d 2(φ) given in (12.2.29) is modified to read d

2

Á R ( j T )cos φ + R ( j T )sin φ Â (φ) = loge cosh , N 0/2A j =0 ±

K −1

I

where R I (t ) = R Q (t ) =

Q

º∞ º−∞ ∞ −∞

(12.2.32)

r I (τ) p(τ

− t )dτ,

(12.2.33a)

r Q (τ) p (τ

− t )dτ

(12.2.33b)

are the components of the output of the matched filter p(−t ), which is the time-reversed copy of the real pulse p(t ). When t = 0, (12.2.33) reduces to (12.2.28). The justification of (12.2.32) is asked for in an end-of-chapter problem. For a single pulse, d2 (φ) reduces to (12.2.29). Setting the derivative of (12.2.32) equal to zero gives

± Á R ( j T )cos φ − R ( j T )sin φ Â Á R ( j T )sin φ + R ( j T )cos φ Â tanh = 0. N 0/2 A N0/ 2A j =0

K −1

Q

I

Q

I

(12.2.34) The value of φ that maximizes (12.2.32) can be determined using by an iterative search over φ for the full K -term expression given in (12.2.34), which uses all K samples in a batch. To this end, the phase φ²+1 for the (² + 1)th iterate is expressed recursively in terms of φ² for the ²th iterate and an update ³φ² as given by

φ²+1 = φ² + ³φ² Â Á R ( jT) Â K −1 Á C ± R1 ( j T ) 2 tanh , = φ² + K N0 /2A N 0/2A j=0 where

. . R 2( j T ) = R ( j T )sin φ² + R ( j T )cos φ² ,

R 1( j T ) = R Q( j T )cos φ² − R I ( j T )sin φ² , Q

I

(12.2.35)

(12.2.36a) (12.2.36b)

with C being a constant that controls the rate of convergence of the search. This is a batch process in which the update ³φ² is determined by iterating on the entire saved block of K symbols using the value of φ² from the most recent iteration to compute φ²+1 .

586

12 Channel Estimation

Sample at jT

p(–t) sin 2π f c t + φ (t)

Loop filter

CLO cos 2π f c t + φ (t)

Sample at jT

p(–t) A j p(t − jT )cos 2 π f c t + φ (t)

r (t) =

tanh

2 (· ) N0

j

Figure 12.4 Recursive phase estimator for binary phase shift keying based on the generalized likelihood function in the form of a control loop.

This batch iteration can be recast in the spirit of the phase-locked loop by reducing the batch process to only a single term of the sum given in (12.2.35) for which j is equal to k. The iteration using this single term is reduced to

Á R (kT ) Â Á R (kT ) Â 1 2 tanh . (12.2.37) φk +1 = φk + N0 /2A N0 /2A An estimate of the update ³² φ over a time interval ³t can be taken to be proportional to the difference φk +1 − φk , which is the summand in (12.2.37). Taking the limit as ³t goes to zero suggests a differential equation with d² φ /dt given by Á Â Á d² φ = C R (t )cos ²φ − R (t )sin ²φ tanh R (t )sin ²φ + R (t )cos ²φ Â . (12.2.38) dt N 0/2 A N0 /2A C K

Q

I

Q

I

A block diagram of this recursive phase estimator at complex baseband is shown in Figure 12.4. This phase estimator is the same as in Figure 12.3 except that the hardlimiter is replaced by the hyperbolic tangent function, with the argument of that function weighted by the signal-to-noise ratio. For large signal-to-noise ratios, tanh(x ) ≈ sgn(x ) (cf. (2.1.6)), which implements hard-decision detection for binary phase-shift keying. This case corresponds to decision-directed phase estimation. For small signal-to-noise ratios, tanh(x ) ≈ x, the term 2/ N0 is absorbed into the loop gain, and the estimated symbol value is not used in the feedback. This case corresponds to a Costas loop. For intermediate values of the signal-to-noise ratio, other approximations to the hyperbolic tangent, described as partial hardlimiting, may be appropriate.

12.3

Clock-Phase Estimation The time at which the received filtered waveform is sampled is governed by a local symbol clock. The alignment of the phase of this local clock to the demodulated baseband waveform is called clock-phase synchronization. Symbols are transmitted at times j T and received at local times j T + τ(t ) for τ(t ) = mT + α(t ), where m is an integer offset, and α(t ) is the incremental timing offset α(t ) = τ( t ) ( mod T ) from

12.3 Clock-Phase Estimation

587

the local clock, which varies over [0, T ]. Methods of estimation of the clock phase, including maximum-likelihood estimation, are as described for the estimation of the carrier phase in Section 12.2. The noisy demodulated waveform before the detection filter is p(t − α) + n(t ), where α is an unknown delay in the interval [ 0, T ] that is to be estimated. The corresponding clock phase is θ = α 2π/ T . The study of clock-phase estimation begins with the case of a known pulse p(t ) received in circularly symmetric white gaussian noise n(t ) with a power density spectrum N 0/2 = σ 2 . ´T Minimizing the estimation error requires maximizing 0 p∗ (t ) p(t −α)dt with respect to α, so that ¹º T » d ∗ Re p (t ) p(t − α)dt = 0. (12.3.1) dα 0 The integral is the real part of the matched filter output at time α. Finding the value of α at which the derivative is equal to zero is equivalent to finding the time at which the matched-filter output is maximum. The output waveform is sampled at that maximum to form the statistic for data detection. Various methods can be used to determine the peak value of the matched filter output. The decision-directed delay-locked loop for binary phase-shift keying, shown in Figure 12.5, is the equivalent of the decision-directed phase-locked loop shown in Figure 12.3. The controlled local oscillator used in the phase-locked loop is replaced by a controlled local clock in the delay-locked loop. This clock can be modeled as a sequence of sampling impulses or as a square-wave clock waveform for which the samples are generated at each clock transition. The time of the transition is incrementally advanced or delayed according to a discrete delay-lock error signal. This error signal is generated by the product of a sample of the derivative of the matched-filter output, which is shown in the upper path in Figure 12.5, and the estimated binary data value after a hard-decision detection process implemented by a hardlimiter, which is generated in the lower path. The derivative can be approximated by a first difference between an early sample and a late sample surrounding the tentative maximum. At the peak value of the d dα r∗ (t)

rp(t – α)

Sample at kT

Controlled Clock

Error signal Hardlimiter

Matched filter output rdet (t)

±1

Sample at kT Removed for low SNR

Figure 12.5 Decision-directed clock estimation for binary phase-shift keying using a hardlimiter

on the output of the matched filter. If the signal-to-noise ratio is small, the hardlimiting operation may be removed.

588

12 Channel Estimation

output of the matched filter without noise, the value of the first difference is zero, the error signal is zero, and the loop is locked. If the value of the first difference is not zero, then the error signal adjusts the phase of the clock, reducing the error signal to zero. Noise, when present, will lead to an error in the phase of the clock. As for a phase-locked loop, when the signal-to-noise ratio is small, the decision errors made by the hardlimiting operation in the feedback are more harmful than the original noise. For this case, the hardlimiting is removed and the error signal ek for the controlled clock can be written as ek

Á dr (kT + α)  r (kT + α) dα ¶ 2 C dr (t ) ¶¶ = 2 dt 2 ¶ , =C

t =kT +α

(12.3.2)

where r (t ) is the output of the matched filter, and C is the loop gain. The expected value of this error signal is zero at the value of α for which the output of the matched filter is maximum. As for a phase-locked loop, the hardlimiting operation can be replaced by a hyperbolic tangent for arbitrary values of the signal-to-noise ratio. The justification for this statement is sought in a problem at the end of the chapter. As before, for large values of the signal-to-noise ratio, the hyperbolic tangent becomes a hardlimiter. For small values of the signal-to-noise ratio, the hyperbolic tangent becomes a linear function.

Joint Estimation

As developed herein, carrier-phase synchronization and clock-phase synchronization were each derived without consideration of the other. This means that, for estimation based on feedback control loops, there are two interacting estimation procedures. The two control loops can be merged to provide a better estimate of both quantities than can be obtained by estimating the parameters separately. Combining the carrierphase estimation process shown in Figure 12.3 with the clock estimation process shown in Figure 12.5 leads to a joint decision-directed estimate such as that shown in Figure 12.6, where the carrier-phase estimation uses a first-order phase-locked loop. Such an arrangement locks both on the carrier phase and on the clock phase, provided that both change slowly, but has complex dynamics when the lock is lost. Similar joint estimation techniques can be implemented for other modulation formats or channel parameters.

12.4

Frame Synchronization The received waveform given in (12.1.1) is delayed by τ = mT + α. To synchronize frame time, the parameter m, which is the number of symbol intervals to the beginning of the next dataframe, must be estimated. A common form of frame synchronization uses a specific block of symbols, called a synchronization marker, synchronization preamble, or syncword. The synchronization marker may appear intermittently. A frame

12.4 Frame Synchronization

sin z(t) r~ (t)

rQ

Matched filter and sample

Carrier-phase estimation

z

CLO cos z(t)

Matched filter and sample

Hardlimiter

rI

d

.

Output Symbol-clock estimation

Controlled clock



589

Matched filter and sample

Figure 12.6 Decision-directed joint estimation of both the symbol clock and the carrier phase for binary phase-shift keying.

Sync marker

Sync marker Data

Frame in Synchronization

Sync marker Data

Sync marker Data

Data

Frame out of Synchronization

Figure 12.7 Frame synchronization based on using markers. An arbitrary block of bits

corresponding to the length of a frame may or may not be aligned with the beginning of the frame.

synchronization protocol will determine how to respond to false or missed marker detections. A frame synchronization protocol is informed by block-detecting one or more synchronization markers to determine the frame boundaries. A datastream with a periodic synchronization marker is shown in Figure 12.7. A synchronization marker used with a binary phase-shift-keyed waveform consists of a dataframe of N bits given by {b0, b1 , b2 , . . . , bn−1, d0, d1 , d2 , . . . , dN −1−n }, where the fixed sequence {b0, b1, b2 , . . . , bn−1 } is the synchronization marker, and the sequence {d0 , d1 , d2, . . . , dN −1−n } is the random encoded data. A synchronization marker can be detected using hard-decision-detected binary data values. The detection criterion may require that the n-bit marker be observed exactly, or it may allow h bits to be incorrect. A false marker detection occurs when a segment of the random binary data sequence {di }, possibly overlapping part of the marker, matches the synchronization marker according to the specified detection criterion. A well-designed marker will rarely be detected in a block that only partially overlaps the marker, as is shown on the right side of Figure 12.7. The probability that such an overlapping block is detected as a valid marker is less than the probability that a block of random data is detected as a valid marker, which is given by (12.4.1). For a fully random block of databits, the probability of a frame synchronization error at a given incorrect position is pe = 2−n when it is required that the entire marker be

590

12 Channel Estimation

detected without any error bits. There are fewer than n such incorrect positions in a frame. Otherwise, if h bit errors are allowed within a detected marker, then

ÁÂ ,

()

h ± n − n pe = 2 k k =0

(12.4.1)

where the binomial coefficient nk = n!/ k !(n − k )! is the number of patterns of length k within the marker of length n. The probability of correctly detecting a frame synchronization marker with not more than h bit errors is pc

=

h Á Â ± n (1 − ρ)n −² ρ ², ² ²=0

(12.4.2)

where ρ is the probability that a single bit is in error. For h = 0, only one pattern will be asserted as a valid marker, and with probability pc = (1 − ρ)n . For h ´ = 0, a marker that has up to h bits in error will be asserted as a valid marker.

12.5

Channel-State Estimation The impulse response h(t ) of a linear, time-invariant information channel must be known or estimated in order to equalize or accommodate the effect of the channel dispersion on the received signal. The dispersion can then be suppressed by adjusting the detection filter or, if feedback from the receiver to the transmitter is available, perhaps by changing the prefilter or the parameters of the coded-modulation format. This estimation process, called channel-state estimation, is discussed in this section. An estimate of the channel impulse response is also required for a receiver that uses the methods of sequence detection discussed in Chapter 11. This is because sequence detection requires an estimate of the noiseless received pulse p(t ) = x (t ) ± h(t ), where x (t ) is the transmitted pulse and h(t ) is the impulse response of the channel. A channel whose linear dispersion varies slowly compared with the symbol rate can be regarded as locally time-invariant, yet with a slowly changing impulse response. The channel impulse response is a random function of time that is largely unchanged over a duration characterized by the channel coherence timewidth. An estimate of the channel impulse response needs to be updated only at a rate faster than the reciprocal of the channel coherence timewidth. In principle, the grand maximum-likelihood estimator would estimate the functional form of the impulse response jointly with the estimation of other channel parameters. Because this joint estimation is not practical, the channel impulse response must be estimated separately. Two channel estimation approaches are considered herein. The first approach estimates the channel by transmitting a known data-free training sequence, then estimating the channel impulse response from the channel output. This training sequence, inserted into the datastream intermittently, can also be used for frame synchronization and for carrier-phase and clock-phase synchronization.

12.5 Channel-State Estimation

591

The second approach does not estimate the channel response. Instead, it directly estimates a detection filter satisfying some optimality criterion either based on using a stationary data-modulated waveform with known statistics or using a training sequence. In this case, one widely used design goal for the detection filter is to minimize the mean-squared error between the sequence of received values at the output of the filter and the sequence of transmitted values. For an additive white gaussian noise channel, this estimation criterion leads to an equalization filter called a Wiener filter. In general, the criterion of minimization of the mean-squared error at the filter output is not the same as choosing the filter to minimize the probability of a detection error. The estimators to be derived in this section are based on the correlation functions for random sequences. The autocorrelation of the arbitrary random sequence {a j } is denoted R aa (²) . The cross-correlation function of the sequence {a j } with the sequence {b j } is denoted R ab (²). For a discrete-time stationary random process, the autocorrelation function is the expectation (cf. (2.2.53))

.

Raa (²) = ²a ∗m +² a m ³,

(12.5.1)

R ab (²) = ²a ∗m +² bm ³,

(12.5.2)

where ² indexes the temporal offset between the two random variables generated from the same stationary random process. The cross-correlation function Rab (²) of sequences {a j } and {b j } is the expectation which reduces to the autocorrelation Raa (²) when {a j } is equal to {b j }. The channel-state estimation is based on the received random datastream. Therefore, the true correlation functions are not known, but may be replaced by sample correlation functions computed from realizations of the random processes. To this end, the sequence {a j } may be replaced by the sequence {r j } of noisy samples, and the sequence {b j } may be replaced by the sequence {s j } of transmitted data values, either known or estimated. When the channel-state estimation is based on a known training sequence, the autocorrelation function is modified by replacing the statistical expectation with a time average. For this case, the sequence {a j } is the training sequence {t j } and the sequence {b j } is a realization {r j } of the random sequence {r j } of noisy samples. 12.5.1

Impulse Response Estimation

The impulse response of a linear channel can be estimated from the response of the channel to a training sequence. A binary training sequence {t j } of total length K may consist of L + M known symbols consisting of M prescribed symbols to be used for training followed by a guard interval of L unused symbol intervals appended to avoid overlapping the response to the training sequence with the response to the data. Although the duration of the impulse response of a lightwave channel is formally infinite (cf. (8.1.4)), that impulse response is eventually negligible. The chosen value of L is deemed to contain nearly all the energy in a received pulse. The discrete convolution of the training sequence {t j } and the sampled channel impulse response {h j } is written here as a vector–matrix product. Together with the additive noise, the channel output is

592

12 Channel Estimation

r = T h + n.

(12.5.3)

The channel impulse response h is a 1 × L column vector that is to be estimated from the channel output. The channel output r = {rk , k = 0, . . . , K − 1} is a sequence of length K . The additive noise n is a 1 × K column vector. The noise vector n of length K has components nk that are independent, circularly symmetric gaussian random variables, each with zero mean and variance σ 2 = N0 /2, for each component. The term T is a K × L matrix given by

⎡ ⎢⎢ ⎢⎢ T=⎢ ⎢⎢ ⎢⎢ ⎣

tL −1

tL −2

... ... ... ...

t K −1

tK −2

...

t0 t1

.. .

.. .

0 t0

.. .

.. .

0 0

.. .

t0 tL −1

.. .

⎤ ⎥⎥ ⎥⎥ ⎥⎥ . ⎥⎥ ⎥⎦

(12.5.4)

The task is trivial when K is equal to L and there is no noise. Then, unless t0 is zero, the lower triangular L × L matrix is invertible and h = T−1r. Otherwise, K is larger than L and the matrix T is not square. It is overdetermined, which can be used to reduce the effect of the noise. An estimate ² h of the channel impulse response h is to be computed from a realization of r. Because r is random, an inference criterion must be adopted to form the estimate. The noise n is gaussian, so a minimum-variance unbiased estimator will be obtained. This estimate will only be derived indirectly. First, the maximum-likelihood estimate ² h is computed. This is straightforward because the noise n is gaussian. Then that maximum-likelihood estimate will be asserted as the minimum-variance unbiased estimate. Given the received vector r of noisy samples satisfying (12.5.3) and the known training sequence matrix T, the maximum-likelihood estimate ² h of length L in additive white gaussian noise is given by the value of h that maximizes the logarithm of the conditional probability density function f (r| h). For additive white gaussian noise, this is a squared euclidean distance given by

È

log f (r|h ) = −log ( 2πσ 2 )K

− 2σ1 2 (r − Th)† (r − Th).

(12.5.5)

The maximum of (12.5.5) is obtained by taking the partial derivative with respect to the vector h, understood to be a componentwise partial derivative, and setting the result to zero. Now regard the complex vector h as though it were a real vector and regard its complex conjugate as though it were a constant real vector. Then the derivative is3

∂ log f (r|h) = 1 (r − Th)† T, ∂h 2σ 2 3 See Kreutz-Delgado (2009).

(12.5.6)

12.5 Channel-State Estimation

593

which, when set to zero, gives the estimator

· ¸ ²h = T†T −1 T†r,

(12.5.7)

where the term T† T is an L × L matrix and the T †r is an L × 1 column vector. Thus, for gaussian noise, the maximum-likelihood estimator is a linear estimator. It remains to be shown that this estimator is the minimum-variance unbiased estimator. To see that it is unbiased, substitute (12.5.3) into (12.5.7) to give

· ¸ ²h = T† T −1 T† (Th + n) · ¸−1 = h + T† T T †n.

(12.5.8)

Clearly ²² h ³ = h because ²n³ = 0, so the estimate is unbiased. Moreover, the estimate of (12.5.7) satisfies the well-known Cramér–Rao bound 4 on minimum-variance unbiased estimates, not discussed here, which means that (12.5.7) is the desired minimum-variance unbiased estimate. † The covariance matrix C = ²² h² h ³ of the estimate now easily follows from (12.5.8). It is

·

¸−1

C = N 0 T† T

,

(12.5.9)

where N 0 = 2σ 2 (cf. (6.2.11)). The estimate in (12.5.7) is to be computed from the received sequence r in response to the training sequence, but this can be a formidable computation. The remainder of this section develops simplifying approximations. The term T†r in (12.5.7) is a vector whose ²th component is

· †¸

± ∗ tk +² rk ≈ R tr (²), T r ²= k =0 K −1−²

(12.5.10)

where the approximant R tr (²) is the ²th term of the cross-correlation function between the training sequence and the received sequence (cf.(12.5.2)) computed from a realization r. The L × L matrix T† T and its inverse [T† T]−1 depend only on the known training sequence and can be precomputed. However, the matrix has L 2 elements, which is large when L is large. To compute (12.5.7), which is done after receiving r, requires L 2 multiplications. To simplify this computation, the matrix can be well approximated by an autocorrelation matrix as follows. Because T†T is an L × L matrix, we can write

·

¸

T†T = ij

4 See Kay (1993).

±

K −1 k =0

t k∗−i tk − j ,

i

= 0, 1, . . . , L − 1; j = 0, 1, . . . , L − 1.

594

12 Channel Estimation

For large K and L , most elements of this matrix depend only on the absolute difference |i − j |, so ignoring some boundary terms gives the simplifying approximation

·

¸

T† T ≈

K −1−| i − j |

±

ij

k =0

tk∗+| i − j | tk

≈ Rtt (²),

(12.5.11)

where ² = |i − j | for large K . The expression Rtt (²) is the ²th term of the autocorrelation ∑ |t |2 = E is the total of the training sequence (cf. (12.5.1)). The term Rtt (0) = i i energy in the training sequence. The approximation in expression (12.5.11) can be justified by letting K approach infinity and noting that ti = 0 both for i smaller than 0 and for i larger than K − 1. Under this approximation, the elements of the symmetric matrix T†T are written using the terms of the sample autocorrelation function Rtt (²) as follows:

⎡ ⎢ T† T =. R = ⎢ ⎢⎣

Rtt (0) Rtt (1)

R tt (1) R tt (0)

... ... .. .. ... . . R tt ( L − 1) Rtt ( L − 2) . . .

Rtt ( L − 1) Rtt ( L − 2)

.. . R tt (0)

⎤ ⎥⎥ ⎥⎦ .

(12.5.12)

Were this matrix a diagonal matrix, the inverse would be easy to state and (12.5.7) would be easy to compute. However, a training sequence that has an autocorrelation function R tt (²) with an impulse-like structure such that T†T is diagonal does not exist. Instead, sequences known as pseudorandom sequences are used as the training sequences. These have an autocorrelation structure that is a good approximation to an impulse-like structure. Within this impulse-like approximation, the autocorrelation function Rtt (²) is R tt (²) =

Ã

E 0

for ² for ²

=0 ´= 0.

(12.5.13)

The matrix R in (12.5.12) is thereby approximated by a diagonal matrix with the diagonal elements equal to E, and the inverse matrix R−1 reduces to a diagonal matrix with the diagonal elements equal to 1/ E . Substituting R−j j1 = 1/ E into (12.5.7) and using (12.5.10), each estimated component ² h ² of the impulse response h of the channel is written as

²h ² = Rtr (²) = Rtr (²) , E R tt (0)

² = 0, 1, . . . , L − 1,

(12.5.14)

where (12.5.13) has been used. Thus, provided the training sequence is long enough to justify the approximation, the minimum-variance estimate of the channel impulse response in additive white gaussian noise is the sample cross-correlation between the noisy received sequence and the known training sequence, normalized by the total energy E of the training sequence.

12.5 Channel-State Estimation

Substituting the approximation (12.5.13) into (12.5.9) gives the variance estimated component of the impulse response as

· ¸ σh2 = C j j = N 0 R−1 j j = ( E / N0 )−1.

595

σh2 of each

(12.5.15)

The variance σh2 decreases inversely with the ratio of the total energy E in the training sequence to the noise power density spectrum N0 .

12.5.2

Detection-Filter Estimation

A discrete-time detection filter {yk , k = 0, . . . , L − 1} based on the minimum meansquared error at the filter output can be estimated directly from the samples {rk } of the received signal without first estimating the channel impulse response. This approach computes the form of the detection filter directly from the received sequence in response to a training sequence or to random data according to statistical knowledge about the data. Such a procedure results in a different filter than the equalization filter discussed in Section 11.2.4. The design of that equalization filter, described as a matched filter followed by a transversal filter, requires knowledge of the received pulse p(t ) obtained from an estimate of the channel impulse response. In this section, the filter is estimated directly without first estimating the channel impulse response. With the problem now cast as an estimation problem, the sampled output component f k for k = L + 1, . . . , K − 1 is written in terms of the required finite-length detection filter y as fk

=

±

L −1

m =0

ym r k−m ,

k

= L + 1, . . . , K − 1,

(12.5.16)

where the sequence r = {r k , k = 0, . . . , K − 1} consists of the received signal samples in additive white gaussian noise. The sequence { yk , k = 0, . . . , L − 1} specifies the detection filter to be estimated. The realization r of the sequence of K noisy samples is known. The filter y of length L is to be computed to satisfy the chosen inference principle on the quality of the output sequence f, which is taken here as the minimum mean-squared error (cf. Section 11.2.3). For each k, the summation in (12.5.16) can be written as an inner product of the desired filter of length L and a shortened vector rk of length L composed, for each k, of L consecutive noisy samples from the sequence {r k } of k samples. Then, for k running from L + 1 to K − 1, (12.5.16) becomes the inner product (cf. (2.1.65)) fk

= y rk = rk y, T

T

(12.5.17)

where y is an L × 1 column vector whose components yk are the desired filter coefficients, and rk = [r k− L +1, . . . , r k ]T is a column vector of noisy samples of length L taken from the full sequence r of length K with components {r 0 , . . . , r K −1 }.

596

12 Channel Estimation

The filter coefficients yk are chosen to minimize the mean-squared error. Specifically, the objective function J (k, y), defined as J (k , y) = ²ek e∗k ³,

(12.5.18)

is to be minimized for each k, where the error ek in the kth component is (12.5.19) = s k − f k for k = 0, . . . , L − 1, and where the target vector s = {s k , k = 0, . . . , K − 1} is the desired filter output. The filter y that minimizes J (k, y) is determined by equating the gradient of (12.5.18) to ek

zero, where the gradient consists of the vector of all relevant partial derivatives. Because ek is complex, a complex gradient is used for conciseness in preference to using the combination of a separate scalar gradient on each component of ek . The complex gradient is defined either as5

∇y = [∂/∂ y1, ∂/∂ y2 , . . . , ∂/∂ y ] L

or as

∇y∗ = [∂/∂ y1∗, ∂/∂ y2∗ , . . . , ∂/∂ y∗ ] , L

where, for y j

T

= u j + i v j , the partial derivatives are ∂/∂ y j =. 12 (∂/∂ u j − i ∂/∂v j ), ∂/∂ y∗j =. 12 (∂/∂ u j + i ∂/∂v j ).

T

(12.5.20a)

(12.5.20b)

(12.5.21a) (12.5.21b)

These two forms of the complex gradient are complementary. Either form can be used as is convenient. The complex gradient has the useful simplifying property that

∇y∗

ɱ M

k =1

Ê

ak yk

=0

(12.5.22)

for any complex vector (a1 , . . . , aM ). The proof of this statement is asked for in an end-of-chapter exercise. It is convenient here to choose ∇y∗ as the form of the complex gradient that is used to derive the desired filter. The complex gradient of the objective function given in (12.5.18) is evaluated by interchanging the expectation and the differentiation. Then the complex gradient of J (k , y) becomes

Ë Ì ∇y∗ J (k, y) = ∇y∗ ek e∗k ( )Ì Ë = (∇y∗ ek )e∗k + ek ∇y∗ e∗k . Because the expression ek = s k − f k = sk − y rk (cf. (12.5.19) and (12.5.17)) is linear for each y j , the term ∇y∗ ek is equal to zero by (12.5.22). The second term is evaluated by noting that e ∗k = s∗k − y† r∗k , and writing T

5 For details regarding the complex gradient, see Brandwood (1983).

12.5 Channel-State Estimation

because ∇y∗ y† r∗k

∇y∗ e∗k = ∇y∗ = −r∗k

¼∗ sk

− y†r∗k

597

½ (12.5.23)

= r∗k and s ∗k does not depend on y. Therefore, ∇y∗ J (k , y) = −²ek r∗k ³.

(12.5.24)

Setting the complex gradient equal to zero results in the vector condition

²ek r∗k ³ = 0,

k = 0, . . . , L − 1,

(12.5.25)

which leads to the minimum mean-squared error. This condition defines the optimal filter yopt . The right side of (12.5.25) is the zero vector of length L. The individual components of (12.5.25) are ²ek r ∗k −m ³ = 0 for m = 0, . . . , L − 1. It remains to determine the filter yopt for which (12.5.25) is satisfied. Taking the complex conjugate of each side of (12.5.25) and then multiplying by the vector yTopt gives ²e∗k yTopt rk ³ = 0, where the vector–vector product has the form of (12.5.16). Writing f opt = yTopt rk (cf. 12.5.17), this becomes

²e∗k f opt ³ = 0.

(12.5.26)

This condition can be described geometrically on the complex plane as an orthogonality condition between the minimum error e∗min and the optimal estimated data value f opt for each k, as is shown in Figure 12.8. Now use (12.5.19) and (12.5.17) to write the condition given in (12.5.25) in terms of the components of the sequences

²ek r ∗k −²³ =

ͳ

sk



µ

L −1 ±

ym r k −m r ∗k −²

m =0

Î

=0

for ² = 0, 1, . . . , L − 1.

Setting the expectation equal to zero and expanding gives L −1 ±

ym R rr (m − ²) = Rr s (−²) for ² = 0, 1, . . . , L − 1,

m =0

(12.5.27)

Ì

Ë

where Rrr (m −²) = r k −m r ∗k −² is the autocorrelation function of the received sequence (cf. (12.5.1)), and Rr s (−²) = ²r ∗k −² s k ³ is the cross-correlation of the received sequence (a)

(b)

sk

sk

e*

e* emin *

f

f opt

k

Figure 12.8 (a) The error e ∗ k for the kth component is the vector difference on the complex plane

between the value s k and the estimated value f k . (b) If the error is minimized, then the error is orthogonal to the estimated value f and is orthogonal to the kth observed value r k for all k. k

598

12 Channel Estimation

and the data sequence (cf. (12.5.2)). For maintenance, the data sequence used for the cross-correlation may be the estimated data sequence ² s k at the output of the filter. For initialization, a training sequence with the same statistical properties as the data may be used. Expression (12.5.27) is valid for each value of ². The resulting set of equations expressed in vector–matrix form is

Ry = wr s ,

(12.5.28)

where R is the matrix form of the autocorrelation function of the received sequence (cf. (12.5.12)), with Rm ² = Rrr (m − ²), and wr s

= ²r∗ sk ³ = [ Rr s (0), Rr s (−1), . . . , Rr s (1 − L))]

T

(12.5.29)

is the vector form of the sample cross-correlation function. The vector y = [ y0 , y1 , . . . , yL −1]T

(12.5.30)

is the vector of length L of the filter coefficients to be computed. The solution to (12.5.28) is given by yopt =

R−1wr s ,

(12.5.31)

where the inverse R−1 exists when R is a positive-definite matrix, which is almost always the case for an autocorrelation function. The causal filter described by yopt is called a finite-impulse-response Wiener filter. The equation specified by (12.5.28) is known as the Wiener–Hopf equation. The error for each estimated component can be determined by using (12.5.17) to rewrite the objective function given in (12.5.18) in terms of matrices. Then J (k, y) = σs2 − y†wr s

− w†r s y + y† Ry,

(12.5.32)

where (12.5.29) is used to write wr†s = ²sk∗ rT³. Provided that R− 1 exists, J (k, y) can be written as follows as the sum of a quadratic term and a residual error term: J (k, y) = (y − R−1wr s )†

R(y − R−1wr s ) + σs2 − w†r s R−1wr s ,

(12.5.33)

showing that the objective function is an L -dimensional quadratic surface with a unique minimum value given by yopt = R−1 wr s . The error at this minimum is

σerr2 = σs2 − w†r s R−1 wr s = σs2 − w†r s yopt,

where the second expression follows from the first expression using yopt = 12.5.3

(12.5.34)

R−1wr s .

Constant-Modulus Objective Function

An alternative linear technique for the task of detection-filter estimation introduced in the previous section starts again with the statement of the detection filter given in (12.5.17), but with a different objective function. The alternative objective function is called the constant-modulus objective function. Recall that {s k } is the target sequence of data values, that { f k } is the sequence of estimated data values, and that {r k } is the

12.5 Channel-State Estimation

599

sequence of noisy samples from which the estimated data values are to be formed. The criterion of minimizing the mean-squared error ²|sk − f k | 2³ (cf. (12.5.18)) was used in Section 12.5.2. In this section the constant-modulus method for estimation of the filter coefficients uses instead the alternative criterion that the squared magnitudes | f k |2 of the detection filter coefficients are equal to a constant for all k. This criterion is suitable for the constant-magnitude phase-shift-keying signal constellation, for which the sk have this property. The advantage of this criterion is that the resulting method does not require a sample cross-correlation function R r s (²) based on the transmitted data {sk }. The objective function that enforces the constant modulus constraint is 6 J (k , y) =

ϼ

½2Ð

| f k| − 1 , 2

(12.5.35)

where the squared magnitude of each symbol is normalized to one. The objective function J (k , y) is minimized when the complex gradient ∇y∗ J (k , y) is equal to zero. Using (12.5.17) to write f ∗k f k as y† r∗k rTk y, the complex gradient with respect to the filter y∗ (cf. (12.5.20b)) is

Ä Å ∇y∗ J (k , y) = 2 (| f k |2 − 1)∇y∗ (y† r∗k rk y) Ä Å = 2 (| f k |2 − 1)(r∗k rk y) Ä Å = 2 (| f k |2 − 1)r∗k f k (12.5.36) = 2²ek r∗k ³, where ek = (| f k |2 − 1) f k is the error in the estimate and where the property ∇y∗ (y† By) = By of the complex gradient has been used with B = r∗k rk . The error T

T

T

is minimized when ek is orthogonal to r∗k . The constant-modulus objective function

does not invoke an L-dimensional quadratic surface as was the case for the minimim mean-squared error. Therefore, there may be local minima at which the gradient is zero. Nevertheless, for small errors, the optimal filter defined by the constantmodulus objective function is a scaled form of the Wiener filter defined by the minimum-mean-squared-error objective function (cf. (12.5.28)).7

12.5.4

Adaptive Estimation

Examining (12.5.31), the calculation of the Wiener filter requires the cross-correlation wr s , which depends on the joint statistics of the received sequence and the data sequence. An initial estimate can be obtained by using a training sequence. The minimum of the objective function can be maintained in an adaptive form using the modulated data sequence. To do so, write the vector of filter coefficients y(k + 1) for the (k + 1)th step in terms of the filter coefficients y(k ) for the kth step and the gradient of the objective function evaluated at k so that 6 Other forms of the objective function replace the value one by a ratio of the moments of the statistical

distribution of the datastream.

7 See Treichler and Agee (1983).

600

12 Channel Estimation

y(k + 1) = y(k) − µ ∇y∗ J (k , y),

(12.5.37)

where µ is a gain parameter that balances the rate of convergence with the resulting accuracy. Using (12.5.36) and replacing the statistical expectation with the instantaneous value gives y(k + 1) = y(k ) + µ ek r∗k .

(12.5.38)

This replacement leads to a gradient that is random. This estimation method is called the stochastic gradient-descent method. Accordingly, the algorithm will execute a random excursion about the optimal solution for the filter yopt with an excursion size that depends on the noise. The constant-modulus algorithm can also be cast into an adaptive form following the same steps that were used to develop an adaptive form of the Wiener filter. Start with (12.5.36) and replace the expectation by its instantaneous value. Incorporating the factor of two into the definition of µ leads to

|2 − 1) f k r∗k = y(k ) − µek r∗k ,

y(k + 1) = y(k ) − µ(| f

k

. where ek = (| f k | 2 − 1) f k is the update error. 12.6

(12.5.39)

Polarization-State Estimation A system that is responsive to the polarization – separately modulating both polarization components – must separate the two polarization components before demodulation. This means that the appropriate polarization axes must be identified. This section presents an estimation method based on the constant-modulus objective function of Section 12.5.3 that, for some modulation formats, is suitable for polarization-state estimation. The constant-modulus objective function uses only the squared magnitude of each component of the polarized signal, which does not depend on the carrier phase. Accordingly, this objective function decouples the estimation of the polarization alignment from the estimation of the carrier phase. This is one reason why the constant-modulus criterion is attractive for polarization-state estimation. A separate estimator is required for the carrier phase and the linear dispersion of each polarization component. To apply the constant-modulus method to polarization-state estimation, define rk

=

¹

rak rbk

»

(12.6.1)

as the kth noisy vector sample. The two complex components r ak = (raI , raQ )k and rbk = (rbI , rbQ )k are the complex-baseband samples for the two received polarization states for sample interval k as shown in Figure 10.15. Define the vector fk = [ f , f ]T as the estimate of the transmitted polarization state ak bk where the estimated transmitted polarization components (a, b) are defined in Figure

12.7 Estimation of Spatial Modes

601

10.15. Replacing the detection filter y of Section 12.5.2 with the transformation U used to form the estimated polarization state gives (cf. (12.5.17))

= U† rk , (12.6.2) where U is a unitary 2 × 2 matrix so that U† = U−1 . The matrix U describes the fk

change in the polarization state from the transmitter to the receiver for the kth sample (cf. (8.1.14)). This change is described by a transition on the surface of the Poincaré sphere (cf. (2.3.59)). A suitable objective function J (k, U) for polarization-state estimation is the sum resulting from the separate application of the constant-modulus criterion to each estimated polarization component. Therefore,

ϼ ½2 ¼ 2 ½2 Ð 2 J (k , U) = | f ak | − 1 + | f bk | − 1 ½2 ¼ ½2 ¼ = |U11rak + U12r bk |2 − 1 + |U21rak + U22rbk |2 − 1 .

(12.6.3)

The complex gradient is determined by taking the complex partial derivative ∂/∂ Ui∗j for i = 1, 2 and j = 1, 2 (cf. (12.5.21b)). This leads to a scalar expression for each of the four components Ui∗j . Setting each expression equal to zero leads to four scalar equations, which can be written compactly as a single matrix equation

²ek r† ³ = 0,

(12.6.4)

²U(k + 1) = ²U(k) − µek r† ,

(12.6.5)

where the term ek r† is a 2 × 2 matrix generated from the outer product of an error vector ek = [e ak , ebk ]T with components eik = (| f ik | 2 − 1) f ik for i ∈ {a , b} and the vector r = [r ak , r bk ]T of received samples. As was done for the detection filter in Section 12.5.2, this method can be cast into an adaptive algorithm for estimating the polarization. Using (12.6.4) and generalizing (12.5.39), the update equation for the transformation U used for polarization-state estimation is where the scalar

µ is a gain parameter that controls the rate of convergence, and where

U(k ) is constrained to be a unitary matrix for a lossless transformation.8 12.7

Estimation of Spatial Modes A communication system that uses multiple spatial modes – separately conveying a waveform on each of two or more spatial modes – must separate and demodulate the spatial modes at the receiver. This is feasible for the small number of modes of a fewmode fiber. Spatial multiplexing at the transmitter and the estimation of the corresponding spatial modes at the receiver is a difficult task. At the transmitter, this difficulty stems 8 See Kikuchi (2008).

602

12 Channel Estimation

from practical issues associated with generating, modulating, and multiplexing multiple spatial modes into one fiber. Similar issues arise in demultiplexing and demodulating multiple spatial modes at the receiver. Moreover, the channel can strongly couple the spatial modes in a time-varying manner requiring mimo processing at the receiver. Consequently, spatial multiplexing and spatial-mode estimation become intractable for a large number of modes. For a small number of spatial modes, let N subchannels be defined by an N × N channel matrix that maps N input spatial modes into N output spatial modes. Generalizing the methods described for a single-input single-output channel, the channel matrix for a mimo channel can be estimated using a set of identical training sequences applied to each input subchannel. Alternatively, a matrix detection filter that will filter and separate the subchannels can be, in principle, directly estimated using an appropriate objective function without estimating the channel matrix. Two approaches are described in the following subsections.

12.7.1

Channel Matrix Estimation for Multiple Spatial Modes

For a discrete-time finite-impulse-response mimo channel, the nth received subchannel is related to the mth transmitted subchannel by a scalar impulse response h nm (²) of length L. The set of impulse responses {h 11(²), . . . , h N N (²)} defines the elements of the discrete-time channel matrix impulse response function h(²) (cf. (8.1.18)). Then h can be regarded as a three-dimensional array with indices n, m, and ². Write the nth row of the channel matrix h(²) as a column vector of length N L given by

= [hn1, . . . , hn N ] , (12.7.1) where the matrix element hnm = [h nm (0), . . . , hnm (L − 1)] is itself a row vector of length L describing the scalar impulse response h nm (²). An estimate of h n can be obtained by using a training sequence {t j } of length K with T

hn

T

T

T

a matrix representation T (cf. (12.5.4)) that is the input to the nth transmitter subchannel for each of the N such subchannels. The response rn for the nth received subchannel is written as a column vector of length K , rn

= [r n (0), . . . , r n ( K − 1)] , T

(12.7.2)

where r n (²) is the ²th noisy sample for the nth received subchannel. The vector rn is related to hn by (cf. (12.5.3)) rn

= Shn + n,

(12.7.3)

where S is a K × N L matrix generated by concatenating the K × L training matrix T given in (12.5.4) N times. The column vector n of length N L has N L components that are independent circularly symmetric gaussian random variables. For the nth received subchannel, (12.7.3) has the same form as (12.5.3). Accordingly, the estimated impulse response vector ² h n , which consists of N scalar impulse responses, can be determined by modifying (12.5.7) so that

12.8 References

· ¸ ²hn = S†S −1 S† rn ,

603

(12.7.4)

where a separate estimation is required for each received subchannel n. These estih N N (²)} of scalar impulse mates may be done in parallel, leading to the set {² h11 (²), . . . , ² responses that characterize the channel matrix h of the N × N spatial-mode mimo channel. 12.7.2

Modal Detection-Filter Estimation

The same formalism can be used to compute a detection filter yn that is applied to the nth received subchannel rnk in order to estimate the data f nk in that subchannel (cf. (12.5.17)). This is f nk

= yn rnk = rnk yn , T

T

(12.7.5)

where n indexes the subchannel, and where k indexes the samples of the nth vector used to form the estimate. For each received subchannel n, all of the methods discussed in Section 12.5.2 can be directly applied to a linear mimo channel by replacing the singleinput single-output channel training matrix T by the multi-input multi-output channel training matrix S. For example, the stochastic-gradient-descent method used to update the detection filter yn for the nth output subchannel is (cf. (12.5.38)) yn (k + 1) = yn (k) + µenk r∗nk ,

(12.7.6)

where rnk = [r n,(k −L +1) , . . . , r nk ] T is an L × 1 column vector of noisy samples for the nth output subchannel used to form the estimate, and enk is the error in that estimate. For the mean-squared-error objective function, enk = s nk − f . For the constant-modulus objective function, enk

12.8

= −(| f nk |2 − 1) y nk .

nk

References The general topic of estimation is described in Kay (1993). The analysis of phase-locked loops is presented in Viterbi (1966) and in Lindsey and Simon (1973). Synchronization is covered in Blahut (2010). The application of the minimum-mean-squared-error technique to multi-input multi-output channels is covered in Al-Dhahir and Sayed (2000) and in Tse and Viswanath (2005), with the application to lightwave systems presented in Han and Li (2005) and in Ip and Kahn (2007). Properties of the complex gradient are discussed in Brandwood (1983). Properties of differentiation with respect to a complex vector are discussed in Kreutz-Delgado (2009). Equalization of a dispersive multi-input multi-output channel is presented in Ahmed, Ratnarajah, Sellathurai, and Cowan (2008). The constant-modulus algorithm is reviewed in Johnson, Schniter, Endres, Behm, Brown, and Casas (1998). The constant-modulus receiver and the Wiener receiver are compared in Zeng, Tong, and Johnson (1998) and in Bellanger (2004). Polarization alignment in lightwave systems based on a constant-modulus constraint is discussed in Roudas, Vgenis, Petrou, Toumpakaris, Hurley, Sauer, Downie, Mauro,

604

12 Channel Estimation

and Raghavan (2010) and in Kikuchi (2008). Estimating the channel matrix of a modedivision multiplex channel is discussed in Ryf, Randel, Gnauck, Bolle, Sierra, Mumtaz, Esmaeelpour, Burrows, Essiambre, Winzer, Peckham, McCurdy, and Lingle (2012).

12.9

Historical Notes Historical notes on aspects of equalization and estimation are given in Haykin (1996) and in the bibliographic notes in Sayed (2008). Minimum-mean-square estimation for random processes was studied in Kolmogorov (1939) and in Wiener (1949). The Wiener–Hopf equations were derived in Wiener and Hopf (1931). Stochastic-gradientdescent methods were first discussed in Widrow and Hoff (1960). Adaptive zero-forcing equalization was first discussed in Lucky (1965). The constant-modulus algorithm was developed in Sato (1975) and in Godard (1980). It was first applied to polarization-state estimation for nonlightwave systems in Treichler and Larimore (1985).

12.10

Problems 1 Second-order phase-locked loops A loop filter used in a second-order phase-locked loop has the transfer function a HL f 1 i2 f

( )= + π

where a is a constant. The phase-locked loop response under a linear approximation is described by

(´( f ) − ´²( f )) , where ´( f ) is the Fourier transform of the phase and ² ´( f ) is the Fourier transform Z ( f ) = C 1 HL ( f )

of the output of the controlled oscillator. (a) Starting with (12.2.12), and using the properties of the Fourier transform, show that ²´( f ) = C 2 Z ( f ). i2π f

(b) Define H ( f ) = ² ´( f )/´( f ) as the ratio of the phase estimate input phase ´( f ). Show that H ( f ) can be written in the form H( f ) =

2iζ( f / f n ) + 1 −( f / f n )2 + 2iζ( f / f n ) + 1 .

²´( f ) to the

Express the natural frequency f n and the damping parameter ζ in terms of the parameters a, C1 , and C 2. (c) The error transfer function He ( f ) is defined as the difference between the ideal transfer function H ( f ) = 1 and the actual transfer function H L ( f ) so that He ( f ) = 1 − HL ( f ). Derive He ( f ) and show, in the absence of noise, that ² φ is equal to φ in steady state.

12.10 Problems

605

2 Nonlinear analysis of a phase-locked loop (requires numerics) A phase-locked loop that is not well-locked does not satisfy sin e e . The loop response is nonlinear, and the phase-noise probability density function is no longer gaussian. For a first-order phase-locked loop, the phase-noise probability density function is9

θ ≈θ

f (θe ) =

eα cos θe , 2π I0 (α)

(12.10.1)

where α = 1/σ θ2e and I0 (x ) is the modified Bessel function of the first kind and order zero. Suppose that this probability density function is approximated by a zeromean gaussian distribution f (θe ) characterized by a root-mean-squared phase error σθe . Determine σ θe such that the squared error

Àº 𠶶 ¶2 f exact (θe ) − f gauss(θ e )¶ dθe −π

is less than 5%. Determine the approximate range of the validity of using a gaussian distribution for the probability density function of the phase noise. 3 Correction of carrier-phase offset using equalization A passband sampler incorrectly samples with respect to an offset carrier frequency f cµ instead of with respect to the correct carrier frequency f c . Show that a sufficiently good equalization procedure will automatically correct for small offset errors in the carrier frequency. 4 Decision-directed phase estimation A received waveform is

r (t ) =

J −1 ± j =0

s j p(t

− j T )eiφ + n(t ),

where the s j are each an element of a complex signal constellation, φ is an unknown phase, and n(t ) is circularly symmetric gaussian noise. Derive the decision-directed phase estimator. Sketch a block diagram for quadratic phase-shift keying. 5 Generalized likelihood ratio for a block Starting with the expression for a received block of symbols given in (12.2.25), derive the form of the squared distance d 2 given in (12.2.32) for a block of symbols that do not have the same energy.

(φ)

6 Generalized likelihood ratio for clock-phase estimation Derive a replacement for the hardlimiter of a decision-directed phase estimator that involves the hyperbolic tangent function. How should the hyperbolic tangent function be approximated for small signal-to-noise ratios and for large signal-to-noise ratios? 9 See Tikhonov (1960).

606

12 Channel Estimation

7 Comparing polarization demodulation with I/Q demodulation Suppose that the estimated polarization basis is misaligned so that the block sample value r after the misalignment is related to the block input s at the transmitter by

r = Ts,

where T is the polarization transformation given with χ = 0 so that the misalignment is described solely by the angle ξ (cf. (2.3.61)). Compare the functional form of this kind of misalignment with the effect of a constant phase error θe in the estimate in the I –Q axes for the demodulation of the two quadrature signal components. 8 Synchronization phase error A phase-synchronization method can track the phase noise t to within an error signal e t , which is well modeled as bandlimited gaussian noise. Explain how, if the phase noise were removed by eliminating its cause, the phase-synchronization method could then be used to increase the information rate.

φ( )

φ()

9 Noise analysis of a delay-locked loop Following the noise analysis used for a phase-locked loop, determine the effect of E N0 on the accuracy of a delay-locked loop.

/

10 Polarization-state estimation Consider a system that transmits on a single linear polarization mode, and receives on a single linear polarization mode. The receiver must determine the angle between the transmitted polarization basis et1 et2 and the reference basis er 1 er 2 defined at the receiver. This polarization transformation is given by (2.3.61). Let be the random error in this estimated angle. Suppose that, for small , the probability density function f is a zero-mean gaussian distribution with 2 variance ³ξ . (a) Determine an expression for the probability density function of each of the two 2 . polarization components in terms of ³ξ (b) Compare this expression with the expression for one signal component in the presence of a phase error. In what ways are the expressions similar? In what ways are they different?

ξ {² ,² }

{² , ² }

³ξ

³ξ

(³ξ)

σ

σ

11 Complex gradient Let f y1 y2 y M be a function of the M complex variables y1 y2 Prove (12.5.22), namely that the complex gradient y∗ satisfies

( , ,..., )

∇y∗

ɱ M k =1

Ê

ak yk



, ,..., y

M

.

=0

for any complex vector (a1, . . . , aM ).

12 Polarization control Let the received lightwave with bit energy Eb be linearly polarized along a direction defined by the unit vector p c . Let the local oscillator be linearly polarized along a direction defined by the unit vector pLO .

ˆ

ˆ

12.10 Problems

607

(a) Derive an expression for the demodulated bit energy E b (θ) in terms of the angle θ between pˆ c and pˆ LO . (b) Suppose that a polarization estimator can track the angle θ so that the probability density function of θ after estimation is a zero-mean gaussian random variable with variance σθ2. Determine the maximum variance allowed for the estimator to limit the power penalty in the received signal to less than 1 dB for 99% of the cases.

13

Channel Codes

As symbols pass through a communication channel, they become contaminated by noise, distortion, and interference. These impairments are hidden from the user by a data-transmission code which is designed to correct errors, to control errors, or to prevent errors, as may be appropriate. In addition to the various impairments, the symbols transmitted through a channel may be subject to constraints that forbid certain symbol subsequences so as to control the transmitted spectrum or to avoid troublesome patterns. These sequence constraints are accommodated by the use of a data-modulation code. Data-transmission codes and data-modulation codes are studied in this chapter. Data-transmission codes and data-modulation codes are two kinds of channel code, each of which intentionally introduces symbol dependences into a transmitted sequence of symbols, but for different purposes. These dependences provide redundancy that can be used in conjunction with the modulation format to reduce the effect of channel impairments as described by the number of errors or by various forms of noise. Errors may occur randomly and independently or may occur as a burst of correlated errors invalidating an entire string of symbols. A channel code creates symbol dependences within the encoded sequence, which can be spread across time in a single-input single-output channel or across time and space in a multi-input multi-output channel. A channel code inserts redundancy to create diversity that spreads the user data across the available degrees of freedom in space, time, and polarization. The redundant information prevents or corrects errors, enabling reliable communication whenever individual subchannels are noisy or are otherwise corrupted. A data-transmission code, or error-control code, prevents or corrects errors by using knowledge about the anticipated types of noise and other channel impairments to design appropriate dependences into the codewords. Early examples of error-control codes were designed to correct errors in blocks of hard-detected symbols wherein the demodulator first determines each symbol, but sometimes a demodulated symbol is in error. Modern channel decoders more commonly work directly with soft-detected symbols, and the notion of a symbol error prior to the decoder output is not relevant. A soft decision may be a quantized replica of the received sample. A data-transmission code for a well-modeled channel with soft-decision samples will usually perform better than a data-transmission code for that channel modified to use hard-decision output samples but otherwise remaining the same. A soft-decision decoder requires a probabilistic channel model, and so is sensitive to imprecision in the channel

13.1 Code Structure and Code Rate

Dataword in

Data-transmission encoder

Data-modulation encoder

609

Modulator

Channel Dataword out

Data-transmission decoder

Data-modulation decoder

Demodulator

Figure 13.1 Block diagram of the encoding and decoding process.

model. A hard-decision decoder does not require a detailed probabilistic channel model, and so is more robust. A data-modulation code has a different purpose and structure. It uses knowledge about how specific patterns of transmitted symbols cause problems because of certain propagation characteristics of the channel. A data-modulation code then prevents such undesired patterns. A data-modulation code may also be used to control the spectral characteristics of the modulated waveform by producing a null in the transmitted power density spectrum, or may be used to prohibit problematic symbol patterns such as those that are most likely to produce errors. A data-transmission code and a data-modulation code may be used in conjunction. A typical ordering of these codes is shown in Figure 13.1. Referring to that figure, a dataword is first encoded by means of a data-transmission code that produces discrete codeword symbols as are represented by bits. These discrete symbols may be further encoded using a data-modulation code, if one is used, with the output being a sequence of symbols that is then used by the modulator to generate a sequence of continuous pulses forming the transmitted waveform. The modulated waveform propagates through a channel where impairments such as distortion, noise, and interference are introduced. The received signal is first demodulated, producing hard-detected or soft-detected samples that are the input to a data-modulation decoder, if used, then to a data-transmission decoder. Figure 13.1 shows the data-modulation code placed closer to the channel and hence called the inner code, whereas the data-transmission code is placed further from the channel and hence called the outer code. The data-transmission code and the data-modulation code will be studied separately herein, without regard to possible interactions.

13.1

Code Structure and Code Rate The redundancy within a channel block code for data transmission is generated by an encoding process that maps a dataword into a codeword. A dataword d is a sequence of length k of symbols from a specified alphabet such as the binary alphabet {0, 1} or the q-ary alphabet {0, 1, . . . , q }. A codeword c is a sequence of length n of symbols from the same alphabet, where n is larger than k. The set of all codewords comprises . the codebook. The code rate is defined as the ratio Rc = k/ n. The code rate has units of data symbols per code symbol (or databits per codebit).

610

13 Channel Codes

A binary code for data transmission maps binary datawords of length k into binary codewords of length n, where n is larger than k. Of the 2n possible binary words of length n, only 2 k are codewords. Thus, for n = 7 and k = 4, of the 128 possible sevenbit words, only 16 are codewords. The remaining 112 words are not codewords, and cannot be transmitted. The encoder maps each of the 16 four-bit datawords into one of the 16 seven-bit codewords, and sends that codeword to the channel. The redundancy is used to prevent errors in the decoder output to the user. This example with k = 4 and n = 7 is a very small code. A modern communication system may use a code with k and n in the tens of thousands. 13.1.1

Decoding

Reliable transmission through an information channel requires a code and a corresponding encoder and decoder. Consider a memoryless unconstrained information channel as the exemplar case. This is an information channel for which the channel input symbols can follow each other in any order and the probability of each channel output symbol depends only on the corresponding channel input symbol, not on earlier or later input or output symbols. For a discrete information channel, the channel input symbol s is a discrete random variable that can take on L values from the channel input alphabet. The detected channel output symbol r is a random variable, conditioned on the input symbol, that can take on M values from the channel output alphabet. The channel output alphabet need not be the same alphabet as the channel input alphabet. For a block of symbols, the corresponding block random variables at the input and the output are s and r. Each symbol at the decoder input depends on the modulation format and on the method used to detect that symbol. For techniques based on photon optics, the detected output value corresponds to the discrete energy in the received symbol. For techniques based on wave optics, the detected output value corresponds to the continuous complex value of that symbol. Information theory, studied in Chapter 14, states that every reasonable channel has a capacity C , defined as the largest value of the code rate R c for which good codes exist. Reliable transmission is not possible at code rates larger than the channel capacity C . Propagation through a noisy and impaired channel and the subsequent demodulation and detection create a received sequence of noisy symbol values called the noisy sensed codeword or the senseword. Two types of sensewords produced by two different kinds of detection will be distinguished. In hard-decision detection, each received sample is separately detected to produce a logical value that is passed to a hard-decision decoder. In soft-decision detection, each received sample is separately quantized. These quantized values, which may be regarded as equivalent to the continuous samples, are passed to a soft-decision decoder. In either case, the decoder must recover the user data nearly error-free from the block of sensed symbols. The performance of a channel code is characterized by the probability pe of block decoding error or symbol decoding error. The performance on some channels is also characterized by the coding gain, which is defined as the reduction in the energy needed per coded dataword symbol as compared with the energy needed per uncoded dataword symbol to achieve the same probability of a block error. The encoding of k databits into n

13.1 Code Structure and Code Rate

611

codebits ensures a reliable output, but requires an adjustment of the bandwidth, the data rate, or the signal constellation of the channel. The use of a larger signal constellation is the basis of trellis-coded modulation, which is discussed in Section 13.6. 13.1.2

Classes of Codes

Modern coding theory recognizes three broad classes of codes. These three classes of codes are herein called algebraic block codes, convolutional codes, and composite codes. Each class of code is suitable for a certain class of decoder. These three corresponding classes of decoders are referred to herein as spherical decoders, sequential decoders, and iterative decoders. Each code class together with its corresponding decoder class seems to be most suitable for a particular range of code rates. To this end, it is pedagogically convenient to partition the range of the code rates by three constants known as the critical rate Rcrit , the cutoff rate R0 , and the capacity C of the channel. These will be described later. For each information channel, these constants define three intervals of the real line given as (0, R crit ), ( Rcrit , R 0), and ( R 0, C ). The nature of the coding problem for asymptotically large codes is different, in principle, for code rates in each of the three intervals, changing the nature of the decoding algorithms and the suitable codes. For our pedagogical purposes, the three classes of encoders and decoders will be described against the backdrop of these three intervals, each class of codes being most appropriate to one of the three rate intervals. Spherical decoding based on minimum distance is normally most appropriate for code rates Rc smaller than the critical rate R crit . In this regime, algebraic block codes based on large minimum distance are most suitable, although the best such algebraic codes of large blocklength remain unknown. Above Rcrit , maximum-likelihood block decoding with a suitable code may be superior. For this purpose, sequential decoding is suitable for code rates Rc smaller than the cutoff rate R0 , but becomes computationally intractable for code rates Rc larger than the cutoff rate R 0, though for large codes such decoders may be impractical even for code rates somewhat smaller than R0 . Convolutional codes are amenable to sequential decoding, and so convolutional codes may be attractive for code rates Rc smaller than the cutoff rate R 0. Finally, iterative decoding based on componentwise maximum-posterior decoding is practical for code rates R c smaller than the channel capacity C. In the regime above R0 , composite codes such as turbo codes (Berrou codes) or low-density parity-check codes (Gallager codes) are suitable. For these codes, however, satisfactory analytic performance guarantees are not known. For code rates larger than the channel capacity C, every code is bad. For a channel with capacity C , reliable communication is not possible at a code rate larger than C . Accordingly, this chapter studies these three classes of codes – algebraic codes, convolutional codes, and composite codes – and the corresponding three classes of suitable decoders – spherical decoders, sequential decoders, and iterative decoders. 13.1.3

Nesting of Codes

Algebraic block codes have an elaborate mathematical structure that can be used to enable powerful calculations in the encoder and the decoder. Algebraic codes are useful

612

13 Channel Codes

Data-transmission encoder User data

Outer encoder

Inner encoder

To the information channel

Figure 13.2 Block diagram of an inner and outer encoder for data transmission.

primarily for low code rates such as code rates smaller than Rcrit . For this reason, two layers of coding are sometimes used as shown in Figure 13.2. The inner code may be a composite code, a convolutional code, or an algebraic code suitable for the information channel. Then the inner encoder and inner decoder, as seen only from their input and output by the outer encoder and outer decoder, form a new information channel, perhaps suitable for a large algebraic code. The outer encoder sees this new information channel, possibly in the multibit alphabet obtained by converting the binary datastream into an m-bit-wide parallel datastream. Therefore, an algebraic block code in a large alphabet may be a suitable outer code. The structure of Figure 13.2 then follows. From an operational point of view, this nested structure can be understood as follows. The inner decoder reduces the error rate. The outer decoder then deals with the infrequent, but difficult, error patterns that are incorrectly decoded by the inner decoder.

13.2

Algebraic Block Codes An algebraic block code is a code constructed using the tools of algebra, especially linear algebra. Every algebraic block code is, or is based on, a linear subspace of a vector space. The real or complex number system is not used for such codes because of precision considerations in the arithmetic. Instead, the symbols of an algebraic code are finite in number and represented by a finite mathematical system. The symbols are to be added and multiplied, but not using the usual rules of arithmetic in the real or complex number system. For this reason, an alternative arithmetic system is used.

13.2.1

Galois Fields

An arithmetic system used for algebraic codes is known as a finite field 1 or a Galois field. Each Galois field consists of a finite set of “numbers” called elements of the field, and has its own definitions of the elementary arithmetic operations of addition, subtraction, multiplication, and division. While the definitions of these operations of a finite field may be unfamiliar, the familiar methods of algebra, including the methods of linear algebra and the Fourier transform, remain valid in the arithmetic of a finite field. In particular, algebraic equations can be manipulated in the familiar way, even though the underlying arithmetic might not be familiar. 1 The word “field” as so used in mathematics has a completely different meaning than the use of the word in

electromagnetics.

13.2 Algebraic Block Codes

613

The finite field with q elements exists if and only if q is a prime or a power of a prime. The finite field with q elements is called GF(q ) or F q . The finite field with two elements, called GF(2) or F2 , will be described in Section 13.2.4 as part of a discussion of binary block codes. A finite field with 2m elements, denoted GF(2m ) or F2m , is called a field of characteristic two. It will be described as part of a discussion of nonbinary block codes in Section 13.2.5. A vector v of blocklength n over the field GF(q ) is an ordered set of n elements of GF (q ) called vector components. A vector may contain the same element of the field as a component more than once. The set of all vectors over GF(q ) of blocklength n is a vector space called GF(q )n . A vector v is multiplied by a scalar a by multiplying every component of v by that scalar. A linear combination of the vectors v1, . . . , vk is the vector a1v1 + · · · + ak vk , where a1, . . . , ak are scalars of GF(q ). A set of k vectors in GF (q )n is a linearly independent set if no linear combination of those k vectors is equal to zero. A set of n linearly independent vectors in GF(q )n is called a basis of GF(q )n . ∑ Two vectors v and v± are orthogonal if k vk vk± = 0. In contrast to the real field R, a vector in a finite field can be orthogonal to itself. Moreover, in contrast to the real field R, a set of n mutually orthogonal vectors in a finite field need not be a linearly independent set. Any set of k linearly independent vectors of GF(q)n is a basis for a k-dimensional linear subspace of GF(q )n . 13.2.2

Linear Codes

A linear algebraic block code is a linear subspace of GF (q)n of dimension k. This means that codewords in the field GF(q ) can be described as vectors over GF(q )n , so they can be added componentwise using the addition operation of GF(q) on each component. The properties of a block code in any finite field are described by the following terms. ● A linear block code is a block code for which the componentwise linear combination ac1 bc2 of any two codewords, c1 and c2, is equal to another codeword. ● The Hamming weight c of codeword c is the number of nonzero components of c. ● The minimum Hamming weight min of a code is the smallest Hamming weight of

+

● ● ● ●

13.2.3

w( )

w

any nonzero codeword of the code. The Hamming distance d between two codewords is the number of components at which the two codewords differ. The minimum Hamming distance dmin is the minimum of the Hamming distance over all pairs of distinct codewords. A maximum-distance code is a code that satisfies dmin = n − k + 1. A code is cyclic if, whenever c = {c0 , c 1, . . . , cn −1} is a codeword, c± = {cn−1 , c0 , . . . , cn −2} is also a codeword.

Matrix Description of Linear Codes

The codewords of a linear code over the q-ary alphabet are generated from the datawords by a set of linear equations using the addition and multiplication operations of the finite

614

13 Channel Codes

field GF(q ). The arithmetic operations of a binary code are those of the finite field GF (2). They are described in Section 13.2.4. The arithmetic operations of a nonbinary code are those of a finite field GF (q ) for q ² = 2. They are described in Section 13.2.5. This section discusses the algebraic structure of linear codes based on matrices. This algebraic structure is valid for any finite field. The encoding operation for a code over the finite field GF(q) can be expressed as a matrix multiplication c

= d G,

(13.2.1)

where d is the dataword2 of blocklength k, c is the codeword of blocklength n, and G is a k × n matrix with k linearly independent rows called the generator matrix. Each of these k rows is a basis vector for the encoding process. Expression (13.2.1) is in the familiar form of a matrix equation, but all of the components are elements of the finite field GF(q ) and the arithmetic operations are those of GF(q ). Every codeword c satisfies cHT

= 0,

(13.2.2)

where 0 is the vector with all zeros, and H is an (n − k ) × n matrix called the check matrix. The check matrix is so named because it performs n − k checks on a senseword to verify that it is a codeword. These checks comprise a set of equations that express the dependences between the symbols of the codeword. Every row of the generator matrix G is a codeword, and so every row satisfies the check condition. Therefore, the check matrix is related to the generator matrix G by

GHT = O,

(13.2.3)

where O is the k ×(n − k) zero matrix. The k rows of the generator matrix G are linearly independent. The n − k rows of the check matrix H are linearly independent as well, and are constrained by (13.2.3). A linear code is completely specified by the generator matrix G or check matrix H, and is described summarily as an (n , k ) code or as an (n, k, dmin ) code. However, the k rows of G and the n − k rows of H taken together need not comprise a set of n linearly independent vectors in the vector space GF(q)n . The two matrices may have some rows in common. The rows of G span a k-dimensional subspace of the vector space GF (q )n , which is the code, and the rows of H span the dual space, which is also called the dual code. Of course, for the code defined by the generator matrix G, the check matrix H is not unique. Any basis of the dual space can be used for the rows of a check matrix for that code. One way to compute a check matrix from a generator matrix is as follows. Enlarge the matrix G to an n × n full-rank matrix by adjoining any n − k additional linearly independent rows in the form of another (n − k ) × n matrix A. Then

±G²

M= A , 2 In this chapter, all vectors are row vectors.

(13.2.4)

13.2 Algebraic Block Codes

615

so that M is a full-rank n × n matrix with linearly independent rows, and hence is invertible. Accordingly, the statement MM−1 = I can be partitioned as

± I O ² ± G ²³ −1 A−1 ´ , G (13.2.5) = O I A ³ ´ where M−1 is written as G−1 A−1 . The new submatrices are appropriately denoted G−1 and A−1 because the matrix G−1 is an n × k matrix3 satisfying GG−1 = I, and the matrix A−1 is an n × (n − k ) matrix, satisfying AA−1 = I. Because the upper MM−1 =

right submatrix of the identity on the right side of (13.2.5) is equal to the zero submatrix O, the last n − k columns of the inverse M−1 satisfy GA−1 = O, and so the matrix A−1 is actually the transpose of a check matrix H for the code (cf. (13.2.2)). Because the only requirements on A are that its rows are linearly independent and are linearly independent of the rows of G, this shows that, given either G or H, there are many ways to satisfy (13.2.3). The concepts of a generator matrix and a check matrix are illustrated by describing two small single-error-correcting binary block codes.4 The first code is a binary repetition code. This code simply repeats the value of a single databit n times to create the codeword. There are only two codewords for this code – the all-zero codeword and the all-one codeword. This code of blocklength n has minimum Hamming distance dmin equal to n and a code rate Rc = 1/ n. It is described as an (n,1, n) code, where the third number specifies the minimum Hamming distance. For n = 3, the generator matrix for the repetition code is

³

G=

and one check matrix for this code is

H=

±

1 1

1

1 0 0 1

1 1

´, ²

(13.2.6)

,

(13.2.7)

where the binary “arithmetic” operations given in (13.2.11) must be used to evaluate (13.2.3). This (3.1.3) code has rate Rc = 1/3 and a minimum Hamming distance dmin = 3. A codeword with one error is at Hamming distance 1 from that codeword, but at Hamming distance 2 from the only other codeword. The second example is a binary linear block code in GF(2) for a dataword of blocklength k = 2m − 1 − m, and a codeword of blocklength n = 2m − 1, where m is an integer not smaller than three. The number of check bits is n − k = m. For m = 3, this is called the (7,4,3) Hamming code with a rate R c = 4/ 7. One generator matrix over GF (2) for the (7,4,3) Hamming code is

⎡1 ⎢0 G=⎢ ⎣

0 1 0 0

0 0

0 0 1 0

0 0 0 1

0 1 1 1

1 0 1 1

1 1 0 1

⎤ ⎥⎥ ⎦.

(13.2.8)

3 Because and are not square matrices, −1 and −1 are called pseudoinverses. They are not unique. 4 Modern binary block codes can have blocklengths of several thousand.

G

A

G

A

616

13 Channel Codes

This generator matrix is called systematic because the first four columns of G form an identity matrix. The term systematic means that the k databits appear explicitly as a subblock of the codeword. A check matrix for the (7,4,3) Hamming code is

⎡0 H=⎣ 1

1

1 0 1

1 1 0

1 1 1

1 0 0 1 0 0

0 0 1

⎤ ⎦.

(13.2.9)

The three rows of the check matrix H give the three GF(2) check equations for this code, which are

+ c3 + c4 + c5 = 0, c 1 + c3 + c4 + c 6 = 0, c 1 + c2 + c4 + c 7 = 0.

c2

(13.2.10a) (13.2.10b) (13.2.10c)

The seven columns of H are distinct and nonzero. Therefore, the GF(2) bitwise sum of any two columns cannot equal zero. However, the bitwise sum of columns one, two, and three is zero. Thus there is at least one set of three columns that is linearly dependent but no such set of two columns. Therefore, no codeword with only two ones can satisfy (13.2.2). The minimum Hamming distance dmin is thus equal to 3. The last integer in the (7,4,3) designation of the code is the minimum distance. The (3,1,3) repetition code and the (7,4,3) Hamming code can correct the same number of errors per codeword because both codes have the same minimum Hamming distance, but the blocklengths are different. The code rate for the (7,4,3) Hamming code is 4/7, whereas the code rate for the (3,1,3) repetition code is 1/3.

Minimum Hamming Distance

The error-correcting capability of a linear block code is directly related to its minimum Hamming distance dmin . The minimum Hamming distance of a linear code can be determined directly from a property of the check matrix. The heft of any matrix M with at least as many columns as rows is defined as the largest value of r such that every subset of r columns of M is linearly independent. In contrast, the rank of a matrix M is the largest value of r such that some subset of r columns of M is linearly independent. Accordingly, because cHT = 0 for every codeword c , but not for noncodewords, there is no codeword with weight equal to the heft of H, but there are codewords with weight one larger than the heft of H. This means that the minimum Hamming distance dmin of a code is 1 plus the heft of any check matrix H for that code. In particular, the check matrix in (13.2.9) has heft 2, so it defines a code with minimum distance 3. Clearly the heft of H cannot be larger than the rank of H, which itself cannot be larger than n − k because H is an (n − k )× n matrix. Therefore, dmin − 1 ≤ n − k for every linear code. This inequality is simple to state, but is not very informative for binary codes. The check matrix in (13.2.9) has rank 3, but heft 2, so this inequality is not satisfied with equality for that code. For large codes, it is known that, for any n and k, the minimum

13.2 Algebraic Block Codes

617

distance of an (n, k) binary code is much smaller than the right side of the inequality. However, the largest possible minimum distance of a large (n, k ) binary block code is not known, in general, even for rather small values of n and k.

13.2.4

Binary Block Codes

A binary block code of blocklength n is a set of 2k binary words of length n. An (n , k ) binary block code is used to map the k databits of a dataword d into the n codebits of a codeword c . A linear binary block code is described by a generator matrix G or a check matrix H over GF(2), so all of the elements are zeros and ones. The elements of H that are equal to one describe the dependences in the code. A k × n generator matrix with a k × k identity matrix embedded within it is called a systematic generator matrix (cf. (13.2.8)). Accordingly, a systematic encoder is an encoder for which the pattern of k databits appears explicitly in the codeword, usually at either the start or the end of the codeword. The other codebits in a systematic code, called check bits, are generated as linear combinations of the databits over GF(2). The systematic property is not important in many applications. A regular check matrix is an enlarged check matrix H that contains the same number of ones per column and the same number of ones per row. The check matrix must be augmented with extra dependent rows, otherwise unnecessary, appended to make this condition possible. A regular check matrix is defined such that every bit of a codeword participates in the same number of check equations and every check equation involves the same number of bits. The finite field GF (2) is the set {0, 1} together with an operation of addition and an operation of multiplication. The addition of two bits in GF(2) is defined as modulotwo addition. Therefore, the componentwise modulo-two addition of two binary words is the same as the componentwise boolean XOR operation. The multiplication of two bits in GF(2) is defined as modulo-two multiplication. The componentwise modulo-two multiplication of two binary words is the same as the componentwise boolean AND operation. These two “arithmetic” operations of binary symbols are given by

+ 0 1

0 0 1

1 1 0

× 0 1

0 1 0 0 0 1

(13.2.11)

With these two operations, the two-element set {0, 1} forms a consistent arithmetic system that satisfies the rules of linear algebra. It is the finite field with only two elements called GF (2) or F2 . Because 1 + 1 = 0 in GF(2), −1 = 1. Because 1 × 1 = 1 in GF(2), 1−1 = 1. As usual, subtraction of zero is trivial and division by zero is not defined. Accordingly, subtraction and division in GF(2), though trivial, are defined. The n-dimensional vector space GF(2)n consists of all vectors v = (v1 , v2, . . . , vn ) of length n of elements of GF(2). Two vectors v = (v1 , v 2, . . . , v n ) and u = (u1, u2 , . . . , un ) are added componentwise, with addition defined in GF(2). Multiplication of a vector by a scalar multiplies each element of the vector by that scalar. This is trivial because the

618

13 Channel Codes

only scalars in GF(2) are the elements zero and one. A linear binary code of dimension k is a k-dimensional vector subspace of the vector space GF(2)n . 13.2.5

Nonbinary Block Codes

A nonbinary block code is a code whose symbols are in a finite alphabet of a size q larger than two, usually a size of the form q = 2m . The methods of linear algebra discussed in Section 13.2.3 can be used for any arithmetic system satisfying the axioms of an algebraic field. An algebraic field with q elements, denoted GF(q), exists if and only if q is a power of a prime. For m = 8, for example, there are 256 elements in the algebraic field GF (2m ), corresponding to eight-bit bytes. A nonbinary code sees the elements of the code alphabet as the fundamental arithmetic objects. The algebra does not look inside an element of GF(q), such as a byte, to decompose it into smaller units. A datablock is a sequence of k data symbols, each symbol an element from the code alphabet GF (q ). A codeword is a block of n code symbols, each symbol an element from that code alphabet. All of the discussion of Section 13.2.3 holds without change for nonbinary codes in the field GF(q ). It is necessary only to define valid operations of arithmetic for the finite field GF(q ) that respect the rules of algebra. Because the finite field GF (q ) is unconventional, definitions of addition and multiplication are needed. These definitions are given next.

Construction of Galois Fields

The operations of linear algebra are powerful and can be used to describe encoders and decoders for nonbinary codes in an alphabet of size 2m . To do so requires appropriate definitions of arithmetic operations on that alphabet. Such an arithmetic system with 2m elements is called a Galois field or a finite field and is denoted GF(2m ). To this end, an addition and a multiplication operation must be defined such that the usual rules of linear algebra apply. Then the matrix theory defining codes given in Section 13.2.3 applies without change. The construction of the Galois field GF(2 m ) from the binary Galois field GF(2) mimics the construction of the complex field C from the real field R. The construction of the complex field can be described as the factoring of the polynomial x 2 + 1, which cannot be factored in the real field R. So that the polynomial x 2 + 1 can be factored, one defines √ an “imaginary number” i = −1 satisfying i2 = −1, and defines the complex field C as the set {a + bi}, where a and b are real numbers. Then x 2 + 1 factors as (x + i)( x − i). With this definition, the consistent arithmetic system called the complex field is created. In a similar way, one may try to use the polynomial x 2 + 1 to enlarge GF(2). However, using the binary addition (cf. (13.2.11)) of GF(2), it follows that ( x + 1)2 = x 2 + (1 + 1)x + 1 = x 2 + 1. Therefore x 2 + 1 factors as (x + 1)(x + 1) and so, because −1 = 1, one √ cannot define −1 as a new element to enlarge GF(2). Not to be discouraged, one notes that the polynomial x 2 + x + 1 does not factor over GF(2). This polynomial x 2 + x + 1 does factor over a “complex-like” field GF(4) if one defines α2 = α + 1, where the “imaginary number” α for the finite field GF(4) is analogous to i for the complex field.

13.2 Algebraic Block Codes

619

The four “numbers” in GF(4) are the elements of the set {a + bα}, where a and b are elements of GF(2). Thus, GF(4) = {0, 1, α, α + 1}. Addition in GF(4) is componentwise addition mimicking complex addition. Multiplication mimics complex multiplication, but with α 2 = α + 1 instead of i2 = −1. Subtraction is defined using −1 = 1, so −(a + bα) = (a + bα). Division is defined implicitly, by proving that when a + bα is nonzero, the expression (a + bα)(a + bα)−1 = 1 always has a solution for (a + bα)−1 . Thus, (α + 1)α = 1 in GF(4), which means that (α + 1)−1 = α and α−1 = α + 1. Then α/(α + 1) = α · α = α + 1. Larger fields are constructed as extensions of GF(2) in a similar way. The reason why the real field cannot be extended beyond the complex field in this way is that, as is well known, there are no irreducible polynomials of degree three or more over the real field R. However, the same statement is not true for the arithmetic system GF(2). There are irreducible polynomials of every degree over GF(2) and thus, with appropriate definitions of arithmetic operations, a Galois field GF(2 m ) that conforms to the linear algebraic operations given in Section 13.2.3 can be defined. For example, to construct GF(256), an irreducible polynomial over GF(2) of degree eight is needed. The polynomial x 8 + x 4 + x 3 + x 2 + 1 will do. It cannot be factored over GF(2). Accordingly, redefine α so that α8 + α 4 + α3 + α 2 + 1 = 0. This means that α 8 = α4 + α 3 + α 2 + 1 in GF(256), where α8 = −α 8 because −1 = 1, which is a consequence of constructing GF(256) from GF(2). This reduction and the arithmetic of GF(2) define GF(256). The 256 field elements are the 256 polynomials in α of the form a α7 + bα 6 + cα 5 + d α4 + eα3 + f α2 + gα + h, where all the coefficients are elements of GF(2). Addition in GF(256) is polynomial addition over GF(2), and α is analogous to i for the complex field and is defined differently for GF(256) compared with GF(4). Multiplication in GF(256) is defined as multiplication of polynomials in α with componentwise multiplication in GF(2), but with the reduction α8 = α 4 + α3 + α2 + 1 instead of i2 = −1. To reduce α± when ± is larger than eight, note that α± = α8 α±−8 = (α4 +α 3 +α 2 + 1)α±−8 , which now has degree ±− 4. This reduction is then repeated if ± − 4 is larger than eight. The n-dimensional vector space GF(q )n consists of all blocks v = (v1 , v2, . . . , vn ) of length n of elements of GF(q). Two vectors v = (v 1, v2 , . . . , vn ) and u = (u1 , u 2, . . . , un ) are added componentwise, with addition over GF(q ). Multiplication of a vector by a scalar multiplies every component of that vector by that scalar. A linear code over GF(q) of dimension k is a k-dimensional subspace of the vector space GF (q )n .

Reed–Solomon Codes

By far the most common algebraic code that is used for forward error correction in lightwave communication systems is a Reed–Solomon code.5 It is a linear nonbinary (n, k, n − k + 1) code. Because dmin = n − k + 1, and no code can have a larger dmin , 5 A Reed–Solomon code used in the International Telecommunication Union (ITU) G.975 standard for optical transport networks is the 255 239 code. This code has 256239 codewords and can correct up to

( , ) (n − k )/2 = (255 − 239)/ 2 = 8 byte errors in a block of 255 bytes. There are approximately 256255 or

approximately 10612 possible sensewords! Those sensewords with more than eight byte errors can almost always be flagged as uncorrectable, but occasionally are decoded in error.

620

13 Channel Codes

a Reed–Solomon code is called a maximum-distance code. Every check matrix H for a Reed–Solomon code satisfies heft H = n − k and rank H = n − k. The symbols of a Reed–Solomon code are in an alphabet of size q. Usually q = 2m , such as m = 8. This alphabet is the Galois field GF(2m ) and the code is a linear subspace of the vector space GF(q )n . Each codeword consists of n code symbols representing k data symbols. The blocklength n is at most 2m + 1, and is usually 2m − 1 or 2 m . For a primitive Reed–Solomon code, each codeword has a blocklength n equal to 2m − 1. For m = 8, the symbols of the Reed–Solomon code are the elements of GF(256) and the primitive blocklength is 255. Let ω be an element of GF(2m ) of order n, meaning that n is the smallest integer that satisfies ω n = 1. All smaller powers of ω must be distinct because if ωi = ω j , then ωi − j = 1, leading to a contradiction. Moreover, n cannot be larger than 2m − 1 because there are only 2 m − 1 nonzero elements in GF(2m ). An element ω can have order n only if n divides 2m − 1. Any element ω of order 2m − 1 is called a primitive element of GF(2m ). Several primitive elements always exist within GF(2m ). Several equivalent ways of describing a Reed–Solomon code are popular. The Reed– Solomon code will be described first in the language of polynomials, then in the language of the discrete Fourier transform. A polynomial C (x ) over GF(2m ) of degree at most k − 1 is described by k coefficients, some possibly zero. The set of such polynomials corresponds to a vector space of dimension k. The Reed–Solomon codeword c corresponding to polynomial C (x ) is the vector with components c i = C (ωi ), and is given by c

= (C (ω0), C (ω1), C (ω2 ), . . . , C (ωn−1 )).

(13.2.12)

The set {c} of all such codewords corresponding to the set {C (x ) : deg C (x ) ≤ k − 1} of all such polynomials of degree at most k − 1 defines an (n, k ) Reed–Solomon code over GF (2m ). The set is closed under addition, so this is a linear code. Because a nonzero polynomial C ( x ) of degree k − 1 can have at most k − 1 zeros, any nonzero codeword c must have n − k + 1 nonzero components. Therefore the minimum Hamming weight of the code is at least n − k + 1. Because it is a linear code, and recalling that the minimum distance cannot be larger than n − k + 1, one concludes that the minimum Hamming distance is dmin = n − k + 1. A Reed–Solomon code can also be described in the language of the Fourier transform, which allows properties of the Fourier transform to be used in the computations of the ∑ 1 C x j . Then encoder and decoder. Write a polynomial of degree n − 1 as C (x ) = nj − =0 j



n−1 ij ci = j =0 C j ω , which has the form of an inverse discrete Fourier transform, but with the operations executed in the arithmetic of GF (q ) and with ω replacing ei2π . All properties of the discrete Fourier transform hold in GF (q ). However, a Fourier transform over GF(q ) of blocklength n exists only if GF(q ) has an element ω of order n. Only such ω whose order divides q − 1 exist. For GF(28), this means n must equal 3, 5, 15, 17, 51, 85, or 255 for a Fourier transform to exist, so Reed–Solomon codes over GF(256) exist only for these blocklengths. Shortened Reed–Solomon codes for other n are found by constraining some components to always be zero.

13.2 Algebraic Block Codes

13.2.6

621

Spherical Decoding

A transmitted codeword c over GF(2m ) may be corrupted by one or more channel errors, thereby producing a hard-detected senseword r equal to c + e, where e is the error pattern r − c in the arithmetic of GF (2m ). The number of symbol errors in the symbol error pattern is equal to the Hamming distance between c and r. Even though the Hamming distance is not the same as the euclidean distance, it is intuitive to think geometrically. The set of all error patterns within at most Hamming distance t from codeword c is described as a Hamming sphere of radius t around the codeword c. The standard choice for the radius t is t

= ³(dmin − 1)/2´.

(13.2.13)

Spheres of larger radius would intersect (cf. Figure 13.3). A code is a set of points in the n-dimensional vector space GF(2m )n , usually a linear subspace of that vector space. Any two codewords are separated by at least the minimum Hamming distance d min of that code. Figure 13.3 depicts, instead, points in the euclidean plane R2 using the euclidean distance, which conveys some of the geometrical ideas, but in a different field. An error pattern can affect a set of codewords in several ways, as is shown schematically in Figure 13.3. For a linear code, the performance does not depend on which codeword is transmitted, so it is convenient to analyze errors for the case in which the all-zero codeword is transmitted. The Hamming distance between the transmitted allzero codeword and the corrupted hard-detected senseword is equal to the Hamming weight w( e) of the error pattern. A similar statement holds for any codeword. A necessary condition so that every error pattern with Hamming weight at most t can be corrected is that the Hamming spheres of radius t centered about the codewords in r Codeword

dmin

t (a)

dmin

c1

Possible sensewords

(b)

Hamming sphere

r

c2

c2

c1 (c)

Figure 13.3 (a) A set of codewords (colored in black) showing symbolically a disjoint set of Hamming spheres of radius t around codewords along with some possible sensewords. The sensewords are colored dark gray for the central Hamming sphere and are colored white for the other spheres. (b) A senseword r of a codeword c 2 that lies within the Hamming sphere of codeword c1 . This senseword produces a decoding error. (c) Two codewords at the minimum Hamming distance along with a senseword r that is not in any Hamming sphere. This senseword results in a decoding failure.

622

13 Channel Codes

the discrete n-dimensional space GF(qm )n do not intersect. The decoder can then, in principle, correct all error patterns lying within the Hamming sphere of the transmitted codeword. This decoder is called a spherical decoder or a bounded-distance decoder. For the central codeword shown in Figure 13.3(a), a spherical decoder can correct a senseword with any combination of errors that lies within the Hamming sphere centered on the central codeword. Several such sensewords are shown in dark gray in Figure 13.3. An error pattern that changes the codeword into a senseword lying in a different Hamming sphere, as shown in Figure 13.3(b), causes a decoding error, and the decoder output is an incorrect codeword. A senseword that lies in the interstitial region between the Hamming spheres, as shown in Figure 13.3(c), has an error pattern e that cannot be corrected by a spherical decoder and the senseword is flagged as uncorrectable. This is called a decoding failure. A spherical decoder is an instance of an incomplete decoder because it does not decode every senseword into a codeword. Only sensewords within a Hamming sphere are decoded. Even for codes of moderate size, complete decoders are computationally intractable, and hence they are not used. The probability of correctly decoding the senseword with a spherical decoder is the probability that the senseword lies within the Hamming decoding sphere of the transmitted codeword. The probability of incorrectly decoding the senseword and producing a decoding error is the probability that the senseword lies within an incorrect Hamming sphere centered about a different codeword. The probability of a decoding failure is the probability that the senseword lies within the interstitial region between the Hamming spheres, shown as white in Figure 13.3. In general, exact expressions for the probabilities of decoding error and decoding failure are known only for Reed–Solomon codes. Sometimes, both decoding errors and decoding failures are loosely referred to as decoding errors. A senseword r that is halfway between two codewords c1 and c2 separated by Hamming distance dmin will produce a decoding failure because it does not lie within the Hamming sphere of either codeword. This can happen only for even dmin. Correcting an error pattern with a Hamming weight t is possible with a spherical decoder only if t < dmin /2 or 2 t + 1 ≤ dmin . This means that, for a spherical decoder, the errorcorrection capability of a block code is the radius t of the decoding spheres given by t = ³(dmin − 1)/2´ (cf. (13.2.13)). For a Reed–Solomon code, dmin = n − k + 1, so the largest number of errors t that can be corrected is half of n − k if that number is even. This is half the number of check symbols. A linear algebraic code admits a simple function of the senseword, called the syndrome S, which depends only on the error pattern, and not on the codeword. The syndrome S is the projection of the senseword r onto the dual space of the code. It is a sufficient statistic. The syndrome equals zero if there are no errors. Using r = c + e, where c is a codeword and e is an error pattern, and because cHT = 0, it follows that rHT = eHT. The left side of this equation is easily computed from the senseword r. The right side of this equation, which is a vector of length n − k, depends only on the error pattern. It is the syndrome of the error pattern given by S

= eH , T

(13.2.14)

where S has length n − k, and the error pattern e has length n and Hamming weight at most t. All error patterns e inside a decoding sphere must have a unique syndrome, so

13.2 Algebraic Block Codes

623

an error pattern e of weight at most t can be recovered from the syndrome S by inverting (13.2.14) to give the only solution that has Hamming weight at most t. This inversion is nontrivial because H is not square. Once e has been recovered, the codeword c is simply r − e. As an example, suppose that a (7,4,3) Hamming codeword is transmitted and that there is a single bit error. Applying (13.2.14), each error pattern e that has a single error produces a unique syndrome S that is equal to the corresponding row of HT . Thus, for the single error pattern e = (0100000), expression (13.2.14) produces the syndrome S = (101), which is the second row of HT which is the transpose of (13.2.9). This means that there is an error in the second bit of the senseword because every column of H is distinct. This error can be corrected by inverting the second bit in the senseword. Now suppose that there are two errors. The form of (13.2.14) means that the syndrome S is now generated from the sum of two rows of HT. For example, the double error pattern e = (1001000) produces the syndrome S = (111), which is the fourth row of HT which is the transpose of (13.2.9). This is the same syndrome that would be generated by a single bit error in the fourth component of the senseword, and would be miscorrected as such. Therefore, for this small code, more than one error will always be miscorrected. Although S is a linear function of e given by (13.2.14), the inverse calculation of e from S is not linear because HT is a nonsquare n × (n − k ) matrix. The inverse has many solutions, but only the solution with the smallest Hamming weight is required. For large codes, this computation can be intractable. One reason for the popularity of Reed–Solomon codes is that their unique algebraic structure is amenable to tractable algorithms that can solve (13.2.14) to determine the lowest-Hamming-weight error pattern e from S even though HT is large and nonsquare. This lowest-weight error pattern has at most ³(dmin − 1)/2´ errors. 13.2.7

Performance Analysis

For an uncoded block of binary data, bit errors produce a block error. For independent bit errors generated during hard-decision detection occurring with a probability ρ in an uncoded block of length n, the probability of a block error is

= 1 − pc = 1 − (1 − ρ)n (13.2.15a) ≈ nρ. (13.2.15b) The approximation in (13.2.15b) holds whenever ρ is much smaller than one. pe

The probability pe of a decoding error depends on the modulation format and on the method of detection. For binary phase-shift keying in additive white gaussian ) and (√ noise hard-decision detection, the bit error probability satisfies ρ = 21 erfc E c / N0 , where E c is the energy in a codebit. Substituting this expression into (13.2.15b) approximates the probability of an error in an uncoded block of binary data with a databit energy E b as µ¶ · n pe ≈ erfc E b / N0 , (13.2.16) 2

624

13 Channel Codes

with the right side simply n times the probability of an error in a single bit when that probability of bit error is small. For a coded block, every senseword that contains up to t errors can be corrected, in principle, by a spherical decoder. For a symbol error probability ρ , the probability that a block of length n has t or fewer errors and hence is correctly decoded is

¸t ¹ nº ± ρ (1 − ρ)n−± , ± ±=0 because the number of error patterns with ± errors within a block of length n is given () by the binomial coefficient n± = n !/±!(n − ±)!. This statement holds both for binary pc

=

codewords and for codewords in a larger alphabet. A block error or block failure occurs whenever the senseword contains more than t errors. For a spherical decoder with hard-decision data, the probability of not decoding correctly is n ¹ º ¸ n ± pe = 1 − pc = ρ (1 − ρ)n −± , ± ±=t+1

(13.2.17)

where ρ is the probability of a symbol error. This expression includes both decoding failures, for which the senseword lies between the Hamming spheres around codewords, and decoding errors, for which the senseword lies in an incorrect Hamming sphere. For a large Reed–Solomon code, pe , though quite small, will be strongly dominated by decoding failures. Decoding errors will be quite rare in comparison. The decoder may denote a decoding failure by an alert flag with the output of the decoder then simply the unaltered senseword. The application will then determine how to proceed. Insight into (13.2.17) is aided by a series of coarse approximations. A spherical decoder corrects up to t errors. The uncorrectable error patterns that are most likely are the lowest-Hamming-weight error patterns that contain t + 1 errors. A simple approximation to the probability of a block error for small pe uses only the most significant term in the summation given in (13.2.17), which is the first term. Then

¹ º ≈ t +n 1 ρt+1 (1 − ρ)n −t−1 ≈ nt+1 ρt+1 (1 − ρ)n−t−1, (13.2.18) .( ) where nt+1 = t+n1 is the number of error patterns with t + 1 errors. For the case of binary phase-shift keying (BPSK), substitute the probability of error ρ = 1 erfc(√ E / N ) into (13.2.18) and substitute E = R E to give the further pe

2

approximation

c

0

c

pe

≈ nt2+1

µ

erfc

µ¶

Rc E b/ N 0

··t+1

c

b

,

where E b is the energy per databit and the term in (1 − ρ) has been neglected. For √ systems with a low error rate, Rc Eb / N 0 is large. Using the approximation erfc(x ) ≈ 2 e−x for large x (cf. (2.2.20)) gives

13.2 Algebraic Block Codes

pe

≈ nt2+1 e−(E /N ) R d b

0

c min

/2 ,

625

(13.2.19)

where (t + 1) ≈ dmin /2. Temporarily ignoring the scaling term of nt+1/ 2, inspection of the argument of the exponential function suggests that doubling the term R c dmin by the use of a code has much the same effect on pe as doubling the energy per bit. The term Rc dmin /2 that multiplies E b / N0 in the exponent is the asymptotic coding gain for a binary linear, hard-decision block code using BPSK modulation. For an information channel that uses soft-decision detection, an upper bound on the probability of block error can be determined by replacing the minimum Hamming distance with the minimum euclidean distance dmin (cf. (10.2.13)). To this end, the binary alphabet {0, 1} is replaced by the bipolar alphabet {− 1, 1}. For BPSK, each codebit in √ the codeword is mapped to one of two antipodal values µ E c , which is the bipolar √ alphabet {−1, 1} scaled by E c . For each component at which the two codewords differ, the corresponding components of the BPSK signal are separated by the squared euclidean distance 4E c = 4Rc Eb (cf. (10.1.3)). Multiplying the minimum Hamming distance dmin by the single-letter squared euclidean distance gives the minimum squared 2 euclidean distance dmin for the codeword as 2 dmin

= 4dmin Ec = 4dmin Rc Eb .

(13.2.20)

Using (13.2.20) ¹ and erfcº(x ) ≈ e− x , the probability of a detection error, which is given by

1 2

erfc

»

2

2 dmin /4N0

(cf. (9.4.14b)) can be further approximated by pe

≈ e−(E /N )R d , b

0

c min

(13.2.21)

where the term outside the argument of the exponential function has been ignored. The term R c dmin inside the argument of the exponential function is the asymptotic coding gain for a binary linear block code that uses soft-decision decoding and BPSK modulation. The missing factor of two in the exponent of approximation (13.2.21) as compared with approximation (13.2.19) represents the rule-of-thumb that, for additive gaussian noise, soft-decision decoding requires about 3 dB less E b / N0 than does hard-decision decoding to achieve the same probability of error when that error is small. For other binary modulation formats, the general form of (13.2.20) is the same, but the specific expression for the euclidean distance may differ. For nonbinary formats, the relationship between the minimum euclidean distance dmin and the minimum Hamming distance dmin may differ further.

13.2.8

Descriptions of Linear Codes as Graphs

Linear codes are defined algebraically by matrix equations, but they can be depicted graphically in several ways. These graphs can be powerful aids for understanding and designing encoders and decoders for large codes. The two graphical models that will be described here are the trellis graph and the Tanner graph. These graphs will be illustrated for the code known as the (8,4,4) extended Hamming code. This code, which

626

13 Channel Codes

0

1

0 1

0 1

0 1

0

1

1 0

1

1 0

1 0

0

0 1 1

1

0

0

1

1

0

0

0 0

1 1

1 1

0

0

0

1

0

1

1

0

0

1

Figure 13.4 The (8,4,4) extended Hamming code on a minimal trellis. The highlighted path is 01100110 and is one of 16 codewords.

is the Hamming (7,4,3) code with an additional check bit, has a minimum Hamming distance equal to four. A trellis for the (8,4,4) extended Hamming code is shown in Figure 13.4. This trellis for a block code is a variation of the trellis that was defined in Section 11.3.1 for a recurring sequence such as a convolutional code. The trellis in Figure 13.4 corresponds to a representation of the (8,4,4) Hamming code under a permutation of components that is chosen because it leads to a clean trellis structure. Each branch of the trellis is labeled with either a zero or a one. There are 16 paths through the trellis from the left node to the right node. The sequence of labels on each such path specifies a codeword. Every four-bit dataword is represented by a unique path by any convenient method of assigning the 16 four-bit datawords to the 16 paths. The extended (8,4,4) Hamming code has minimum distance four. There are 256 eightbit words in total, of which the 16 words described by the trellis are codewords, 128 words are at distance one from a unique codeword, and the remaining 112 words are each at Hamming distance two from two of the codewords. A hard-decision decoder sees the trellis labeled as shown in Figure 13.4 and, when given a senseword, in effect, finds the path that agrees with the senseword in all but at most one place. A soft-decision decoder sees the trellis labels as µ A instead of 0, 1, and finds the path that is closest to the senseword in total squared euclidean distance. Such a search for the best codeword could be carried out by the methods of sequential decoding, such as the Viterbi algorithm which is introduced in Chapter 11 and revisited for a convolutional code in Section 13.3 of this chapter. Convolutional codes are described in Section 13.3.1 on a trellis that is a variation of the trellis shown in Figure 13.4. A Tanner graph is a bipartite graph that is useful for describing some iterative algorithms. As an example, consider the extended (8,4,4) Hamming code using a check matrix under a special permutation of columns given by

⎡1 ⎢1 H=⎢ ⎣

1 0

1 0 0 0

1 1 0 1

0 1 0 0

0 0 1 0

1 0 1 1

0 0 0 1

0 1 1 1

⎤ ⎥⎥ , ⎦

(13.2.22)

13.2 Algebraic Block Codes

f1

r0

r1

f2

r2

f3

r3

r4

627

f4

r5

r6

r7

Figure 13.5 Tanner graph for an (8,4,4) code.

where the permutation is chosen to give the check matrix a kind of symmetry. Because the check matrix has a symmetric structure with four ones in every row, it is described by a Tanner graph with a tidy structure as shown in Figure 13.5. The Tanner graph displays connections rather than paths. It consists of two rows of nodes connected by lines called graph edges. The row of nodes depicted by circles on the bottom corresponds to the eight bits ci of a codeword. These are called codebit nodes or bit nodes. The bit nodes are labeled ri and are initially identified with the senseword components. The nodes in the row on the top are called check nodes, with those nodes depicted as squares. Each check node represents one row of the check matrix H. These check nodes are labeled f j . Each check node is connected to the four bit nodes that have ones in the corresponding row of H. A check node f j is connected to bit node r i if H ji = 1. Were each bit node to contain a zero or a one, the eight bit nodes would contain a codeword if and only if every check node were connected to an even number of bit nodes containing ones. The Tanner graph and the check matrix are equivalent in that one can be developed from the other. However, the Tanner graph makes visible the loops that are present in the dependences but are less clearly evident in the check matrix. Therefore, the Tanner graph is useful for understanding the structure of the dependences in a code. The girth of a Tanner graph is the length of the shortest closed path in the graph. The Tanner graph shown in Figure 13.5 has a girth of four. The girth of a Tanner graph cannot be smaller than four. Section 13.5 studies the use of the Tanner graph to describe iterative decoding algorithms. To this end, observe that evidence about bit node r 4 of the senseword is not directly informative about decoding the correct value of bit c3 . That evidence reaches c3 only indirectly through the bit nodes r0 and r7 , and even more indirectly through other paths. This metaphor of evidence propagating in a graph is useful for describing iterative decoding, and for other kinds of analysis. 13.2.9

Limits of Spherical Decoding

For a hard-decision senseword with not more than t errors with a code whose minimum Hamming distance is 2 t + 1, a spherical decoder can always find the correct codeword. Efficient computational algorithms to do this are known for some codes, primarily the

628

13 Channel Codes

Reed–Solomon codes, but not for all such codes. When there are t + 1 or more errors, a spherical decoder fails to decode even when the closest codeword is unique. A spherical decoder does not decode any senseword outside of a decoding sphere. But, for codes of large blocklength, the interstitial region between decoding spheres is large, and most sensewords that are outside the decoding sphere do have a unique closest codeword. A minimum-distance decoder would correct these sensewords, were such a decoder computationally tractable. These minimum-distance decoding regions in GF(q )n would not be spheres, and indeed can be much larger than spheres. A spherical decoder approximates this minimum-distance decoding region by the largest included sphere. This limitation of a spherical decoder may be acceptable for low code rates, but becomes a major consideration for code rates above Rcrit . Above Rcrit , a great number of harddecision sensewords lying in these deep holes are far from any decoding sphere. These sensewords could be uniquely decoded by a minimum-distance decoder, but cannot be decoded by a spherical decoder. Moreover, near or above R crit , choosing a code to maximize the minimum distance dmin need not be the best way to minimize the probability of a decoding error. A more suitable objective may be to use a weight profile. The weight profile is a vector of length n for which the ±th component is the number of codewords of weight ±. Choosing a good weight profile may allow one to accept a smaller minimum distance so that the total number of low-weight codewords is reduced. However, little is known about designing codes using this alternative optimality criterion. To summarize, for a large block code, a minimum-distance decoder is computationally intractable, and a code with the maximum value of the minimum distance need not be the best code. This motivates the discussion of sequential decoders and convolutional codes. A convolutional code has a structure that can be decoded one symbol at a time, a method called sequential decoding.

13.3

Convolutional Codes The fixed blocklength n of a block code must be large enough to ensure that there is a small probability of a decoding error. Worst-case error patterns occur only with some very small probability of error. Yet, to correct these worst-case error patterns, a large blocklength is necessary. The large blocklength is unnecessary, however, for typical error patterns. Although a large blocklength is needed only for the worst-case error patterns, the encoder cannot know when these worst-case error patterns will occur. This suggests that a code of unbounded length should be used with a decoder that uses a variable length of the senseword to decode, using only as much of the senseword as is needed at the time. Such a decoder estimates one data symbol at a time, starting at the beginning of the senseword. The length of the senseword that is examined to decode the first data symbol depends on the actual error pattern and not on the worst-case error pattern. After decoding the first data symbol, the second data symbol is decoded in the same way, and so on. Codes of unbounded length can be called stream codes. Convolutional codes constitute a linear class of stream codes that are suitable for this form of

13.3 Convolutional Codes

629

maximum-likelihood decoding, called sequential decoding, but only for code rates smaller than the cutoff rate R0 of the channel. A small convolutional code may be suitable for a simple version of sequential decoding called Viterbi decoding. The Viterbi decoder is (nearly) a complete decoder based on the Viterbi algorithm, as described in Section 11.3.2. Both the special case of Viterbi decoding and the general case of sequential decoding are described in Section 13.3.3. But, first, the structure of a convolutional code is described. 13.3.1

Convolutional Encoders

A binary convolutional code is a binary stream code. A binary convolutional code generates a sequence of codebits by modulo-two operations on a stream of databits. A binary convolutional code is defined by two or more convolutions in the finite field GF(2). A nonbinary convolutional code is defined by two or more convolutions in the finite field GF(q). In contrast to a block code, a convolutional code has no predetermined length and the codeword length could be regarded as infinite. It can be terminated at any length. A common instance of a binary convolutional code encodes a binary data sequence d = { d0 , d1 , d 2 , . . .} into two binary code sequences c1 = {c 10 , c 11 , c 12 , . . .} and c2 = {c20 , c21 , c22 , . . .} that are then interleaved bitwise to form one output code sequence. Because a codeword of this code has one data sequence and two code sequences, it is called a (2, 1) binary convolutional code. The code sequences are generated by two GF(2) convolutions given by c 1±

=

c 2±

=

ν ¸ k =0

ν ¸ k =0

dk g1(±−k )

,

± = 0, 1, 2, . . . ,

(13.3.1a)

dk g2(±−k )

,

± = 0, 1, 2, . . . ,

(13.3.1b)

where the coefficients g1k and g2k are either zero or one, provided that at least one lead coefficient g1ν or g2ν is not equal to zero. The addition and multiplication operations are the operations of GF(2) (cf. (13.2.11)). The integer ν is called the constraint length of the convolutional code. The constraint length describes the number of past databits that affect the current codebit. This can be restated in terms of the number of subsequent codebits that the current databit influences. The convolutions in (13.3.1) are expressed concisely in the language of the polynomial product. A binary sequence of arbitrary length can be represented symbolically by the polynomial

( )=

d x

∞ ¸ ±=0

d± x ±

,

(13.3.2)

where each databit d± is either zero or one, and where the upper limit is set to infinity to indicate that the data polynomial has an unspecified length. The symbol x has no meaning other than that of a placeholder. The polynomials g1 (x ), g2(x ), c1 (x ), and

630

13 Channel Codes

()

c 2 x are defined similarly. The polynomial notation is a convenient way of represent-

ing a sequence, especially because a discrete convolution has the form of a polynomial product. Using this notation, (13.3.1) can be written as

( ) = d(x )g1(x ), c 2 (x ) = d(x )g2(x ),

c1 x

(13.3.3a) (13.3.3b)

where d(x ) is a polynomial representing the data sequence, where c1 (x ) and c2 (x ) are polynomials representing the two code sequences, and where g1( x ) and g2 (x ) are generator polynomials that describe the convolutional structure of the code with the larger of their polynomial degrees as the constraint length ν of the convolutional code. The final codeword is the interleave of the bits represented by c1 (x ) and c2 (x ), and can be expressed as the polynomial equation

( ) = c1 (x 2) + x c2(x 2 ),

c x

(13.3.4)

where the interleaving is described by using x 2 to spread the polynomial coefficients, then multiplying the second polynomial by x to offset the second polynomial from the first. A binary (n , k ) convolutional code, in general, is constructed by partitioning the input stream of databits into k-bit blocks called dataframes. The encoder stores the m most recent dataframes consisting of km bits. As each new dataframe enters the encoder, the encoder uses the k(m + 1) bits of the new dataframe and the stored m dataframes to compute a single codeframe of length n. This means that the encoder generates each codeframe of length n for the corresponding dataframe of length k using the current dataframe and the past m dataframes. Because one codeframe of length n is generated for each dataframe of length k, a convolutional code has a rate of R c = k / n. The description of an (n , k ) convolutional code requires kn generator polynomials, the longest of which has degree m. The constraint length, defined below, is not larger than kn. The code can be compactly represented by defining a k × n generator matrix G(x ) with each matrix element gi j (x ) being a generator polynomial. Then, the row vector d(x ) of length k, with i th component di (x ), is the polynomial representation of the k sequences of databits at the input to the encoder. Similarly, the polynomial row vector c(x ) of length n, with jth component c j (x ), is the polynomial vector representation of the n sequences of codebits at the output of the encoder. Using these expressions, the encoding operation can be written as

( ) = d(x )G(x ). (13.3.5) Using (13.3.2), the polynomial representation c j (x ) of each output code sequence can c x

be written as

( )=

cj x

k ¸ i =1

( ) ( ).

di x gi j x

(13.3.6)

The product of the polynomials di (x )gi j (x ) is another polynomial pi j (x ) ∑∞ p x ±, with the coefficients p given by i j± ±=0 i j ±

=

13.3 Convolutional Codes

pi j ±

=

n ¸ k =0

di (±−k ) gi jk

631

,

where gi jk is the kth coefficient of the generator polynomial gi j (x ), and di (±−k ) is the (±− k )th coefficient of the data polynomial di (x ). Substituting pi j ( x ) into (13.3.6) gives the polynomial representation c j (x ) for the j th codebit sequence,

( )=

cj x

k ¸ ∞¸ n ¸ i =1

±=0 k =0

di (±−k) gi jk x ±

.

(13.3.7)

For k = 1 and n = 2, (13.3.7) reduces to (13.3.1). Figure 13.6 shows an example of an encoder for a (3, 2) convolutional code that has the generator matrix

G=

±

x 1

x2 0

1

x + x2

²

.

(13.3.8)

Because three codebits are generated for every two databits, the clock rate for the serialized output codebits is 50% faster than the clock rate for the serialized input databits. The convolutional nature of the encoding process means that each codebit depends on the recent databits. This is specified by a constraint length ν . Define νi as the maximum degree of the n polynomials gi j ( x ) in the i th row of generator matrix G(x ). The constraint length is

ν=

k ¸ i =1

νi ,

(13.3.9)

which is at most km. A minimal encoder for a binary convolutional code of constraint length ν has ν bits of memory. A minimal trellis for this code has 2ν channel states. The resulting code is also called an (n, k, ν) convolutional code. As an example, for the convolutional encoder shown in Figure 13.6, there are four bits of memory and thus 16 channel states. +

Codebit buffer

c1

d1

c2 +

d2

+

c3 Serialized codebits

Figure 13.6 Block diagram of an encoder for a rate 2/3 convolutional code.

632

13 Channel Codes

13.3.2

Decoding on a Trellis

The codewords of a binary convolutional code of constraint length ν over GF (q ) are described on an infinitely long trellis with q ν states. A trellis for a convolutional code describes the encoding of a sequence of elements of GF(q ) into another sequence of elements of GF(q) at a code rate of k / n. This is the ratio of the number of bits needed to identify the branches leaving a node to the number of bits labeling each branch. There are 2k branches leaving a node, each of which is labeled with n bits. In comparison, a trellis describing intersymbol interference, discussed in Section 11.3.3, may be viewed as “encoding” a sequence of real or complex numbers into another sequence of real or complex numbers at rate one. Section 11.3.3 showed that the most likely transmitted sequence for a linear dispersive channel can be determined by searching a trellis using the Viterbi algorithm with a branch metric for intersymbol interference based on the euclidean distance defined in R or C. Similarly, a hard-detected or soft-detected convolutional code of modest constraint length can be decoded by searching a trellis using the Viterbi algorithm with a branch metric based on either the Hamming distance defined in GF(q ) or the euclidean distance defined in R or C. The computational complexity is exponential in the constraint length. An example of a trellis for a (2, 1, 2) convolutional code is shown in Figure 13.7. For this code, a single databit is encoded into a codeframe of length two, with four possible codeframe values, which are shown labeling the individual nodes in a column of nodes. The action of the encoder can be described by a path from left to right, reading the branch labels along the path. A databit zero at time ± specifies the upper branch out of the current node at time ±. A databit one at time ± specifies the lower branch out of the current node at time ±. A datastream specifies a unique path through the trellis. The codeword is the sequence of the two-bit labels on the sequence of branches specified by the datastream. The action of the decoder when given the senseword is to search the trellis to find the most likely path using the Hamming distance, euclidean distance, or likelihood, as appropriate. The first step of this process is to determine the most likely initial codeframe. This will require a search through the trellis. The most likely initial codeframe is almost always eventually found, then the second codeframe, and so on. In this regard, a

00

00

00

11

00

00

11

00 11

10

10

11

01

01

00 10

00

11

01

01

10

00 10

00

11

01

01

10

00 10

00

10

11

01

01

01

11

00 11

10

01

11

00 11

10

01

11

00 11

10 01

00

10

00 10

10

...

01

01

11

10

11

Figure 13.7 The trellis of a rate-one-half convolutional code with a constraint length of two. Each

state value equals the previous two input databits. The solid or dashed arrows indicate the next state for an input databit zero or one, respectively. The two output codebits label the branches.

13.3 Convolutional Codes

633

hard-decision Viterbi decoder is a complete decoder if unbounded delay is allowed. It is nearly a complete decoder if decoding is limited to a fixed decoding delay that is a small multiple of the constraint length. Whenever the accumulated distance metric from two or more codewords remains the same, the decoder declares a decoding failure. The Viterbi decoder may be regarded as a simple instance of a sequential decoder. Because the decoding complexity is proportional to 2ν , the Viterbi algorithm is not suitable for codes of a large constraint length. On the other hand, convolutional codes with small constraint length are not powerful.

13.3.3

Sequential Decoding

Sequential decoding is motivated by the topic of sequential hypothesis testing. A binary sequential hypothesis tester at each decision step has three options: decide hypothesis H0 , decide hypothesis H1 , or collect more data. The additional data depends on the correct hypothesis, and so informs the next decision step. In this way, the amount of data collected to make a decision depends on the actual observed data rather than on the worst possible case of data. Therefore, the time needed to make a decision is variable. To emulate sequential hypothesis testing, a sequential decoder recovers only one dataframe at a time from the senseword. For a binary convolutional code with k = 1, the decoder has three options for the first databit: it is a zero, it is a one, or collect more data by looking at more senseword bits. This may be repeated multiple times. Once a decision has been made, the first dataframe and its effect are stripped from the senseword, and the process is repeated to decode the next dataframe. This means that a sequential decoder determines one dataframe at a time, using only as much of the senseword as needed to determine the value of that dataframe. In effect, instead of processing a fixed senseword length based on the worst-case error pattern, the number of senseword frames ± needed to decode one dataframe is variable and depends on the actual error pattern. Unlike sequential hypothesis testing, however, the n± senseword bits appearing in the ± frames are conditioned on the k± subsequent databits, which have not yet been detected, and so are unknown at this time. These unknown databits are treated as conditioning variables, taking on 2 k± values. The likelihood for each possible pattern of the k ± unknown databits must be considered. If, for every subsequent databit pattern covering ± frames, of which there are 2k ±, a zero or one in the first databit is the more likely value of the first databit, then that first databit is determined. Otherwise increase ± and repeat the process. Sequential decoding can be understood as a search on a trellis, or, for a large constraint length, on a tree. The trellis must be searched to a sufficient depth at which the best path to every node at that depth starts with the same branch. The Viterbi algorithm does this. However, the complexity of the Viterbi algorithm is proportional to 2 ν , where ν is the constraint length. For even a moderate constraint length this is intractable. Then a partial search of the trellis must be used. The truncated search strategies are of two kinds. One such strategy walks a single path of the trellis with frequent backtracks to explore other branches before eventually

634

13 Channel Codes

deciding on the first frame, then the second frame, and so forth, as the walk moves deeper into the trellis. At a code rate approaching the cutoff rate R0, this back-andforth exploration of the branches becomes computationally intensive, leading to a high probability of timeout as predicted by the Pareto probability distribution. The second search strategy is to simultaneously explore only a fixed number of paths, say 2b paths, which is much fewer than 2 ν . At every iteration, the 2b paths are extended one frame, resulting in 2· 2b longer paths, of which the best 2 b paths are rank-ordered and saved. The others are discarded. At code rates approaching the cutoff rate, this memoryintensive approach can have an unacceptable probability of dropping the correct path before that path is seen to be likely. For a convolutional code that is appropriate to the channel statistics, and except for error patterns that have a vanishingly small probability of occurring, the sequential decoding process will eventually halt. However, the waiting time to complete the decoding process is an exponentially distributed random variable, and the computational work grows exponentially with the waiting time. Accordingly, the amount of computational work is described by a Pareto random variable, which is a heavy-tailed probability density function (cf. (2.2.6)), possibly with infinite second-order or first-order moments. The expected waiting time to decode a frame is not finite for code rates larger than the cutoff rate R0 . For code rates larger than the cutoff rate, decoder resources such as the buffer memory or the allotted decoding time must be exceeded occasionally, regardless of their size. Many practical ways to define and organize the computations of sequential decoding have been developed, but, because of the exponential growth of the computational work as the code rate approaches the cutoff rate R0 , none of these algorithms is immune from the statement regarding intractability for code rates above the cutoff rate. 13.3.4

Performance Analysis

A convolutional code can be used to correct hard-decision detection errors, as for a binary symmetric channel, or to reduce the required signal power needed for softdecision detection, as for a continuous information channel such as a bandlimited additive gaussian noise channel. The error-correcting capability of a convolutional code on a discrete memoryless information channel depends primarily on the minimum Hamming distance of the code. For a continuous memoryless gaussian information channel, the performance depends primarily on the minimum euclidean distance. Invoking the linearity of a convolutional code, the minimum Hamming weight is the primary indicator of performance. Let d± be the minimum Hamming distance between the all-zero codeword and any initial codeword segment ± frames in length that is nonzero in the first dataframe. These distances are shown in Figure 13.8 as a function of ±. The minimum distance dmin is equal to d± at the value of ± for which the minimum nonzero path returns to the allzero path. As the length of the codeword segment becomes large, the minimum distance between any two codeword segments reaches the constant value dmin , and thereafter no longer increases. Therefore, the initial dependence of the minimum distance on the segment length becomes inconsequential.

13.4 Cutoff Rate and Critical Rate

00

00 11

00

00

00

00

11

00

00

00

00

00

00

00

635

00

00

11

10

10

11

10 10 01

10

01

01

11

d1 = 2

d2 = 3

d3 = 5

Figure 13.8 The minimum distance d± as a function of the length

the trellis defined in Figure 13.7.

± of the codeword segment for

Tables of “good” convolutional codes are widely available. These tabulated codes are selected on the basis of the largest possible minimum distance for the given n, k, and constraint length ν . However, for large codes, maximizing the minimum distance need not produce a code with the best performance with a maximum-likelihood decoder. A good weight profile, analogous to the weight profile for an algebraic block code, may give better performance, but this advantage is hard to quantify analytically, and is rarely considered. When a binary convolutional code is used on a binary phase-shift-keyed channel with 2 additive white gaussian noise, the minimum squared euclidean distance dmin is 4dmin E c (cf. (13.2.20)), where»dmin is the minimum Hamming distance. Substituting this expres-

2 sion into ρ = 21 erfc( dmin / 4N 0 (cf. (9.4.14a)) and multiplying by the mean number of codewords n at the minimum distance gives a coarse approximation to the probability of a decoding error as

pe



⎛¼



n dmin Ec ⎠. erfc ⎝ 2 N0

(13.3.10)

Using erfc(x ) ≈ e− x (cf. (2.2.20)), E c = Rc E b , and, on ignoring the scaling terms outside the exponential function, (13.3.10) reduces to a form of (13.2.21). The main purpose of (13.3.10) is to see that, within the limits of the approximation, increasing dmin R c is equivalent to increasing E b . The approximation used to develop (13.3.10) neglects the other terms in the union bound because those terms are exponentially small. But there are exponentially many such exponentially small terms, which makes this approximation suspect. Empirically, simulation can validate this approximation for code rates that are not close to R0 . The approximation breaks down as the code rate approaches R 0. 2

13.4

Cutoff Rate and Critical Rate The critical rate R crit of an information channel might be regarded as the maximum code rate for which the minimum distance is a primary predictor of code performance. The critical rate will be regarded as a practical upper limit for large algebraic block codes with spherical decoding. In turn, the cutoff rate R0 of an information channel will

636

13 Channel Codes

be regarded as a practical upper limit on the code rate for which sequential decoding is tractable. This means that R 0 is the upper limit on the code rate for maximum-likelihood decoding. For code rates larger than the cutoff rate R0 , maximum-likelihood decoding is exponentially expensive in the code blocklength. This is because the required waiting time for a sequential decoder to decode one symbol is governed by a Pareto probability density function (cf. (2.2.6)), which has infinite moments for a code rate larger than R0 , leading to an unbounded distribution of the waiting time for decoding. The critical rate R crit and the cutoff rate R 0 are expressed using a conditional probability distribution called the channel transition matrix in the context of coding, and denoted here as Q with elements Q (bk |a j ), abbreviated Q k | j . These elements are denoted p(k| j ) in earlier chapters. For a discrete information channel defined by Q, the cutoff rate R0 is 6

⎡ ⎞2⎤ ⎛ J −1 K −1 ¸ ¸ ⎢ ⎝ p j Q 1k /| 2j ⎠ ⎥⎦ , R0 = max ⎣−log p k =0

and the critical rate R crit is R crit where Q ∗k | j

= Q1k /| j2/



= max p k

−1 J −1 K ¸ ¸ j =0 k =0

(13.4.1)

j =0

p j Q ∗k| j log

½



Q ∗k| j i

pi Q∗k | i

¾

,

(13.4.2)

1/2

Qk| j .

Rates for a Symmetric Channel

For a binary symmetric channel (cf. Figure 9.9), the channel transition probabilities are Q 0|1 = Q 1|0 = ρ , which is the crossover probability, and Q 1|1 = Q 0|0 = 1 − ρ . Both the cutoff rate R0 and the critical rate R crit are maximized for an equiprobable prior with p0 = p1 = 1/2. The cutoff rate for a binary symmetric channel is given by

¹ µ√ ¶ ·2 º ρ + 1−ρ R0 = − log 2 21 · µ ¶ = 1 − log2 1 + 2 ρ( 1 − ρ)

(13.4.3)

in bits per bit. The critical rate for a binary symmetric channel is R crit

= 1 − Hb

¹

º √ρ √ρ + √1 − ρ

(13.4.4)

in bits per bit, where Hb (ρ) = −ρ log2 ρ

− (1 − ρ)log(1 − ρ)

is called the binary entropy function (cf. (14.1.8)). Figure 13.9 plots the cutoff rate, the critical rate, and the capacity for a binary symmetric channel as a function of the crossover probability ρ . 6 See Blahut (2020).

13.4 Cutoff Rate and Critical Rate

637

1.0 Capacity 0.8 Cutoff rate )stib( etaR

Critical rate

0.6

0.4

1 −ρ

0 ρ

0.2

0.0

0

1 −ρ

1 1/2

10

–1

10

–2

–3

ρ

–4

10 10 Probability of Bit Error (ρ )

1 –5

10

10–6

Figure 13.9 The critical rate R crit , cutoff rate R 0 , and capacity C in bits per binary symbol for a

binary symmetric channel.

Using a block code, a code rate up to R crit may be possible using a spherical decoder. The gap between the curve for the critical rate R crit and the curve for the cutoff rate R0 is the theoretical rate improvement that can be obtained by using a large convolutional code with sequential decoding rather than a large block code with spherical decoding. Using sequential decoding, it is possible in principle to achieve a rate up to the cutoff rate R 0. The gap between the curve for the cutoff rate R 0 and the curve for the channel capacity C is the potential improvement in using a composite code with iterative maximum-posterior bitwise decoding, described in Section 13.5, rather than a convolutional code with sequential decoding. These asymptotic assertions, however, make no statement regarding the practical complexity of the code or the decoder at a practical blocklength. Figure 13.9 also suggests the range of bit error rates for which each kind of code might be suitable, such as the range for a code rate of one-half. The three rate parameters converge to a value of one bit for a low probability of detected bit error, indicating that a simple block code may be sufficient. The three rate parameters all converge to a value of zero bits as the probability of a detected bit approaches one-half. This is because no code, no matter how complex, can correct for a channel that randomly and equiprobably flips the transmitted bits.

Rates for Phase-Shift Keying in Additive White Noise

The cutoff rate for phase-shift keying in additive, white gaussian noise can be derived in closed form because every point of the signal constellation sees the same set of distances to the other points of that constellation. In this case, the cutoff rate is maximized by an equiprobable prior and is given by 7 7 See Wilson (1996) Section 4.3.2.

638

13 Channel Codes

R0

= −log2

½ ¸ L −1 1 L

±=0

2 e−d± / 4N0

¾

,

(13.4.5)

where d±2 for ± = 0, 1, . . . , L − 1 are the squared euclidean distances from any point of the constellation to all other points of the constellation. For L-ary phase-shift keying, d± = 4E sin2(±π/ L ), where E is the symbol energy (cf. (10.2.17)). Using sin2 x = (1 − cos 2x )/2, the cutoff rate can be written as R0

= −log2

½

¾ L −1 ¸ − E /2 N0 1 ( E /2 N0 )cos (±2π/ L ) e e . L ±=0

(13.4.6)

As the number of points L in the phase-shift-keyed signal constellation approaches ∑ L−1 e(E /2N0)cos(±2π/L ) approaches I (E /2N ) and infinity, the term (1/ L ) ±= 0 0 0

µ

R 0 = −log 2 e− E / 2N0 I0 ( E / 2N 0)

·

,

(13.4.7)

where I 0(x ) is the modified Bessel function of the first kind of order zero. But E b so, by setting R equal to R 0, this equation can be rewritten as R0

= −log2

µ − E /2R N e

b

0

0

I0 (E b /2R0 N0 )

·

,

= E R,

(13.4.8)

which expresses R 0 implicitly as a function of E b / N0 . Figure 13.10 plots the cutoff rate R 0 as given in (13.4.7) as well as the capacity CPSK for phase-shift keying (cf. (14.3.13)) as functions of E / N 0, and the Shannon bound C for an unconstrained finite-energy modulation format (cf. (14.3.4b)). These curves show that, for a large value of E / N 0 and any fixed rate, the best possible code with a sequential decoder for a phase-shift-keyed constellation requires 1.68 dB more energy per bit than 5

4

Shannon capacity

1.68 dB

)stib( etaR

3 PSK capacity

2

1

0

Cutoff rate –5

0

5 E/N 0 (dB)

10

15

20

Figure 13.10 The cutoff rate R 0 and capacity C in bits per symbol for phase-shift keying (PSK) in

a channel with additive white gaussian noise. Also shown is the Shannon bound for an unconstrained format.

13.5 Composite Codes

639

Notional Only

ytixelpmoC

Sequential decoding

Iterative decoding

R0 Rate (bits) Figure 13.11 Notional comparison of the complexities of sequential decoding and iterative

decoding.

does an optimal decoder that achieves the capacity. The derivation of this statement is asked for as an end-of-chapter exercise. This difference is the potential improvement in using a composite code with an iterative decoder. Known codes, however, fall short of the theoretical performance shown in Figure 13.10. The capacity and the cutoff rate are asymptotic existence statements. They do not consider the complexity of the codes. The relative merits of sequential decoding and iterative decoding depend strongly on the application and implementation. A notional guide to this comparison is shown in Figure 13.11. For sequential decoding, the complexity becomes unbounded as the rate approaches the cutoff rate, as is shown notionally in Figure 13.11. Iterative decoding of composite codes can achieve rates greater than the cutoff rate, but the complexity of these codes as the rate approaches the channel capacity depends on the details of the specific code and the implementation.

13.5

Composite Codes At code rates larger than the cutoff rate R0 of an information channel, maximumlikelihood decoding becomes computationally intractable for that channel. Maximumlikelihood decoding is then abandoned in favor of componentwise maximum-posterior decoding. Good codes for maximum-posterior decoding have long codewords with symbol dependences that extend over many symbol intervals. Convolutional codes have direct dependences that extend only over the constraint length. To obtain widely distributed dependences, a local structure is combined with a global structure. Such codes are called composite codes. This composite structure may be described informally as “highly structured randomness.” The widely distributed dependences lead to better performance when using maximum-posterior decoding. This is realized by the process of marginalization (cf. Section 11.3.4), which localizes to each individual bit the relevant decoding evidence that is spread throughout the senseword.

640

13 Channel Codes

Two examples of composite codes are described in the following sections. These are called the Berrou codes and the Gallager codes. A Berrou code combines two copies of a convolutional code, each with a small constraint length. Each convolutional code encodes a different permutation of the same dataword. This is the local structure consisting of the two convolutional codes as subcodes. The global structure is the permutation that links the two subcodes. A Gallager code embeds each databit in a small local structure of check equations. The global structure then arises because each check equation involves different and widespread bits. In each case, the purpose of the composite local/global structure is to reduce an intractable global componentwise marginalization into many tractable local componentwise marginalizations while maintaining widespread dependences. 13.5.1

Componentwise Marginalization

The marginalization required for componentwise maximum-posterior decoding is introduced by a simple example of a (3, 2) binary block code. The senseword is r

= dG + e,

(13.5.1)

where G is a 2 × 3 generator matrix and e is the error pattern. The matrix equation written componentwise is

= d1G 11 + d2 G 21 + e1, r 2 = d1G 12 + d2 G 22 + e2, r 3 = d1G 13 + d2 G 23 + e3,

r1

(13.5.2)

where d1 and d2 are the two databits. For hard-decision detection, these three equations are equations in GF(2). For soft-decision detection, the first addition sign in each equation denotes addition in GF(2) and the second addition sign denotes addition of real numbers, with the understanding that the elements of GF(2) are first replaced by the bipolar reals µ A. In either case, the probability distribution p (r 1, r2 , r 3| d1, d2 ) for a specific channel can be derived and computed for each d1 and d2. The Bayes relationship then gives p(d1 , d2) p(r 1, r 2, r3 |d1 , d2 ) p(d1, d2| r1 , r 2, r 3) = . (13.5.3) p(r1 , r2 , r 3)

Because the senseword r = (r 1, r2 , r 3) is known at the decoder, the numerical value p(r1 , r 2, r3 ) is known, as is p(r1 , r 2, r 3|d1 , d2) for each d1 and d2. Therefore, the numerical value of each term on the right is known. So, after a direct calculation, the numerical value of the term on the left is known for each (d1 , d 2) pair as well. The marginalized distributions on d1 are computed from p(d1 , d2 |r 1, r2 , r 3) as p(d1 = 0|r 1, r 2, r3 ) = p (0, 0|r 1, r 2, r3 ) + p(0, 1| r1 , r 2, r3 ),

p(d1 = 1|r 1, r 2, r3 ) = p (1, 0|r 1, r 2, r3 ) + p(1, 1| r1 , r 2, r3 ).

(13.5.4a) (13.5.4b)

Because the two probabilities on the left sum to one, only the first needs to be computed in this way. Databit d1 is then decoded as either zero or one according to which of the

13.5 Composite Codes

641

two probabilities on the left of (13.5.4) is larger. The marginalized distribution for d2 is computed in the same way, and d2 is then decoded. This concludes the explanation of the principle of bitwise maximum-posterior decoding for a binary codeword with only two databits. This marginalization process extends to larger blocklengths. The marginals of the first databit are

¸¸ ¸ · · · p (0, d2 , . . . , dk |r) , d d d ¸ ¸ ¸ · · · p (1, d2 , . . . , dk |r) , p(d1 = 1| r) =

p(d1

= 0|r) =

2

d2

3

d3

(13.5.5a)

k

(13.5.5b)

dk

where r = (r 1, r2 , . . . , r n ). An equation like (13.5.5) holds for every databit, resulting in k marginalizations. For the mth bit position, this is written concisely as p(dm

= 0|r) =

p(dm

= 1|r) =

¸

: =

¸

d dm 0

: =

d dm 1

p(d|r),

(13.5.6a)

p(d|r).

(13.5.6b)

For a blocklength of 1000, there are 2 999 terms in each sum on the right side and there are 1000 such marginalizations of this form – one for each bit – so this computation is impractical as written. Nevertheless, at code rates larger than the cutoff rate R0 , the need for such marginalizations to implement a composite code by localizing the decoding evidence appears to be unavoidable. Accordingly, it is necessary that some method of simplifying the computation be used. One approach is to use a convolutional code so that the summations on the right side of the marginalization can be organized and executed on a trellis. This simplifies the computation by using the Bahl algorithm (cf. Section 11.3.4). However, a convolutional code is a weak code unless the constraint length is large, and for a large constraint length the required computations, even using the Bahl algorithm, are again intractable. To overcome this difficulty, a code that combines a local structure and a global structure is created. Such a code is called a composite code. The local structure enables efficient local marginalizations. The global structure accounts for widely distributed symbol dependences, thereby enabling an iterative decoding process. These iterative techniques for marginalization can decode powerful codes approaching the Shannon bound for a memoryless channel in additive white gaussian noise. Two classes of composite codes suitable for iterative decoding are the Berrou codes (or turbo codes) and the Gallager codes (or low-density parity-check codes). Decoding for each of these codes relies on a combination of intrinsic evidence and extrinsic evidence. Intrinsic evidence is evidence generated locally for each symbol on the basis of the received value of that symbol. Extrinsic evidence is evidence from the other received values that is based on widely distributed symbol dependences. Extrinsic evidence is passed between the local and global code structure in an iterative manner until convergence.

642

13 Channel Codes

13.5.2

Berrou Codes

A Berrou code is a composite code constructed in such a way that a form of iterative decoding, called turbo decoding, can be used. Each codeword of a Berrou code consists of two constituent convolutional codewords – each codeword from the same convolutional code with constraint length ν – encoded in parallel from the same dataword so as to circumvent the weakness of a single convolutional code. Only bits that are separated by at most the constraint length ν are directly connected within a single constituent convolutional code. Increasing the constraint length improves the performance of the code by dispersing extrinsic evidence more quickly throughout the codeword, but increasing the constraint length increases the computational burden, which is proportional to 2ν . Instead, the bits of a dataword d in a Berrou code are indirectly connected by encoding both d and a permuted dataword d± = T (d) using the same convolutional code, where T is a nontrivial permutation. Because T is a permutation, it is invertible. The permutation creates different symbol dependences encoded from d± compared with the symbol dependences encoded from d. The dataword d is systematically encoded once as a convolutional code with the output codeword described by d(x )G(x ) for a generator matrix G(x ) (cf. (13.3.5)). The codeword consists of the original dataword d and a checkword c containing the check bits. The permuted dataword d± is systematically encoded to produce a different checkword c± . This is shown in Figure 13.12. The transmitted codeword is the composite interleave (d, c, c± ) of the dataword d along with the two checkwords c and c± . This form of composite coding, called parallel concatenation, produces direct dependences derived from the two constituent convolutional codewords over the constraint length ν and a more tangled and stronger web of dependences over the entire composite codeword generated by encoding the dataword twice, the second time with permuted databits. The fixed permutation T generates the second dataword d± from d, so d± is not transmitted. It is conventional to regard the codeword as two separate rate-one-half systematic convolutional codewords, (d, c) and (d± , c± ) = (T (d), c± ). This means that the composite interleave (d, c, c± ) is a rate-one-third codeword. The two convolutional codewords have different sets of bit dependences because of the permutation used to generate d± from d. Each of the two rate-one-half constituent convolutional codes has a modest constraint length. Depending on how the trellis describing the constituent convolutional code is terminated (cf. Figure 11.8), the code rate of this Berrou code is either one-third or slightly less than one-third. The code rate can be increased to one-half by simply d

d

T

d′

Permutation

First encoder

c

Second encoder

c′ Multiplex

Figure 13.12 The structure of an encoder for a Berrou code.

x Memoryless noisy channel

y

y'

Demultiplex

13.5 Composite Codes

643

deleting every second bit in each of the two checkwords. This ad hoc technique is called puncturing.8

13.5.3

Turbo Decoding

A transmitted Berrou codeword (d, c, c± ) is received as the noisy senseword (x, y, y± ), where x is the noisy dataword, y is the noisy checkword generated from the dataword d, and y± is the noisy checkword generated from the dataword d± . The databits of d have an equiprobable prior. The goal is to compute the individual marginalized databit posteriors from (x, y, y± ) (cf. (13.5.6)). Direct marginalization is an intractable computation as such, but would be tractable using the Bahl algorithm on a trellis were the senseword only one of the two constituent sensewords, either (x, y) or (x, y± ). But then the posterior probabilities that would be computed from either set of partial data ignore a large part of the received data, so each computation, as such, would not give the correct componentwise posteriors defined by the full data set. Nevertheless, one may expect that either computed componentwise posterior is an improvement on the componentwise prior. Accordingly, the computed output posterior of one constituent decoder, regarded as a product distribution, can be used as a prior for the other constituent decoder. Then the new posterior is returned as a new prior to the first decoder. This cross-coupling of computed posteriors that are used, in turn, as new priors for two separate computations forms the iterative structure shown in Figure 13.13, alternating in this way until a convergent is obtained. The convergent components are then hard-decision detected to produce the output databits. One compact way to represent the marginalized prior probabilities of (13.5.6) is as a ratio so that the individual block posterior p(x, y, y± ) can be canceled out and need not be computed. The computations can be arranged so that only the posterior ratios need to be propagated forward. The ratio of the marginalized posterior probabilities for databit dm is

= 0|r ) = 1|r ) ∑ p(x, y, y±| d) p(d) = ∑d:d =0 p(x, y, y±|d) p(d) . d:d =1

u(dm |r) =

p(dm p(dm

m

(13.5.7)

m

T ±1 q y

x

First decoder

T

u

T

y′

Figure 13.13 The structure of a turbo decoder. 8 See Richardson and Urbanke (2008).

q' Second decoder

u'

T ±1

Hard decision

644

13 Channel Codes

The marginalization indicated by the notation d : dm = 0 denotes the set of all datawords d for which dm = 0 for the mth component of the dataword. Thus, the summation in the numerator is over all probabilities with dm = 0 for the mth component of the dataword. Each conditional probability of the summand is weighted by the prior p (d) of that dataword. The summation in the denominator is over all probabilities with dm = 1 for the mth component of the dataword.

Alternating Marginalization

The posterior probability ratio in (13.5.7) is exact, but it is not tractable to evaluate for large codes. Instead, the turbo decoding process uses a separate decoder for each of the two constituent codewords (d, c) and (d± , c± ). Replacing (x, y, y±) with (x, y) yields the posterior probability ratio used for the first constituent decoder,

∑ d:d u (d m | x, y) = ∑

=0 p(x, y|d) p(d) . (13.5.8a) d:d m =1 p (x , y |d ) p (d ) . Similarly, for the second constituent decoder, use (T (x), y± ) = (x±, y± ), where x± = T (x) m

using the permutation T . The posterior probability ratio is

) ( u± dm |x± , y± =

(± ± ) =0 p x , y | d p(d) . ± ± d:d m =1 p (x , y | d) p (d )

∑ ∑d:d

m

(13.5.8b)

Each constituent code is a convolutional code, so its componentwise posterior probabilities can be computed by marginalization on a trellis using the Bahl algorithm (cf. Section 11.3.4). Each of these two constituent decoders is shown as a box in Figure 13.13. Because the checkwords and the sensewords for each constituent decoder are different, each constituent decoder would generate a different ratio of posterior probabilities even were the prior probability p(d) the same in both decoders. The two decoders alternate and each decoder passes its computed block of componentwise posteriors to the other decoder where, multiplied together, that block is used as the product block prior for the next iteration. This iterative procedure is shown in Figure 13.13. As the number of iterations increases, the posterior probability ratios um and u±m defined for each m in (13.5.8) usually converge to the same value, thereby reaching a state of consensus. When either a consensus or a timeout is reached, a hard-decision rule in the form of (11.3.18) is applied to either um or u±m to detect the mth letter in the data sequence. The joint conditional probability distribution p(x, y|d) is a product distribution given by p(x, y|d) = p (x|d) p(y|d),

and p(x, y, d) = p(x|d) p(y|d) p(d). The individual conditional probability distribution p(x| d) is also a product distribution, p(x| d) =

k ¿

j =1

p(x j |d),

(13.5.9)

13.5 Composite Codes

645

where p(x j |d) is the conditional probability for the j th component of the sequence. Because of the structure of the code, p(x j |d) depends on all of d, in general. À The term p(y| d) is a similar product distribution given by p(y|d) = kj =1 p( y j | d). Because the databits are treated as independent, the updated prior is also a product À distribution with p (d) = kj =1 p(d j ). Combining these three statements, the ratio of the posterior probabilities for the first constituent decoder given in (13.5.8a) is expanded as

∑ d:d u (dm | x, y) = ∑

Àk

=0 j =1 p(d j ) p(x j | d) p( y j | d) . Àk p(d ) p(x |d) p( y |d) j j j j =1 d:dm =1 m

(13.5.10)

The mth terms p(dm ) and p(xm | dm ) for dm = 0 are the same for all terms of the sum in the numerator and can be factored out. The same statement holds for the mth term of the sum in the denominator. Separating the terms for the mth component from the rest of the terms rewrites the expression as

∑ = 0) p( xm |dm = 0) d:d u (dm |x, y) = × × ∑ dm = 1) Á ÂÃ= 1)Ä Áp( xm |ÂÃ Ä Á d:d p(dm p(dm

rm

À

k =0 j ∼m p(d j ) p(x j |d) p( y j | d) Àk p(d ) p(x |d) p( y |d) j j ∼m m =1 ÂÃ j j Ä m

λm

qm

= rm λm qm, (13.5.11) where the notation j ∼ m denotes that the term for which j equals m is excluded.

The first two terms are the intrinsic evidence for the mth databit, and the remaining term is the extrinsic evidence for the mth databit. The first term, rm , on the right is the ratio of the prior probabilities for the mth component of the codeword (cf. (9.5.11)). The middle term, λ m , is the likelihood ratio defined in (9.5.12). It contains the intrinsic evidence based on the value of the mth component that is being decoded. The last term, qm , is the extrinsic evidence for dm gathered from all the other components in the codeword. Expressing u(dm |x, y) in terms of log-likelihood ratios gives loge (u(dm |x, y)) =

LÁ (ÂÃrmÄ) + LÁ (λ ÂÃmÄ) + LÁ (ÂÃqmÄ) . prior

intrinsic

(13.5.12)

extrinsic

The second term is the intrinsic log-likelihood ratio. Therefore it is natural to call the third term the extrinsic log-likelihood ratio. The corresponding logarithm for the second constituent decoder is written similarly as

( ) log e u± (dm |x, y± )

= L(r±m ) + L(λ±m) + L(qm± ).

(13.5.13)

Each term for the second constituent decoder has the same form as the corresponding term for the first constituent decoder, with y replaced by y± . When qm = 1 or L(qm ) = 0, there is no extrinsic evidence and (13.5.12) reduces to the memoryless case given in (9.5.13).

The Gaussian Noise Channel

The expressions in (13.5.8) are general. The probabilities therein must be evaluated for each specific instance of a channel. These probabilities can be expressed as the branch

646

13 Channel Codes

(a) 0

(b)

Simulation Estimate

–2

etaR rorrE tiB eht fo goL

etaR rorrE tiB eht fo goL

–1

–1

Uncoded BPSK

–3

–2

–4 –5

–3 –4

Two iterations

Three iterations

–6

One iteration

–7 –8

–5 0

2

Shannon limit (Binary code)

4 6 Eb /N 0 (dB)

8

10

(

–9

,

2

3

4

5 6 Eb /N0 (dB)

7

8

)

Figure 13.14 (a) The bit error rate of a 4096 2048 Berrou code as a function of the number of

iterations. (b) Illustrating the error-rate floor for a different turbo code.

metrics on the trellis used by the Bahl algorithm. For an additive white gaussian noise channel, the noise samples are independent, and the product probability expressions can be specialized for this case. For signal-independent memoryless gaussian noise, the branch metrics can be reduced to euclidean distances. The bit error rate on an additive gaussian noise channel for a (4096, 2048) Berrou code as a function of the number of iterations is shown in Figure 13.14. The performance of uncoded binary phase-shift keying is shown as well. The remarkable performance of these codes, approaching the Shannon bound after only a few iterations, is evident. The limiting performance of a Berrou code depends on the minimum Hamming distance dmin of the constituent convolutional code. The performance curve will exhibit an error-rate floor for low error rates as is shown in Figure 13.14(b).9 For small block error rates the error-rate floor can be estimated as (cf. (13.3.10)) pe

≈ K erfc

µ¶

/

dmin E c N0

·

,

(13.5.14)

where the constant K depends on the sum of the Hamming weights of all codewords at minimum distance dmin, and E c = Eb R c is the energy in a codebit for a code with rate R c . A comparison of this bound with simulated results is shown in Figure 13.14(b).

13.5.4

Gallager Codes

A linear binary block code that has at least one regular check matrix H with a low density of ones is called a Gallager code. The requirement refers to the check matrix. The codewords themselves do not have a low density of ones. When represented by a low-density check matrix, a Gallager code is also called a low-density parity-check code. Recall that a regular check matrix has the same number of ones in every column and the same number of ones in every row. Other check matrices for that Gallager code are not of interest here. A block code having a check matrix with nearly the same number of ones 9 Adapted from Garello, Pierleoni, and Benedetto (2001).

13.5 Composite Codes

647

per column and nearly the same number of ones per row is called an irregular Gallager code. By contrast, a code with a regular check matrix is called a regular Gallager code. Absent a statement defining when the density of ones is low, the class of Gallager codes is not precise. The requirement for a low density of ones in the check matrix induces a local structure within the code. Accordingly, a Gallager code has a natural association with a Tanner graph as defined in Section 13.2.8. Decoding a Gallager code is best described as the passing of local estimates of probability ratios along the edges of the Tanner graph, as will be discussed in the next subsection. This decoding is based on each bit seeing only a local structure of check equations and each check equation seeing only a local structure of bits. With this structure, an intractable global marginalization (13.5.6) can be replaced by many local marginalizations with multiple iterations. A practical Gallager code may have a blocklength of 1000 or more and a check matrix H with as few as six ones in each column. Such a large matrix need not be written down. It is embedded in the structure of a computational algorithm or a logic circuit. Examples given here, however, must be small so that the check matrix can be written down. A modest example of a check matrix H for a Gallager code is the matrix with five rows and 10 columns obtained by simply writing every binary five-tuple with two ones as a column of the matrix. This gives

⎡ ⎢⎢ H=⎢ ⎢⎢ ⎣

1 1 0 0 0

1 0 1 0 0

1 0 0 1 0

1 0 0 0 1

0 1 1 0 0

0 1 0 1 0

0 1 0 0 1

0 0 1 1 0

0 0 1 0 1

0 0 0 1 1

⎤ ⎥⎥ ⎥⎥ . ⎥⎦

(13.5.15)

The matrix H has two ones in every column and four ones in every row. The corresponding Tanner graph has girth six. It is an exaggeration to call this small matrix a sparse matrix because, out of the 50 bits in the matrix, 20 bits are ones. Nevertheless, it serves as a simple example. The rows sum to zero, so they are not linearly independent, but any set of four rows is linearly independent. Therefore, n − k = 4 and, because n = 10, this oversized check matrix defines a (10, 6) linear binary code. Inspection shows that the heft of H equals 2, so the minimum distance of the code is 3, and the code is further described as a (10,6,3) code. A smaller Gallager code is obtained by striking out the last row and every column that contains a one in that last row. Then

⎡ ⎢ H=⎢ ⎣

1 1 0 0

1 0 1 0

1 0 0 1

0 1 1 0

0 1 0 1

0 0 1 1

⎤ ⎥⎥ ⎦.

(13.5.16)

This check matrix defines a (6,3,3) Gallager code. It is smaller than the (7,4,3) Hamming code, but it is in the class of codes that are deemed to be more suitable for messagepassing decoders because of the structure of the code. The Tanner graph for this small example is shown in Figure 13.15. Each check node in the top row corresponds to one of the check equations. Each check node is connected to three bits, and these connections

648

13 Channel Codes

f1

r0

f2

r1

f3

r2

r3

f4

r4

r5

Figure 13.15 (a) A Tanner graph of a (6,3,3) Gallager code.

together can be regarded as a local structure. Notice that each data node of this example has two edges leaving it and each check node is reached by three edges, as is evident from inspection of the rows of the check matrix. This Tanner graph also has girth 6. A closed path of length 6 is highlighted. No closed path through this Tanner graph has a length smaller than 6. A girth of 6 is often considered satisfactory. Because of the central role of the Tanner graph in the message-passing decoder, the iterations will recover the codeword rather than the dataword. This means that that dataword must be recovered in a subsequent step. To this end, the encoder uses a systematic generator matrix G. Then the k-bit dataword is recovered as k consecutive bits of the decoded systematic codeword. Because the Gallager code is defined by the low-density check matrix H, the generator matrix G must be computed from H so as to satisfy GHT = O. For a moderately large Gallager code such as a binary (2048, 1024) code, this one-time computation of the generator matrix G is a tractable but nontrivial task.

13.5.5

Message-Passing Decoders

A maximum-posterior decoder, as defined, marginalizes the block posterior to each data component, then decodes that component. However, it is computationally intractable to explicitly marginalize a large codeword to each component. For this reason, an iterative message-passing algorithm is used instead. Our discussion of message passing is restricted to binary block codes. To describe an iterative decoding strategy, the dependences between the components are expressed using a Tanner graph such as that shown in Figure 13.15 for the (6,3,3) Gallager code. In brief, messages are passed along the edges between the nodes of the Tanner graph to iteratively compute local marginalizations until a steady state is reached, as will be described. The simplest example is message passing of binary data. For hard-decision binary data, an iterative message-passing decoder is a simple bit-flipping procedure. This method flips bits, as appropriate, until the Tanner graph constraints have been satisfied. To flip a bit, a bit-flipping decision threshold parameter δ is required. The binary syndrome vector S = rHT for the hard-decision senseword r in GF(2) is equal to c + e, where e is the error vector and c is a codeword. It has a length equal to the number of columns of HT. Halt if S = 0 and assert that s = r. Otherwise, call each component Si of the syndrome S a symptom. Flip each bit ri that participates in more than δ nonzero symptoms. Repeat the process until convergence or timeout.

13.5 Composite Codes

649

The bit-flipping process is easy to describe on the Tanner graph. Each bit node sends its initial sensebit value to all check nodes to which that bit node is connected. The check node determines whether the check equation is satisfied by calculating the modulo-two sum of all its bit inputs, then passes this message to each connected bit node. At each bit node, these messages containing the modulo-two sums are received from all check nodes to which that bit node is connected. If the number of unsatisfied check equations at a bit node exceeds δ , then that codebit is flipped. The process is repeated until all of the check equations are satisfied or until a timeout occurs. This bit-flipping algorithm is an elementary example of a message-passing algorithm. The remainder of this section generalizes the bit-flipping algorithm to the decoding of soft-decision binary data, again by message passing. The method is essentially the same, but for soft decisions the messages are based on posterior probabilities, computed within each node using only evidence available at that iteration step to that bit node from neighboring check nodes.

Extrinsic Evidence

Marginalization of the block posterior probability to an individual bit uses both intrinsic evidence for that bit and extrinsic evidence for that bit. Although this partition between intrinsic and extrinsic evidence is different for every bit, all evidence is eventually shared with multiple bits. The decoding strategy is to iteratively pass fragments of extrinsic evidence along the edges of a Tanner graph so as to eventually recover each bit of the codeword. For each bit node, extrinsic evidence is evidence that the bit node does not already have. Evidence that the bit node already has is called intrinsic evidence. The extrinsic evidence for each bit is derived from the interbit dependences and is expressed as a combination of the posterior probability ratio generated from the noisy senseword r and the probability that the check equation at check node f j is satisfied. Because of the central role of the Tanner graph, the decoding task is formulated appropriately with respect to that graph. To this end, the marginalized posterior ratio given in (13.5.7) is now reformulated to suit the problem at hand. For the message-passing algorithm, codebit posteriors are computed rather than databit posteriors. The ratio of each exact posterior probability marginalized to component c i is u(ci |(r, C ))

= pp((cci == 10|(|(rr,, CC )))) , i

(13.5.17)

where C denotes the set of check equations describing all dependences on other bits. The posterior probability of the value zero or the value one for codebit ci is conditioned both on the detected noisy senseword r and on the complete set of check equations C . For a large code, the marginalizations required to form this ratio are impractical to compute (cf. (13.5.6)). Instead, the task is localized. The posterior probability ratio for c i is replaced by the localized probability ratio u (c i |(r, Ci ))

= pp((cci == 10|(|(rr,, CCi )))) , i i

(13.5.18)

650

13 Channel Codes

where now Ci refers to only the local check equations that directly involve codebit ci as depicted by the Tanner graph. These check equations are the rows of the check matrix H for which there is a one in the ith column. This simplification means that extrinsic evidence not in Ci , although in the complete set of check equations C , does not immediately affect bit node ri . For codebit ci , to make use of the widely distributed extrinsic evidence, subsequent iterations are necessary. In this way, extrinsic evidence propagates through the branches and nodes of the Tanner graph, iteratively updating the value of the iterate r i attached to each bit node. The iterate r i is initialized with the noisy sample ri and converges to the noise-free bit ci . An iteration consists of each bit node sending a message to every check node to which that bit node is connected, followed by each check node sending a message to every bit node to which that check node is connected. Each message sent by a node consists of an individual conditional probability distribution ( p0, p1 ) appropriate to the destination node according to local evidence. For binary modulation, it suffices to send only p0 because p1 = 1 − p0. When sent by a check node to a bit node, p0 is the locally computed probability that the bit is a zero using the evidence that is available to that check node. When sent by a bit node to a check node, p0 is the locally computed probability that the check equation is satisfied using the evidence that is available to that bit node. To avoid instability, a transmitted conditional probability to a destination node is computed without using the evidence that has come directly from that destination node. The bit node tells each check node to which it is connected its locally computed conditional probability that the check equation for that check node is satisfied on the basis of the localized evidence available to that bit node. This conditional probability is computed using the received sample r at that bit node and all evidence from all check nodes to which the bit node is connected other than the destination check node. The bit node must compute and send a different message to every check node to which it is connected. The check node then tells every bit node to which it is connected its locally computed conditional probability that the value for that bit node is a zero on the basis of the localized evidence available to that check node. This conditional probability is computed using the messages from all the bit nodes to which that check node is connected other than the destination node. The check node must compute and send a different message to every bit node to which it is connected. Rather than compute the conditional probabilities as such, however, it may be more convenient to compute the conditional probability ratios, because certain common terms cancel out. Furthermore, rather than compute the ratios themselves, it is more convenient and numerically more stable to compute the logarithm of these ratios so that multiplications are replaced by additions. Separate the expression of the localized probability ratio given in (13.5.18) into a term for messages L (qi → j ) passed from bit nodes ri to check nodes f j and a step for messages L (t j →i ) passed from check nodes f j to bit nodes r i .

13.5 Composite Codes

651

The intrinsic evidence used for initialization is determined from the Bayes rule. Because p(ci

= a±|r i ) p(r i ) = p (ri |ci = a±) p(ci = a±)

∈ {0, 1}, the intrinsic evidence can be expressed as ¹ p(c = 0|r , C ) º i i i L(ci |Ci ) = loge ¹ º ¹ pp((rci |c= =1|r0i ,, CCi )) º (13.5.19) = loge p(r i |ci = 1, Ci ) + loge pp ((cci == 01)) . i i i i The second term, denoted L(ci ), is the log-likelihood ratio of the prior on codebit ci for each i. At initialization, the codebits are equiprobable, so p(c i = 0) = p(c i = 1). In that case, the second term L(ci ) is zero. The noisy bits ri are shown as the inputs to the with a±

bit nodes at the bottom of Figure 13.15. These noisy bits provide the intrinsic evidence available at each bit node. The message L (qi → j ) passed from bit node i to check node j contains the extrinsic evidence expressed as the logarithm of the ratio of the posterior probabilities marginalized to the codebit ci ,

.

L (qi → j ) = loge where

¹ q (0) º i →j , qi → j (1)

. ( = a± |ri , Ci ∼ j ) ,

qi → j (a± ) = p ci



(13.5.20)

∈ {0, 1}

(13.5.21)

is the posterior probability that the transmitted codebit equals a± . This probability is conditioned on the received noisy sample ri and on the probability that the check equation Ci is satisfied for every connected check node except the check equation C j . The condition is denoted as Ci ∼ j . The message L (t j →i ) passed from check node f j to bit node r i is expressed as

. L (t j →i ) = loge

¹ t (0) º j →i , t j →i (1)

(13.5.22)

where t j →i (a± ) is the probability that the check equation C j for check node f j is satisfied conditioned on the bit value ci for every connected bit node except the bit node r i . Expressions for L (t j →i ) and L (qi → j ) for the case of gaussian noise are derived in the next subsection. An example of extrinsic evidence propagating through a graph is shown in Figure 13.16, which depicts two subgraphs of the Tanner graph shown in Figure 13.15. The message L (t2→0) from the check node f2 to the bit node r 0 is shown in Figure 13.16(a). This message uses the extrinsic evidence available at check node f2 from the messages L (q3→2 ) and L (q4→2) generated by bit nodes r3 and r4 . These messages are generated in the previous message-passing step. This extrinsic evidence excludes the message L (q0→2 ) from bit node r 0 because that message is from the destination node. Similarly, Figure 13.16(b) shows the message L (q3→2) from bit node r 3 to check node f2 generated

652

13 Channel Codes

(a)

(b)

f2

L(t 2→ 0)

r0

L(q 3→2 )

L(q4→2 )

r3

f2

f3

L(q 3→2 )

r4

L(t3 →3 )

r3 (c3 )

r3

Figure 13.16 (a) A subgraph of the bit nodes for the Tanner graph shown in Figure 13.15. The

message L(t2→0) from check node f 2 to bit node r0 uses extrinsic evidence available at f2 from a message L(q3→2) originating from bit node r 3 and a message L( q4→2) originating from r4. (b) The message L( q3→2) from bit node r3 to check node f2 uses extrinsic evidence available at r3 from the message L (t3→3 ) and the intrinsic evidence L(c3 ).

using the extrinsic evidence available from the message L (t3→3) as well as the intrinsic evidence available at r 3 , which is given by L (c 3).

Message Structure for White Gaussian Noise

The expressions for L (qi → j ) in (13.5.20) and L (t j →i ) in (13.5.22) involve specific probability distributions appropriate to the information channel of interest. When written explicitly for some common channels, these expressions take on a simple form. For binary phase-shift keying in additive white gaussian noise, let the codebit ci = 0 be mapped into the symbol value s = 1 and let the codebit ci = 1 be mapped into the symbol value s = −1. The conditional√probability density function of additive gaussian noise then has the form p (ri |s ) = (1/ 2πσ 2 ) exp(−(r i − s )2/2σ 2). The single-symbol log-likelihood ratio for binary phase-shift keying with an equiprobable prior is

¹ p (c = 0|r ) º i L(ci ) = loge ½ p (ci = 1|r ) ¾ = loge

2 2 e−(r i −1) / 2σ

2 2 e−(r i +1) / 2σ

i = 2r 2 σ ,

(13.5.23)

where ri is a noisy gaussian-distributed sample for the i th bit with variance σ 2 . The log-likelihood ratio defined in (13.5.23) can be inverted to recover the conditional probabilities from the log-likelihood ratios as p(ci p(ci

= 0|r ) = 1 + e1−L( ) , ci

= 1|r ) = 1 + 1eL( ) , ci

(13.5.24a) (13.5.24b)

13.5 Composite Codes

653

Message from Check Nodes To determine the form of the message L (t j →i ) from a check node in terms of the message L (qi → j ) from a bit node, note that a check node corresponds to a bitwise sum of at least two codebits that must sum to zero modulo two for every codeword. Therefore, every check node is connected to a set of bit nodes for which the codeword contains an even number of ones. For each check node specified by a row of the check matrix H, these bit nodes correspond to the location of the ones for that row of H. Each noisy sample ri is independent of the other noisy samples in the senseword, and has the value one with probability pi = p(ci = 1). Given that the ith bit of a set of M independent random bits is a one with probability pi , the probability that the set of M independent bits has an even number of ones, denoted pM (even), is pM (even) =

1 2

+ 21

M ¿ i =1

(1 − 2pi ) .

(13.5.25)

The proof of (13.5.25) is by induction. It is trivial to show that it holds for M = 1. Moreover, a simple calculation shows that if it holds for M − 1, then it holds also for M. This proof is asked for as an end-of-chapter exercise. Now suppose that the i th bit ci is 0. For the check equation C j to be satisfied, the remaining codebits with i ± not equal to i must contain an even number of ones so that their binary sum equals zero. This means that the probability t j →i (0) that the check equation is satisfied given that c i = 0 is equal to pM (even) given in (13.5.25). For the jth check node with M + 1 input branches, t j →i (0) is equal to pM (even) and can be written as (cf. (13.5.21)) t j →i (0) =

1 2

+

1 ¿ 1 − 2qi ± → j (1), 2 ± i :∼ i M

(13.5.26)

where the correspondence of pi to qi → j (1) has been used with qi → j (1) derived in the previous iteration step, and the notation i ± :∼ i indicates that the product is taken over all of the codebits i ± used in the check equation for the j th check node except for the codebit i ± = i , which is the destination bit node ri . Using t j →i (1) = 1 − t j →i (0), rewrite (13.5.26) as 1 − 2t j →i (1) =

M ¿

i ± :∼i

1 − 2qi ± → j (1).

(13.5.27)

Use the identity: tanh( 21 loge x ) = (x − 1)/( x + 1) to determine the corresponding logprobability ratio given in (13.5.22). Let x = p0/ p1 be the ratio of the probabilities of a binary event such that p0 = 1 − p1 . Using this ratio, write the identity as follows: 1 − 2p1

= tanh

µ1

2 loge ( p 0/ p 1)

·

.

Apply this identity to both sides of (13.5.27). On the left, set x = t j →i (0)/ t j →i (1). On the right, set x = qi → j (0)/qi → j (1). Making these substitutions yields

654

13 Channel Codes

¹

tanh

1 2

log e

Á

¹t

¹ q ± (0) º º ¹ M (0) º º = ¿ i →j 1 . tanh 2 log e t j →i (1) qi ± → j (1) ± i :∼ i ÂÃ Ä ÂÃ Ä Á j →i

L ( t j →i )

L (qi ± → j )

The log-probability ratio L (t j →i ) in (13.5.22) then becomes L (t j →i ) = 2 tanh−1

½¿ M

i ± :∼i

¾ ( ) tanh L (qi ± → j )/2 ,

(13.5.28)

which provides the check-node update expression. This equation expresses the message L (t j →i ) from a check node in terms of L (qi → j ). For the first iteration, these ratios are initialized by replacing L (qi → j ) with the log-likelihood ratio L (ci ) given in (13.5.23).

Message From Bit Nodes To determine the form of the message L (qi → j ) from a bit node in terms of L (t j →i ), equally weight each log-probability ratio L (t j →i ) as well as the initial log-likelihood ratio L(ci ) at each bit node. Equally weighting the log-probability ratios from the other check nodes, including the initial log-likelihood ratio L (c i ) derived from the noisy received sample r i , and excluding j ± = j gives L (qi → j ) = L (ci ) +

¸

j ± :∼ j

L (t j ± →i ),

(13.5.29)

which provides the variable-node update expression. The output of (13.5.29) for each pair of connected nodes is used as the input to (13.5.28). Similarly, the output of (13.5.28) is used as the input to (13.5.29). Together, equations (13.5.28) and (13.5.29) define the basic iteration step. After each iteration, a componentwise hard decision may be applied to derive an estimate Å c of the codeword. The log-probability ratio L (i ) at each bit node used for this hard-decision process is L (i ) = L (c i ) +

¸ j

L (t j →i ),

(13.5.30)

which includes both the intrinsic and the extrinsic evidence. Depending on the mapping of the letters in the codeword into the transmitted symbols given in (13.5.23), Å c i = 0 is asserted if L (i ) ≥ 0, and ci = 1 is asserted if L (i ) < 0. For each iteration, the extrinsic evidence available to each bit node changes so that L (i ) evolves as a function of the number of iterations. The iteration is stopped when the estimated codeword Å c satisfies ÅcHT = 0 (cf. (13.2.2)) or when a halting criterion is satisfied. As an example, the performance in gaussian noise of a rate-one-half binary Gallager code for a dataword with a length k = 1024 as a function of the number of iterations is shown in Figure 13.17.10 For this specific code, the relative improvement per iteration decreases and stagnates beyond approximately 50 iterations. The performance of 10 See Hamkins (2011) as well as Butler and Siegel (2013).

13.6 Trellis-Coded Modulation

655

0 –1 etaR rorrE tiB fo goL

–2 –3 –4 20

–5

50

–6

100

–7 –8

2

5

10

200 0

2

4

6

8 Eb /N0 (dB)

10

12

14

Figure 13.17 Bit error rate of a (2048, 1024) Gallager code with a girth of 6 after 2, 5, 10, 20, 50,

100, and 200 iterations.

other codes using longer codewords can saturate at a higher number of iterations, and they can perform within a fraction of a decibel of the Shannon bound. Message-passing decoders for Gallager codes can be generalized to dispersive channels and to larger signal constellations. The performance of a Gallager code decoded by message passing is difficult to quantify because the error-rate floor need not be caused by low-weight codewords as was the case for a Berrou code. Instead, for a large Gallager code, the intrinsic evidence is mingled with the extrinsic evidence when the number of iterations exceeds half the girth of the Tanner graph. These inevitable dependences can lead to small-scale “trapping” structures or sets within the code that cause stagnation and can produce an error-rate floor. For an additive white gaussian noise channel, this error-rate floor is difficult to quantify because it depends on the specific code and may even depend on numerical details of the calculation. As a consequence, practical implementations of a Gallager code may use an outer algebraic block code to improve reliability by protecting against message-passing stagnation.

13.6

Trellis-Coded Modulation The algebraic codes described in Section 13.2 and the modulation based on the signal constellations described in Chapter 10 have been described as separate consecutive functions of the transmitter. Instead, the two functions may be merged by combining a convolutional code with a complex signal constellation. A sequence-dependent combination of a convolutional code with signal subconstellations (cf. Figure 10.2) is called a trellis code. A trellis code mingles the finite-field arithmetic of the convolutional code with the complex numbers of the signal constellation.

656

13 Channel Codes

A trellis code defined on a finite set of points in the complex plane is called an Ungerboeck code. The resulting modulation format is called trellis-coded modulation. An (n, k) Ungerboeck code maps k bits in an input dataframe into n codebits which then address a complex signal constellation with 2 n points. The mapping of the databits depends on the recent history of the dataframes as determined by the current node of the trellis. This mapping is described by the labeling of the trellis, which overlies the internal structure of a convolutional code defined by an associated generator polynomial matrix. The branches of the trellis are labeled with the points of the signal constellation. An Ungerboeck code does not satisfy the linearity condition at the level of a code symbol because the addition of two points in a real or complex signal constellation need not be another point of that signal constellation as is required for a linear code. Maximizing the euclidean distance between sequences of modulated symbols is not generally equivalent to maximizing the Hamming distance between the input sequences of logical symbols. The motivation for using a trellis-based code instead of a logical-symbol code is, in large part, because the euclidean distance can be treated in a simple way using a trellis. The motivation for using a signal constellation that has 2 n points to encode 2k bits can be understood through a simple example that compares the performance of uncoded quadrature phase-shift keying (QPSK) modulation with that of coded 8-ary PSK modulation, with each format transmitting two databits per symbol interval. This comparison is based on the channel capacity of a memoryless channel in additive white gaussian noise with a equiprobable prior. This capacity is derived in Chapter 14 and is shown in Figure 14.11(a) for several phase-modulation formats. The figure is redrawn in Figure 13.18 for several phase-shift-keyed modulation formats. As a reference, the figure also includes the Shannon channel capacity of the unconstrained white gaussian noise channel (cf. (14.3.4b)). 5 Shannon capacity limit

4 )lobmys/stib( yticapaC

16-PSK 3

8-PSK pe = 10–5

2

QPSK 7.1 dB

1 1 dB 0

–5

0

5 E/N0 (dB)

10

15

20

Figure 13.18 The channel capacity for several phase-modulation formats in additive white

gaussian noise.

13.6 Trellis-Coded Modulation

657

Uncoded QPSK transmits two bits per symbol. For the QPSK modulation format, a value of E / N0 equal to 12.9 dB produces a symbol error rate pe equal to 10−5 (cf. (10.2.19)). Referring to Figure 13.18, information theory informs us that the same data rate of two bits per symbol can be achieved, in principle, for (coded) 8-ary PSK modulation at a minimim value of E / N0 equal to 5.8 dB, as is shown in Figure 13.18. The difference of 7.1 dB is the improvement that is possible by using a strong code with 8-ary PSK instead of uncoded QPSK. The complexity of the code determines how much of the available 7.1 dB gain is to be realized. Figure 13.18 also shows that increasing the size of the signal constellation to 16-ary PSK provides essentially no improvement in capacity over 8-ary PSK at a data rate of two bits per symbol. (However, it may be easier to realize most of the potential 7.1 dB gain by using 16-ary PSK rather than 8-ary PSK.) The capacity limit also shows that any modulation format at a rate of two bits per symbol for this channel can provide only 1 dB of additional improvement compared with using 8-ary PSK. This additional 1 dB cannot be realized using only phase modulation. Trellis-coded modulation employs an expanded signal constellation that uses 2 n points to represent the databits encoded for each symbol interval. This provides the redundancy needed to absorb the increased number of codebits. An example is a (3, 2) 8-ary PSK trellis code with a constraint length ν equal to two. Because it has a small constraint length, this simple trellis code has only a modest performance improvement compared with the potential 7.1 dB improvement promised by Figure 13.18. It has an input of two bits at each instant and an output that is one of the 2 3 = 8 possible modulation points of 8-ary PSK. The labeled modulation points are shown on the left side of Figure 13.19. Because k = ν = 2, there are four encoder states represented by the nodes of a trellis. Because k = 2, two bits are encoded for each modulation interval so that each node has four input branches and four output branches. Each branch is labeled by one of the eight possible PSK modulation points shown in Figure 13.19. dmin = 2

dmin =√2

dmin = 2sin( π/8)

010

010

100

010 011

1

100

000

100

000

001 000 111

101

A

110

110 001

011

110

001

011

B 101

111

111

101

Figure 13.19 The 8-ary PSK constellation can be partitioned into two QSPK subconstellations. In

turn, each of the QPSK subconstellations can be partitioned into two BPSK subconstellations.

658

13 Channel Codes

The preferred trellis structure for trellis-coded modulation maximizes the minimum euclidean distance between any two sequences permitted by the trellis. This method involves concurrently determining the branch structure that connects the nodes in the trellis and assigning a modulation point to each branch. One method to systematically assign modulation points to branches consists of partitioning the signal constellation into smaller and smaller subconstellations, with each subconstellation having a larger minimum euclidean distance. The notion of partitioning is introduced in Figure 10.2 of Chapter 10, and is shown for 8-ary PSK in Figure 13.19. In the first step, the 8-ary PSK constellation, which has a minimum distance of dmin = 2 sin(π/8) for a unit-energy symbol (cf. (10.2.17)), √ is partitioned into two QPSK constellations, each having a minimum distance of 2. In the second step, the two QPSK subconstellations are partitioned into four BPSK subconstellations, each having a minimum distance of 2. The 8-ary PSK trellis is compared with the trellis for uncoded QPSK in Figure 13.20. Because there is no code, the trellis for uncoded QPSK shown in Figure 13.20(a) has only a single degenerate “encoder state” with four branches leaving it, each branch representing a point in the signal constellation, √ connecting successive modulation intervals. The minimum distance dmin of QPSK is 2 (cf. Figure 13.19), with each point having two nearest neighbors at this distance. The trellis for coded 8-ary PSK is shown in Figure 13.20(b). The four distinct pairs of parallel branches represent the four BPSK subconstellations shown in Figure 13.19. For each encoder state represented by a node, there are four branches, organized into two sets of two parallel branches, entering and leaving the state. Each set of four branches corresponds to one of the QPSK subconstellations, denoted as either “A” or “B” in Figure 13.19. This labeling is also shown for each encoder state in Figure 13.20(b). Because the four branches entering and leaving each encoder state represent only half the possible modulation points of 8-ary PSK, the number of possible sequences using trellis-coded 8-ary PSK is equal to the number of sequences using uncoded QPSK and so has the same data rate. However, a properly designed trellis using 8-ary PSK has a larger minimum euclidean distance dmin than the uncoded QPSK trellis shown in Figure 13.20(a) and thus requires less energy per symbol. The reduction in the symbol (a)

(b)

00 01

00 01

11 10

11 10

A 110 B

000 100 010 0 01

110

000

0 10 001 101

A B 011

111 1 11 001 101

011 011

000 100 010

0 01

000

110 100

101 001 111 11 1 001 101

011 011

000 100 010

0 01

000 101

111 11 1 001 101

100 001

011

Figure 13.20 (a) QSPK described as a degenerate trellis code. (b) A trellis code for the 8-ary PSK

constellation with the labels defined in Figure 13.19.

13.7 Modulation Codes

659

energy required in order to achieve the same data rate is the coding gain. For trelliscoded modulation, the coding gain does not require any additional bandwidth because the symbol rate has not changed. Accordingly, trellis-coded modulation is widely used in bandlimited channels such as dispersive lightwave channels. For the example considered in this section, in a high-signal-to-noise-ratio regime the calculation of the coding gain is dominated by the minimum squared euclidean distance 2 dmin between sequences, with the number of nearest neighbors being of less importance. For uncoded QPSK, the four parallel transitions for each modulation interval shown in the degenerate trellis in Figure 13.20(a) allow any sequence to be transmitted. Therefore, the minimum distance dmin between uncoded QPSK sequences of length K is K times 2 = 2E (cf. (10.2.17)). For coded the minimum distance between symbols, which is dmin 2 8-ary PSK, the minimum distance of each BPSK subconstellation is dmin = 4E . For the trellis used in the example, every subconstellation has the same minimum distance as a consequence of the labeling used in the parallel branch structure. This means that the minimum squared euclidean distance between coded sequences is also K times the minimum squared euclidean distance of the BPSK subconstellation. Accordingly, the asymptotic coding gain is G

= 10 log10

½

2 K dmin (uncoded QPSK)

2 (coded 8-ary PSK) K dmin

¾

= 3 dB.

(13.6.1)

This is an asymptotic coding gain because it is based on only the minimum euclidean distance dmin and ignores the effect of multiple nearest neighbors. This is appropriate in a high-signal-to-noise-ratio regime typical of a low probability of error. The coding gain in a low-signal-to-noise-ratio regime will be somewhat less than 3 dB because of the effect of other terms. For most larger trellis-coded modulation formats, the trellis does not have parallel branches. For this case, the error-rate performance depends on the minimum euclidean distance between the correct sequence and all possible incorrect sequences weighted by their probability of occurrence. For a linear code, the performance does not depend on which codeword is transmitted, so that the error rate of a linear code when the allzero codeword is transmitted is the same as the error rate when any other codeword is transmitted. A trellis code is nonlinear in that the euclidean distance does not have this property. Nevertheless, for a symmetric signal constellation, the performance can be determined using relative distances instead of distance pairs, as is the case for binary codes. 11

13.7

Modulation Codes A modulation code is a code used to accommodate the various restrictions imposed on the patterns of symbols that are allowed to be transmitted. Only modulation codes for binary sequences will be discussed. 11 See Zehavi and Wolf (1987).

660

13 Channel Codes

There are many reasons for restrictions on the allowed sequences. A modulation code can be used to insert redundancy into a symbol sequence so as to avoid troublesome or undesirable subsequences. While it may seem that this redundancy reduces the data rate, the opposite is usually true. By removing troublesome subsequence patterns that limit the transmitted rate, the rate of transmitted symbols per second can be increased. For example, by ensuring that all marks in the datastream are sufficiently spaced, it may be possible to significantly increase the transmitted symbol rate by reducing the symbol intervals while keeping the same symbol duration. A modulation code may be used to enable synchronization of the data clock from the data itself by excluding the transmission of certain specific subsequences of symbols for which it is difficult to synchronize. A long run of zeros may be such a difficult subsequence. A modulation code can be used to control or spectrally shape the transmitted power density spectrum. It can be designed to reduce peak power or to avoid specific troublesome bit sequences. This section discusses three kinds of modulation code. These are a runlength-limited code, a spectral-notch code, and a partial-response code. These various codes are similar in that they all constrain the allowable transmitted symbol subsequences. However, they are designed from different points of view, or with different restrictions imposed. 13.7.1

Runlength-Limited Codes

A common type of constrained code is a runlength-limited code. These codes are described here for on–off intensity-modulated systems for which such a code may be well suited. There are several of reasons for using such a code, as will be described. Because of various impairments such as linear or nonlinear dispersion, two closely spaced marks may be hard to resolve. This sets a lower limit on the duration of a bit interval. Let T be the width of a data pulse. Reducing the bit spacing to less than T will cause the pulses to overlap. For this reason, the data rate seems to be limited by the pulse width. However, by requiring that several spaces follow every one in the coded data, the clock rate might be increased, other things being equal, by reducing the bit spacing without reducing the pulse width. By requiring that at least two zeros follow every one and changing the bit interval to T /3 without changing the duration of a pulse, the clock rate can be tripled compared with the case in which two ones are allowed to be consecutive. For this purpose, there exists a modulation code with a code rate of onehalf satisfying the constraint that at least two zeros follow every one. Using this code, the channel bit rate is tripled, but each codebit conveys only half a databit. This means that the actual data rate is half of the tripled symbol rate, resulting in a 50% improvement in the data rate seen by the user. Another application of a runlength-limited code is to facilitate clock recovery. A long sequence of consecutive marks or consecutive spaces will make it difficult to maintain clock synchronization using only the data-modulated lightwave. For this reason, marks must not be too far apart. A typical runlength constraint is that at most k consecutive spaces can occur in the codestream.

13.7 Modulation Codes

0

1 1 0

0 1

0

0

1 0

1 0

0

0

661

0

1 0

0

0

0

( ,∞) runlength-limited code.

Figure 13.21 Truncated trellis for a 1

These two constraints can be combined. A code that is designed to ensure that two marks are always separated by at least d spaces and by at most k spaces is conventionally described as a (d , k ) runlength-limited code. A simple example is a (1,∞) runlength-limited code. This code requires at least one zero between two marks, but permits an arbitrary number of spaces between any two marks. A trellis of length 7 for this constraint is shown in Figure 13.21. The sequence of bits labeling the branches of any path through this trellis defines a binary word of length 7 satisfying the (1, ∞) constraint. The termination ensures that such binary words can be concatenated end-to-end still satisfying the constraint. There are 13 paths through the trellis. Any eight of the 13 paths can be selected to form a block code taking four databits into seven codebits, resulting in a (1, ∞) runlength-limited code with a code rate of 4/7. Such a code would have the form of a mapping such as 0000 0001

←→ ←→

1010100 0101010

1111

←→

0010100

.. .

.. .

(13.7.1)

With this code, the bit interval of duration T is replaced by a bit interval of duration T /2, thereby allowing the channel symbol rate to double without pulse overlap while keeping the duration of the pulse the same. To use the code, the datastream is broken into four-bit blocks, with each four-bit block encoded into seven codebits. The coding penalty is 4/7, but the symbol rate is doubled. The actual data rate increases by 8/7, so the user achieves a 14% improvement in the data rate by using this simple code. A larger improvement can be obtained by increasing the blocklength, but, for this runlength constraint, there is a limit on the possible improvement. This limit is called the capacity of the binary (1, ∞) constraint. To enable reliable clock recovery, a maximum may be imposed on the runlength of zeros. Another simple example is the constraint that at most three zeros may follow a one. This runlength constraint can be combined with the previous runlength constraint that at least one zero must follow every one. A (1, 3) runlength-limited code denotes a code with at least one zero and at most three zeros following every one. After every one, a zero must be transmitted. After three consecutive zeros, a one must be transmitted. A full trellis for a (1, 3) runlength-limited constraint requires four states labeled 1, 01, 001, and 000. However, by merging two frames into one, and by other manipulations not described here, a modified trellis can be constructed for this constraint as is shown in Figure 13.22. Each branch of this modified trellis is labeled x / yy, with x denoting one databit and yy denoting two codebits. With this labeled trellis, every binary datastream

662

13 Channel Codes

0/01

0

. . . 1

0/01

0/01

0/01

0/01

0/01

1/00

1/00

1/00

1/00

1/00

1/00

1/10

1/10

1/10

1/10

1/10

1/10

0/10

0/10

0/10

0/10

0/10

. . . 0/10

(, )

Figure 13.22 Trellis for a 1 3 runlength-limited stream code.

Codestream

Datastream

Datastream Codestream

Figure 13.23 Encoder and decoder for a rate-one-half stream code.

defines a unique path through the trellis, and so defines a unique codestream. In turn, every binary codestream defines a unique path through the trellis, and so defines a unique datastream. Inspection of the branch labels on the trellis shows that the decoder will require a one-bit latency in its output. With this code, the symbol rate is doubled, and the code rate is one-half, so the user data rate is unchanged. The codestream, however, now has frequent zeros even if the datastream does not. This property is useful in a non-return-to-zero inverse modulation format (cf. Figure 10.4) to create frequent level transitions, thereby ensuring robust clock recovery. The user sees the same data rate as before. The channel uses the same pulses as before, so the dispersion and the bandwidth are not changed. However, the waveform has frequent zeros, and so has frequent transitions. Rather than a block code, a runlength-constrained stream code could be used. An encoder and decoder for a stream code for the (1, 3) constrained binary channel are shown in Figure 13.23. Because this encoder is not linear over GF(2), this stream code is not a convolutional code. The two operations in these circuits that are nonlinear over GF(2) are the complement operation and the AND operation. Referring to Figure 13.23, two bits of the codestream are shifted out of the encoder for every bit of the datastream shifted in. The decoder inverts the encoder and recovers the datastream. The decoder has no feedback. This property ensures that errors do not propagate. Only one binary memory cell in the encoder is essential to form the codestream. This might be inferred from the constraint length of 1, which follows because there are 2ν states in Figure 13.22 with ν = 1. The other memory cells in the diagram provide clarity and are not essential for the encoding algorithm. For example, by following the trellis of Figure 13.22 or the logic of Figure 13.23, initialized to the node labeled 1, the datastream (left-to-right) 01100011100 . . . is encoded

13.7 Modulation Codes

663

into the codestream 1010001010101000100101 . . ., which satisfies the constraints. In turn, that codestream defines a unique path through the trellis. The decoder traces the path through the trellis defined by the codestream and reconverts that path back into the original datastream. Another example using Figure 13.22 or Figure 13.23 encodes a long string of ones. The codeword is evident in Figure 13.22 as . . . 100010001000 . . .. Even though the dataword has no transitions, the modulated waveform does have transitions. When ones are then modulated onto a pulse that is two clock intervals in duration, this will appear as . . . 1100110011001100 . . . in the modulated waveform. 13.7.2

Spectral-Notch Codes

The received signal may be impaired by certain spectral regions of the modulated waveform, either because frequencies in these regions incite certain undesirable properties or because the channel preferentially attenuates these frequencies. A spectral-notch code is a modulation code that has little or no power in a specified frequency region. When this spectral notch is at zero frequency, the code is called a DC-balanced code or a DC-free code. Such a code is designed to ensure a nearly equal number of ones and zeros over any moderate running length. The design constraint on the code, which is whimsically called a charge constraint, is described by a running up–down count of zeros and ones in a codeword. A virtual counter is incremented up for each one and decremented down for each zero. The code must be designed so that every codeword or concatenation of codewords keeps the counter within µq for some specified q. For modest values of q, a modulation code with little reduction in the data rate that has a strong spectral notch at zero frequency can be designed. A spectral-notch code is used in intensity-modulated lightwave systems because lower frequencies in the modulation correspond to slowly varying amplitude. This can cause gain variations in a lightwave amplifier, which can affect other wavelengths or other subchannels (cf. Section 7.4.2). Reducing the low-frequency components can also suppress stimulated Brillouin scattering (cf. Section 5.2.3) because there are few frequency components within the bandwidth of this nonlinear process. 13.7.3

Partial-Response Codes

A partial-response code, introduced in Section 9.2.4, is now viewed in this chapter as a modulation code. A large class of partial-response codes can be generated from the two basic codes described in Section 9.2.4 as the duobinary code and the dicode. These are also denoted as the 1 + D and 1 − D codes, respectively, where D denotes a unit symbol delay. With this notation, other partial-response codes can be expressed as polynomials of the form p( D) = (1+ D)n (1− D )m . For example, if n = m = 1, then p( D ) = 1− D2 . This code is called a modified duobinary code. A partial-response code resembles a convolutional code, but over the real field. It can be described by a two-state trellis and demodulated by any trellis-based maximum-likelihood algorithm. A partial-response waveform demodulated using a

664

13 Channel Codes

soft-decision Viterbi algorithm is called a partial-response maximum-likelihood modulation format. A partial-response code is used to spectrally shape the power density spectrum of the transmitted waveform so as to be compatible with the transfer function of the channel, thereby allowing a higher data rate. The width of the autocorrelation function of the modulating waveform is increased, so the width of the power density spectrum of the transmitted waveform is decreased, leading to less pulse spreading (cf. (4.5.7)). The partial-response code can be regarded as a form of controlled intersymbol interference that is known and can be accommodated at the receiver.

13.8

References The general topic of channel codes for data transmission is covered in Blahut (2003), in Lin and Costello (2004), in Richardson and Urbanke (2008), in Ryan and Lin (2009), and in Blahut (2010). Bit and symbol error rates for block codes are discussed in Desset, Macq, and Vandendorpe (2004). The cutoff rate for phase-shift keying is given by Geist (1990). Partial-response codes were developed by Lender (1964) and are discussed in Kabal and Pasupathy (1975). The application of a duobinary code to a lightwave communication system is discussed in Yonenaga and Kuwano (1997). Runlength-limited codes are covered in Immink (1990) and in Widmer and Franaszek (1983). Aspects of the composite Berrou and Gallager codes are covered in Richardson and Urbanke (2008) and in Ryan and Lin (2009). Applications to lightwave systems are presented in Djordjevic, Ryan, and Vasic (2010). Coded-modulation for a lightwave communication system is discussed in Beygi, Agrell, Kahn, and Karlsson (2013). A logarithmic upper bound on the minimum distance of a Berrou code was derived in Breiling (2004). Estimates of the error floor of turbo codes caused by codewords with low Hamming weight are discussed in Garello, Pierleoni, and Benedetto (2001). The application of turbo decoding to equalization is covered in Koetter, Singer, and Tüchler (2004). Gallager codes were devised in Gallager (1962). The performance of Gallager codes for an additive white gaussian noise channel is discussed in Richardson (2003), in Costello (2009), and in Butler and Siegel (2014). The rate-one-half Gallager code used for Figure 13.17 is from Hamkins (2011).

13.9

Historical Notes A review of the history of channel coding is given in Forney and Costello (2007). The Reed–Solomon codes, introduced in Reed and Solomon (1960), popularized the use of Galois fields (named for É. Galois (1811–1832)) in coding theory. Forney (1966) showed that the use of an inner code and an outer code can reduce complexity. The barrier of the cutoff rate was first explained by Jacobs and Berlekamp (1967) by noting that the waiting time for a sequential decoder to decode one symbol is governed by a Pareto probability density function. Viterbi (1967) introduced an algorithm to explain

13.10 Problems

665

maximum-likelihood decoding. The patent for turbo codes was filed in 1992 by Berrou (see Berrou (1995)), and granted in 1995. The topic first appears in the open literature in Berrou, Glavieux, and Thitimajshima (1993). This work broke the long-standing barrier of the cutoff rate R 0, and led to research on iterative methods of decoding. Low-density parity-check codes were rediscovered by MacKay and Neal (1996) as candidates for iterative decoding, having been devised earlier by Gallager (1962), but ignored at that time. The Tanner graph was originally introduced by Tanner (1981) as a method of constructing large codes. Message passing on a graph under the term belief propagation was originally proposed by Pearl (1982) for general problems of statistical inference. Polar codes (Arikan, 2009) are another class of codes that are decodable at rates above the cutoff rate by the method of successive cancellation. Trellis-coded modulation was introduced by Ungerboeck (1982). The method of developing a trellis for a code from a trellis for a runlength constraint using the methods of dynamical systems theory is due to Adler, Coppersmith, and Hassner (1983).

13.10

Problems 1 Binary linear codes Let C be a binary linear code and let C be a code derived by taking the complement of all codewords in C . (a) Show that C C if and only if the all-ones codeword 1 1 is in the code C . (b) Is C a linear code? (c) Is the union C C a linear code?

=

( ,..., )



2 Code rate for a Hamming code (a) A binary Hamming code is a linear code with a minimum distance 3 and a blocklength n 2m 1 for some integer m. By mimicking the construction of the (7,4,3) Hamming code, describe the construction of the Hamming code with n 2m 1 for m 3. (b) Determine an expression that relates n and k for a Hamming code. (c) Show that for large n the code rate of these codes approaches one.

=

=





>

3 Orthogonality and linear dependence Give three nonzero vectors in the vector space GF(4)8 that are pairwise orthogonal but are not linearly independent. Can this happen in the real vector space R8 . Why? 4 Relationship between euclidean distance and Hamming distance (a) Show that, for codebit energy E c equal to 1, the relationship between the minimum Hamming distance dmin and the minimum euclidean distance dmin for binary phase-shift keying is given by

dmin

= 2dmin .

2 and the Hamming (b) The expression relating the squared euclidean distance dmin distance dmin is given by (13.2.20). For E c equal to one it is 2 dmin

= 4dmin .

666

13 Channel Codes

The left side of the second equation is the square of the left side of the first equation. However, the right side of the second equation is not the square of the right side of the first equation. Why? 5 Relationship between euclidean and Hamming distance for quadrature phase-shift keying 2 The expression dmin 4dmin E c derived in (13.2.20) relates the Hamming distance to the euclidean distance for binary phase-shift keying. Discuss the conditions under which this expression can be applied to quadrature phase-shift keying. Provide a specific counterexample for which this expression is not valid.

=

6 Coding gain for a repetition code (a) Determine the probability of a block error pe for an uncoded sequence of three bits each with an energy E b and independent bit error . (b) Determine the probability of a block error pe for hard-decision decoding using a (3,1,3) repetition code. (c) Determine the value of E b N0 for which the probability of a block error for an uncoded block is equal to the probability of a block error for a coded block. (d) For values of E b N0 larger than the value determined in part (c), is there a coding gain? Explain. (e) Show that the hard-decision coding gain of any (n,1,n) code is negative for a large value of E b N0 . Explain why.

ρ

/

/

/

7 Asymptotic coding gain for a Hamming code Determine the asymptotic hard-decision coding gain of an n k d 2m 1 2m 1 m 3 Hamming code in terms of the number of check bits n k. Determine the number of check bits required for a coding gain of 2 dB.

( , , )=( − , − −

− ,)

8 Pareto random variable Sequential decoding is governed by a Pareto random variable. (a) Show that if y is a random variable with an exponential probability density function given by

Æ λe−λy f y ( y) = 0

y y

≥0 < 0,

= e y has a Pareto probability density function given Æ λx −(λ+1) x ≥ 1 f x (x ) = 0 x < 1.

then the random variable x by

(b) Derive the mean and the variance of the Pareto probability density function. 9 Check matrix A check matrix for a linear block code is not unique. Any row of H can be added to any other row to give a new check matrix for the same code. Why?

667

13.10 Problems

10 Tanner graphs (a) Construct the Tanner graph of the (7,4,3) Hamming code described by the check matrix

⎡1 H=⎣ 1 0

1 0 1

0 1 1

1 1 1

1 0 0

0 0 1 0 0 1

⎤ ⎦.

(b) Is there an equivalent check matrix with a different Tanner graph? 11 Counting soldiers Soldiers in a long straight line can communicate only with immediate neighbors. At a predetermined start time, each soldier with only a single neighbor says “one” to that neighbor. Each soldier upon hearing “n” from a neighbor says “n 1” to the other neighbor. (a) Show that eventually every soldier knows the total number of soldiers in the line. (b) Generalize to soldiers in two perpendicular lines in the form of a “T.” (c) Generalize to any number of intersecting lines with no loops. (d) Can this method be generalized to count soldiers in intersecting straight lines that form loops?

+

12 Prior probabilities A prior p p0 p1 of the form p 1 0 or p 0 1 on any bit implies that the transmitted bit is not random. That bit is known. Prove that, for any bits with such prior probabilities, the marginalized posterior probability given in (13.5.3) will be the same as the prior probability. Therefore, known bits remain known.

=( , )

=( , )

=( , )

13 The cutoff rate and capacity for phase-shift keying Using the large-signal approximation for the capacity of the phase-shift-keyed information channel given in (14.3.15), repeated here as C

≈ 21 log(4πE /eN 0),

and the large-argument expansion for the modified Bessel function I0(x ) of the first kind of order zero, which is given by I 0( x ) ≈

√ 1 ex , 2π x

do the following. (a) Show that, for the same value of E / N0 , the offset in the rate between the curve for the capacity and the curve for the cutoff rate shown in Figure 13.9 approaches the constant value 21 log2 (4/e) ≈ 0.28. (b) Show that, for the same rate, the offset in E / N0 between the curve for capacity and the curve for the cutoff rate approaches the constant value 4/e ≈ 1. 68 dB.

668

13 Channel Codes

14 Probability that a binary codeword has an even number of ones The ith bit of a set of M independent random bits is a one with probability pi . Show that the probability of there being an even number of ones is given by the Gallager induction

p M (even ) = which is (13.5.25). (Hint: for M

1 2

+ 21

M ¿

i =1

(1 − 2pi ),

= 1, p (even) = 1 − pi .) M

14

The Information Capacity of a Lightwave Channel

The characteristics and structure of the lightwave communication channel, which topics comprise the subject of the book up to this point, now will be revisited at a deeper and more abstract level in the three final chapters of the book. These three concluding chapters discuss the transmission of information, the quantum nature of lightwaves, the relationship between these topics, and the primary role of the physical concept of energy in the abstract concept of information capacity. We begin in this chapter with a formal study of the information capacity of a lightwave channel. The purpose of a lightwave channel is to convey information. For this reason, the theoretical determination of the maximum information rate that can be conveyed by a given channel – called the channel capacity – provides a benchmark against which a specific communication system can be judged. This benchmark depends on the physical model of that lightwave channel, including its supporting components as well as all relevant details such as the form of the noise, the nature of the nonlinearities, and the constraints on the channel inputs. For the purposes of this chapter, the information channel is defined as a black box that contains all lightwave and electronic components comprising the fixed attributes of the information channel including amplifiers, mixers, detectors, and so forth. Any additional components at the input or the output that are deemed subject to change to obtain the desired information rate are not in the information channel. Those flexible components are in the encoder and decoder. The information channel is regarded as a black box described only by the probabilistic dependence between its input and output. The channel capacity is a statement regarding the maximum rate at which the encoder and decoder can transmit data reliably through that channel in the form specified. The capacity of an information channel describes the maximum reliable information rate, but does not describe the coded-modulation format required to achieve that maximum rate. The capacity suggests the optimum modulation format without identifying it. If, however, the system is required to use a specific modulation format, then that requirement is regarded as a fixed attribute of the information channel, and the capacity changes accordingly, but cannot increase. Reliable transmission requires that, for all practical purposes, the reconstructed datastream out of the decoder is error-free. The capacity of a channel is a specific number with units of bits per symbol or bits per second. Because the capacity refers to the fixed information channel, it places no restrictions on the method of coding or on the complexity of the encoder and decoder. Therefore, to determine the channel capacity from

670

14 The Information Capacity of a Lightwave Channel

the operational point of view would require a statement describing optimal encoders and decoders. This is not a feasible procedure because we do not know how to describe or build optimal encoders and decoders. The information-theoretic statement of the capacity of an information channel is developed by an indirect approach using statistical methods. The information-theoretic capacity determines the actual capacity of the channel only to the extent that the mathematical model of the information channel is an accurate description of the actual channel as it appears from the input and output. One purpose of the information-theoretic channel capacity is to establish a benchmark against which any proposed encoder and decoder, or modem, is to be judged. A second purpose is to discern some of the desirable structural considerations within an encoder and decoder as well as the structure of the code needed to transmit at a code rate approaching the information-theoretic channel capacity. The goal of achieving communication rates approaching the channel capacity but staying within practical cost or complexity constraints has indeed guided the evolution of modern telecommunication systems. The information-theoretic capacity is defined for each instance of an information channel and encompasses whatever is included in that channel. The distinction between the information channel and other internal notions of a channel given in Figure 1.4 is here repeated as Figure 14.1 in a simplified form. An information channel is described by a set of symbols or waveforms at its input, a set of symbols or waveforms at its output, and a probabilistic relationship between the outputs and the inputs. This definition depends on how the fixed attributes of a communication system are conceptually segregated into the information channel, and the flexible attributes segregated into the encoder and decoder, which may include part or all of the modulation process. The structure of the user data entering the encoder is changed in many ways while passing through the communication system, and individual databits are represented differently and diffusely in the various internal stages shown in Figure 14.1. For an encoder that achieves a high information rate, close to the capacity, an individual bit is deeply

Encoder Input user data

Transmitter

Channel

Passband Baseband modulation modulation

Fiber channel

Codeword

Baseband signal

Modulated lightwave signal

Receiver Photodetection/ demodulation Detection

Received lightwave signal

Lightwave channel Electrical channel Information channel (discrete or continuous) Figure 14.1 Communication channels.

Electrical signal

Decoder

Detected codeword

Output user data

14.1 Entropy, Mutual Information, and Channel Capacity

671

buried in the structure of the lightwave signal, and is impaired in many ways while passing through the system. Nevertheless, the reconstructed user data out of the decoder should be an error-free reproduction of the data at the input to the encoder. Errors in the output sequence to the user, shown at the right side of Figure 14.1, should be quite rare. Once the partitioning that defines the information channel has been specified, the capacity is defined for that fixed information channel. Because the information channel is an abstraction of the actual channel, it also depends on the attributes of the physical channel that are used to model the information channel. Lightwave systems are unique because, for the same physical channel, the information channel defined using wave optics, photon optics, or quantum optics is different because each signal model has a different set of admissible signal distributions that reflect the different physical attributes incorporated into that signal model. To the extent that the signal model used for the information channel captures the available physical attributes of the actual lightwave channel, the information-theoretic channel capacity is a mathematical statement of the maximum information rate that the operational channel can support. If the information channel were to change, then the information-theoretic channel capacity would change accordingly. The discussion of channel capacity follows from the fundamental concepts of entropy and mutual information defined in Section 14.1 along with the properties of random photon-optics signals and random wave-optics signals developed in Chapter 6. For a memoryless channel, the capacity, though it pertains only to large blocks of data, can be calculated from the channel response to a single letter. The single-letter capacity of a single discrete information channel, corresponding to a single symbol in one degree of signaling freedom in time and space, is first derived herein using photon optics. The limit of the photon-optics capacity as the discrete energy approaches a continuous quantity is one way to define the capacity for a continuous information channel based on wave optics. This limit agrees with the statement of the capacity derived directly from wave optics and provides a formal connection between the capacity of a channel calculated using a discrete alphabet based on photon optics and the capacity of the same channel using a continuous alphabet based on wave optics. The wave-optics capacity is then applied to determine the capacity of a continuous information channel that supports multiple degrees of freedom. In space, this is a multiinput multi-output channel. In time, this is a bandlimited waveform channel. Finally, to end the chapter, the capacity of a nonlinear interference-limited lightwave channel is studied. The calculation of the capacity of this nonlinear channel poses several challenges, not the least of which is constructing an appropriate discrete-time electrical channel model from a nonlinear lightwave waveform channel.

14.1

Entropy, Mutual Information, and Channel Capacity The information-theoretic channel capacity can be formulated for most practical, wellbehaved, and well-defined channels. This formulation is best introduced for a discrete memoryless information channel that has a finite set of input symbols and a finite set of output symbols.

672

14 The Information Capacity of a Lightwave Channel

User data

Encoder

Black box Q

Decoder

User data

Figure 14.2 An input/output depiction of a communication channel. The channel transition matrix

Q describes the fixed attributes of the channel. The encoder/decoder describes the flexible attributes of the information channel.

14.1.1

Types of Information Channels

A memoryless channel is a channel model for which each channel output symbol depends only on the corresponding channel input symbol, and not on other input symbols. The dependence is described by a conditional probability distribution, called the transition probability, which usually remains the same for successive input symbols. This is the stationarity property of a memoryless channel. The set of all conditional probabilities between the finite set of input symbols and the finite set of output symbols is called the channel transition matrix Q from the channel input probability vector p to the channel output probability vector q. This matrix fully describes the fixed attributes of the discrete memoryless information channel. It is shown in Figure 14.2. A discrete information channel is described by such a channel transition matrix with rows that are probability mass functions conditioned by the channel input. The dimensions of the matrix depend on the channel input alphabet and the channel output alphabet, which need not be equal. A channel transition matrix whose channel output alphabet is the same as the channel input alphabet describes a hard-decision memoryless information channel. A continuous information channel is described by a channel transition matrix Q with a continuous output alphabet and a discrete or continuous input alphabet. Accordingly, the transitions are described by conditional probability density functions instead of conditional probability mass functions. A channel transition matrix describes a soft-decision information channel with a continuous channel input alphabet or a discrete channel input alphabet such as a signal constellation. A waveform information channel has a continuous waveform as its input and a continuous waveform as its output. For a bandlimited random process, a waveform information channel is converted into a continuous information channel using the Nyquist–Shannon sampling theorem, and into a discrete channel by quantization of the channel input and output.

The Prior

Randomness always exists at the input to a channel because messages are random. Information is best conveyed when this randomness is in a specific channel-dependent form. The randomness on the input alphabet is quantified by the probability distribution on that input alphabet, called the prior probability distribution or simply the prior, and denoted p(s).1 The prior is imposed by the encoder. The form of the optimal prior depends on 1 As in earlier chapters, the subscript denoting the functional form of the probability distribution is

commonly suppressed for brevity, with the form implicitly conveyed by the argument of the function.

14.1 Entropy, Mutual Information, and Channel Capacity

673

the form of the information channel. For a discrete memoryless information channel, the channel input alphabet is discrete, and so the prior is a product distribution. Each factor of the product is an identical probability mass function. For a continuous information channel, the channel input alphabet is continuous, and so the prior is a probability density function. The ideal prior p(s) derived in this chapter is based on the information-theoretic channel model. Operationally, this prior becomes a required property of an optimal channel code. This information-theoretic prior p(s) for a memoryless channel will be the ideal relative frequency of the input symbols of any optimal code for that channel.

14.1.2

Entropy

Any discrete probability distribution p(s) is summarized by a single real number defined in (6.1.1) as the entropy

.

H (s) = −±log p(s)²

=−

± s

p(s)log p(s),

(14.1.1)

where the logarithm base determines the units.2 For a continuous information channel, the differential entropy (cf. (6.1.2)) is used. The entropy is a measure of uncertainty. The entropy summarizes the statistical uncertainty by a single real number in a manner analogous to summarizing a vector by its magnitude or summarizing a matrix by its determinant. The entropy is a nonnegative real number with its values between zero and log M, where M is the size of the alphabet on which the prior probability distribution p(s ) is defined. The entropy equals zero if and only if one component of p(s) is equal to one. The entropy equals log M if and only if all components of p(s) are the same so the distribution is uniform. The combined uncertainty of the channel input and the channel output taken together is quantified by the joint entropy

.

H (s, r ) = −±log p(s, r)²

=−

±± s

r

p(s, r)log p(s , r ).

(14.1.2)

For a given prior p with components p(s), the uncertainty introduced by the channel is quantified by the conditional entropy

. ± p(s) H (r|s = s) s ± ± = − p(s) p(r|s)log p(r|s)

H (r| s) =

s

(14.1.3a)

r

2 The Boltzmann scaling constant of k used for physical entropy in Chapter 6 is not used for information (or

Shannon) entropy. The information in units of bits uses log2 x = loge x / log e 2. The information in units of nats uses a base e logarithm. The conversion is 1 bit equals 1 / loge 2 nats. Also note that (loge 2)( log2 e ) = 1.

674

14 The Information Capacity of a Lightwave Channel

=−

±± s

r

p(s, r )log p(r |s)

= −±log p(r|s )²,

(14.1.3b)

where the expectation is taken over the joint probability distribution p(s, r ) and (2.2.8) has been used to write p(s) p(r |s) = p(s, r ), with p(r| s) equal to the conditional probability distribution at the output of the information channel. The definition of the conditional entropy leads to the chain rule for the entropy, H (s , r ) = H (s) + H (r |s) = H (r) + H (s| r),

(14.1.4)

which is asked for as an end-of-chapter exercise. For independent variables, p(s, r) = p(s) p(r) and H (s , r ) = H (s ) + H (r ). More generally, for a block of independent random variables, (s1 , s 2, . . . , sn ), this becomes H (s 1, s2, . . . , sn ) = H (s1) + H (s2) + · · · + H (s n ). This additivity property of the entropy states that the entropy of a block of independent random variables is the sum of their entropies. 14.1.3

Mutual Information

The difference between the entropy of the input to an information channel and the corresponding conditional entropy of the input to that information channel given the output defines the mutual information

.

I (s ; r) = H (s) − H (s|r ). Rewriting (14.1.4) as gives

(14.1.5a)

H (s) − H (s|r ) = H (r ) − H (r| s)

.

I (s; r ) = H (r ) − H (r| s)

= H (r) −

± s

p(s) H (r |s = s).

(14.1.5b) (14.1.5c)

The last form shows that the mutual information is the difference between the uncertainty in the average channel output H (r ) and the average uncertainty in the channel ∑ output given by s p (s) H (r|s = s). Using (14.1.3) and the properties of the logarithm, the mutual information can be expressed in several equivalent forms:

.

I (s; r) = H (s) − H (s|r ) = H (r ) − H (r|s )

² p (s, r) ³ = p(s, r)log p(s ) p(r) s ,r ³ ± ³ ² ² ± = p(s, r)log pp((ss|)r) = p(s) p(r|s )log pp((r|rs)) . s ,r s ,r ±

(14.1.6a) (14.1.6b) (14.1.6c)

The random variables s and r are independent if and only if the information channel completely randomizes the channel input so that H (r |s ) = H (r ), which implies that

14.1 Entropy, Mutual Information, and Channel Capacity

675

I (s ; r) = 0. For this worst-case channel, the channel output r provides no information about the channel input s. For a best-case channel, s and r are equal, and H (r |s) = H (r| r) = 0. For this ideal channel, the channel adds no statistical uncertainty and the information rate is limited only by the entropy H (s) of the information source at the channel input. 14.1.4

Fano Inequality

The conditional entropy H (r |s) is a measure of the remaining uncertainty in the value of r when given the value of s. When r and s both take their values in an alphabet of size M, an error occurs whenever r and s take different values. The probability that r is not equal to s and so generates a detection error is given by pe

=

± , ³=

rs r s

p(s, r ).

(14.1.7)

The uncertainty in whether or not an error occurs can be expressed in terms of the binary entropy function, which is given by Hb ( pe ) = − pe log pe − (1 − pe )log(1 − pe ),

(14.1.8)

where pe is the probability of a symbol detection error. Then, if an error did take place, there are M − 1 wrong possibilities, so the conditional uncertainty is at most log (M − 1), which occurs with probability pe . This heuristic argument asserts that the conditional entropy, or uncertainty, satisfies H (r |s) ≤ Hb( pe ) + pe log( M − 1).

(14.1.9)

This inequality3 is known as the Fano inequality. When it is convenient to do so, the two terms on the right side of the Fano inequality can be further upper-bounded without violating the inequality to write the simpler inequality H (r| s) ≤ 1 + pe log M.

(14.1.10)

The Fano inequality places no restriction on the alphabet size of r and s , so it applies as well for blocks of symbols r and s of length n. Then H (r| s) ≤ 1 + pe log M n ,

(14.1.11)

because M n is the size of the block alphabet. If each component of a block is independent and identically distributed, then the joint probability density function of a discrete memoryless information channel is a product distribution, so the conditional entropy is given by H (r| s) = n H (r |s).

(14.1.12)

3 For a formal proof, see Blahut (2020). The Fano inequality applies as well to any long message. It cannot

be circumvented by breaking a long message into many small blocks that are individually reliable. The Fano inequality then states that, statistically, enough small blocks will be wrong that its assertion still holds.

676

14 The Information Capacity of a Lightwave Channel

This expression states that the uncertainty added by the channel grows linearly in the blocklength n if the components of the block are independent and the channel is memoryless. The Fano inequality then states that the probability pe of block error for a user message of length n is lower-bounded by pe

(r|s) − 1 ≥ Hlog Mn −1 ≥ H (r|s) − n , log M

(14.1.13)

where (14.1.12) has been used. For a discrete memoryless noisy information channel, this inequality states that if the symbols in a block are independent, then the probability of a message error is bounded away from zero for large n. Therefore, reliable communication with pe arbitrarily small must incorporate symbol dependences in the form of a channel code so that the conditional entropy H (r| s) of a block is smaller than n times the conditional entropy H (r|s ) of a single symbol. Accordingly, the reliable transmission of a long message on a practical noisy channel always requires channel coding. This cannot be avoided. 14.1.5

Channel Capacity

The information-theoretic channel capacity C is defined as the maximum value of the mutual information I (s; r ) over all admissible prior probability distributions C

= max I (s; r), p(s)

(14.1.14)

in units of bits per symbol with the mutual information expressed in units of bits. The maximum in (14.1.14) is constrained to the set of probability distributions on the channel input alphabet. The definition of the channel capacity can be best understood for a discrete memoryless information channel as shown in Figure 14.2. For this channel, both the channel input alphabet and the channel output alphabet are discrete, with the elements of each alphabet indexed by integers. However, the concept of capacity is quite general and also applies to continuous information channels as well as to waveform information channels. For a bandlimited waveform information channel with additive gaussian noise that transmits a symbol every T seconds, the capacity also is expressed in units of bits per second by referring to C = C/ T as the bandlimited capacity in units of bits/second, where, for contrast, C is now called the single-letter capacity in units of bits per symbol. Two fundamental theorems due to Shannon pertain to data transmission. Shannon’s first coding theorem or the noiseless source coding theorem pertains to the source.4 It states that a memoryless information source with a probability distribution p(s) and an entropy H (s) can be block-encoded into R bits per source symbol for code rate R 4 Classical communication theory places source coding with the application, and not with the channel.

14.1 Entropy, Mutual Information, and Channel Capacity

677

greater than the source entropy H (s) with the probability of a coding failure arbitrarily small. This means that, for any R larger than H (s) and for any arbitrary positive ± , however small, there exists a source code of rate R such that the codeword error rate is less than ±. This kind of codeword error occurs when the information source output is an atypical sequence that cannot be encoded with the source code. Conversely, no source code of rate R smaller than H (s), regardless of its complexity, can achieve an arbitrarily small codeword error rate. A consequence of Shannon’s first coding theorem is that, for any noiseless channel, the channel capacity need not be larger than the entropy of the information source that is the input to the channel. Source coding is not studied in this book. Shannon’s second coding theorem, called the noisy channel coding theorem, applies to information channels. It states that, for any information rate R in bits per second less than the bandlimited capacity C in bits per second, and for any arbitrary positive ± , however small, there exists a code for that channel such that the codeword error rate is less than ±. Conversely, no channel code of rate R larger than C , regardless of its complexity, can achieve an arbitrarily small codeword error rate. The noisy channel coding theorem then states that an information rate of R bits per second can be conveyed by a channel if and only if the data rate R in bits per second is smaller than the bandlimited capacity C of the channel in bits per second. To remove bandwidth from the discussion, the noisy channel coding theorem is usually normalized to a single-letter restatement of the theorem. A code exists if and only if the code rate R c in bits per symbol is less than the single-letter capacity C in bits per symbol. The noisy channel coding theorem is not proved in this book. The capacity defined in (14.1.14) is an optimization over the probability distribution comprising the memoryless prior on a single channel input symbol. However, as implied in (14.1.13), deterministic dependences distributed across multiple symbols must be introduced into a codeword in order to communicate reliably. To reconcile these seemingly conflicting statements, observe that the dependences built into a properly defined code become relevant only for a long segment of the code. They are not meaningful at the level of a single symbol or a short subsequence of symbols. To quantify this argument, note that, even though an encoder is deterministic and will produce the same codeword whenever it is presented with the same dataword, the dataword itself is random and thus the distribution of codewords is random as well. Therefore, the i th component of the random codeword is random. The probability distribution of the ith component of the encoded block input is determined by summing out all other components j ³ = i of the distribution p(s) of the random codeword s. This computation is called the marginalization to the i th component (cf. Sections 11.3.4 and 13.5), and is symbolically written here as (cf. (11.3.16b)) p(si

= a² ) =

±± s0

s1

···

± ± ± · · · p(s), si −1

si +1

sn −1

(14.1.15)

where a² is an element of the input alphabet, and where p(s) is zero unless s is a codeword, and p(s) is equal to 1/ M for equiprobable codewords. The notation is understood

678

14 The Information Capacity of a Lightwave Channel

to mean the sum of p(s ) for all codewords s that have the value a² in the i th component. For a good code with a code rate close to the capacity, this marginalized single-letter distribution p(si ) for every i will closely resemble the single-letter prior p that achieves the capacity. Seen at the level of any single component of the random codeword, the code symbol s is a random variable with a probability distribution that is essentially equal to the prior p. Furthermore, seen at the level of a short subblock, the joint probability distribution for the symbols within that subblock is essentially a product distribution of this same prior. This is consistent with the fact that the capacity can be determined by optimizing over the probability distribution for a single component of the codeword without regard to the specific dependences that are incorporated into the code.

Capacity for Finite-Length Codewords

The probability of decoding error for a good code approaches zero asymptotically as the codeword or message size becomes arbitrarily large. This requires widespread internal dependences within the codeword that are not visible at the local level by viewing short segments of the codeword. For a code of finite blocklength and rate R, there is a limit on how small the probability of an error can be because the distribution of coded dependences is limited by the blocklength. The relationship between the information rate R, the probability of a codeword error pe , and the finite blocklength n for an optimum code is a difficult problem that is not considered in this book.5 The general form of the relationship between the code rate of a good code and the probability of an error is shown notionally in Figure 14.3. When viewing this figure, note that, because a long message requires many codewords of a short code and only a few codewords of a long code, a long codeword would be preferred when possible. For any good code, the information rate R, in bits per second, can never be larger than the bandlimited capacity C . Any code with a rate larger than C is certain to make enough errors that the received information rate is actually smaller than C . For good codes, all with the same information rate R less than the capacity C , the probability pe of a

pe

Increasing codeword length n

Infinite codeword length

Rate Figure 14.3 Notional relationship between the probability pe of a codeword error for an optimum

code and the data transmission rate R as the codeword length n increases. The limiting value as n goes to infinity is defined as the channel capacity C , with pe being arbitrary small for increasing blocklengths for rates less than C .

5 See Polyanskiy, Poor, and Verdu (2010).

14.1 Entropy, Mutual Information, and Channel Capacity

679

codeword error is, asymptotically, an exponentially decreasing function of the codeword blocklength. Accordingly, for a good code with sufficiently long codewords that achieves a satisfactory value for pe , the information rate can nearly achieve the channel capacity C . 14.1.6

Signal and Channel Models

The information-theoretic notion of channel capacity requires an appropriate probabilistic description of the information channel as expressed by a channel probability transition matrix Q (cf. Figure 14.2). Photon-optics and wave-optics models may lead to different statements of the channel capacity because the transition probabilities are specified differently. These alternative statements of the capacity are unified herein. The chapter first develops the single-letter capacity of a discrete information channel based on the discrete-energy photon-optics signal model. This photon-optics model applies both to weak signals and to strong signals. Accordingly, the capacity of the photonoptics model in the large-signal regime will be obtained by replacing the photon count with the continuous energy without invalidating the capacity expression. The single-letter capacity of a continuous information channel based on the waveoptics model is then derived directly and shown to be the same capacity as the photon-optics capacity in the large-signal regime. It will also be shown that in the smallsignal regime, the single-letter capacity using wave optics is less than the single-letter capacity using photon optics. This difference is attributed to the discrete-energy property of a lightwave signal, which is most pronounced in a small-signal regime and is not incorporated into the wave-optics model. The channel capacity of an information channel that supports multiple independent subchannels is obtained simply by summing the single-letter capacities for the independent subchannels that the channel supports. These subchannels may be defined in space, time, and polarization. Finally, the bandlimited capacity for a continuoustime waveform information channel is obtained by letting the sampling interval go to zero with the power and bandwidth held constant. This capacity is given later in the chapter, first for an ideal rectangular bandwidth, then for the general case. The single-letter capacity expressions derived for photon optics and wave optics are based on the transition probabilities from the channel input to the channel output. The information channel defined using either set of transition probabilities is herein called a classical information channel. In contrast to a classical information channel, a quantum-lightwave information channel cannot be described solely in terms of the transition probabilities. The transition probabilities of a classical information channel are replaced by an extension of a probability density function called a density matrix. This extension, which is discussed in detail in Chapter 15, incorporates the unique uncertainties and dependences that are admitted within quantum optics. The classical information capacity, expressed in bits, for a quantum-lightwave information channel is discussed in Chapter 16, where it arises naturally in the context of quantum optics.

680

14 The Information Capacity of a Lightwave Channel

In this chapter, an intermediate semiclassical approach is adopted, which is based on a composite probability distribution that substitutes for the density-matrix formalism of quantum optics. This composite probability distribution – based on the Poisson transform and specific to the photon-optics signal model – is defined so that it later conforms to the rules of quantum optics as discussed in Chapter 15. Thus, in the context of this chapter, the Poisson transform may be seen as a proxy for the density matrix.

14.2

Photon-Optics Capacity Two semiclassical photon-optics channels are described. In each case, the channel input is a passband or complex baseband waveform and the channel output is a photon stream. In each case, the instantaneous photon signal arrival rate is proportional to the instantaneous waveform power. Both waveform channels have additive gaussian noise – one has discrete-time gaussian noise and the other has continuous-time gaussian noise. Both channels are constrained in terms of the average power at the receiver. Both channels have time-varying power – one confined to be constant on discrete-time intervals, and the other confined to a frequency band. Just as the additive white gaussian noise model is the archetypical continuous waveoptics channel, so too is the additive white gaussian noise model the archetypical continuous-time photon-optics channel. It is the worst-case source of continuous noise satisfying the mean constraint. The consequences of this archetype will be substantial. The continuous-time photon-optics channel has a continuous-time bandlimited waveform at the channel input and a discrete photon stream at the channel output characterized by a time-varying mean photon arrival rate R(t ). The study of the bandlimited continuous-time channel will be restricted to the study of the discrete-time photon-optics channel. The discrete-time photon-optics channel has a constant-amplitude input signal on intervals of duration T . The amplitude varies from interval to interval. The constraint of a constant amplitude during each interval of duration T implies that the mean signal arrival rate R and the mean number of signal counts E are simply scaled by T and have the same statistics. The discrete-time photon-optics channel transmits a modulated waveform whose amplitude is a random complex value on each time interval. This random complex value x with prior p(x) is the input to the channel. Two forms of randomness ensue. First, the channel imposes additive circularly symmetric complex gaussian noise z on the signal.6 The squared magnitude | x + z|2 then produces a modulated Poisson stream in that interval, with the arrival rate determined by the squared magnitude of the noisy complex channel symbol. The capacity of this channel is the maximum theoretically possible information rate that can be conveyed by this channel. The discrete-time photon-optics channel has a constant complex input waveform on time intervals of duration T . In each interval, the lightwave signal energy E is a 6 In this section, the complex signal amplitude s is replaced by x and the complex noise amplitude n replaced

by z to avoid confusion with the discrete quantities s and n, which are expressed in terms of photon counts.

14.2 Photon-Optics Capacity

681

random variable characterized by the prior probability distribution f ( E ) imposed by the encoder. The prior on x controls the mean number of signal photons 7 E collected in each interval. The probability mass function p(s) for the random number of signal photons s is the Poisson transform of f (E). During each interval, the input to the channel is an independent realization of a random complex amplitude x. The prior on the complex channel input is the bivariate probability density function f (x). Any nonnegative real number is an allowed value of the signal energy in an interval, but the prior is constrained to have a mean energy not larger than E = ± E ². The energy E of each realization of the complex input x produces at the channel output a Poisson distribution for the number of received signal photons s in that interval for that realization. Because the energy E is random, the receiver sees a conditional Poisson distribution in each independent signaling interval of duration T . To compute the channel capacity, the optimal prior f ( E ) on the input energy in an interval must be determined indirectly in terms of the optimal prior on the input complex amplitude f (x). It is not enough to specify the prior directly on the energy. This is because the complex gaussian noise z is added, not to the energy, but to the complex signal amplitude. Moreover, because the prior probability density function f (E ) on the signal energy and the probability mass function p(s) on the photon number are related by a Poisson transform (cf. (6.3.2)), the relationship between the distribution on the energy f ( E ) and the distribution on the complex amplitude f (x) couples the wave-optics signal model and the photon-optics signal model (cf. Figure 6.5). Given this relationship, an appropriate discrete input distribution p(s ) on the photon number does lead to the appropriate continuous distribution on the energy, and so to the complex amplitude. The optimal prior p (E ) on the continuous random input energy E will be found to be an exponential probability density function and the optimal prior p(x) on the input complex amplitude x will be found to be a circularly symmetric gaussian distribution. It will be seen that the optimal prior for the energy implies that the corresponding probability distribution for the input complex amplitude must be a bivariate circularly symmetric gaussian distribution because the energy must be the sum of two squared terms, not one (cf. (6.2.3)). Under the optimal prior for the continuous energy at the channel input, the posterior probability distribution at the channel output, which is a probability mass function p(s) on the number of received signal photons, is the Gordon distribution (cf. Section 6.3.4). This is a composite probability distribution composed of the statistical uncertainty in the continuous energy E at the input to the channel (described by an exponential distribution), the additive gaussian noise, and the quantum uncertainty expressed as photon noise8 (described by a conditional Poisson distribution (cf. Section 6.3.3)). Because these forms of uncertainty have different properties, the derivation of the photon-optics capacity must respect the differences in these forms of uncertainty. 7 Note carefully the distinction between the mean number of signal photons E and the mean lightwave signal

energy E

= Eh f .

8 The randomness described by photon noise is a form of quantum uncertainty. In this chapter, it is simply

described as a random photon stream, which is a form of statistical uncertainty.

682

14 The Information Capacity of a Lightwave Channel

This analysis is rigorous for a memoryless channel. For this case, the wave-optics signal model at the channel input in each time interval is a pure sinusoid of arbitrary complex amplitude. Although phase cannot be detected in the received photon stream, a reason will soon emerge for the phase being uniformly distributed at the transmitter. Remarkably, when there is maximally random noise in the lightwave incident to the photodetector, an optimal wave-optics waveform must use phase at the transmitter to generate the appropriate prior for the signal energy even though phase is not recovered in the photon stream. 14.2.1

The Discrete Memoryless Photon-Optics Channel

The discrete-time memoryless photon-optics channel transmits a waveform with random energy E in a symbol interval of duration T , which is received as a random photon signal count s with the expected count conditioned on the waveform energy. Because the random transmitted energy E has a finite mean, the probability mass function p(s) for the received signal count also has a finite mean E = ±s ². For a memoryless photon-optics channel, the probability mass function p(s) in an interval does not depend on the signal in other intervals and p(s) is the same for all intervals. This means that the probability distribution on the number of received photons (s1 , s2, . . . , sn ) in each interval of a block of successive intervals is a product distribution given by ´ p(s1, s2 , . . . , sn ) = n²=1 p(s² ). The lossless, memoryless, discrete-time Poisson channel is composed of contiguous equal time intervals each of duration T with additional independent circularly symmetric noise z added to the wave and seen in the photon stream as a maximum-entropy Gordon distribution. These two descriptions of noise are interchangeable in the mathematics (cf. Figure 6.5). Let the random variable r denote the total number of received signal photons s and additive noise photons n in an interval. Then r

= s + n.

Within photon optics, the distribution of the photons in an interval of a photon stream is derived from a fundamental form of uncertainty described using a Poisson probability mass function, as is discussed below. Substituting r = s + n into (14.1.5b), the mutual information I (s; r) is I (s; r ) = H (r ) − H (r| s)

) ( = H (r) − H (s + n)|s (14.2.1) = H (s + n) − H (n), ) ( where the conditional entropy H (s + n)|s is equal to the entropy of the noise H (n)

because the signal s and the noise n are independent. Because the photon noise arrival rate is exponentially distributed, the entropy H (n) of the noise is the entropy of a maximum-entropy Gordon distribution given in (6.1.6). The capacity is defined in (14.1.14) as the maximum value of the mutual information I (s ; r ) over the choice of the prior p(s). But the noise term H (n) in (14.2.1) does not

14.2 Photon-Optics Capacity

683

depend on the prior. Therefore, I (s; r) is maximized by maximizing H (r ), which is maximized by the choice of p(r) under the constraint that the prior is compatible with an admissible channel input distribution. For a discrete random variable r with a finite mean, the maximum entropy distribution p(r) is also a Gordon distribution (cf. Section 6.3.4). Therefore, to achieve the channel capacity, p(s) should be chosen, if possible, so that the received probability mass function p(s+n) is a maximum-entropy Gordon distribution with the expected number of received photons Y equal to E + N 0. The energy distribution f ( E ) at the channel input required to produce this discrete distribution is determined through the inverse Poisson transform (cf. (6.3.6)). The complex-amplitude distribution f (x) required to achieve the exponential energy distribution f ( E ) will be seen to be a circularly symmetric gaussian density. This is because the complex gaussian signal and the additive complex gaussian noise combine to produce a complex gaussian random variable with an exponential probability density function for the energy. This does indeed lead to the desired Gordon distribution by means of the Poisson transform (cf. (6.3.11)). The resulting entropy at the channel output is given in (6.1.6) and repeated here as H (r) = (1 + Y)log (1 + Y) − Y log Y where

= g(Y),

(14.2.2)

.

(14.2.3)

g(u) = (1 + u )log(1 + u) − u log u is the Gordon function with the log base either 2 or e.

Statistical and Quantum Uncertainty

Within photon optics, the Gordon distribution is regarded as a composite probability distribution for the total uncertainty. It is the Poisson transform of a maximumentropy exponential distribution. The Gordon distribution may be viewed as having two parts. The constituent Poisson part of the Gordon distribution describes the quantum uncertainty. The constituent exponential part of the Gordon distribution describes the statistical uncertainty. Each form of uncertainty has distinct consequences, and both the signal and the noise have both forms of uncertainty. Statistical uncertainty occurs within photon optics because the mean number of photons over an interval T is, itself, a random variable. For the signal, it is described by a prior on the channel input. Statistical uncertainty arises within wave optics because the signal energy E , or the complex amplitude x, over an interval T is a random variable described by a prior f ( E ), or f (x), respectively. The relationship between f (E ) and f (x) is discussed in Section 6.2.1. Quantum uncertainty is expressed as a fundamental underlying Poisson process. This form of uncertainty was called photon noise in Chapter 6 and was analyzed as such. In Chapter 15, photon noise will be associated with a general form of quantum uncertainty, distinct from statistical uncertainty. This form of uncertainty occurs during a measurement operation on a conventional lightwave.

684

14 The Information Capacity of a Lightwave Channel

The calculation of the photon-optics channel capacity uses the framework of the Poisson transform derived in Section 6.3, with the Poisson transform expressing the composite probability distribution incorporating both statistical uncertainty and quantum uncertainty. Because the Gordon distribution reduces to a Poisson distribution (cf. (6.3.9)) when there is no statistical uncertainty, the constituent Poisson part of the Gordon distribution, which describes the quantum uncertainty, is regarded as fundamental because it is always present within a photon-optics signal model. Distinguishing between these two forms of uncertainty, the photon-optics channel can be described by a classical Gordon distribution that incorporates both statistical uncertainty and quantum uncertainty.9

Derivation of the Channel Capacity

The derivation of the channel capacity for a photon-optics channel must respect the fundamental difference between quantum uncertainty and statistical uncertainty. Therefore, to derive the channel capacity within a semiclassical photon-optics signal model, the effect of the quantum uncertainty is determined first using a Poisson probability mass function. Then the effect of the statistical uncertainty is incorporated using a Poisson transform with a maximum-entropy exponential probability density function for the continuous energy for both the signal and the noise. Starting with the elementary properties of the fundamental constituent Poisson process within the photon-optics signal model, when r and n are each Poisson and r = s + n, then s must be Poisson as well (cf. (6.2.30)). When the encoder generates an input distribution for the continuous signal and noise energy that is the maximum-entropy exponential distribution, the corresponding discrete distribution for the number of transmitted photons that includes quantum uncertainty is a Gordon distribution. The expected number of transmitted signal photons E is equal to the difference between the expected number of received photons Y and the expected number of noise photons N 0 added by the channel. This is because the effect of quantum noise – expressed within photon optics as photon noise – does not change the mean value of the signal or the noise (cf. (6.3.8a)). Using Y = E + N0, the maximum entropy H (r ) at the output of the channel is H (r ) = g(Y) = g(E + N 0).

(14.2.4)

Substituting this expression and the expression for the noise entropy given in (14.2.2) into (14.2.1) gives C

= H (r) − H (n) = g(E + N0) − g(N0 ),

from which we conclude that C

= log2

²

1+

E

1 + N0

³

+ (E + N0)log2

²

1+

1

E

³

+ N0 − N0 log2

(14.2.5)

²

1+

1 N0

³

(14.2.6)

expresses the single-letter capacity C of the noisy photon-optics channel in units of bits/symbol. 9 Within quantum theory, the composite uncertainty is fully described using a density matrix formalism.

14.2 Photon-Optics Capacity

685

Expression (14.2.6) is valid for a channel with an unconstrained photon-optics arrival rate of photons with a mean signal count E and a mean noise count N 0. Both are nonnegative. The capacity states how much information, measured in bits, can be conveyed per symbol through an information channel based on photon optics in terms of the mean number of photons in a single symbol. The capacity of the channel, as for other channels, does not fully inform us regarding the kind of coded modulation that should be used to achieve this capacity. Depending on the values of E and N0, different coded-modulation formats with different priors and different methods of decoding will be appropriate to achieve the channel capacity. 14.2.2

The Continuous Photon-Optics Channel

The continuous photon-optics channel has a continuous waveform confined to a passband of bandwidth B with its input constrained to have an average power not larger than P at the receiver. Circularly symmetric white gaussian noise is added to the signal. The instantaneous arrival rate of signal photons is proportional to the instantaneous lightwave power, with the mean number of photons in any time interval of duration T proportional to the lightwave energy in that time interval. The continuous bandlimited channel is not constrained to use fixed signal intervals, but it is constrained in bandwidth. The continuous channel may be regarded as more general than the discrete-time channel, which does not have limited bandwidth. To determine the capacity of this ideal bandlimited channel rigorously, the analysis must use a complete set of orthogonal bandlimited basis functions for that channel (cf. Section 6.5) on a long signaling interval, the duration of which is then sent to infinity. This analysis is not included in this book. 14.2.3

The Ideal Photon-Optics Channel

When there are no noise photons either from thermal noise or from amplifier noise, N 0 is zero.10 Then the uncertainty in the number of photons in a symbol received over an interval of duration T is the composite of the quantum uncertainty and the statistical uncertainty in the transmitted lightwave signal. In the limit as N0 goes to zero, the term g(N 0) in (14.2.5) goes to zero and the term g(E + N0) goes to g (E), where E is the expected number of signal photons per symbol. Therefore, the capacity of an ideal photon-optics channel without additive noise is C

= g(E) = log 1 + E) + µ (¶· ¸ Cw

²

E log 1

µ

¶· Cp

1

³

+E . ¸

(14.2.7)

This is the entropy of a Gordon distribution with mean E. Even without additive noise, the capacity is limited by the expected number of signal photons E. Within quantum

λ = 1500 nm and a temperature of T0 = 290 K, the expected number of thermally generated noise photons per mode is approximately 4 × 10−15 (cf. (6.1.8)) and can usually be ignored.

10 For a typical operating wavelength of

686

14 The Information Capacity of a Lightwave Channel

10

=

w+ p

Poisson )lobmys/stib( yticapaC

1

p

10 –1 w

10 –2

10 –3

10 –3

10 –2

10 –1

1

10

102

10 3

Expected Number of Signal Photons Figure 14.4 The single-letter capacity of an additive-noise-free channel. The term Cw corresponds

to the continuous-wave property of the signal. The term C p corresponds to the discrete-particle property of the signal. Also shown is the entropy of a Poisson distribution.

optics, this limitation is caused by quantum uncertainty. Within photon optics, the quantum uncertainty is described as photon noise. The expression for C in (14.2.7) is written as the sum of the two highlighted terms: Cw and Cp , which are the contributions of the two constituent probability distributions used to form a Gordon distribution. The two terms are equal when E is equal to one, which is an expected value of one photon per symbol. For this signal level, the capacity is two bits per symbol, with each term of (14.2.7) having an equal contribution. For a large expected signal level, the term Cw dominates. This term corresponds to the part of the capacity associated with the continuous-wave nature of the lightwave signal. For a small expected signal level, the term Cp dominates. This term corresponds to the part of the capacity associated with the discrete-particle nature of the lightwave signal. These two terms as well as the total capacity are individually plotted in Figure 14.4 as a function of the expected number of photons E. The entropy of the Poisson distribution is also shown in Figure 14.4. That entropy term will be shown to be equal to the limiting form of the capacity for small signal levels.

Large-Signal Regime

The large-signal regime refers to the range of E much larger than one. Examining (14.2.7), as the expected number of signal photons E becomes large, the term Cp approaches the constant value of 1/ log e 2 bits, which can be seen in Figure 14.4. The large-signal limit for the term Cw is approximately log E. This means that as E becomes large, Cp becomes insignificant compared with Cw and C is approximately equal to log E. This is equal to the large-signal limit of the entropy of a continuous exponential probability density function (cf. Table 6.1). Therefore, in a large-signal regime, the capacity based on modeling the signal energy as a discrete quantity approaches the entropy based

14.2 Photon-Optics Capacity

687

on modeling the energy as a continuous quantity. In this way, the term Cw is regarded as that part of the capacity associated with the continuous-wave property of the lightwave signal, though not meaningful as a limit within wave optics.

Small-Signal Regime

The small-signal regime refers to the range of E much smaller than one. Using a powerseries expansion for each term gives Cw

and Cp

= log(1 + E) ≈ E

² ³ = E log 1 + 1E ≈ −E log E,

(14.2.8a)

(14.2.8b)

with the total capacity given by C

≈ E(1 − log E).

(14.2.9)

The term Cp = −E log E dominates for small values of E. This means that the capacity in a small-signal regime is appropriately described by Cp alone. Because the small-signal limit of a Gordon distribution is a Poisson distribution (cf. Section 6.3.4), the capacity Cp in a small-signal regime is simply the entropy of a Poisson distribution, as is evident in Figure 14.4. For this reason, Cp can be regarded as the part of the capacity associated with the discrete-particle property of the lightwave signal. This part of the capacity cannot be derived using a limiting operation based on continuous-wave optics. The interpretation of the two terms in (14.2.7) hints at the modulation and detection methods that should be used to achieve the capacity both in a large-signal regime and in a small-signal regime. In a large-signal regime, the continuous property of the lightwave signal is most significant, and continuous passband modulation formats and detection techniques based on wave optics are appropriate. In a small-signal regime, the discrete property of the lightwave signal is most significant, so modulation and detection methods based on the discrete-energy property of photon optics are appropriate. In principle, detection techniques based on the discrete-energy property of a lightwave are sufficient to achieve the capacity for any signal level. In practice, this discrete form of detection is difficult to achieve even for modest power levels because the rate of individual photon arrivals is large.11 Instead, the Shannon capacity can be achieved by using detection techniques based on continuous passband modulation formats.

Single-Photon Capacity The discrete-particle nature of the single-letter capacity given by the term Cp in (14.2.8b) goes to zero as E goes to zero, as one should expect. However, the capacity per detected photon is given by Cp E

≈ −logE,

(14.2.10)

11 As an example, a 1 nW lightwave signal at 1500 nm corresponds to an arrival rate of signal photons of approximately 7 5 109 photons/second.

. ×

688

14 The Information Capacity of a Lightwave Channel

and becomes unbounded as E approaches zero. This means that while the single-letter capacity, expressed in units of bits per symbol, approaches zero, the capacity expressed in units of bits per photon becomes unbounded. This can be explained by considering a channel that transmits a finite arrival rate of signal photons R = P / h f (cf. Table 6.2). As the expected number of signal photons E approaches zero, the bandwidth B must approach infinity so that the photon arrival rate R = E B remains finite. Therefore, over any finite modulation interval T , the timewidth– bandwidth product T B (cf. Section 6.5) approaches infinity and the capacity in bits per photon becomes unbounded because the channel supports an infinite number of independent signaling intervals that can convey information. Therefore, expression (14.2.10) requires an infinite bandwidth and no additive noise. The next section shows that when additive noise is considered, the capacity is finite. Moreover, Section 14.4.4 shows the bandwidth of an ideal photon-optics information channel is inherently finite.

14.2.4

The Additive-Noise-Limited Photon-Optics Channel

The inclusion of additive noise has a significant effect on the capacity both in the large-signal regime and in the small-signal regime, as derived in this section. The smallsignal regime in which the discrete-particle nature of the lightwave signal is dominant is considered first. The large-signal regime in which the continuous-wave nature of the lightwave signal is dominant is considered next. The capacity for the second case is derived by taking a limit. This limiting operation leads to an expression for the capacity that is identical to the expression derived directly using wave optics.

Capacity in a Small-Signal Regime

The capacity including additive thermally generated noise photons is given in (14.2.6). To examine this capacity in the small-signal regime, (14.2.6) is now expanded in a power series in E. This shows that the capacity including additive thermal noise in the smallsignal regime is approximately C

² ³ ≈ E log 1 + N1

≈ ≈

hf loge 2 kT0 1 E loge 2 N0

0

E

bits per symbol,

(14.2.11)

because 1 + 1/N0 = eh f / kT0 (cf. (6.1.8)), E = Eh f , and N 0 = kT0 for thermal noise. The base-2 logarithm is used in order to express the information rate in bits per symbol. For a typical operating wavelength (λ = 1500 nm) and temperature (T0 = 290 K), the number of transmitted bits per photon C/E in the small-signal regime required to achieve the capacity is approximately 48 bits/photon. This limiting value, which is linear in E / N0 , cannot be derived from wave optics and is not meaningful as a limit within

14.2 Photon-Optics Capacity

689

wave optics. It is an asymptotic statement for low-rate systems based on the discreteenergy property of a lightwave signal in the limit of the photon-optics arrival rate of signal photons going to zero. The analysis does not include other sources of noise, and also places no limit on the length of a message, on the duration of a symbol interval, or on the accuracy of measuring the signal arrival time of a photon. These considerations would have further limited the capacity were they included. Given that the discrete-energy nature of a lightwave signal dominates for small signal levels, modulation formats and detection techniques based on photon counting are appropriate. The large information capacity per photon dictates the use of photon counting for small signal levels combined with on–off-keyed, pulse-position modulation in a dispersion-equalized channel. For this modulation format, a short pulse is transmitted in one of K temporal subintervals of the modulation interval T . In a practical system, the total number of temporal subintervals K is finite and is approximately the ratio of the temporal width of the received pulse ³t to the symbol interval T . Such a system can achieve high energy efficiency only at the expense of inefficient use of bandwidth, and requires precise time resolution.

Capacity of a Binary Modulation Format in a Small-Signal Regime

This section derives the capacity of on–off-keyed modulation in a small-signal regime using a photon-counting receiver (cf. Section 9.5.2). This capacity is compared with the capacity of a binary modulation format based on wave optics as well as the capacity of an unconstrained modulation format. We will show that the capacity of a simple on–off-keyed modulation format that uses a photon-counting receiver approaches the capacity of an unconstrained modulation format in a small-signal regime. Moreover, this modulation format with a photon-counting channel achieves a higher information rate than the equivalent wave-optics modulation format in that regime. A binary photon-counting channel has an on–off-keyed signal constellation at its input, transmitting either zero energy E 0 or energy E 1 . The signal output is a discrete Poisson process, with either mean E0 or E1 , where E0 is equal to zero. A hard-decision detector maps the count to a binary decision in each interval. A soft-decision detector uses those counts as the soft-decision symbol. A binary photon-counting channel with hard-decision outputs is an example of an asymmetric information channel. It is compared with a binary symmetric channel in Figure 14.5. For a binary asymmetric channel, the optimal threshold ´ and 1−ρ

p1 ρ

p0

(a)

p0

p1

p1 |1

p 1|0

p0 |1

ρ

1−ρ

p1|1

p1

p0 |0

p 0|1

p0

(b)

p0 |0 = 1

(c)

Figure 14.5 (a) A binary symmetric channel, (b) a binary asymmetric channel, and (c) a Z channel

that models a photon-counting system with no additive noise.

690

14 The Information Capacity of a Lightwave Channel

the corresponding information channel depend on the prior p = ( p0 , p1). For a binary asymmetric channel, there are four conditional detection probabilities, which are related by p1| 1 = 1 − p0|1 , p0| 0 = 1 − p1|0 .

(14.2.12a) (14.2.12b)

The mutual information given in (14.1.6c)

² p(r|s ) ³ p(s )p (r |s)log I (s; r ) = , p(r) s=0 r =0 . . is now written with p(0) = p0 and p(1) = p1 as the prior so that ( ) ( ) I (s ; r) = p0 p0| 0 log p0|0 / pr (0) + p0 p1| 0 log p1|0 / pr (1) ) ( ) ( + p1 p0|1 log p0|1/ pr(0) + p1 p1|1 log p1|1/ pr(1) , 1 1 ± ±

(14.2.13)

(14.2.14)

where pr (0) = p0 p0|0 + p1 p0|1 ,

pr (1) = p1 p1|1 + p0 p1|0 .

(14.2.15a) (14.2.15b)

For a space, zero photons are transmitted or received, written E0 = 0. For a mark, the expected number of received photons is E1. Let Eb be the expected number of signal photons per bit. With the prior probability of transmitting a mark given by p1 and with zero photons transmitted for a space, Eb = (1 − p1 )E0 + p1E1 = p1E1. Therefore, the expected number of received photons for a mark is E 1 = Eb / p1. In the presence of additive noise, the random number of received photons r in a single symbol interval is described by a Laguerre probability mass function p(r , E, N0 ) with K = 1, where N 0 is the expected number of noise photons (cf. (6.5.9)). This discrete distribution is the Poisson transform of a continuous noncentral chi-square distribution with two degrees of freedom. That distribution describes the statistical uncertainty of the squared magnitude of a lightwave signal in additive noise. The Poisson transform of this distribution incorporates the quantum uncertainty expressed as photon noise. The Laguerre probability mass function is given in terms of the two parameters E and N 0 by

² ³ ( N 0 )r E − E /(1+N ) 0 p(r, E, N0) = e Lr − , N0 (1 + N 0 ) (1 + N0)r+1 0

where L 0r (x ) is a Laguerre polynomial (cf. (2.2.52)). The corresponding binary photon channel is called a Laguerre channel, with the channel transition probabilities being given by

14.2 Photon-Optics Capacity

p0|1

=

´− ±1 =

r 0

691

p(r, Eb / p1, N0 ),

= 1 − p0|1, ´− ±1 p0|0 = p(r, 0, N 0),

p1|1

(14.2.16)

=

r 0

p1|0

= 1 − p0|0,

where the threshold ´ is determined by equating the two conditional probability distributions. When there is no additive noise, N 0 = 0 and the Laguerre distribution conditioned on a mark being transmitted reduces to a Poisson distribution with mean E1. When there is no signal but there is gaussian noise, E = 0 and the Laguerre distribution reduces to a Gordon distribution. When the expected number of noise photons remains the same from symbol to symbol, the wave-optics probability density function of the mean number of noise photons f (N 0) is a Dirac impulse, and the Gordon distribution for the number of noise photons reduces to a Poisson distribution (cf. (6.3.9)).

Signal-Dependent Information Channels An information channel can be described by a conditional probability p(k|²) on the channel output k given the channel input ². Equivalently, the information channel can be described by a probability transition matrix Q. For a conventional probabilistic information-theoretic channel, the prior resides entirely in the encoder and decoder and not in the information channel itself. This kind of information channel can be depicted as a black box for which fixed attributes of the information channel are decoupled from encoder and decoder. This depiction is shown in Figure 14.2. The sharp delineation of this black-box model is not always appropriate. The dependence of the transition probabilities given in (14.2.16) on the prior p1 leads to an information channel that can be depicted as a “gray box.” The term “gray box” indicates that the flexible attributes of encoding and decoding are not completely decoupled from attributes that define the information channel. Specifically, expression (14.2.16) shows that the transition probability p0| 1 depends on the prior because the threshold ´ depends on the prior. The optimum value of the threshold may depend on the prior, and the optimum prior depends on the threshold. This leads to a “gray box” information channel. The maximization of the mutual information is then a joint optimization of both the threshold ´ that defines the discrete information channel and the prior p = { p0 , p1 } that achieves the capacity for each instance of the information channel defined by the choice of ´. This form of joint optimization of a “gray box” channel can be incorporated within standard information theory of a “black box” channel by regarding each value of the threshold ´ as a fixed parameter of another information channel. Every value of the threshold results in a different information channel with its own optimal prior and its own capacity. The capacity C(´) then becomes a function of each fixed threshold ´. In

692

14 The Information Capacity of a Lightwave Channel

this way, the information channel is defined with ´ remaining fixed, but there are many channels, one for each threshold. The maximum capacity is then given by C

= max C(´). ´

In this formulation, the information channel continues to be viewed as fixed, with a fixed threshold, and concepts from standard information theory apply without change. For each such channel, all flexible attributes are in the encoder/decoder. The choice of the channel with the largest capacity is then an optimization that is outside of standard information theory. Quantum optics also admits information channels that depend on the prior, but with different considerations. These channels are discussed in detail in Section 16.4.

Capacity of a Binary Poisson Channel When there is no additive noise, the Laguerre distribution p(r, E, 0) reduces to a Poisson distribution conditioned on the transmitted symbol (cf. Figure 9.12). The resulting channel is called a binary Poisson channel. A binary Poisson channel with a symbol interval of duration T transmits a Poisson stream with a fixed photon-optics arrival rate of signal photons R, leading to an expected number of photons E 1 = R T for a mark and a null stream for a space.12 This is a discrete memoryless photon-optics channel. Then E0 = 0. The receiver for an ideal binary Poisson channel must accept a maximum photon count rate that is much larger than the mean photon-optics arrival rate of signal photons so that arbitrarily close photon arrivals can be resolved. For a practical system, this maximum count rate is limited by the response time of the photodetection process (cf. Section 7.3). The individually detected photons are summed over a sampling time Ts to generate an integer-valued sample at a sample rate R = 1/ Ts . The sample is used as a soft-decision detection statistic (cf. Section 9.5.2). Alternatively, a different information channel can be generated using one or more quantization thresholds to implement some form of hard-decision detection (cf. Figure 9.19), either binary or multilevel. The capacity of a binary Poisson channel is bounded by the entropy of a Poisson probability mass function, which can be written as13

∞ Ek log k! ± e − E H (E) = E − Elog e E + e nats. k! k=2

(14.2.17)

This expression is plotted in Figure 14.4. In the small-signal regime, E is much smaller than one, and the summation term in (14.2.17) is negligible compared with the first term. The sum of the remaining terms is equal to the small-signal limit of the term Cp (cf. (14.2.9)). This again justifies associating the term Cp with the discrete-energy nature of the lightwave signal. Moreover, it shows that the small-signal limit of the entropy of a Gordon distribution, which describes the photon-optics channel capacity, is well approximated by the entropy of a Poisson distribution.

µ η

η

12 The photoelectron arrival rate is R , where is the quantum efficiency. 13 See Martinez (2007), Appendix A for a derivation.

14.2 Photon-Optics Capacity

693

In a large-signal regime, E is much larger than one, and the entropy of a Poisson distribution approaches the entropy of a gaussian probability density function with the same variance. This limiting case is considered in a problem at the end of the chapter. For an ideal binary photon-counting channel with a null signal for a space, the optimal threshold for hard-decision detection has a value of zero regardless of the prior. If the detected value is zero, the hypothesis that a space was transmitted is asserted. If the detected value is not zero, the hypothesis that a mark was transmitted is asserted. This ideal asymmetric photon-counting channel is described as a Z channel, as shown symbolically in Figure 14.5(c). Two of the four conditional detection probabilities for the Z channel given in (14.2.16) do not depend on Eb . These are p0| 0 = 1 and p1| 0 = 0. The other two conditional detection probabilities do depend on Eb and the prior p1 . These are p0| 1 = e−Eb / p1 and p1|1 = 1 − p0|1 . Because two of the transition probabilities depend on the prior, the mutual information I (s; r ) is a function of the prior p1 . The mutual information of a Z channel is plotted in Figure 14.6 as a function of the prior probability p1 for a mark for several values of the expected number of photons per bit Eb . For an expected number of photons per bit Eb greater than about 10 photons, the crossover probability is small and the information channel is nearly a binary symmetric channel with p1 nearly equal to one-half. The mutual information can be treated as a symmetric function of the prior as is evident in Figure 14.6. For an expected signal level below 10 photons per bit, the channel is asymmetric. In this regime, the capacity is less than one bit per symbol and is achieved by reducing the transmission probability p1 for a mark to a value below one-half. Using the optimal value for the prior, the probability ρ of a bit error for a harddecision binary photon-counting Z channel is shown in Figure 14.7 as a function of Eb . The information rate using photon counting with an equiprobable prior ( p1 = 1/2) does not achieve the capacity, and is shown for comparison. 1.0 b = 10

)stib( noitamrofnI lautuM

0.8

2

0.6

1

0.4

0.5

0.2 0.0

b = 0.2

0.0

0.2

0.4

0.6

0.8

1.0

Prior Probability p1 of Transmitting a Mark Figure 14.6 Mutual information for a hard-decision binary photon-counting Z channel as a

function of the probability p1 of transmitting a mark. The dashed line follows the peaks of the curves and indicates the optimal prior probability for a mark.

694

14 The Information Capacity of a Lightwave Channel

1.4

10

(a) Homodyne demodulation

1.0 0.8

0.4

Photon counting (p 1 = 1/2)

0.2 10

–2

1

10–1

Photon counting (optimal p1)

0.6

0.0

)lobmys/stib( yticapaC

)lobmys/stib( yticapaC

1.2

10 1 Expected Photons per Bit E b

Photon counting (optimal p 1)

10–2 10–3

–1

(b)

10

Photon counting (p 1 = 1/2) Homodyne demodulation

10–3

10–2 10–1 1 10 Expected Photons per Bit Eb

100

Figure 14.7 (a) The capacity of a hard-decision binary photon-counting Z channel using the

optimal prior. The capacity using the suboptimal value of p1 = 1/ 2 is shown for comparison. Also shown is the capacity using homodyne demodulation given in (14.2.18), and the capacity C defined in (14.2.7). (b) The same curves with the capacity expressed on a log scale.

It is instructive to compare the single-letter capacity using ideal photon counting to the single-letter capacity using shot-noise-limited binary phase-shift keying. A channel using binary phase-shift keying with homodyne demodulation of one signal component to a real-baseband signal leads to a binary symmetric information channel with a capacity

= Hb (ρ) = −ρ log ρ − (1 − ρ)log(1 − ρ), (14.2.18) where Hb (·) is the binary entropy function (cf. (14.1.8)) and ρ is the shot-noise-limited Chomo

probability of a bit error given in (10.2.24). This expression is derived later in the chapter (cf. (14.3.10)). The capacity based on that expression is compared with the capacity based on ideal photon counting using an optimal prior in Figure 14.7. Examining Figure 14.7 shows that, in a small-signal regime for which Eb is much smaller than one, the capacity of a binary asymmetric channel using photon counting is larger than the capacity of a binary symmetric channel using homodyne demodulation of one signal component with an equiprobable prior. This improved performance is because photon counting is based on the discrete-energy nature of a lightwave and that property is more pronounced at small signal levels. Homodyne demodulation achieves a capacity for Eb slightly greater than one photon per bit because, in this regime, the wave property of a lightwave becomes more pronounced (cf. Figure 14.4). Both binary modulation formats saturate to a value of one bit per symbol for large values of Eb . The gap between the binary modulation capacity curves and the unconstrained capacity curve C given in (14.2.7) is the potential improvement due to using a different method of modulation and detection that fully exploits the dual wave/particle nature of a lightwave. These methods are discussed in Chapter 16.

Additive-Noise Channel For an additive-noise binary photon-optics channel with nonzero N 0, the optimal threshold is nonzero (cf. Figure 9.12). When additive noise is present, the capacity for each

14.2 Photon-Optics Capacity

1.2

1

1.0

Homodyne demodulation

0.8

Optimal photon counting

0.4 0.2 10 –2

Homodyne demodulation

10–1

0.6

0

)lobmys/stib( yticapaC

10

)lobmys/stib( yticapaC

1.4

695

Optimal photon counting

10–2 10–3

10–1 1 10 Expected Photons per Bit Eb

10–4

100

10–3

10–2 10 –1 1 Expected Photons per Bit E b

10

Figure 14.8 (a) The capacity of a hard-decision noisy binary photon-counting channel with

= 1. Also shown is the capacity for homodyne demodulation and the capacity C defined in (14.2.6). (b) The same curves on a log scale.

N0

instance of the information channel is determined by a joint optimization of the threshold ´ and the optimal prior probability for a mark. This capacity is shown in Figure 14.8 for N0 = 1. The curve for N0 = 1 is offset relative to the curve for N 0 = 0 shown in Figure 14.7. This offset shifts the capacity curve so that the capacity using homodyne demodulation (cf. (14.2.18)) is now greater than or equal to the capacity using photon counting for all signal levels. In contrast, for the Z channel with N 0 = 0, photon counting has a larger capacity for small signal levels. The rapid degradation of the capacity of an ideal binary photon-counting channel in the presence of additive noise demonstrates the sensitivity of the capacity of this format to additive noise (cf. Figure 9.15) when only the discrete-energy property of a lightwave is used to define an information channel. When homodyne demodulation based on the continuous-phase property of a lightwave is used to define a different information channel, the resulting channel capacity is less sensitive to additive noise. A similar sensitivity to additive noise occurs for quantum-optics signals. This is discussed in Section 16.2.6.

Large-Signal Regime

In the large-signal, additive-noise-limited regime, both N0 and E are much larger than one. For this case, using log(1 + x ) = x − x 2 /2 + · · · , the latter two terms of (14.2.6) can be expanded as

(E + N0)log2

²

1+

³

1

E

+ N0

− N0 log2

²

1+

³

1 N0

= 1 − 12 E +1 N − 1 + 12 N1 + · · · , 0

0

(14.2.19) which goes to zero as N0 becomes large. Therefore, the first term in (14.2.6), which is log 2(1 + E/(1 + N 0)), dominates and approaches log2 (1 + E/N0 ) as the expected number of noise photons N 0 becomes large. Setting E = Eh f , where E is the continuous signal energy, and setting N 0 = N0 h f , which is the power density spectrum of the additive noise, gives C

= log2

²

1+

E N0

³

bits per symbol

(14.2.20a)

696

14 The Information Capacity of a Lightwave Channel

as the single-letter capacity for an additive-noise-limited channel in a large-signal regime. This expression agrees with (14.3.4) derived in the next section directly from wave optics. An alternative derivation of the same expression that uses the limiting form of the Gordon distribution as h f becomes small compared with kT0 will be useful in other contexts. The resulting probability density function is an exponential function f ( E ) (cf. (6.2.1)) with an entropy H ( E ) = 1 + log E (cf. Table 6.1). Replacing the . Gordon function g (x ) in (14.2.6) by He (x ) = 1 + log x, which is the entropy of an exponential probability distribution function, the single-letter capacity is C

= H (r) − H (n) = 1 + log( E + N0 ) − (1 + log N 0) ² E³ = log2 1 + N bits per symbol, 0

(14.2.20b)

in agreement with (14.2.20a).

14.3

Wave-Optics Capacity The large-signal photon-optics capacity expressions given in (14.2.20a) and (14.2.20b) are based solely on the concept of energy without explicitly invoking the wave-optics concepts of magnitude/phase or in-phase/quadrature signal components. This section derives the capacity of a passband wave-optics channel based on additive gaussian transition probabilities. The expression for the capacity based on wave optics is found to be equal to the large-signal limit of the expression based on photon optics derived in the previous section. To derive the wave-optics capacity, let s be the continuous complex-baseband transmitted signal, let r be the continuous soft-detected signal, and let n be a circularly symmetric gaussian random variable for the noise. The differential entropy (cf. (6.1.2) for one signal component described by a real random variable s with a zero-mean √ 2 2 gaussian probability density function f (s) = (1/ 2πσs )e−s / 2σs is H (s ) = −

¹

¹∞

−∞ ∞

f (s )log f (s )ds

²

º

³

= f (s ) loge 2πσ + s / 2σ ds −∞ = 21 loge (2πσs2 ) + 21 loge e = 12 loge (2π eσ s2) nats, (14.3.1) where, for a zero-mean function, the mean-squared value ±s 2 ² is equal to the variance σs2 . The differential entropy in units of bits is (14.3.2) H (s ) = 21 log2 (2π eσs2) bits. 2 s

2

2 s

The differential entropy for a passband signal described by a complex-random variable s with a circularly symmetric, zero-mean gaussian probability density function 2 2 given by f (s) = (1/2πσs2 )e−|s| /2σs is

14.3 Wave-Optics Capacity

H (s) = log 2(2π eσs2 )

= log2(π eE )

bits,

697

(14.3.3)

where E = 2σs2 is the mean symbol energy. This value is twice the value given in (14.3.2) because the passband waveform has two independent components. Using (14.2.1), with r = s + n, the mutual information is given by I (s; r) = H (s + n ) − H (n). The mutual information is maximized by maximizing H (s + n ), which is maximized when s + n is gaussian. This is achieved by choosing the probability density function f (s) for the prior to be a maximum-entropy circularly symmetric gaussian probability density function with variance 2σ 2 . Replacing E with E + N0 in (14.3.3), the entropy of the probability density function f (r) is log2(π e(E + N 0)). Using (14.2.1), the single-letter capacity for a passband channel is C

= H (r) − H (n) = log2 (π e( E + N 0)) − log2(π eN0 ) ² ³ = log2 1 + NE bits/symbol, 0

(14.3.4a)

(14.3.4b)

in agreement with (14.2.20). This expression is the Shannon single-letter capacity of a gaussian passband channel. The capacity derived using a photon-optics signal model in a large-signal regime, which does not explicitly include the concept of phase, is the same as the capacity derived using a passband wave-optics signal model, which does include the concept of phase. The two signal models do not have the same capacity in a small-signal regime. These relationships are fully discussed in Chapter 16 using a quantum-optics signal model. While the two expressions ((14.2.20) and (14.3.4)) are equivalent, the wave-optics derivation is informative regarding the form of the prior that can achieve the capacity whenever the wave-optics signal model is the more appropriate description of a lightwave signal. From a practical viewpoint, this condition corresponds to a signal energy that is larger than about 10 photons per sample. This is discussed in detail in Chapter 15. 14.3.1

Capacities and Priors for Waves and Photons

The Shannon single-letter capacity for the additive gaussian noise channel within wave optics is achieved by choosing the prior probability density function f (s) as a maximum-entropy circularly symmetric gaussian probability density function. The Gordon single-letter capacity for the additive particle noise channel within photon optics is achieved by choosing the probability mass function p(s) for the discrete prior to be a maximum-entropy Gordon distribution. Each of these statements is fully supported within that signal model. Yet these statements are consistent and comport with the Poisson transform. This means that the optimal prior for one signal model can be used to determine the optimal prior for the other signal model. To show this starting from wave optics, observe that a circularly symmetric gaussian distribution for the complex amplitude corresponds to an exponential distribution f ( P ) for the power P (cf. (6.2.2) and Figure 6.5). Referring to Section 6.2.1, the probability

698

14 The Information Capacity of a Lightwave Channel

density function of the lightwave energy f ( E ) defined over a time interval T is also an exponential distribution whenever the timewidth–bandwidth product T B is smaller than one (cf. Section 6.6), which is equivalent to a discrete-time channel model that satisfies the Nyquist rate. The Poisson transform of the continuous maximum-entropy exponential distribution for the lightwave energy gives the discrete maximum-entropy Gordon distribution (cf. (6.3.11)). These steps are shown from left to right at the top of Figure 6.5. To show the converse statement starting from photon optics, observe that the inverse Poisson transform (cf. (6.3.6)) of a discrete Gordon distribution is a continuous exponential distribution for the lightwave energy E defined within wave optics. When this continuous distribution is lifted into the complex plane (cf. (6.2.3a)), it yields a circularly symmetric gaussian distribution (cf. (6.2.8)). These steps are shown from right to left at the bottom of Figure 6.5. These results show a deep connection between both the capacity and the capacityachieving prior derived using photon optics and the capacity and the capacity-achieving prior derived using wave optics. This connection may seem surprising, given that the properties of waves and particles are incompatible. Chapter 16 will show that this deep connection is a fundamental consequence of both signal models being special cases of the quantum-optics signal model.

14.3.2

Soft-Decision Capacity and Hard-Decision Capacity Using Wave Optics

The channel capacities given in (14.3.4) are based on a continuous additive-gaussiannoise information channel using soft-decision detection. The channel input symbols are all complex numbers, and range over all of the points of the complex plane (cf. Figure 9.1(c)). A different information channel is generated by restricting the channel input to a finite set of points of a signal constellation. We are interested in how this reduction in the allowable channel inputs affects the channel capacity both for soft-decision detection and for hard-decision detection.

Soft-Decision Detection in Additive Gaussian Noise

The differential entropy of the received noise n is determined using (14.3.3), with N0 2σ 2, and is H (n) = log (π eN 0).

=

(14.3.5)

The differential entropy of the soft-detected symbol r is the vector equivalent of (6.1.2), H (r) = −

¹¹ C

f (r)log f (r)dr,

(14.3.6)

where f (r) is the probability density function of the soft-detected symbol at the channel output and the integral extends over the complex plane. The conditional probability density function f (r|s² ) for r, given that s² is transmitted, is a multivariate complex gaussian probability density function (cf. (2.2.29)). The unconditioned probability

14.3 Wave-Optics Capacity

699

density function f (r) in (14.3.6) at the channel output is determined by a weighted sum using the prior p(s² ) on the input symbols. Then f (r) =

±

L −1

²=0

p(s² ) f (r|s² ),

(14.3.7)

where the s² are the elements of the designated signal constellation. The prior p(s² ) that achieves the channel capacity must be determined by maximizing the mutual information (cf. (14.1.14)) for that channel. For a symmetric modulation format such as phase-shift keying, an equiprobable prior p(s² ) = 1/ L is optimal, and (14.3.7) is f ( r) =

1 L

L −1 ±

²=0

f (r|s² ).

(14.3.8)

For a nonsymmetric modulation format such as multilevel quadrature-amplitude modulation, an equiprobable prior is not optimal, although it is often preferred for simplicity of implementation. The optimal prior uses signal points with larger energy less frequently than points with smaller energy so as to mimic a circularly symmetric gaussian prior when constrained to a discrete distribution on a finite set of complex points. This means that when an equiprobable prior is used, additional signal energy is required to achieve the same information rate at the same noise level. The reduction in the required mean energy per symbol due to replacing the equiprobable prior with the optimal prior is called the shaping gain. The soft-decision capacity using an equiprobable prior on a signal constellation can be determined by substituting (14.3.8) into (14.3.6). The resulting expression for the received entropy H (r), along with the entropy of the noise H (n) given in (14.3.5), is then substituted into (14.3.4a). Except for a few simple cases, the capacity must be determined using numerical methods. The soft-decision detection capacity for quadrature amplitude modulation using an equiprobable prior is shown in Figure 14.9. The curves collectively define a curve slightly below the Shannon bound. The gap is because an equiprobable prior rather than the optimal prior was used. The gap can be closed by using the optimal prior. Figure 14.9 shows that the shaping gain obtained by reducing the probability of using points with a large signal energy is, at most, about 1.5 dB for a large signal constellation used on a linear channel with additive white gaussian noise. 14

Soft-Decision Detection in General Noise

The envelope of the curves in Figure 14.9 suggests that the channel capacity of a continuous information channel based on a continuous input alphabet could be computed indirectly from a sequence of discrete information channels based on quadratureamplitude modulation (QAM) signal constellations. Figure 14.9 shows the capacity for several QAM signal constellations with an equiprobable prior. As the number of signal points in the QAM constellation is increased, it is evident that the envelope of the capacity for a set of discrete information channels based on QAM curves traces out 14 See Forney, Gallager, Lang, Longstaff, and Qureshi (1984).

700

14 The Information Capacity of a Lightwave Channel

10

Shannon bound

)lobmys/stib( yticapaC

8

256-QAM

6

64-QAM

4

16-QAM

2

4-QAM

0 –5

0

5

10 15 E/N0 (dB)

20

25

30

Figure 14.9 Soft-decision capacities for quadrature amplitude modulation using an equiprobable

prior.

the capacity curve of the unconstrained continuous information channel. This envelope differs from the true capacity only by the shaping gain, which would be eliminated by using the optimum prior on the signal constellation. This means that, for any E / N 0, the optimal prior on a large-signal constellation will actually allocate most of the probability to a smaller subconstellation, and nearly equiprobably. The additional outer points of the large-signal constellation will have a smaller probability so as to approximately conform with the optimal circularly symmetric gaussian prior that would be used for the waveform information channel input. The capacity for each discrete information channel used by this method of computing the capacity of a continuous channel depends only on the transition probabilities for each constellation. Accordingly, this envelope method is appropriate for both linear and nonlinear channels with signal-dependent gaussian or nongaussian noise. For linear channels with signal-independent gaussian noise, the transition probabilities depend solely on the euclidean distance between the signal points. For a linear channel with signal-dependent or nongaussian noise, the probabilities depend on the decision regions, which need not depend simply on the distance between the two points. For a nonlinear channel, the transition probabilities can be numerically calculated or treated as signaldependent noise. Once the transition probabilities have been determined, the optimal capacity-achieving prior can be calculated for each signal constellation.15 The size of the signal constellation is then increased and the process repeated to form the envelope, thereby indirectly calculating the capacity of the unconstrained waveform information channel. The envelope method is discussed in the context of a nonlinear lightwave channel in Section 14.6.

Hard-Decision Detection

For a signal constellation with L points, the hard-decision detection capacity can be expressed using the vector form of (14.1.5c), and is given by 15 See Blahut (1972) and Kschischang and Pasupathy (1993).

14.3 Wave-Optics Capacity

C

= H (r) − H (r|s) L −1 L −1 L −1 ± ± ± = − p (r²)log p(r²) + p(s j ) p(r²|s j )log p(r²|s j ). ²=0

j =0

²=0

701

(14.3.9a)

The first term, H (r), on the right is the uncertainty for the average channel output. The second term, H (r|s), is the average uncertainty for the channel output. For a symmetric channel such as that obtained using phase-shift keying, the capacity is achieved using an equiprobable prior p(s j ) = 1/ L. Moreover, because every symbol has the same energy, p (r² ) = 1/ L , so the capacity for a symmetric channel can be written as C

= log L +

±

L −1

²=0

p(r² | s0)log p (r² | s0 ),

(14.3.9b)

where p(r² | s0) is the conditional probability that the symbol r² is detected, given that symbol s0 was transmitted. Because the channel is symmetric, every term in the sum is the same. For L = 2, the channel is a binary symmetric channel, with the capacity given by C

= 1 + ρ log ρ + (1 − ρ)log(1 − ρ) = 1 − Hb(ρ),

where Hb(ρ) is the binary entropy function (cf. (14.1.8)), and bit error.

14.3.3

(14.3.10)

ρ is the probability of a

Intensity Modulation

The capacity of a channel using intensity modulation depends on the type of detection that is used to form the information channel. Only the memoryless additive-noise channel with direct photodetection demodulation will be treated. This means that all intersymbol interference has been removed from the output of the information channel by an equalization process.

Soft-Decision Detection with Unconstrained Inputs

The single-letter capacity of the soft-decision intensity-modulation channel and the prior that achieves that capacity can be deduced using simple heuristic arguments. When E / N0 as defined in (14.2.20) is large, one may posit that the capacity for intensity modulation is approximately one-half the capacity of an unconstrained format given in (14.3.4b) because only one degree of freedom is used for modulation. This capacity is C



²

E 1 log2 1 + 2 N0

³

+ O (1)

bits per symbol,

where O (1) denotes a term of order one.16 16 For example, see Mecozzi and Shtaif (2001), Lapidoth (2002), and Katz and Shamai (2004).

(14.3.11)

702

14 The Information Capacity of a Lightwave Channel

Ignoring scaling constants, let the intensity z be given by s 2 , where s is the nonnegative real amplitude of the lightwave signal. The probability density function f (z ) for the prior that achieves the capacity for intensity modulation is determined starting with the maximum-entropy distribution for the power in the complex lightwave signal. This probability density function is an exponential function. This distribution is equivalent to a central chi-square probability density function with two degrees of freedom (cf. (2.2.40)). For intensity modulation, only one degree of freedom is used for modulation. Therefore, a simple heuristic argument states that the capacity-achieving prior probability density function f (z ) on the intensity is a central chi-square probability density function with one degree of freedom as given in (2.2.37) and repeated here: f ( z) =



1

2πσ

2

z −1/ 2e− z /2σ

2

,

E

≥ 0,

where σ 2 = ± z ² is the mean. Using z = s2 and dz = 2s ds, the probability density function of the nonnegative lightwave amplitude s is a central chi probability density function with one degree of freedom (cf. (2.2.47)), repeated here as f (s ) = 2 √

1

2πσ

2

2 2 e−s /2σ

for s

> 0,

(14.3.12)

and otherwise f (s ) equals zero. This is a gaussian probability density function constrained to have only positive arguments. This probability density function, which achieves the capacity given in (14.3.11), is valid when E / N0 is large. For small E / N0 , the probability density function must be determined numerically, and approaches a discrete probability mass function.

Hard-Decision Detection with a Binary Input

The hard-decision intensity-modulation capacity is smaller than the soft-decision detection intensity-modulation capacity. The capacity of a binary( intensity-modulated signal √ E / 2N ) (cf. (10.2.1c)). using hard decisions can be derived using ρ = 21 erfc b 0 This expression is exact for additive white gaussian noise. Substituting this expression into (14.3.10) yields the hard-decision detection capacity. This capacity is plotted in Figure 14.10. For signal-dependent noise with the threshold chosen to produce a binarysymmetric channel, the expression for ρ uses an averaged noise term (cf. (9.5.27)). Optimizing the threshold to minimize ρ in a small-signal regime leads to an asymmetric information channel for which the prior cannot be decoupled from the threshold (cf. (14.2.12)). For this case, the capacity must be determined by a joint optimization of the prior and the threshold.

Comparison with Photon Counting

Intensity modulation within wave optics is not equivalent to photon counting within photon optics. The capacity for intensity modulation given in (14.3.11) in a largesignal regime is about half the capacity of an unconstrained modulation format given in (14.2.20a), which is a limiting form of the photon-optics capacity. This capacity can be realized, at least in principle, by photon counting.

14.3 Wave-Optics Capacity

703

3.0 Shannon bound

2.5 )lobmys/stib( yticapaC

Soft-decision detection

2.0 1.5 1.0

Binary hard-decision detection

0.5 0.0 –15

–10

–5

0

5

10

15

E/N 0 (dB) Figure 14.10 Capacity of intensity-modulated gaussian noise channels. Top curve, Shannon bound.

Lower curve, binary hard-decision detection. Dotted curve, soft-decision detection capacity.

The difference in the two capacities stems from the constraints imposed for each channel. The intensity-modulated channel constrains the transmitter to use only the waveform intensity and constrains the receiver to detect only the waveform intensity. The photon-optics channel discussed in Section 14.2 constrained the receiver to detect photons. However, at the transmitter, the required maximum-entropy Gordon distribution for the photon number that achieves the capacity corresponds to a circularly symmetric gaussian distribution within wave optics, which has two degrees of freedom. This is simply a consequence of the fact that the most general description of the energy (or power) is expressed using two degrees of signaling freedom (cf. (6.2.3)). Intensity modulation uses only one of those degrees of freedom, and this constraint leads to the difference in the capacities. 14.3.4

Phase Modulation

Phase modulation on an additive gaussian noise passband channel is a symmetric modulation format, with the optimal prior being√a uniform phase distribution. The received signal sample at complex baseband is r = Ee iφ + n, where E is the symbol energy. Only the phase error due to the noise need be considered. The probability density function of the received phase is then also uniform, with the corresponding received entropy given by H (r) = H (φ) = log 2π (cf. Table 6.1). The entropy of the phase noise is determined using the marginal probability density function of the phase given in (2.2.35), with F = E / N0 .

Soft-Decision Detection

An expression for the capacity of phase modulation on an additive gaussian noise channel can be obtained using the exact expression for the marginal probability density function of the phase given in (2.2.35) with F = E / N0 . The resulting capacity is given by17 17 See Wyner (1966) for a complete derivation.

704

2.0

14 The Information Capacity of a Lightwave Channel

5

)lobmys/stib( yticapaC

QPSK

1.5 1.0

BPSK

0.5 0.0

–5

0

5

10

)lobmys/stib( yticapaC

Shannon bound

(a)

(b)

PSK Limit

Shannon bound

4

16-PSK

3

8-PSK

2

QPSK

1

BPSK

0

–5

0

E/N 0 (dB)

5 10 E/N 0 (dB)

15

20

Figure 14.11 (a) Capacities for BPSK and QPSK. Solid lines, soft-decision capacities. Dashed

lines, hard-decision capacities. (b) Soft-decision capacities for phase modulation.

C

where

= log2

² h( F , x ) ³ ² 2F ³ ¹ ∞ − h ( F, x )log 2 dx , e x 0

.

h ( F, x ) = 2F xe− F

(x 2+1)

(14.3.13)

I0 (2F x ).

This capacity is shown as a function of E / N0 in Figure 14.11. Rather than derive (14.3.13) itself, an approximate derivation valid for large values of E / N0 will be given. In this large-signal regime, the marginal probability density function of the phase is well approximated by a gaussian distribution with variance (2E / N 0)−1 (cf. (2.2.36)). The corresponding entropy for the phase noise is (cf. (14.3.2)) H (n) ≈

1 log 2

² πe ³ . E / N0

(14.3.14)

Substituting this expression and the maximum received entropy H (r) = log 2π into (14.3.4a), the soft-decision detection capacity for large E / N0 in units of bits is

= H (r) − H (n) ² πe ³ 1 ≈ log 2π − 2 log E / N 0 1 ≈ 2 log(4π E /eN0 ) ≈ 12 log( E/ N 0) + 12 log(4π/e ). 1 log 2(4π/e) is equal to 1.1. This large-E / N0 2 C

(14.3.15)

limit is shown in FigThe constant ure 14.11(b) using the curve labeled “PSK Limit.” Therefore, the capacity in (14.3.15) in bits per symbol is half the capacity of an unconstrained additive gaussian noise channel given in (14.3.4b) plus 1.1 bits. This constant is of the same order as the constant for intensity modulation given in (14.3.11). The approximate factor-of-two reduction in the capacity for phase modulation or intensity modulation is readily explained by examining the form of the joint gaussian probability density function in polar form f (r, φ) given in (2.2.32). When E / N0 is much larger than one, f (r, φ) approaches a separable joint gaussian probability density function with the marginal probability density function of both the magnitude and the phase

14.3 Wave-Optics Capacity

705

being a gaussian distribution. In this large-signal-to-noise-ratio regime, the capacity of a channel constrained to use only either the magnitude or the phase is approximately half the capacity given in (14.3.4b), which uses both signal components. For L -ary phase modulation, the sample value s² is s²

√ = Eei2π²/ L ² 2π² ³¼ √ » ² 2π² ³ = E cos L + i sin L ,

(14.3.16)

for 0 ≤ ² ≤ L − 1. Each of these values defines a conditional probability density function f (r| s² ). The capacity is determined by substituting each of these functions into (14.3.8) and evaluating the summation to produce f (r) and the corresponding received entropy H (r) given by (14.3.6). The resulting expression, along with the entropy of the noise given in (14.3.5), is then substituted into (14.3.4a) to numerically determine the capacity. Figure 14.11(b) plots this capacity for several phase-shift-keyed modulation formats. For L greater than four, evaluation of the capacity requires numerical methods. Comparing Figure 14.11(b) for PSK with the soft-decision capacity of QAM given in Figure 14.9, 4-QAM is equal to QPSK. However, comparing 16-PSK with 16-QAM, there is approximately 4 dB difference in the value of E / N0 required to achieve the capacity, with QAM being the more energy-efficient format for the same number of bits per symbol. This is because the signal constellation is not constrained to lie on a circle as it is for PSK.

Hard-Decision Detection

The hard-decision detection capacity for L -ary phase-shift keying is determined using (14.3.9). The transition probability p(r² |s0 ) for phase modulation is given by p(r² |s0) =

¹

(2²+1)π/ L

(2²−1)π/ L

f (φ)dφ,

where f (φ) is the probability density function of the phase given in (2.2.35). For L = 2 and L = 4, which correspond to binary phase-shift keying (BPSK) and quadrature phase-shift keying (QPSK), respectively, the integral can be evaluated analytically. For BPSK, the channel is a binary symmetric channel and, using (14.3.10), the channel capacity is



CBPSK

= 1 − Hb

½1 2

¾

erfc E / N 0

¿

,

(14.3.17)

where ρ = 21 erfc E / N0 is the probability of a bit error for BPSK given in (10.2.1a). Because a QPSK constellation is two BPSK constellations in phase quadrature, the capacity CQPSK for QPSK is simply CQPSK

½ ½ ¾ ¿¿ = 2 1 − Hb 12 erfc E /2N0 ,

(14.3.18)

2 where the factor of two inside the erfc function accounts for the difference in dmin between BPSK and QPSK (cf. Figure 13.19). A comparison of the soft-decision detection capacity and the hard-decision detection capacity for both BPSK and QPSK is

706

14 The Information Capacity of a Lightwave Channel

shown in Figure 14.11(a) as a function of E / N 0. The difference between the capacity for a hard-decision information channel and that for a soft-decision information channel is the potential improvement which can be obtained by using a more complex codedmodulation format. This potential improvement is most pronounced at intermediate values of E / N0 .

14.4

Capacity of a Product Channel A channel that supports multiple independent degrees of signaling freedom expressed as independent information subchannels is described by a multivariate product distribution and is called a product channel. The capacity of a product channel can be derived in two steps. The first step determines the single-letter capacity for each subchannel, which depends on the noise level of that subchannel and the amount of energy assigned to that subchannel. Because these subchannels are independent, the total capacity is the sum of the subchannel capacities. The second step determines the optimal distribution of the total allowed energy among the subchannels so as to maximize the total capacity. This statement applies to any channel that is appropriately modeled using a continuous bandlimited gaussian random process with a mean power constraint. This is because there always exists a basis {ψ j (t )} and a corresponding set of components {b j } that are uncorrelated gaussian random variables and so are independent (cf. Section 6.5). In this way, a continuous bandlimited gaussian random process can be expressed as a discrete sum of uncorrelated subchannels. The resulting multivariate probability density function of the information channel in this basis is a product distribution. The overall capacity of this product channel is the sum of the capacities for each of the subchannels considered in isolation. The optimal allocation of the signal fills the subchannels with the total available signal energy using a method called water filling. Given a set of independent gaussiannoise subchannels with a different effective noise power density spectrum N k in the kth subchannel, the signal energy is first allocated to the subchannel that has the smallest effective noise power density spectrum until the signal energy plus the effective noise power density spectrum in that independent subchannel is equal to the subchannel that has the next-smallest effective noise power density spectrum. Signal energy is then allocated in equal amounts to both subchannels until the sum again equals the next-smallest noise level. This process is repeated until all the signal energy has been allocated. For the subchannels that contain signal energy, this produces equal levels of the signal energy plus the effective noise power density spectrum, as is shown pictorially in Figure 14.12. The water-filling procedure can also be applied to continuous bandlimited channels as discussed in Section 14.4.3. The total capacity is C



=

K ± k =1

²

log 1 +

Ek Nk

³

,

(14.4.1)

where E total = k E k is the total energy, with E k being the energy allocated in the kth subchannel. These subchannels can be defined in time, frequency, space, or polarization.

14.4 Capacity of a Product Channel

Noise level

Signal

707

Signal

Noise level

Subchannel 1

Subchannel K

Subchannel 1

(a)

Subchannel K (b)

Figure 14.12 (a) For the same noise variance per subchannel, the capacity is achieved by an equal

distribution of the signal energy. (b) For a different noise variance per subchannel, the capacity is achieved by “water-filling” the available signal energy.

This section discusses an information channel based on a product distribution generated from a bandlimited gaussian random process. The product distribution may have components that correspond to degrees of freedom in space, time, frequency, or polarization. A multi-input multi-output channel with a product distribution over the spatial degrees of freedom is considered first, then a bandlimited channel for which the degrees of freedom of the product distribution are related to the timewidth–bandwidth product T B. 14.4.1

Capacity of a Gaussian MIMO Channel

A channel with multiple degrees of freedom in space is called a multi-input multi-output channel. At each sample time, the input to a memoryless discrete-time mimo channel is a vector s of transmitted symbols. The channel output is a vector r of detected symbols. For a general multivariate additive, white gaussian-noise complex-baseband mimo channel, the output is given by (cf. (8.3.3)) r=

Hs + n,

(14.4.2)

where n is a vector of independent, identically distributed, circularly symmetric gaussian random variables, and H is the channel matrix. This section considers only a complexbaseband channel with K inputs and K outputs. Accordingly, the channel matrix H is square. Because the noise is gaussian and the channel is deterministic with a mean power constraint, the capacity-achieving prior f (s) on the complex-baseband channel input is evidently a multivariate gaussian probability density function, given in (2.2.30a) and repeated here as † −1 1 e−(s −±s²) V s (s −±s ²) , f (s) = K π det Vs

where Vs is the covariance matrix of the complex-baseband channel input signal (cf. (2.2.30b)). This reasonable statement is here accepted without proof. The covariance matrix Vs is part of the prior and should be chosen to maximize the mutual information subject to any constraints that may be imposed.

708

14 The Information Capacity of a Lightwave Channel

.

Define a = s − ±s². Then Vs = ±a a† ² (cf. (2.2.30b)), and the differential entropy for a complex-baseband mimo signal described by Vs is H (s) = −± loge f (s)²

=

À

1 a† V− s a

Á

+ loge (π

K

det Vs ).

(14.4.3)

1 † −1 The term a†V− s a regarded as the inner product of a and Vs a is equal to the trace of the outer product (cf. (2.1.85)). The linearity of the expectation operation then leads to

½ ¿ ±a†V−s 1 a² = trace V−s 1±a a†² = trace(V−s 1Vs ) = K.

Using this fact in (14.4.3) gives H (s) = K

+ loge (π det Vs ) ( ) = loge (π e) det Vs = loge det (π eVs ) nats. It follows that the entropy of Hs is loge det(π eHVs H† ). K

K

The entropy of white complex-baseband noise n described by V N

(14.4.4)

= N0 I

K

is

H (n ) = loge det (π eV N )

= loge det(π eN 0I ) = K loge (π eN0 ), K

(14.4.5)

where N 0 = 2σ N2 is the noise power density spectrum for a single complex-baseband subchannel. For an arbitrary signal covariance matrix Vs , the received differential entropy H (r) depends on the complex-baseband channel matrix H, which determines how the received symbols are correlated. Using r = Hs + n, the covariance matrix for the received vector r can be written as Vr = N0 IK + HVs H† , which is asked for as an end-of-chapter exercise. Substituting this covariance matrix into (14.4.4) produces H (r) = log e

½ (π e)

K

½

det N 0I K

¿¿ + HVs H† .

(14.4.6)

Combining this expression with (14.4.5), the capacity Cmimo for a complex-baseband mimo channel in terms of Vs is Cmimo

= H (r) − H (n) ½ ½ ¿¿ = loge (π e) det N0 I + HVs H† − K loge (π eN 0) ² ³ = K loge (π eN 0) + loge det I + N1 HVs H† − K loge (π eN 0) 0 ³ ² 1 (14.4.7) = loge det I + N HVs H† nats. K

K

K

K

0

14.4 Capacity of a Product Channel

709

In going from the second to the third line, the expression det( N0 IK + HVs H† ) = N 0K det(IK + (1/ N0 )HVs H† ) is used. The covariance matrix Vs of the input signal is not yet specified. The first application of (14.4.7) is to an information channel whose encoder is constrained to transmit an equal amount of energy E k = E total/ K in each of the K subchannels. Specifying the covariance matrix Vs to be E k IK and substituting into (14.4.7) gives ² ³ 1 E total † Cmimo = log2 det I K + HH bits (14.4.8) K N0 as the channel capacity. Moreover, for the special case with no coupling between the subchannels, HH† = |h |2I K , where |h |2 is the channel gain, which is set to one for convenience. For this channel, each received symbol is independent, with an entropy in the kth subchannel given by the energy Ek = E total/ K for a symbol. Using the identity det(a IK ) = a K , where a = 1 + E total/( K N0 ), the complex-baseband capacity is Cmimo

²² ³ ³ = log2 det 1 + K1 ENtotal I ² 1 E 0³ = K log2 1 + K Ntotal bits. K

0

(14.4.9)

Comparing this capacity with the single-letter capacity given in (14.3.4b) for the same total energy E total , the linear scaling of K outside the log function dominates the 1/ K scaling inside when K is smaller than E total/ N 0. This fundamental result states that, with the total energy E total constrained, an increase in the number of independent subchannels with less energy per subchannel produces a larger capacity whenever the number of subchannels K is less than E total/ N 0. However, when K is larger than E total / N0 , spreading the total energy across more subchannels provides diminishing returns, and the capacity Cmimo saturates at (E total/ N0 )/log e 2 bits. The derivation of this expression is asked for as an end-of-chapter exercise. 14.4.2

Capacity of a Random MIMO Channel

A multi-input multi-output complex-baseband channel may have a random channel matrix H (with the randomness denoted by the underscore) because of mode-dependent attenuation. A realization of this random matrix, denoted by H, is not necessarily hermitian. For a given channel realization H, let Ek be the energy in the kth uncorrelated sub∑ channel with k E k = E total. The E k need not be the same for every subchannel. Suppose that the encoder adapts to each realization of the channel. The capacity Ck for a given channel realization H for the kth subchannel is determined using the real nonnegative eigenvalues ξk of the symmetric matrix HH† . These eigenvalues quantify the energy redistribution between the uncorrelated complex-baseband subchannels caused by the coupling introduced by the channel. Accounting for this redistribution, the energy in the kth uncorrelated subchannel is modified by ξ k so that the capacity Ck of the kth subchannel of the channel realization H is

710

14 The Information Capacity of a Lightwave Channel

Ck

= log2

²

1+

Ek Nk

³

,

(14.4.10)

where Nk = N0 /ξk is the effective noise power density spectrum for the kth subchannel. The total capacity is the sum of the capacities of the independent subchannels Cmimo

=

K ± k =1

²

log2 1 +

Ek Nk

³

.

(14.4.11)

The total capacity Cmimo is now a random variable that depends on the channel realization = 1 for all k then E k = E total/ K , the capacity for every subchannel is the same, and (14.4.11) reduces to (14.4.9). The expected value of the channel capacity is determined by averaging over the probability density function of H and is

H, with the encoder adapting to that realization so as to achieve the capacity. If ξk

±Cmimo ² =

± K k=1

²

log 2 1 +

Ek Nk

³Ã

bits,

(14.4.12)

where each realization generates a different set of subchannels with different capacities {Ck }, and a different total capacity.

14.4.3

Capacity of a Bandlimited Wave-Optics Channel

The capacity of the ideal rectangular, bandlimited-waveform information channel with white gaussian noise is easily derived using the Nyquist–Shannon sampling theorem. That theorem states that a passband waveform of bandwidth B can be fully represented by B samples per second. Moreover, the Nyquist samples of ideal bandlimited white noise are independent. This means that a bandlimited waveform information channel has a discrete-time representation. This representation leads to a continuous information channel which is formally equivalent to the subchannel representation used in space. Over a time interval T , this leads to a product channel with the product distribution defined in terms of T B (cf. Section 6.5). For this case, each of T B components can convey a single complex symbol with nearly the same entropy (cf. Figure 6.8). For channels whose bandlimiting function is not rectangular, water-filling must be used to achieve the capacity, as will be discussed below. The expected energy E total over a temporal interval of duration T can be written as E b RT , where E b is the energy per uncoded databit, and R is the information rate in bits per second. For this case, the energy E k for the kth temporal component is the same for all k and is given by Ek

= =

E total TB R Eb . B

(14.4.13a) (14.4.13b)

14.4 Capacity of a Product Channel

711

Substitute (14.4.13a) into (14.3.4b) and multiply the result by the number of independent components T B to yield Cband

= T BC ² ³ = T B log2 1 + T1B ENtotal 0

bits.

(14.4.14)

Comparing (14.4.14) with the capacity Cmimo for a mimo channel in space given in (14.4.9), the number of equal-entropy independent components T B for a bandlimited channel plays the same role as the number of equal-entropy independent subchannels K for a multi-input multi-output channel. Similarly, substituting (14.4.13b) into (14.3.4b) and multiplying the result by the number of independent components yields Cband

= T B log2

²

R Eb B N0

1+

³

bits.

(14.4.15)

The bandlimited capacity C , which is the maximum information rate in bits per second, is determined by multiplying the single-letter capacity by the number of samples per second, or equivalently, dividing Cband by the modulation interval T to give

. Cband = B log C= 2 T

²

1+

R Eb B N0

= B log2 (1 + SNR)

³

bits per second,

(14.4.16a) (14.4.16b)

where (10.1.22) was used to write SNR = R E b / N0 B in the last line. This expression is the Shannon bound for the capacity of a bandlimited channel for which the single-letter capacity C is constant over the bandwidth B (cf. (14.3.4b)). When the bandwidth B is smaller than the term R E b / N 0, which is a scaled information rate, the bandlimited capacity C in bits per second is approximately linear in B. When the bandwidth B is larger than R E b / N0 , the bandlimited capacity C saturates at a value equal to ( R E b/ N0)/log e 2. The derivation of this expression is asked for as an end-of-chapter exercise. The capacity saturates because adding more degrees of freedom per unit time as expressed by the bandwidth B provides diminishing returns when B is larger than R E b/ N0 .

Water-Filling for a Bandlimited Channel

When the passband bandwidth is not an ideal rectangular function or the noise is not white, (14.4.16b) must be modified. Let N ( f ) be the noise power density spectrum at the channel output, and let H ( f ) be the channel transfer function. For this case, the effective noise power density spectrum N ´ ( f ) at the channel input is given by N ´( f ) =

N( f ) |H ( f )|2 .

(14.4.17)

712

14 The Information Capacity of a Lightwave Channel

|H(f )|2

N( f )

f

N′( f )

f

Water level

S( f )

f

f

Figure 14.13 Water-filling for a bandlimited channel.

Now modify the expression for the capacity of a set of independent subchannels given in (14.4.11) by replacing the summation on k over the subchannels by an integration over f in frequency, which leads to

² ´ ³¼ d f, (14.4.18a) N ´( f ) −∞ where a water-filling threshold parameter ´ is applied to N ´( f ) to determine the allocated signal power density spectrum S ( f ) = max[0, ´ − N ´ ( f )] at the channel input. This input power density spectrum satisfies a total power constraint 18 ¹∞ S= max [0, ´ − N ´ ( f )]d f , (14.4.18b) C ( S) =

¹∞

»

max 0, log

−∞

where S is the total signal power over the band. Sweeping the parameter ´ in (14.4.18) from zero to infinity in these two parametric equations traces out the curve C (S ). The result is depicted graphically in Figure 14.13. This figure suggests using a modulation format that partitions the frequency axis into small frequency intervals, then using a different optimized signal constellation for the modulated waveform in each frequency interval. This well-developed technique is called adaptive bit loading.

Maximum Bandwidth

The maximum information rate depends on the maximum bandwidth supported by the channel. A simple estimate for the maximum bandwidth and the corresponding information rate can be determined using concepts discussed in Chapter 6. Because the energy of a photon is directly proportional to the frequency, the maximum-entropy distribution for the average signal energy E derived in (6.1.9) along with the energy in a mode derived in (6.1.11) leads to an expression for the signal power density spectrum S ( f ), which is the average signal energy as a function of the frequency. 18 See Blahut (2020).

14.4 Capacity of a Product Channel

713

The signal power density spectrum S ( f ) is given by the first term of (6.1.12), where the thermal energy kT0 is replaced by the average signal energy E used to convey information19 so that hf S ( f ) = h f /E (14.4.19) e − 1.

Ä

The integral 0∞ S ( f )d f gives the signal power Ps which determines the maximum information rate supported by the channel. Examining Figure 6.3, the maximum-entropy power density spectrum is nearly constant for frequencies less than a frequency f max given by f max

= E / h.

(14.4.20)

Then the maximum-entropy power density spectrum rapidly decreases to zero with increasing frequency. For frequencies less than f max, the power density spectrum S ( f ) can be approximated by the average signal energy E . For frequencies greater than f max , S ( f ) can be approximated by zero. The signal power Ps is then the area of a constant power density spectrum S ( f ) of magnitude E and bandwidth fmax = E / h with Ps

≈ E 2/ h .

(14.4.21)

This is the frequency of a photon with energy E . This expression relates the average signal power Ps and the average signal energy E when the signal energy is distributed uniformly in frequency up to the approximate maximum bandwidth given by f max. Using (14.4.16b) and setting the bandwidth B equal to f max = E / h, setting the signal power Ps equal to E 2/ h, and setting the noise power equal to N0 f max = N0 (E / h) gives

Cmax

≈ ≈

²

f max log2 1 +

E 2/ h N0 (E / h)

f max log2 (1 + E / N0)

³

bits per second.

(14.4.22)

Expression (14.4.22) states that when the energy is distributed according to a maximumentropy distribution, the maximum information rate in bits per second is approximately the single-letter capacity C given by log2 (1 + E / N0 ) (cf. (14.2.20)) multiplied by the frequency f max . Because f max = E / h, this implies that the wideband channel capacity Cmax depends only on the signal energy E and the noise power density spectrum N0 as is the case for the single-letter capacity.

14.4.4

Capacity of a Bandlimited Photon-Optics Channel

The bandlimited channel capacity and the wideband channel capacity of an ideal photonoptics channel are derived in this section. For the bandlimited channel, the single-letter photon-optics capacity does not depend on the lightwave frequency. This bandlimited capacity can be directly compared with the bandlimited wave-optics capacity derived in Section 14.4.3. For the wideband channel, the single-letter photon-optics capacity 19 The second term in (6.1.12), which is the vacuum-state energy, is not included because it cannot be

controlled by the encoder.

714

14 The Information Capacity of a Lightwave Channel

does depend on the frequency because the energy of a photon depends on the frequency. Including the frequency dependence of the single-letter capacity leads to an expression for the maximum information rate that can be conveyed as the bandwidth goes to infinity.

Bandlimited Capacity

When the mean number of signal photons E per symbol is constant over a frequency interval modeled as a flat bandwidth B, the photon-optics arrival rate of signal photons R (cf. (1.2.5)) is proportional to the bandwidth with R = E B . Substitute the mean number of signal photons E = R/ B into (14.2.7), and scale the resulting expression by the bandwidth B (cf. (14.4.16a)) to give the fundamental expression

C

²

R

³

²

= B log2 1 + B + R log2 1 + = Cw + C p bits per second

B

³

R

(14.4.23)

for the bandlimited capacity C for an ideal photon-optics channel. This remarkable equation,20 called the Gordon formula, is symmetric in the mean arrival rate of signal photons R and the bandwidth B. It couples the physical and information-theoretic properties of a noiseless lightwave channel in a single expression. The arrival rate of signal photons R in photons per second describes the physical quantity used to convey information within the channel. The flat bandwidth B in cycles per second describes the degrees of signaling freedom per unit time within the channel. The bandlimited capacity C in bits per second describes the maximum information rate accepted by the channel. Equation (14.4.23) also expresses both the wave aspects and the particle aspects of the bandlimited capacity. The term Cw is the bandlimited counterpart of the single-letter capacity Cw (cf. (14.2.7)) and describes the contribution of the continuous-wave nature of the lightwave signal to the bandlimited capacity. It has the same form as the bandlimited wave-optics capacity C given in (14.4.16a), with the wave-optics term R E b/ N0 replaced by the arrival rate of signal photons R. The second term C p is the bandlimited counterpart of the single-letter capacity Cp (cf. (14.2.7)) and describes the contribution to the bandlimited capacity of the discrete-particle nature of the lightwave signal. The inclusion of this term, which cannot be derived from wave optics, leads to the symmetric form of the Gordon formula. Figure 14.14 compares the single-letter capacity C shown in Figure 14.4 and repeated in Figure 14.4(a) with the bandlimited capacity C scaled by the photon-optics signal arrival rate R. This capacity is shown in Figure 14.14(b). Examining the two figures, it can be seen that the roles of the single-letter capacity terms (Cw and Cp ) as a function of the expected number of photons are reversed compared with the roles of the bandlimited capacity terms (Cw and C p) as a function of the bandwidth B scaled by the photon-optics signal arrival rate R. Specifically, when the mean photon-optics arrival rate R is much smaller than the bandwidth B, the channel is more granular because there are more degrees of freedom 20 Attributed to Gordon (see Gordon (1962)).

14.4 Capacity of a Product Channel

10 1

p

)R /C( yticapaC

)lobmys/stib( yticapaC

10

10

= w+ p

–1

p

1

w

–1

10

w

10 –2 10 –3 10 –3

= w+

715

–2

p

10–2

–1

2

1 10 10 10 10 Expected Number of Signal Photons (a)

3

10

10–3 10 –3

10–2

10–1 1 10 Bandwidth (B/R)

102

103

(b)

Figure 14.14 (a) The single-letter capacity given in Figure 14.4 for an ideal photon-optics

channel. (b) The bandlimited capacity C as a function of the bandwidth B. Both C and B are scaled by the signal arrival rate R.

per unit time, as expressed by the bandwidth, than photons per unit time, as expressed by the arrival rate R. Therefore, the term C p is greater than the term Cw . In this “coarsegrained” regime, the most significant contribution to the bandlimited capacity is from the discrete-particle nature of the lightwave, with the channel capacity scaling approximately linearly with the mean photon-optics arrival rate of signal photons R and scaling logarithmically with the bandwidth B . When the mean photon-optics arrival rate R is much larger than the bandwidth B, the term Cw is larger than the term C p . In this “fine-grained” regime, the most significant contribution to the bandlimited capacity is from the continuous-wave nature of the lightwave, with the capacity scaling approximately linearly with the bandwidth B and scaling logarithmically with the arrival rate of signal photons R. This “wave-like” behavior is in accordance with the bandlimited wave-optics capacity (cf. (14.4.16a)), with the arrival rate of signal photons R replaced by the scaled wave-optics information rate R E b / N0 . The transition between these two regions occurs when B = R = 1, meaning that one photon per second is transmitted in one hertz of bandwidth. At this crossover point, C p and Cw are equal, and the bandlimited capacity is two bits per second. In the regime for which C p is greater than Cw , the bandwidth is larger than the arrival rate R and there are more degrees of freedom per second expressed by the bandwidth B than photons per second expressed by the arrival rate R. Therefore, increasing R has a larger effect on C than increasing the bandwidth. In the regime for which Cw is greater than C p , the arrival rate of signal photons R is larger than the bandwidth B and there are more photons per second than degrees of freedom per second. Therefore, increasing B has a larger effect on the bandlimited capacity C than does increasing the arrival rate of signal photons R.

Wideband Capacity

The bandlimited capacity of a noiseless photon-optics channel when the single-letter capacity C( f ) is a function of frequency is the topic of this subsection. To analyze this case, the mean number of signal photons per hertz E( f ) at frequency f is determined

716

14 The Information Capacity of a Lightwave Channel

by dividing the power density spectrum S ( f ) given in (14.4.19) by the photon energy h f to give

( ) = Sh( ff ) = eh f /E1 − 1 ,

E f

(14.4.24)

where E is the mean signal energy. Substituting this expression into (14.2.7) leads to a frequency-dependent single-letter capacity per unit bandwidth C( f ) given by

( )=g

²

³

1

(14.4.25) −1 , where g(x ) is the Gordon function (1+ x )log(1+x )− x log x (cf. (14.2.3)). The bandlimited capacity C in bits per second is determined by integrating C( f ) over the bandwidth B as given by ¹ ¹ ² 1 ³ C= C( f ) d f = g h f /E (14.4.26) e − 1 d f. B B . To proceed, define f ´ = f / f max , where f max is the frequency of a photon with energy C f

e h f /E

E given in (14.4.20). The bandlimited capacity can then be written as

C ( B ´ ) = f max

¹



0

g

²

ef

´

³

1

−1

d f ´,

(14.4.27)

where B ´ = B / f max and only the band from zero to B ´ is used to convey information. This expression is plotted in Figure 14.15 with both the bandwidth and the information rate scaled by f max . When the bandwidth B is much smaller than the frequency f max of a photon with mean energy E , the single-letter capacity C( f ) is nearly constant in 5

π2 /( loge8)

4

)xamf / ( etaR noitamrofnI

3 2 1 0 0

2

4 6 Bandwidth (B/fmax)

8

10

Figure 14.15 Capacity as a function of the bandwidth showing the saturation of the information rate at about B / f max = 5. Both the capacity and bandwidth are scaled by the frequency f max of a photon with energy E .

14.4 Capacity of a Product Channel

717

frequency (cf. Figure 6.3) and the information rate grows linearly as the bandwidth increases. This condition defines the bandlimited regime considered earlier. When B is larger than fmax , the single-letter capacity C( f ) rapidly decays to zero (cf. Figure 6.3) and the information rate saturates because the photon energy is larger than the mean energy E and hence is unlikely to be generated by the encoder. The maximum channel capacity Cmax is determined by letting B ´ go to infinity. This gives

Cmax

=

f max

=

f max

Ä∞

¹∞ ² g

0

π2 , log 8

ef

´

1

³

−1

df´ (14.4.28)

e

´

where the definite integral 0 g(( e f − 1)−1)d f ´ = π 2 /log e 8 has been used21 with a base-2 logarithm for the Gordon function g(x ). This limiting value of (π 2 / log e 8) f max is shown in Figure 14.15. The maximum bandlimited capacity can be written in terms of the signal power Ps . This power is the integral of the power density spectrum given in (14.4.19) Ps

=

¹∞ 0

S ( f )d f

=

¹∞ 0

hf

eh f / E

−1

df

= π6

2

E2 . h

(14.4.29)

The derivation of this expression is asked for as an end-of chapter exercise. The exact expression is a factor of π 2/6 ≈ 1.6 larger than the approximate expression Ps = E 2/ h given in (14.4.21). ¾ Solving for the¾average signal energy in (14.4.29) gives E = 6h Ps /π 2, so we have fmax = E / h = 6Ps /π 2 h. This expression substituted into (14.4.28) yields

π Cmax = log 2 e

Å

2Ps 3h

bits per second.

(14.4.30)

Expressions (14.4.30) and (14.4.28) show that in the absence of additive noise, the maximum information rate Cmax in bits per second is directly proportional to the frequency f max of a photon with energy E. The constant of proportionality is equal to π 2/ loge 8, which is approximately 5. This leads to the order-of-magnitude estimate that the maximum information rate Cmax using all possible frequencies is approximately five times the frequency f max of a photon with the average energy E . As a numerical example, substituting an input signal power Ps of one milliwatt into (14.4.30) gives the maximum information rate Cmax as 4. 5 × 1015 bits per second. This value is about five times larger than the frequency f max of a photon with energy E in hertz, which is given by E / h = 9.6 × 1014 hertz (cf. (14.4.20)). Therefore √ fmax = E / h ≈ Ps / h provides a remarkably simple order-of-magnitude estimate of the maximum information rate of a noiseless photon-optics channel. 21 This definite integral and the similar definite integral below require evaluating terms that are expressed

using polylogarithm functions.

718

14 The Information Capacity of a Lightwave Channel

14.5

Spectral Rate Efficiency For an ideal flat bandwidth B, the single-letter capacity C, expressed in bits, and the bandlimited capacity C = B C, expressed in bits/s, are related by the spectral rate efficiency r = R / B. The spectral rate efficiency r , or simply the spectral rate with units of bits/s per hertz, is the information rate R per unit bandwidth B. For a wave-optics channel with additive white gaussian noise, use (14.4.16), along with r ≤ C / B, to give ² E³ C = C r Nb , r≤ (14.5.1) B 0

where C(·) denotes the functional form of the single-letter capacity of the channel under study as a function of Eb / N 0. The single-letter capacity depends on the product of the spectral rate efficiency r and the energy efficiency E b / N0 , where E b is the energy per uncoded databit. Values of r and E b / N 0 that satisfy (14.5.1) define the region where information can be reliably conveyed over a channel. For each channel, because the information rate R must be less than or equal to the bandlimited capacity C , the spectral rate efficiency r must be less than or equal to C / B. This section discusses the spectral rate efficiency both for a wave-optics channel with additive white gaussian noise and for a photon-optics channel with only photon noise.

14.5.1

Wave-Optics Spectral Rate Efficiency

Consider an ideal bandlimited white gaussian noise wave-optics channel in which the only constraint on the modulation format is a mean power constraint. For this channel, the single-letter capacity is given in (14.4.15). Substituting this expression into (14.5.1) gives r

≤ log2

²

Eb 1+r N0

³

.

(14.5.2)

For a fixed bandwidth and E b/ N0 much larger than one, the argument of the logarithm approaches r Eb / N 0. In this large-signal-to-noise-ratio regime, to increase the spectral rate efficiency r by one bit/s per hertz, the ratio E b / N0 must double for the inequality to be satisfied. Therefore, increasing the spectral rate efficiency is exponentially expensive in the energy per bit for a fixed information rate. The boundary for the achievable spectral rate efficiency as a function of E b / N 0 is obtained by rewriting (14.5.2) as Eb N0

r ≥ 2 r− 1 .

(14.5.3)

This inequality is illustrated by the shaded region in Figure 14.16(a). Information can be reliably conveyed using wave optics only for points in the shaded region. For a fixed value of N 0, the value of E b / N 0 when r = 0 represents the minimum signal-to-noise ratio (E b / N 0)min to communicate reliably at any rate using wave optics. Because the spectral rate efficiency r = R / B must approach zero and the information

14.5 Spectral Rate Efficiency

8

8

)zH/s/stib( etaR noitamrofnI

)zH/s/stib( etaR noitamrofnI

6 4 2 0 0

719

2

4

6

Eb/N0

8

10

N0 = 0

Wave-optics spectral efficiency

4 2 0

12

N0 = 1

6

0

2

4

6

Eb

8

10

12

/

Figure 14.16 (a) Wave-optics spectral efficiency as a function of E b N0. The shaded region is

where information can be conveyed reliably for an additive white noise gaussian channel. (b) A plot of the photon-optics spectral efficiency as a function Eb for N0 = 0 and N0 = 1. Also shown for comparison is curve (a) with E b / N0 replaced by Eb .

rate R is finite, this implies a wideband regime for which the bandwidth goes to infinity. This minimum can be determined by differentiation because C(x ) is a concave function so that the derivative d C( x )/dx decreases as x increases, where x = E b/ N0 . Then ( Eb / N0 )min is given by22

ÆÆ ³−1 (14.5.4a) (Eb / N 0)min = dx C(x ) x =0 ²d ÆÆ ³ −1 = dx log2(1 + x ) x =0 = loge 2, (14.5.4b) which is equivalent to −1.6 dB. This is the minimum value of E b / N0 required to com²d

municate reliably over a memoryless channel with additive white gaussian noise at any rate. This value is called the Shannon bound for the spectral rate efficiency. This value was used in Chapter 13 to quantify the performance of channel codes. The statement E b/ N0

≥ loge 2 ≈ 0.69,

(14.5.5)

which holds for wave-optics communication in gaussian noise, unites the dissimilar concepts of energy and information for wave optics. 14.5.2

Photon-Optics Spectral Rate Efficiency

The spectral rate efficiency for a lightwave signal on an ideal narrowband photon-optics channel for which the single-letter capacity is independent of frequency is determined using the single-letter capacity given in (14.2.7) with E replaced by r Eb . Substituting r Eb into (14.2.7), the spectral rate efficiency for N 0 = 0 is bounded by r 22 See Verdu (2002).

≤ log2 (1 + r Eb) + r Eb log2

²

1+

1 r Eb

³

.

(14.5.6)

720

14 The Information Capacity of a Lightwave Channel

This inequality is illustrated in Figure 14.16(b) with the curve labeled N0 = 0. As r goes to zero, Eb goes to zero as is evident in Figure 14.16(b). Therefore, for an ideal photonoptics channel with no additive noise, reliable communication at a nonzero information rate is possible for any level of signal energy. For a nonzero value of the additive noise N0 , the spectral rate efficiency of the photonoptics channel is given by (14.2.6) with E replaced by r Eb . Reliable communication now requires a minimum signal level (Eb )min as is the case for wave optics. This shift is shown in Figure 14.16(b) for N0 = 1. For N0 = 1, (Eb )min is equal to 0.58 or − 2.36 dB. This minimum required energy is smaller than −1.6 dB for wave optics by approximately a factor of (loge 2 + 1/2)−1 ≈ 0.84 or −0.77 dB, which is also plotted in Figure 14.16(b). For N0 much smaller than one, (Eb )min goes as N0, which is asked for as an end-ofchapter exercise. For large N0 , the photon-optics spectral rate efficiency approaches that of the wave-optics channel because the channel capacity is the same (cf. (14.2.20)).

14.5.3

Spectral Rate Efficiency for Constrained Modulation Formats

The capacity of an information channel constrained to use a particular modulation format depends on that format. This means that the spectral rate efficiency also depends on the modulation format and on the functional form of the inverse of the single-letter capacity C−1. For most modulation formats, this inverse function must be determined numerically. The spectral rate efficiencies for several phase modulation formats are shown in Figure 14.17. For a phase-modulation format, the single-letter capacity C as a function of E b / N0 is also a concave function both for hard-decision detection and for soft-decision detection. To determine ( E b/ N0)min for binary hard-decision detection, the probability of a bit 3.0

PSK limit

Unconstrained limit

)zH/s/stib( ycneiciffE lartcepS

2.5

QPSK soft

2.0

QPSK hard

1.5 BPSK soft

1.0 0.5 0.0

BPSK hard ~ 2 dB

–2

0

2

4

6

8

10

E b/N 0 (dB) Figure 14.17 The spectral rate efficiency for soft-decision detection and hard-decision detection

for BPSK and QPSK. The curve at the left is the same curve as that in Figure 14.16(a).

14.5 Spectral Rate Efficiency

721



error for BPSK, which is ρ( x ) = 21 erfc x for x = E b / N0 , is substituted into the expression for the capacity of a binary symmetric channel given in (14.3.17). The resulting expression is then substituted into (14.5.4a). The derivative can be evaluated 2 √ using (d/dx )erfc(x ) = − 2e−x / π . Taking the limit of the resulting expression as x approaches zero yields the value of π log e 2/2. A general expression valid for L-ary hard-decision detection of a phase-modulated signal is 23

( Eb/ N0 )min = π2 loge 2 for L = 2, 4 log e 2 for L ≥ 4. (14.5.7) = 2 42π L sin (π/ L ) The value ( E b/ N 0)min = π loge 2/2 = 0.37 dB is the same for BPSK ( L = 2) and QPSK (L = 4) because ( E b/ N 0)min is the same on a per-component basis. The limiting value for hard-decision detection of BPSK or QPSK is approximately 2 dB higher than the Shannon bound for soft-decision detection. This difference, shown in Figure 14.17(b), is an example of the penalty of hard-decision detection compared with soft-decision detection in a wideband regime. Figure 14.18 shows the probability of a detection error pe for binary phase-shift keying, the Shannon bound of −1. 6 dB for soft-decision detection, and the limit of 0.37 dB for hard-decision detection. These limits are valid for asymptotically small values of the bit error rate. The performance of all coded-modulated formats for a white additive gaussian noise channel, regardless of the complexity of the format, lies to the right of the Shannon bound. This means that, for binary phase-shift keying, the maximum coding gain for any type of code is approximately 11.2 dB at a bit error rate of 10 −5. 0

–2 etaR rorrE tiB goL

Uncoded BPSK –4 11.2 dB –6

Hard–decision limit for BPSK

Shannon limit –8

–4

–1.6

0.37

4

8

12

E b/N 0 (dB) Figure 14.18 The maximum achievable coding gain for binary phase-shift keying using soft

decisions and hard decisions.

23 See Gursoy (2007) for details.

722

14 The Information Capacity of a Lightwave Channel

14.6

Nonlinear Lightwave Channels Much of this chapter has dealt with the capacity of an information channel derived from a linear and bandlimited lightwave waveform channel. However, this model of a waveform channel is limited by nonlinearities in the fiber. Accordingly, the channel capacity in the presence of a nonlinearity will now be studied. For this purpose, an abstract description of a nonlinear information channel suitable for analysis must be stated. This model must be simple enough to study, yet realistic enough to be informative. The capacity is defined for a specific information channel defined as a black box characterized only by the relationship between the input and the output that describes the channel to the decoder. The information-theoretic channel capacity is a parameter describing a specific information channel. It is a mathematical statement regarding the maximum rate that can be transmitted through a fixed information channel. The information channel lies between the output of the encoder and the input to the decoder. It is deemed to contain all processes that are not accessible to the encoder and decoder, and is described by a probabilistic relationship between the information-channel input and the information-channel output. Different information channels can be defined for the same physical medium. Therefore, the specification of the information channel requires a clear statement of the interface between those functions that are considered to be part of the encoder/decoder and those functions that are considered to be part of the information channel. The capacity then applies to the information channel as so described. For example, Figure 14.9 shows the Shannon capacity for the linear additive gaussian noise information channel. It also shows the capacity for a different information channel that is constrained to use a specific QAM signal constellation. Each information channel has its own channel capacity. The most important nonlinearity in the study of lightwave channels is the nonlinear phase due to the Kerr nonlinearity. Therefore, this section will idealize this nonlinear channel as a black box called the Kerr nonlinear lightwave information channel, or the Kerr lightwave channel, defined as follows. The Kerr lightwave channel, as studied herein, consists of one or more segments of a fiber span with the signal in each of N segments of length L / N satisfying the nonlinear Schrödinger equation (cf. (5.4.3)), including dispersion and attenuation, with the Kerr nonlinearity as the only nonlinearity. Each segment output signal is linearly amplified and gaussian noise is inserted prior to the noisy signal entering the next segment. Any input signal s (t ) is allowed as the input to the Kerr information channel. This signal may consist of a single carrier, possibly preprocessed within the transmitter electronics, or it may consist of multiplexed carriers that are treated separately at the transmitter and the receiver, or it may consist of multiple carriers treated as a single block mimo signal recovering all datastreams simultaneously. Each of these cases defines a different information channel.

14.6 Nonlinear Lightwave Channels

14.6.1

723

The Full Kerr Lightwave Channel

The study of the Kerr lightwave channel begins with a single wavelength carrier. The capacity of the Kerr lightwave channel – even for a single wavelength – may not be computable in a conventional way. The single-letter capacity of the Kerr information channel based on a full solution to the single-wavelength nonlinear Schrödinger equation (cf. (5.4.2)) is not known, and perhaps not knowable. This is because the nonlinearity and dispersion spread the spectrum and mix the noise with the signal. The resulting noiselike impairments at the channel output are signal-dependent, and, when modeled as a random process, are generally nongaussian with complex internal memory in the noise-like impairments. A discrete-time model based on a discrete set of basis functions derived for a set of bandlimited waveforms cannot be directly applied to the Kerr information channel. This is because the nonlinearity redistributes the signal energy in frequency (cf. Chapter 5). The waveform at the output need not have finite, signal-independent support in the frequency domain. Even when the channel output is observed only in a designated bandwidth, the output bandwidth need not be equal to the input bandwidth, and may depend on the signal power. This means that the signal space at the channel output can be larger than the signal space at the channel input, with the number of dimensions of the output signal space depending on the input signal power. The full Kerr information channel has a complex form of memory. To compute the capacity of a channel with memory, one must assign a conditional probability distribution to each output sequence, then compute a prior on the set of input sequences. The length of the sequences is made large enough to accommodate the memory. Because the complexity is exponential in the sequence length, this approach becomes intractable for systems with a large nonlinear constraint length. The channel capacity provides meaningful performance limits only for a decoder that fully accounts for the nature of the channel. Such a decoder would employ massive computation and may be unrealistic. A simplified model may be informative. Several simplications are evident. The first simplification, discussed in Section 14.6.3, considers only the effect of the nonlinearity and ignores dispersion. This simplification leads to a single-wavelength dispersionless channel corrupted by nonlinear phase noise for which the bandwidth is not constrained. The joint probability density function of the noise then can be determined analytically. Another simplification ignores the frequency spectrum that lies outside of the original band. This is an information channel appropriate for an encoder and decoder that do not use that part of the frequency spectrum, which is a form of energy attenuation. Within this simplification, the signal space at the channel output does not change compared with the signal space at the channel input. A further simplification neglects the attenuation. This means that the effect of the nonlinearity is to redistribute the signal energy within the original bandwidth without increasing the bandwidth. This is called a quasi-linear approximation and is the basis of the Kerr multiplex information channel discussed in Section 14.6.4. For that information channel, each wavelength subchannel

724

14 The Information Capacity of a Lightwave Channel

is processed independently at the receiver, with the nonlinear interference between multiple wavelength subchannels treated as signal-dependent noise. 14.6.2

The Memoryless Kerr Information Channel

The Kerr channel can produce substantial memory both in the signal and in the noise at the output of the information channel. Instead, the memory can be hidden at the output by standard methods of codeword interleaving. This replaces the information channel with a memoryless substitute. Although this memoryless channel now has a smaller capacity, this is the capacity for an encoder/decoder that treats the output as memoryless. Instead, the memory can be partially removed by using the split-step backpropagation algorithm on the output sequences as the final operation prior to the output of the information channel. The output of the information channel then consists of the resulting back-propagated sequences, which can be modeled as memoryless blocks to calculate the capacity. Because this back propagation is a massive computation, it is currently impractical, though not intractable. 14.6.3

Dispersionless Channel

The solution of the nonlinear Schrödinger equation in a lossless, dispersionless singlemode, single-carrier system is given in (5.4.10) and repeated here as

.

a(z , τ) = s (τ)e −iγ| s (τ)|

, (14.6.1) where a(z , τ) is a complex signal envelope, s (τ) = a(0, τ) is the transmitted lightwave signal expressed using a root-mean-squared amplitude, and γ is the fiber nonlinear 2z

coefficient defined in (5.3.10). In a lossless, dispersionless fiber, the amplitude of the signal s (τ) launched into the fiber is not affected by the nonlinearity. Circularly symmetric gaussian noise added to the channel between segments corrupts the amplitude of the signal. However, when there is no dispersion, no additional amplitude fluctuations are generated during subsequent propagation within a segment. This means that under the simplification that the signal space of the channel output is nearly the same as the signal space for the channel input, the marginal probability density function of the magnitude of the signal plus noise is a ricean probability density function just as would be the case if there were no nonlinearity.24 For large powers, however, this approximation breaks down. To derive the capacity, form the mutual information for the received complex symbol r given a transmitted complex symbol s I (s; r) = H (r) − H (r| s),

where the conditional differential entropy is (cf. (14.1.3a)) H (r|s) = − 24 See Mecozzi (1994a, b).

¹∞

−∞

f (s)ds

¹∞

−∞

f (r|s) log f (r|s)dr.

(14.6.2)

14.6 Nonlinear Lightwave Channels

725

The maximum uncertainty is generated whenever the channel removes any structure between the signal components so that the magnitude and the phase are independent. Accordingly, a lower bound for the mutual information can be obtained by replacing the joint conditional probability density function f (r|s) by the product of the conditional marginal probability density functions f 1(r | A ) f2 (φ|θ). This replacement removes the correlation between the magnitude r and the phase φ at the receiver. A lower bound is obtained by replacing the marginal probability density function of the phase by a maximum-entropy, uniform probability density function. Integrating over the uniform probability density function of the phase, the two-dimensional joint conditional probability density function reduces to the one-dimensional conditional probability density function of the magnitude of the complex signal s, with f (r|s) =

1 f (r | A ), 2π

(14.6.3)

where√ f (r | A) is a conditional ricean probability density function (cf. (2.2.33)) with A = E and σ 2 = N0 /2. The term 1/2π can be suppressed. Under the approximation that the received phase is independent of the transmitted phase and is uniformly distributed, the mutual information is maximized using only the prior f ( A ) for the magnitude A = |s| of the input distribution. Substituting (14.6.3) into (14.6.2), the conditional entropy is H (r|s) = −

¹∞ 0

f ( A )dA

= H (r | A).

¹∞ 0

f (r | A )log f (r | A)dr (14.6.4)

Repeating this process for the received entropy gives H (r) = H (r ). Upper-bounding the conditional entropy produces a lower bound on the capacity, C

= H (r) − H (r|s) ≥ H (r ) − H (r | A ) ² ³ ≥ 21 log2 1 + NE + O (1), 0

(14.6.5)

which has the same form as the soft-decision detection capacity C for intensity modulation for a large signal-to-noise ratio (cf. (14.3.11)) because the information channel using the magnitude of a signal has the same capacity as that using the magnitude squared of the signal. In a large-signal regime C is approximately half the unconstrained capacity (cf. (14.3.11)). Therefore, for a dispersionless channel, the effect of the nonlinearity randomizing the phase is to reduce the capacity by at most a factor of two. This is a lower bound because the received phase is still partially correlated with the transmitted phase and this correlation could be used to convey information. An alternative method of modulation that achieves the same information rate uses only phase modulation. While this may seem counterintuitive, the nonlinear self-phase modulation in (14.6.1) depends on | s (τ)| 2, which is the squared magnitude of the input signal. For phase modulation, this value is a constant and generates a constant nonlinear

726

14 The Information Capacity of a Lightwave Channel

phase shift. This constant phase shift can be estimated and compensated using the methods discussed in Section 12.2. 14.6.4

Kerr Wavelength-Multiplex Information Channel

A fiber may be used to carry multiple wavelengths as separate multiplex subchannels with separate receivers. A Kerr multiplex information channel is a bandlimited wavelength-multiplex waveform channel in a dispersive fiber channel with a Kerr nonlinearity. Each wavelength subchannel experiences nonlinear interference from the other subchannels (cf. Section 11.6), but the receiver for each wavelength subchannel can observe only that single wavelength subchannel. Yet all wavelength subchannels contribute to the nonlinearity that is seen by each wavelength subchannel. This causes interference between subchannels. For a quasi-linear channel, to the first order of approximation, the bandwidth of the waveform at the channel output is equal to the bandwidth of the waveform at the channel input. Each multiplex subchannel is processed separately using single-user detection (cf. Section 8.1.3), with the detection statistic based only on the received signal in that subchannel. The nonlinear interference is treated as noise. The uncompensated dispersion in the fiber span causes the other wavelength subchannels to contribute to the nonlinear interference power PNL in the subchannel of interest. Asserting the central limit theorem for independent subchannels, a sample of the nonlinear interference can be modeled as a circularly symmetric gaussian random variable 2 with an interference power PNL = 2σNL (cf. (11.4.3)) that has a Ps3 power dependence, where Ps is the total signal power in all the subchannels (cf. (5.4.2)). Under this gaussian model, the total noise power is the sum of the nonlinear interference power PNL and the independent additive-noise power Pn . Given the independence of the noise and the posited independence of the nonlinear interference, the conditional entropy H (r|s) reduces to the noise entropy H (n ) (cf. (14.2.1)), with the mutual information given by (14.2.1), I (s; r) = H (r) − H (n ).

(14.6.6)

The received probability density function that maximizes the mutual information is a circularly symmetric gaussian probability density function with an expected value defined over the bandwidth B of the channel (cf. (14.4.16b)). For a rectangular bandlimiting filter, each subchannel transmits the same power. For this case, the received differential entropy can be expressed as (cf. (14.3.3)) H (r) = log(eπ( Ps

+ Pn + P )),

(14.6.7)

NL

where Ps is the signal power, Pn is the additive-noise power, and PNL is the nonlinear interference power. The entropy of the noise over the same band is H (n) = log (eπ( Pn + PNL )). Substituting these expressions into (14.6.6) and using C bandlimited capacity C is

(14.6.8)

=

B C (cf. (14.4.16)), the

14.6 Nonlinear Lightwave Channels

727

= BC = B (log (eπ( Ps + Pn + P )) − log (eπ( Pn + P ))) ² ³ = B log Ps +P P+n P+ P ² n ³ 1 = B log 1 + , (14.6.9) OSNR−1 + OSIR−1 where OSIR = Ps / Pn is the optical signal-to-noise ratio (cf. (2.2.71b)), and OSIR = Ps / P is the optical signal-to-interference ratio defined for the nonlinear interference C

NL

NL

NL

NL

NL

in the absence of additive noise. As the OSIR goes to infinity, meaning that there is no nonlinear interference, (14.6.9) reduces to (14.4.16b) and recovers the bandlimited capacity for a linear channel. The effect of the nonlinear interference term PNL on the capacity can be simulated by numerically solving the nonlinear Schrödinger equation or may be approximated using an analytical expression. Given that the source term in the nonlinear Schrödinger equation has a Ps3 dependence, the nonlinear interference power can be written as PNL = αNL Ps3 , where the scaling constant αNL depends on the specific system. Using this expression, OSIR−1

= P / Ps = α NL

NL

Ps3 / Ps

= K OSNR2,

(14.6.10)

where K = αNL Pn2 is a power-dependent scaling factor that determines the strength of the nonlinear interference. When K is small, the effect of the nonlinearity is evident for large values of the OSNR. When K is large, the effect of the nonlinearity is evident for small values of the OSNR. Substituting expression (14.6.10) into (14.6.9) gives

C

= B log

²

1+

1

OSNR−1 + K OSNR2

³

.

(14.6.11)

This capacity is shown in Figure 14.19 in units of bits per second. It has a maximum value of log 2(1+ 32 (2K )−1/3 ) when the OSNR equals (2K )−1/ 3. Because the transmitted signal power is considered a flexible attribute of the information channel, the power used to achieve the capacity may be less than the maximum permitted power as specified by the OSNR. Accordingly, for values of the OSNR greater than (2K )−1/3 , the transmitted power used to achieve the capacity is smaller than the maximum permitted power, with the capacity given by C = log 2(1 + 32 (2K )−1/3 ). A memoryless gaussian model combining the noise and the signal-dependent interference is tractable, but may be an inaccurate model of the information channel. This is because modeling the effect of interference as a maximum-entropy gaussian distribution ignores any memory in the dispersive lightwave channel. The dispersion produces dependences in the interference, reducing the uncertainty and the corresponding conditional entropy. Moreover, because the nonlinearity is deterministic in the absence of noise, using knowledge about the set of datastreams at the transmitter and jointly detecting the datastreams at the receiver can remove a substantial amount of the nonlinear interference, leading to a different information channel model with a larger capacity. That channel is considered next.

728

14 The Information Capacity of a Lightwave Channel

10

Shannon bound

8 )s/stib( yticapaC

K = 10–8

6

K = 10–6

4 K= 10–4 2 0 0

5

10

15 OSNR (dB)

20

25

30

Figure 14.19 The bandlimited capacity for a wavelength-multiplex channel given in (14.6.11) for

three values of the nonlinear interference parameter.

14.6.5

Kerr Wavelength MIMO Channel

A wavelength mimo channel regards the signal in the set of wavelength channels as a single block signal both at the channel input and at the channel output. At the channel input, controlled forms of interchannel cross-coupling may be applied to precompensate for the effects of the nonlinearity. At the channel output, the block wavelength signal is jointly processed using multi-user detection. In one form of joint processing, the additional information is used for interference cancellation. For this method, an estimate of the nonlinear interference is subtracted from each subchannel of the received block signal. This leads to a wavelength mimo information channel that is different than the wavelength-multiplex information channel based on single-user detection. Accordingly, the wavelength mimo channel has a different channel capacity. Let the received output r k of the kth subchannel be written as (cf. (11.4.3)) rk

= s k + ck + n k ,

(14.6.12)

where s k is the signal in the kth subchannel, ck is the nonlinear interference, and n k is a zero-mean additive-noise term. In the absence of noise, the nonlinear interference term ck for each subchannel is a deterministic function of the block signal defined using all of the subchannels. This is an approximation because the mixing of the signal with the noise in each amplified fiber segment is random. Given estimates cˆ jk of the nonlinear interference term, an estimate sˆ jk of the j th sample in the kth subchannel can be computed by subtracting cˆ jk from the received sample r jk to partially cancel out the nonlinear interference.25 This leads to an information channel with a capacity that is at least as large as the capacity of the wavelength-multiplex channel. 25 See Taghavi, Papen, and Siegel (2006) and Temprana, Myslivets, Kuo, Liu, Ataie, Ali´c, and Radi´c (2015).

14.7 References

729

The capacity of a wavelength mimo channel is at least as large as the capacity for a wavelength-multiplex channel because it uses all of the available information in the block signal instead of information in the individual wavelength subchannels considered in isolation. The relationship between the capacity of mimo channels and multiplex channels is analogous to the relationship between soft-decision detection and hard-decision detection. 14.6.6

The Capacity Using the Envelope Method

The memoryless Kerr information channel is an abstract channel with a continuous input alphabet and memoryless signal-dependent noise whose mean and variance are equal to the single-letter marginalization of the stochastic impairments of the Kerr information channel. The signal-dependent noise mean and variance can be obtained from simulation or experimental data. The capacity of the memoryless Kerr information channel can be computed using the envelope method (cf. Section 14.3.2). For a linear channel, this method calculates the capacity as a function of E b / N0 for a sequence of discrete information channels based on finite-point signal constellations of increasing size, then graphs the envelope of this set of capacities as in Figure 14.19. The generalization to the calculation for the memoryless Kerr information channel differs from the linear case only in the form of the transition probabilities, which are modeled using a memoryless signal-dependent noise source as described earlier. This approach decouples the physical-layer channel model, which defines the transition probabilities, from the calculation of the channel capacity based on those transition probabilities. While the physical-channel model includes the difficult issue of defining the support of the channel output, once the transition probabilities have been determined, the channel capacity calculation is straightforward. The capacity is calculated as a function of the optical signal-to-noise ratio (OSNR) for each finite-point signal constellation, with a new transition probability matrix Q calculated for each value of the OSNR. The calculation is terminated when increasing the OSNR does not increase the capacity. The number of points in the constellation is then increased, and the process is repeated until increasing the size of the signal constellation or the OSNR does not increase the capacity. The envelope of this set of curves is an indirect method that describes the channel capacity of the unconstrained nonlinear channel. This approximate method requires the transition probability matrix Q from the physical-channel model, obtained by simulation or measurement.

14.7

References Aspects of information theory are considered in Blahut (1988, 2020) and in Cover and Thomas (1991, 2006). The channel capacity of intensity-modulated systems is covered

730

14 The Information Capacity of a Lightwave Channel

in Stern (1960), in Gordon (1962), in Yamamoto and Haus (1986), and in Hall (1994). The capacity for the continuous limit for intensity modulation was first addressed by Jelonek (1953) and Blachman (1953). These results were applied to a continuous lightwave channel by Gordon and Bolgiano (1965), and more recently by Lapidoth (2002), by Hranilovic and Kschischang (2004), and by Katz and Shamai (2004). Water-filling is covered in Cover and Thomas (1991, 2006). Kramer, Ashikhmin, van Wijngaarden, and Xing (2003) discuss spectral rate efficiency, as does Gursoy (2007). The spectral efficiency in the wideband regime is discussed in Verdu (2002). The capacity and spectral efficiency of lightwave channels are summarized Kahn and Ho (2004). Computation of the optimal prior that achieves the channel capacity is discussed in Blahut (1972), in Arimoto (1972), and in Kschischang and Pasupathy (1993). The capacity of nonlinear lightwave systems is discussed in Dar, Shtaif, and Feder (2014), in Kramer, Yousefi, and Kschischang (2015), and in Agrell, Alvarado, and Kschischang (2016). The capacity for a memoryless channel in the presence of nonlinearities is covered in Turitsyn, Derevyanko, Yurkevich, and Turitsyn (2003). The capacity for a Kerr nonlinear lightwave information channel using phase modulation is covered in Ho and Kahn (2002). The envelope method is discussed in Fehenberger, Alvarado, Böcherer, and Hanik (2016).

14.8

Historical Notes The subject of information theory is due to Shannon (1948), republished in book form by Shannon and Weaver (1949, 1998). Shannon showed that all forms of point-to-point communication are fundamentally the same, with any information source represented by logical symbols. He also defined the channel capacity and proved it to be useful. Early works incorporating quantum effects into Shannon’s theory include Gabor (1953), Stern (1960), and two important papers by Gordon. The first Gordon paper (1962) showed how quantum effects can change the channel capacity compared with an analysis using only wave optics. That seminal paper derived the form of the geometric distribution that bears Gordon’s name and the narrowband channel capacity. The second paper by Gordon (1964) provided a conjecture for the capacity of a lightwave channel, including the discrete-energy property of a lightwave signal. The proof of this conjecture is discussed in Chapter 16. The wideband channel capacity of an ideal photon-optics channel, which leads to the maximum information rate, was addressed later in Zador (1965), in Lebedev and Levitin (1966), and in Bowen (1967). The effect of a Kerr nonlinearity on the capacity of a lightwave system appears to have been considered first in Splett, Kurtzke, and Petermann (1993) and then later in Mitra and Stark (2001) and Stark, Mitra, and Sengupta (2001). Modulation methods based on using the eigenvalues of the nonlinear Fourier transform were first proposed in Hasegawa and Nyu (1993). These methods were later revisited by Yousefi and Kschischang (2014a, b, c), by Prilepsky, Derevyanko, and Turitsyn (2013), and by Turitsyna and Turitsyn (2013).

731

14.9 Problems

14.9

Problems 1 Entropy (a) Show that H s s 0. (b) Show that when s and z are independent discrete random variables,

( | )=

H

( (s + z)|s) = H (z).

(c) Derive the identities H (s , r ) = H (s) + H (r |s) = H (r) + H (s| r). This set of identities, generalized to multiple random variables, is called the chain rule for the entropy. 2 Mutual information for an asymmetric channel The mutual information for a binary channel, given in (14.2.14), is

³

²

²

³

p1| 0 p0|0 + p0 p1|0 log I (s ; r) = p0 p0|0 log p0 p0|0 + p1 p0| 1 p1 p1|1 + p0 p1|0 ² ³ ² ³ + p1 p0|1 log p p p+0|1p p + p1 p1|1 log p p p+1|1p p . 0 0| 0 1 0|1 1 1|1 0 1|0 (a) Using this expression, set p1| 0 = p0| 1 = ρ and p1|1 = p0|0 = 1 − ρ = pc . Show that the resulting expression is equal to the mutual information of a binary symmetric channel. (b) Show that the prior p = (1/2, 1/2) achieves the capacity of the resulting binary symmetric channel. 3 Discrete capacity using an exponential probability density function The large-signal limit for the capacity of a Poisson channel is C

= H (r) − H (r|s) = 12 log E = 12 Cw .

This limit can be derived using a central chi-square probability density function with one degree of freedom. Show that when this probability density function is replaced by an exponential function for p(s ), the resulting capacity is smaller than 1 1 2 Cw by the constant term 2 (loge (2 π)− 1 −γ ), where Euler’s constant γ is 0.5772. 4 Entropy of a circularly symmetric gaussian probability density function Determine the entropy H s of the probability density function

()

f ( s) =

1 −| s| 2/2σ 2 s , e 2πσs2

732

14 The Information Capacity of a Lightwave Channel

where |s|2

= x 2 + y2 , and show that H (s) = log2 (2π eσs2 ) = log2 (π eE ) bits, where E = 2σs2 is the expected signal energy, which is (14.3.3).

5 Entropy of a Poisson probability mass function

(a) Show that the entropy of the Poisson probability distribution H (E) = −

∞ ±

p(k)loge p(k)

= ∞ Ek e−E ±

Ç

k 0

=−

k

=

k 0

!

can be written as H (E) = E − E loge E + e−E

log e

Ek e−E k

È

!

∞ Ek log k! ± e k! k=2

nats.

(b) Using this expression, derive an approximation for the entropy of a Poisson probability mass function for E much smaller than one. 6 Upper bound on the entropy of a Poisson probability distribution in a large-signal regime An upper bound on the entropy of a Poisson probability distribution p s with mean E much larger than one can be obtained by forming a continuous random variable x s u, where u is an independent continuous uniform random variable defined on the interval 0 1 . Given that the gaussian probability density function is the maximum-entropy continuous distribution, the entropy H s can be bounded as

()

= +

[, ]

H (s) ≤

1 2

loge (2π eσ s2 ),

()

where σs2 is the variance of s. Using this relationship, show that H (s) ≤

and thus show that for E

µ1

1 2

loge (2π e(E + 1/12 )),

H (E) ≤

1 2

loge (2π eE),

confirming that, for a large mean, the entropy of a Poisson probability distribution approaches the entropy of a gaussian probability density function with the same variance. 7 Covariance matrix of the received block signal Using r Hs n and the definition of the covariance matrix for a multivariate circularly symmetric gaussian random variable z

=

+

(

)

V =. ± z − ±z² (z − ±z²)† ²,

14.9 Problems

733

show that when s is a multivariate gaussian random variable with a channel input covariance matrix Vs , the channel output covariance matrix Vrec can be written as

Vrec = N0 IK + HVs H† , where IK is the K

× K identity matrix.

8 Magnitude-only capacity and phase-only capacity (a) Under what conditions is the sum of the phase-only capacity and the magnitudeonly capacity equal to the capacity of an unconstrained modulation format? (b) Explain why this approximation breaks down when the condition determined in part (a) is not satisfied. 9 Binary capacity in a small-signal regime Consider the binary detection of a lightwave signal in a small-signal regime for which the expected number of photons Eb is much smaller than one. (a) Derive the optimal threshold in terms of Eb and for prior probability p1. (b) Derive the terms required to form the mutual information as a function of p1 and Eb . (c) Determine the capacity for Eb 1. (This requires numerical root finding.) (d) Expand the expression for the mutual information in a power series in Eb , keeping only the first term. (e) Using the term from part (d), determine the value of p1 that maximizes the mutual information for a given value of Eb . (f) For the optimal prior probability determined in part (e) with Eb much smaller than one, show that the expression for the mutual information reduces to

´

=

−Eb log2Eb

bits,

which is the small-signal limit of the entropy of a Poisson probability distribution. 10 Capacity of a multi-input multi-output channel with correlated noise The capacity of a multi-input multi-output channel with uncorrelated noise samples, given by (14.4.7), is

²

C

= loge det I + K

1 HVs H† N0

³

.

Using the noise entropy H (n) of a multi-input multi-output channel with correlated noise samples, H (n ) = loge det(π eV N )

nats

(cf. (14.4.4)), modify the expression for the capacity to account for the correlated noise. 11 Water-filling and the capacity of a multi-input multi-output channel Consider a multi-input multi-output additive gaussian noise channel that supports

734

14 The Information Capacity of a Lightwave Channel

three subchannels. The effective noise energy Nk = N 0/ξk for each of these subchannels is shown in the figure below, where ξ k is the kth eigenvalue of the matrix HH† , with H being the channel matrix.

N1

N3 N2

(a) Suppose that the total energy E available for transmission satisfies E > ∑3 N , where N is the effective noise power density spectrum in each k k =1 k subchannel. Graphically solve for the optimal energy allocation per subchannel using water-filling, and determine the optimal value of the energy E k for each subchannel in terms of E and N k . (b) Determine the capacity Ck of each subchannel. (c) Determine the overall capacity. (d) Now suppose that there is a fourth subchannel with an effective noise N 4. Determine the capacity for this system. Compare this result with the result for the three-subchannel system using the same total energy. 12 Maximum information rate in terms of the arrival rate The expression for the maximum information rate of an ideal photon-optics channel is given by (cf. (14.4.28))

Cmax = f max

π2 , log 8 e

and is expressed in terms of the frequency fmax of a photon with average energy E . In this problem, an equivalent expression is derived upon expressing the capacity in terms of the signal power Ps by relating the energy E to the signal power Ps . (a) Integrate the power density spectrum

S( f ) = h f

²

1

eh f / E

³

−1 ,

over an infinite bandwidth to derive an expression that relates the total signal Ä∞ power Ps to the energy E. The definite integral 0 x /(ex − 1)dx = π 2/6 will be useful. (b) ¾ Use the expression derived in part (a) to show that fmax = E / h is equal to 6Ps /(π 2 h ). (c) Substitute the expression derived in part (b) into the expression for the bandlimited capacity to show that

π C= log 2 e

which is (14.4.30).

Å

2Ps 3h

bits per second ,

735

14.9 Problems

13 Water-filling on a mimo channel A 2 2 mimo channel has a complex channel transfer function matrix (cf. (8.1.19b)) given by

×

H( f ) =

»

H11( f ) H21( f )

H12 ( f ) H22 ( f )

¼

.

The channel outputs are each contaminated by independent circularly symmetric gaussian noise. Generalize the water-filling formulas to this mimo channel. 14 Bandlimited capacity The bandlimited capacity for a wave-optics channel can be written in terms of a scaled wave-optics information rate Rw R E b N0 ,

=. / C = B log2 (1 + Rw / B ) bits per second.

(a) Using a small-signal expansion of this expression, show that when B is much smaller than Rw , the bandlimited capacity scales linearly in B. (b) Using a large-signal expansion of this expression, show that when B is much larger than Rw , the bandlimited capacity saturates to a value given by Rw/log e2. 15 Photon-optics spectral rate efficiency The spectral rate efficiency of a photon-optics channel including additive noise satisfies the following inequality (cf. (14.2.6) and (14.5.1)):

r

≤ log2

²

1+

r Eb

1 + N0

³

+ (r Eb + N0)log2

²

1+

1 r Eb

³

+ N0 − N0 log2

²

1+

1 N0

³

.

(a) Expand the right side of this expression in a power series in r Eb up to the linear term. (b) Set the expression derived in part (a) equal to r, and solve for Eb in terms of N 0. This expression can be used to determine (Eb )min when r equals zero. (c) Set N0 = 1 in the expression derived in part (b), and show that (Eb )min is equal to loge 2/(loge 2 + 1/ 2). This expression is smaller by a factor of (loge 2 + 1/2) than ( E b / N0 )min = log e 2 for a wave-optics channel. (d) Show that when N0 is much smaller than one, (Eb )min goes as N 0. 16 Asymmetric channel An asymmetric information channel has a signal constellation with points of different energies and signal-dependent noise so that the variance of each conditional probability distribution function is different. (a) Using counterexamples, show that when only one of these conditions is satisfied, a symmetric information channel is generated. (b) When both conditions are satisfied, can a symmetric information channel still be produced? Provide a quantitative justification for your answer.

15

The Quantum-Optics Model

The quantum-optics signal model is studied in this, the second of the three concluding chapters of the book. A description of a lightwave communication system at its most fundamental level requires the quantum-optics signal model. This signal model incorporates the seemingly incompatible wave properties and particle properties of a lightwave under a single unifying framework that includes both wave optics and photon optics as simplified models. The use of these two models side-by-side, as was done in the previous chapters, is now validated in this penultimate chapter. The quantum framework describes several physical properties of a lightwave signal that cannot be described using the other signal models. Accordingly, wave optics and photon optics are reconciled by this larger class of signals. This reconciliation is used in Chapter 16 to determine how to convey classical information, expressed in the form of bits, using quantum-lightwave signals. The description of a quantum-lightwave signal and the associated description of the operations of modulation, demodulation, and detection require mathematical methods that are extensions of the methods used to describe wave optics and photon optics. To incorporate the additional properties admitted for quantum-lightwave symbols or sequences requires a mathematical object called a signal state, which is, in general, described as a vector in an infinite-dimensional signal space, distinct from conventional physical space. Moreover, measurement operations such as photodetection are described in a fundamentally different way than are classical measurements. The resulting mathematical framework provides a full and consistent description of lightwave properties, yet lacks the everyday intuition used to describe a classical communication system. Understanding of the quantum behavior is based on an understanding of the mathematical rules describing that behavior. The challenge for this chapter is to seamlessly extend the mathematical methods used to analyze classical communication systems so as to develop the methods and insight required to analyze a quantum-lightwave communication system.

15.1

An Operational View of Quantum Optics Before developing the mathematical formalism of quantum optics, which begins in Section 15.2, and of statistical quantum optics, which begins in Section 15.4, this section provides an operational overview of quantum optics. This overview motivates the

15.1 An Operational View of Quantum Optics

737

mathematical formalism that comes in the next section, and introduces some unfamiliar concepts. Informally, quantum optics reconciles and unifies the seemingly incompatible wave properties expressed by the wave-optics signal model and the particle properties expressed by the photon-optics signal model. This chapter shows that an appropriately defined signal state within quantum optics – called a coherent state – exhibits both particle-like properties and wave-like properties, with the properties that are observed depending on the kind of measurement employed as well as on the expected signal level.

15.1.1

Lightwave Signal States

The primary differences between a classical communication system and a quantumlightwave communication system are diagrammed in Figure 15.1. The lightwave source used within the transmitter is discussed first. A classically ideal, noise-free, unmodulated wave-optics lightwave source is described as a deterministic passband signal A cos(2π fc t + φ) at a carrier frequency f c with both a constant amplitude A and a constant phase φ. Within the photon-optics signal model, a classically ideal, constantamplitude lightwave source emits a Poisson-distributed random number of photons in a time interval of duration T , with the Poisson arrival rate depending on the square A2 of that constant wave-optics amplitude. The randomness in the number of photons in a time interval of duration T is incorporated by the Poisson transform of the probability distribution of the random energy defined over T using a wave-optics signal model. This relationship is shown pictorially on the right side of Figure 15.2. In quantum optics, the physical properties of a lightwave source that conveys classical information are elegantly described as the outcomes of operations applied to an Classical communication channel Channel

Transmitter User data

Encoding

Baseband modulation

Frequency translation

Fiber channel

Receiver Filtering/ Photodetection/ Codeword symbol Decoding Decoded demodulation detection user data detection ˆ

d

d

User data d

Fiber channel

Subset Set of all admissible quantum-state preparation techniques

Subset

Decoded user data ˆ

d

Set of all admissible quantum-state detection techniques

Quantum-Optics Communication Channel

Figure 15.1 A simplified comparison of a classical communication system and a quantum-optics communication system, showing the larger set of admissible operations.

738

15 The Quantum-Optics Model

Quantum-Lightwave Signal Model

Continuous signals

Coherent state Displacement operator

Discrete si gnals

Classical Signal Models Phasesynchronous demodulation (large signal values) Ph o (an ton c ys o ign untin al v g alu e)

Vacuum photon-number state

Wave-optics signal model Poisson transform Photon-optics signal model

Figure 15.2 Relationships between lightwave signal models.

abstract object called a signal state taking values in an appropriately defined signal space, which is usually infinite-dimensional. Examples of signal states for conventional lightwave signals are given in Section 15.3. Others will be given later. A signal state evolves according to the rules of quantum optics. The signals used for conveying classical information in the form of bits are produced by a state-preparation process. At the receiver, there is a corresponding state-detection process. The statepreparation process maps letters from an alphabet onto signal states. The mapping depends on the kind of lightwave signal state used for the lightwave source. Different kinds of modulated lightwave signal states can have different observable properties. Operationally, this means that the observable properties of the signal state described by the combined state-preparation/state-detection process might be wave-like, particlelike, or such that these properties cannot be readily described using either wave optics or photon optics. A property of a lightwave is determined from the outcome of a measurement. The measurement is described by a transformation of the signal state. To determine the outcome of a measurement, a signal state is expressed in a basis suitable for that measurement. Each state in the measurement basis is associated with a measurable outcome. The set of measurement states may be finite, countably infinite, or uncountably infinite depending on the property of the lightwave to be measured. The vector (or function) of expansion coefficients that expresses an arbitrary signal state in a specified basis is called the quantum wave function of the signal state expressed in that basis. The componentwise squared magnitude of the quantum wave function then defines the probability distribution of the outcome of a measurement with respect to that basis. Two important kinds of lightwave signal states – coherent states and photon-number states – provide contexts for the differing properties of lightwave signal states.1 Other 1 The word “state” is used to refer both to a state basis vector and to a component of the vector in that basis.

15.1 An Operational View of Quantum Optics

739

kinds of lightwave signal states are admitted within quantum optics, but are not discussed.

Coherent States A coherent state (or a Glauber state) is a formal quantum-optics signal state that, for large signal values, most closely resembles a single-frequency lightwave signal. The sinusoidal output of an ideal single-mode laser (cf. Section 7.5) is described as a coherent state. A mode is a single degree of signaling freedom in time, space, and polarization. Each mode may be associated with a coherent state. The signal in each mode is then described as a coherent state with the allowed energies derived from a quantized harmonic oscillator. A single-mode coherent state can be described in terms of an expansion using an appropriately defined basis, as discussed in Section 15.3. This expansion is characterized by a single complex number α , called the Glauber number. Similarly to a classical lightwave that is demodulated and then detected to form a sample, a coherent state can be described by the measurement of the in-phase component denoted α I and the separate measurement of the quadrature component denoted α Q. Because the outcome of this kind of measurement is a detected signal state, this measurement is called state detection. It is discussed in Section 15.5.2. However, unlike in the case of a classical lightwave, there exists a fundamental dependence between α I and αQ . Because of this dependence, a coherent state is described in a phase space with components (α I , α Q ), which, in general, cannot be treated as independent variables. This dependence affects how the entropy of a quantum-lightwave signal is defined, as is discussed in Section 15.4.4. Phase space is not the same as signal space because it accounts for the unique quasi-probabilistic dependences between the two phase-space components αI and α Q permitted within quantum optics. The phase-space description of a coherent state and the quasi-probability distributions that express the dependences between the two signal components are discussed in Section 15.6. In contrast to the phase-space representation of the coherent state, the properties of a classical lightwave in a single spatiotemporal mode are described by a complex signal point s = s I + i s Q in signal space. When appropriate, this point is related to the mean number of signal photons E by the expression E = |s| 2 = | s|2 /±ω = E /± ω (cf. (10.1.4)). The measurable properties of a coherent state – expressed in phase space – approach the measurable properties of a classical monochromatic lightwave – expressed in signal space – as the Glauber number α becomes large. In this large-signal regime, the dependences between the two phase-space components αI and αQ are not evident. Accordingly, for a large Glauber number, α , approaches s and the energy E in a mode approaches ±ω|α |2 . These relationships are shown notionally in Figure 15.3(a). For large α , the energy E in a mode approaches ±ω|α |2. However, the measurable properties of a coherent state can be so associated with the measurable properties of a classical lightwave only for a strong signal characterized by a large value of α for which the dependences between α I and α Q are not evident. When the same signal is attenuated such that α becomes small, different measurable properties

740

15 The Quantum-Optics Model

(a)

Signal space (s)

Coherent state (small uncertainty)

Phase space

Qs

αQ

Large α s ≈α

sI (b)

αI

Signal space (s)

Phase space

Coherent state (large uncertainty) αQ

Qs

Small α

sI

s ≠α αI

Figure 15.3 (a) The relationship between a complex signal point s in signal space and a complex

point in phase space for a large value of α . (b) The same relationship for a small value of α .

of the coherent state become evident that would not be measurable properties of an attenuated classical lightwave. This means that the association of α with s breaks down for small signal values as the emergent quantum properties become more pronounced and α cannot be replaced by s. These emergent quantum properties are described by a fundamental dependence between the two components αI and αQ in phase space. The relationship is shown notionally in Figure 15.3(b). These differences, though seemingly enigmatic, do have an elegant theory and a compelling structure. As an example, (15.3.25) shows that the mean energy in a mode determined from quantum optics is E = ±ω(|α |2 + 1/2), differing from the classical expression by the equivalent of half a photon. This difference is not evident when α is large, so the lightwave energy is approximately ±ω| α|2 , as stated earlier. This difference is evident when α is small. The difference may be regarded in phase space as a fundamental “quantumuncertainty cloud” in the measured in-phase component and the quadrature component of a coherent state. This intrinsic quantum-uncertainty cloud is shown notionally in Figure 15.3(b) and is discussed in Section 15.6. The strength of the observed wave and particle properties of a lightwave depends on the signal level and on the method of measurement. For a coherent state measured using ideal photon counting, the outcome of a measurement is a point process and can be described using photon optics. This relationship is valid for any signal level. It is shown pictorially in Figure 15.2. Each spatiotemporal mode of a system may be associated with a coherent state. While the modes of a lossless system are themselves orthogonal,2 the coherent states describing the signal within each signal mode are not orthogonal. This pairwise nonorthogonality of coherent signal states is expressed by a nonzero inner product – to be defined later. For large signal levels, however, the inner product between two coherent states approaches zero. In this large-signal regime, a set of coherent states 2 The spatial orthogonality of modes is discussed in Chapter 2. The temporal orthogonality of modes is

discussed in Section 6.6.

15.1 An Operational View of Quantum Optics

741

can be regarded as mutually orthogonal, and can be accurately described using wave optics by associating α with s. This mathematical relationship provides the conceptual bridge between a quantum-lightwave coherent state and a wave-optics signal. It is shown pictorially at the top of Figure 15.2. A signal state that can be accurately described semiclassically using a combination of wave optics and photon optics is called a classical state of light.

Photon-Number States An ideal photon-number state (or a Fock state) results from a lightwave source that emits a specified and fixed number of photons over a time interval of duration T . This state is not produced by any conventional lightwave source and cannot be generated directly using the output of a laser. Instead, only a probabilistic approximation to a photon-number state can be generated in current practice, and that only by elaborate methods. 3 Because the number of photons in a given time interval is not random, a photonnumber state is a nonclassical state of light. This state is fundamentally different than the random Poisson-distributed source in photon optics. A photon-number state with a nonzero number of photons cannot be described using wave optics or photon optics. The properties of a photon-number state can be understood only by using the methods of quantum optics. In contrast to coherent states, the photon-number states are mutually orthogonal and together can be used as an orthonormal eigenbasis to express other quantum-optics signal states. In particular, a coherent state can be represented as an infinite superposition of photon-number states. This means that the coherent state – expressed by a single complex number in phase space – is expressed by an infinite number of photon-number states. The relationship between coherent states and photon-number states is shown pictorially on the left side of Figure 15.2 and is discussed in detail in Section 15.3. Informally, it is hard to generate a photon-number state, but it is easy in principle to measure an unknown photon-number state with no quantum uncertainty. In contrast, it is easy to generate a coherent state, but practically impossible to measure a property of an unknown coherent state without some quantum uncertainty. 15.1.2

Modulation

A classical coded-modulation process transforms a user datastream into a sequence of codewords that is modulated onto a lightwave carrier. Wave-optics and photonoptics modulation techniques, such as those discussed in Chapter 10, are a subset of a much larger admissible class of operations that can convey classical information using quantum-lightwave signal states. This class of operations, called state-preparation techniques, can fully exploit the unique properties of a quantum lightwave. The term “state preparation” indicates that this process may have attributes of both classical encoding 3 For example, see Waks, Diamanti, and Yamamoto (2006), or McCusker and Kwiat (2009).

742

15 The Quantum-Optics Model

and classical modulation. These quantum-optics state-preparation and state-detection techniques, shown pictorially in Figure 15.1, subsume all classical coded-modulation and demodulation operations in a classical modem. Accordingly, the methods of state preparation and state detection describe a quantum modem. For this larger class of signals, a prepared signal state that corresponds to a modulated pulse waveform for one symbol from a chosen signal constellation is called a component-symbol state and is described in a component signal space. Likewise, a prepared signal state analogous to a modulated waveform for a classical codeword is called a composite signal state and is described in a composite signal space. One conventional method of preparation of a composite signal state consists of the formation of a classical coded-waveform of large amplitude. When this waveform is sufficiently attenuated, the quantum properties of the waveform become evident. Some signal states in the composite signal space can be expressed directly in terms of the outer product of component-symbol states, which need not be coherent states. These composite signal states, called block-symbol product states or simply product states, are analogous to a classical product distribution. The construction of a blocksymbol product state from component-symbol states is discussed in Section 15.2. Other block-symbol states in the composite signal space cannot be expressed as an outer product of component-symbol states. These entangled states are discussed later. 15.1.3

Measurements

Classical information that is encoded onto one or more properties of a quantumlightwave signal state is observed by an appropriate measurement. The nature of the measurement determines which properties of the signal state are observed. A measurement within quantum optics can measure an individual property, such as the number of photons within a time interval, or a joint property such as both the in-phase component and the quadrature component of the lightwave signal. It will be shown that within quantum optics, a joint measurement of the two quantum equivalents of the inphase component and the quadrature component cannot be described as two independent measurements. The functional form of this dependence is discussed in Section 15.3.4. The usual type of quantum measurement is called an observable measurement. Outcomes of an observable measurement are described mathematically by a self-adjoint transformation of the signal state. Depending on the transformation describing the measurement, the signal state after the measurement may or may not be the same as the signal state before the measurement. The transformation that describes the outcome of an observable measurement has orthogonal eigenstates and real eigenvalues, which are the outcomes of a measurement. The transformation is called an observable operator, or, more simply, an observable. This means that the outcome of a measurement of an observable is always a single real number, possibly an integer. Classical measurements that have complex-valued outcomes, such as heterodyne demodulation, must be described differently and will be discussed later. Because the signal state is transformed by the measurement operation into an eigenstate of the observable,

15.1 An Operational View of Quantum Optics

743

other observable properties of the signal state that are not observed by the measurement may be changed by this transformation and may be no longer measurable as such.

Quantum Uncertainty and Statistical Uncertainty

Just as a classical lightwave signal can be random, so too a quantum-lightwave signal state can be random. Such a random signal state has statistical uncertainty, which expresses incomplete knowledge about the signal state. Moreover, within quantum optics, but not within classical wave optics, there is additional quantum uncertainty in the outcome of an observable measurement of a conventional lightwave. Thus, a random quantum-lightwave signal state has two distinct forms of randomness – quantum uncertainty, which depends on how the state is measured, and statistical uncertainty, which depends on the random state. The study of quantum optics with both forms of uncertainty is called statistical quantum optics.

Alignment Quantum uncertainty occurs whenever the set of signal states conveying the relevant information and the set of orthogonal eigenstates of an observable are not “aligned” one-to-one. Referring to Figure 15.4(a), in which a deterministic incident signal state is a measurement eigenstate, the signal state is aligned with a measurement state and consequently there is no quantum uncertainty in the outcome of a measurement. Figure 15.4(b) diagrams an incident signal state that is a superposition of the two measurement eigenstates. The outcome of a measurement of a signal state expressed as a superposition of the two measurement eigenstates has quantum uncertainty because a superposition of two eigenstates is not, in general, an eigenstate. The uncertainty is caused by a misalignment of the measurement eigenbasis with respect to the incident signal state. This uncertainty is due solely to the measurement process even for an incident signal state that has no statistical uncertainty. No measurement uncertainty

Measurement uncertainty

Incident state Incident state

Measurement states

Measurement states

(a)

(b)

Figure 15.4 (a) When an incident signal state is aligned with a measurement state, there is no

quantum uncertainty. (c) When an incident signal state is a superposition of the two measurement states, there is quantum uncertainty.

744

15 The Quantum-Optics Model

Polarization An example of the effect of alignment in the quantum-optics model is the measurement of polarization. Figure 15.5(a) shows a polarization measurement using a polarization beamsplitter (cf. Figure 10.14) that passes the horizontally polarized component of the light in one direction and passes the vertically polarized component of the light in another direction. In quantum optics, the photodetected lightwave signal for each orthogonal polarization component is described by an orthogonal polarization eigenstate. The photon count from the photodetector marked “V” in Figure 15.5 over a time interval of duration T is the outcome of the measurement for the vertically polarized eigenstate. Similarly, the photon count from the photodetector marked “H” in Figure 15.5 is the outcome of the measurement for the horizontally polarized eigenstate. In Figure 15.5(a), were the incident polarization state vertically polarized and aligned with a measurement state defined by one axis of the polarization beamsplitter, the incident signal to the beamsplitter would always be detected at the photodetector marked “V.” Similarly, for horizontal polarization, the incident signal would always be detected at the photodetector marked “H.” Consequently, with such an alignment, there is no quantum uncertainty in the outcome of the measurement for either of these two polarization states. Figure 15.5(b) diagrams a measurement of an equally weighted superposition of horizontally polarized and vertically polarized light comprising a linearly polarized lightwave at an angle of π/4. This signal state has quantum uncertainty because the superposition of the two eigenstates of a polarization measurement is not an eigenstate. For an incident signal described by this superposition, the outcomes of the two measurements, which are the photon counts for each photodetector, are probabilistic. The probability that the signal is detected by the “V” photodetector is equal to the probability that the signal is detected by the “H” photodetector. Each probability is equal to one-half. This quantum uncertainty is caused by a misalignment of the measurement eigenbasis defined by the axes of the polarization beamsplitter with respect to the incident signal state, even for an incident signal state with no statistical uncertainty. Measurement u ncertainty

Polarizing beamsplitter Horizontal state measurement H Vertical state measurement

lacitreV

Incident signal state

θ

V (a)

Incident state

Horizontal (b)

Figure 15.5 (a) Schematic diagram of a polarization measurement that measures two orthogonal

linearly-polarized states using a polarization beamsplitter. (b) If a known polarization state is described as a superposition of horizontally and vertically polarized light, then there is quantum uncertainty in the outcome of the measurement.

15.1 An Operational View of Quantum Optics

745

This quantum uncertainty cannot be eliminated by a post-measurement rotation on the values of the detected signal state such as could be used for classical polarization-state estimation (cf. Section 12.6). The uncertainty would be eliminated were the polarization beamsplitter rotated so that a measurement state was coincident with the signal state. For a modulation format that uses orthogonal polarization eigenstates with a measurement eigenbasis aligned to those states, the received signal state would be detected in either the “V” photodetector or the “H” photodetector and there would be no quantum uncertainty. In the example of a polarization measurement, it is always possible to align the two orthogonal measurement states with the two orthogonal signal states. In contrast, all members of a set of coherent states are pairwise nonorthogonal. For every orientation of an orthogonal measurement eigenbasis, for at least one incident signal state, some of the received signal state is detected in at least two of the measurement states. Accordingly, there is always quantum uncertainty in the detected state when coherent states are used. This uncertainty cannot be avoided. It leads to a fundamental limit on the channel capacity when coherent states are used to convey classical information. In general, the realignment of the measurement states is expressed as a change of basis, and the quantum uncertainty depends on the choice of the measurement basis. This kind of basis-dependent uncertainty for the outcome of a measurement is a fundamental distinguishing feature of a quantum lightwave. It does not exist for a measurement of a classical lightwave. Moreover, a joint measurement on a composite signal state corresponding to a classical codeword need not produce the same outcome as a collection of measurements on the individual component signal states for each symbol in the codeword. This is because the outer-product space admits many bases that are not product bases. Therefore, a composite codeword is more than a collection of the individual components when nonorthogonal signal states are used to convey information. Many of the key differences between quantum optics and semiclassical optics are consequences of these two statements.

Quantum Uncertainty in a Photon-Counting Measurement Context for the nature of quantum uncertainty is provided by comparing an ideal photon-counting measurement of a photon-number state to an ideal photon-counting measurement of a coherent state. Because an incident photon-number state is an eigenstate of the observable that describes the outcome of an ideal photon-counting measurement, the measurement of a photon-number state is the exact number of photons over the time interval of the measurement. In this case, there is no quantum uncertainty in the measurement because the signal states and the measurement states are aligned. In contrast, because coherent states are not photon-number states and cannot be aligned to them, each coherent state must be expressed as a superposition of photonnumber states to determine the outcome of a photon-counting measurement. The form of this superposition is derived in Section 15.3.6. Because this superposition is not a single photon-number state, a coherent state cannot be aligned to a single photon-number state. In this case, the outcome of the measurement is probabilistic. The probability distribution on the set of measurement outcomes is equal to the componentwise squared magnitude of the quantum wave function, which is the vector of the

746

15 The Quantum-Optics Model

expansion coefficients of the coherent state in the photon-number-state eigenbasis. This discrete Poisson-distributed random variable, as derived in Section 15.3.6, is a consequence of the misalignment of the two bases. This statement provides the conceptual bridge between a quantum-lightwave coherent state and the photon-optics signal model. The relationship is shown schematically in Figure 15.2. Within the photon-optics signal model, this form of quantum uncertainty is attributed to the random arrival times of photons without questioning why the arrival times of photons should be random. Within quantum optics, this form of quantum uncertainty is attributed to an unavoidable mismatch between the coherent signal states and the photon-number measurement states. The intrinsic quantum uncertainty in the outcome of a measurement regarding a specific signal state is not at all the same as the incomplete knowledge about which signal state is being measured, which is a form of statistical uncertainty. A quantum state that has statistical uncertainty is called a mixed signal state. A quantum state that has no statistical uncertainty, and so is certain though possibly not known or measured, is called a pure signal state. Treating both quantum uncertainty and statistical uncertainty within a common mathematical framework requires a generalization of the notion of a probability distribution. This is because a probability distribution characterizes only the statistical uncertainty about the signal state, and not the quantum uncertainty that may be present when that state is measured. As an example, the factorization of a product state into an outer product of constituent component states does not imply that a product state can be described by a classical product distribution because a product state may have a combination of statistical uncertainty and quantum uncertainty. A product state is fully described by using a density matrix, to be described next. Moreover, a classical product distribution cannot describe an ensemble of coherent state symbols as would be the case for an ensemble of classical complex symbols. A classical product distribution is meaningful only when there is no quantum uncertainty – only statistical uncertainty. It will be shown that for a coherent state this condition is asymptotically realized with a large Glauber number α.

Density Matrix

Any nonnegative function that sums or integrates to one is a valid probability distribution, either a probability mass function or a probability density function, respectively. Generalizing the notion of a probability distribution, any nonnegative-definite 4 selfadjoint linear transformation whose diagonal elements sum or integrate to one is called a density matrix or a density operator.5 A density matrix may be informally viewed as a probability distribution “lifted” into a larger space. This formalism provides more flexibility than a classical probability distribution. Whereas a density matrix will remain a density matrix under any unitary transformation (generalized rotation) corresponding to a change of orthonormal basis, a diagonal density matrix, which will be shown to 4 A nonnegative-definite linear transformation is one having nonnegative eigenvalues. This is also called a

positive-semidefinite linear transformation.

5 When used in the context of density operators, the word “matrix” may refer to the abstract transformation

or to the specific basis-dependent representation of that transformation.

15.1 An Operational View of Quantum Optics

747

Table 15.1 Kinds of uncertainty in the outcome of a measurement and the form of the density matrix that describes the signal state as a function of the type of ensemble and the type of signal state

Orthogonal signal states a

Nonorthogonal signal states

State with no statistical uncertainty (Pure state)

State with statistical uncertainty (Mixed state)

Definite outcome (No uncertainty)

Statistical uncertainty (Probability distribution) (Diagonal density matrix)

(Density matrix with single diagonal element equal to one) Quantum uncertainty (Quantum optics) (Rank-1 density matrix) (Different diagonalization for each nonorthogonal state)

Both quantum uncertainty and statistical uncertainty (Statistical quantum optics) (Nondiagonal density matrix)

a Assumes the measurement states are aligned one-to-one with the signal states.

correspond to a probability distribution, need not remain diagonal under such a change of basis. A density matrix is a generalization of a probability distribution that incorporates both quantum uncertainty and statistical uncertainty. It plays a role in quantum-lightwave communications that corresponds to the role played by a probability distribution in classical communications. A density matrix must be used whenever both quantum uncertainty and statistical uncertainty are present. For an ensemble of orthogonal signal states that are aligned with the set of orthogonal measurement states of an observable, there is no quantum uncertainty. For this case, the density matrix is a diagonal matrix and corresponds to a classical probability distribution as is shown in Section 15.4. The density matrix of a pure signal state is a rank-one matrix, not necessarily diagonal. It has one nonzero eigenvalue, which must equal one. A measurement eigenbasis for which the density matrix of a pure signal state is diagonal has a single element equal to one, indicating a definite measurement outcome. For a set of pure signal states, however, a measurement eigenbasis need not exist such that every pure signal state in that set is diagonal in that basis and so measured without error. The kind of uncertainty for sets of pure and mixed signal states, which need not be orthogonal, is summarized in Table 15.1 and is discussed in detail in Section 15.4. 15.1.4

State Detection

Quantum optics formally admits a collection of state-detection operations that includes the classical operations of direct photodetection, heterodyne demodulation, and homodyne demodulation together with the associated method of detection. This larger collection of operations is shown pictorially in Figure 15.1(b). Just as the term “state preparation” corresponds to the classical functions of modulation and encoding, so the

748

15 The Quantum-Optics Model

term “state detection” corresponds to the classical functions of demodulation, detection, and decoding. Operationally, other state-detection techniques in this larger collection may be viewed informally as novel ways of mingling the particle properties and the wave properties of a lightwave signal so as to produce a smaller probability of a detection error than occurs with the semiclassical methods, which do not fully mingle these properties. Determining practical state-detection techniques that can produce a lower probability of detection error or a larger channel capacity within this larger collection of operations is the principal objective for a quantum-lightwave communication system that conveys classical information. The common state-detection technique for the demodulation and detection of the in-phase signal component is denoted as the observable operator ± aI . The common state-detection technique for the demodulation and detection of the quadrature signal component is denoted as the observable operator ± aQ . As will be shown, these two observables do not commute and do not have a common set of eigenstates. Accordingly, a measurement such as heterodyne demodulation, which is defined by jointly measuring both the in-phase and the quadrature signal components, must introduce quantum uncertainty in at least one of the outcomes of the joint measurement. 15.1.5

Gaussian Signal States

Consider a classical product distribution of a block of symbols in additive white gaussian noise at the output of a memoryless channel. The joint probability distribution for the classical case is the product of independent, offset, circularly symmetric gaussian probability distributions, with the rth probability distribution corresponding to the complex signal component r m of the noisy composite signal r at the channel output. For this classical case, the resulting conditional product distribution for r given the transmitted codeword s± is a complex multivariate gaussian probability distribution (cf. (2.2.29)) f (r| s± ) =

M ²

1

(2πσ ) 2

M

m =1

2 2 e−| rm −s ±m | /2σ

= (2πσ1 2)

M

2 2 e−|r−s ± | / 2σ ,

(15.1.1)

where s±m is the mth complex component of the transmitted block symbol s± . Within wave optics, this product distribution underpins essentially all classical coded modulation and detection even for sequences that have symbol dependences. The resulting detection statistic is a gaussian random variable. For a multivariate gaussian random variable, all higher-order moments can be expressed in terms of the first-order and second-order moments. Gaussian distributions also arise within quantum optics. Similarly to classical systems, gaussian distributions describe both noise and information sources. Even in the absence of statistical uncertainty such as additive noise, a gaussian distribution describes the minimum quantum uncertainty when a shot-noise-limited signal with no additive noise is homodyne demodulated. Every classical signal described by a gaussian distribution has a corresponding gaussian signal state. Other gaussian signal states do not correspond to a classical signal.

15.1 An Operational View of Quantum Optics

749

Gaussian signal states are discussed in Section 15.6.5, and are completely characterized by a covariance matrix that incorporates both statistical uncertainty and quantum uncertainty. The gaussian signal state that corresponds to a classical shot-noise-limited waveoptics signal that is homodyne-demodulated is a circularly symmetric classical gaussian signal state. This signal state is described by a real covariance matrix C that is a scaled identity matrix. It will be shown in Section 15.6 that this covariance matrix leads to a minimum-uncertainty coherent state. In the general case, a gaussian signal state can be described by a covariance matrix, but it cannot be expressed as a probability distribution because of fundamental dependences between the quantum-optics equivalents of in-phase and quadrature signal components. These dependences do not exist for a classical covariance matrix. Section 15.6 shows that quantum wave functions for the in-phase and quadrature operators are related by a Fourier transform. This means that while a minimum-uncertainty coherent state corresponds to a classical shot-noise-limited signal, the joint statistical properties of the in-phase and the quadrature signal components of that gaussian signal state cannot be expressed as a product probability distribution as would be the case for a classical signal. Quantum optics also admits nongaussian signal states, which are signal states whose higher-order statistics cannot be expressed in terms of first-order and second-order statistics. All nongaussian quantum signal states are nonclassical. A gaussian signal state that does not have the form of a complex circularly symmetric gaussian function (with or without an offset) is called a nonclassical gaussian signal state because it does not correspond to a minimum-uncertainty coherent state, which is the gaussian signal state that corresponds to a classical shot-noise-limited signal. Nonclassical gaussian signal states are discussed in Section 15.6.5. A Venn diagram showing one way of classifying classical and nonclassical signals is shown in Figure 15.6. For this classification, every classical shot-noise-limited signal is one-to-one mapped to a classical gaussian signal state. However, nonclassical

Nonclassical signals

Classical signals

Nongaussian signal states

Nongaussian signals

Gaussian signal states

Gaussian signals

Figure 15.6 Venn diagram of quantum-lightwave signal states. A classical gaussian signal state corresponds to a classical signal described by a circularly symmetric gaussian product distribution with or without a bias.

750

15 The Quantum-Optics Model

nongaussian signal states, such as photon-number states, have no corresponding classical signal, even if the classical signal is described by nongaussian statistics. This classification provides a conceptual bridge between the dependences admitted for classical communication systems based on the first-order and second-order statistics of gaussian random processes, and the unique dependences that are admitted within quantum optics.

15.2

A Formal View of Quantum Optics A discussion of a quantum-lightwave signal state requires a mathematical formalism of a signal space and the linear transformations on that signal space. The conventional notation and terminology for quantum-lightwave signal states and linear transformations are different than the conventional notation and terminology for classical signals.

15.2.1

Signal States

A signal state is a mathematical object expressed in an appropriately defined signal space, distinct from physical space, that is used to describe the properties of a lightwave. A signal state is conventionally expressed as a unit-length vector written in the Dirac bra–ket notation, as will be described. The notion of length is defined in terms of an inner product as discussed below. A signal state is the concept within quantum optics that corresponds to a classical signal. A signal state is formulated in terms of a basis-free representation, but that basisfree representation does not specify the possible measured outcomes of a property of that signal state. The measurable outcomes of a property of a signal state are not specified until a basis specific to the measurement is given. The measurable outcomes of a property of a signal state are specified by the coefficients of the representation of that signal state in a basis specific to that measurable property, for example the measurable property of the number of photons in a mode. Different measurement eigenbases specify different properties. When the property of a signal state being measured is the same as one of the basis eigenstates for the specific measurable property, the outcome of that property is certain. When the same signal state is expressed as a linear combination in a different eigenbasis associated with a different property, the measured outcome for that property is random. For a quantum-optics signal state, the probability of an outcome of a measurement is equal to the squared magnitude of the expansion coefficient for that outcome in the basis specified for that property. This means that the same signal state can have one property measured with certainty for one measurement eigenbasis and a second property that is random for a second measurement eigenbasis. For a given signal state, this randomness is the quantum uncertainty in the measured outcome. The basis-dependent nature of the measurement of a signal state does not exist in the same way for the measurement of a classical signal. The measurable properties of a noiseless classical signal are “basis-free” in the sense that a measurement using one

15.2 A Formal View of Quantum Optics

751

basis can be transformed after the measurement into another basis without error. This is because, for a classical signal, the choice of the basis does not introduce uncertainty into the measurement. This classical property is widely used for the estimation of classical signal properties such as carrier phase and polarization (cf. Chapter 12), by combining several measurements. The distinction between the “basis-dependent” properties of quantum signal states, which lead to quantum uncertainty, and the “basis-free” properties of noiseless classical signals, which have no quantum uncertainty, is fundamental. The basis-dependent nature of measurements underlies the entire formalism of quantum optics. When a signal space is represented in terms of a discrete set of basis functions, the signal state is expressed as a column vector or a row vector. For a signal space represented by a continuous set of basis functions, a signal state is expressed as a continuous function. Thus the signal state may be expressed as a vector in an infinite-dimensional signal space defined on discrete countable indices or a “vector” in a signal space defined on the continuum. Because a signal state does not itself depend on the choice of the basis, an abstract notation is required to describe a basis-free signal state. The Dirac notation for a signal state in column form is |ψ±. This is called a ket. The ket refers to an abstract basisfree column vector that becomes concrete, but not necessarily known, when a basis is defined. When described using a specific basis, the ket is represented by a column vector. The Dirac notation for a signal state in row form is ²ψ|. This is called a bra. The bra is the complex-conjugate transpose of a ket so that ²ψ| = |ψ±† . The same signal state can be denoted as either a ket or a bra as is convenient. Using bra–ket notation, the inner product between two signal states, denoted ²ψ1 |ψ2±, produces a complex number. The complex number is independent of the choice of basis. Using the inner product, all signal states used to express measurable properties for an orthonormal countable basis are, by convention, normalized so that ²ψi |ψ j ± = δi j , where δi j is the Kronecker impulse. For an orthonormal uncountable basis defined on the continuum ²ψy |ψ y ³ ± = δ( y − y³ ), where δ( y ) is a Dirac impulse, and y is continuous. The outer product of two signal states defined in the same signal space, denoted |ψ1 ±²ψ2|, produces an abstract, basis-free linear transformation in that space. For a specific basis, this linear transformation is described by a matrix. 15.2.2

Operators

Within quantum theory, classical expressions for dynamical variables such as the inphase and the quadrature signal components are replaced by operators, which are linear transformations on signal space. The replacement of variables by operators “lifts” the classical formulation into a higher-dimensional signal space. The replacement is called canonical quantization. Operators are highlighted by a caret over the letter representing an operator, such as ± T . When a signal state is represented by a vector in a specific basis, the operator is represented by a matrix that depends on that specific basis. Then the action of the operator ± T is described by a matrix–vector product.6 6 If the signal state is described by a continuous set of basis functions, then the transformation is described

by an integral equation.

752

15 The Quantum-Optics Model

An observable operator is an operator described by a self-adjoint transformation. A self-adjoint transformation has a set of real eigenvalues and a set of orthonormal eigenvectors that do not depend on the basis. Every real-valued measurement of a physical quantity is described by an observable operator. When a countable basis is given, the concrete form of an observable operator is described by a hermitian matrix. A hermitian matrix always has orthonormal eigenstates and real eigenvalues, which means that the outcome of a measurement, described by an observable operator acting on a signal state, is always a real number, possibly an integer. Quantum optics is described using linear transformations in complex signal space, but there are fundamental differences between a classical complex signal and a quantumlightwave signal state. A classical signal described by a point on the complex plane can be expressed in a one-dimensional complex signal space with a single complex basis function (cf. Section 10.4). For classical signaling, the same point can be described using a real signal space with two real basis functions representing the in-phase and quadrature components. The choice between the two versions of signal space in which to represent a classical signal is regarded as a matter of convenience. For a quantum-optics signal state, the same statement is not true. A coherent state is always described in an infinite-dimensional Hilbert space, which is a complex signal space with an appropriate definition of an inner product. This point cannot be represented using two separate real signal spaces even if each space is infinite-dimensional because the real part and the imaginary part of each complex basis function are, in general, inherently interdependent. This is discussed further in Section 15.3. 15.2.3

Time Dependence

The temporal evolution of a “closed” quantum system that does not exchange energy with an external set of states is described by a time-varying unitary transformation of the signal state. Several alternative descriptions of a time-varying signal state are conventionally used. Temporal evolution can be incorporated into the signal state, or can be incorporated into the operators that act on that signal state, or can be partitioned between the signal state and the operators. The Schrödinger representation incorporates the time dependence into the signal state. The Heisenberg representation incorporates the time dependence into the operators. The Dirac representation or interaction representation partitions the time dependence between the signal state and the operators. Because the time dependence is unitary, the difference between any two of these representations amounts to a time-dependent change of basis within the Hilbert space in which the signal state is defined. Accordingly, any of these alternative representations of the time dependence can be used, as may be convenient. 15.2.4

Quantum Wave Functions

When described in terms of a basis {|am ±} , a signal state is written as

|ψ± =

³ m

am |am ±,

(15.2.1)

15.2 A Formal View of Quantum Optics

753

where the set of expansion coefficients {am } is specific to the basis {|am ±}. The basis states are usually chosen to be the orthonormal eigenstates of a observable operator, which is required henceforth. These orthonormal eigenstates comprise the eigenbasis. The set of basis eigenstates {|am ±} used in (15.2.1) may be countable with the index m mapped onto the integers, or the set of basis eigenstates may be uncountable, in which case the index m is continuous. An example of a countable eigenbasis is the set of eigen± corresponding to photon counting. An example of states for the observable operator N an uncountable basis is the set of eigenstates of the in-phase signal component operator ±aI . These operators are discussed in detail in Section 15.5. When a countable eigenbasis is specified, an observable operator is represented by a hermitian matrix. This matrix has orthogonal eigenvectors and real eigenvalues that do not depend on the choice of basis. For an eigenbasis {|am ±} , the expansion given in (15.2.1) can be written in terms of ±m defined in (2.1.93), one projection operator for a sum of the projection operators P each coefficient am . Each projection operator has an eigenvalue equal to one corresponding to the eigenstate |am ± and an eigenvalue equal to zero for all other eigenstates. ∑ ± = ±I (cf. The sum of the projection operators gives the identity operator. Then m P m (2.1.94)). ±m in terms of the outer product as P±m = |am ±²am | (cf. (2.1.93)) gives Writing P

|ψ± =

³± m

Pm |ψ± =

³ m

|am ±²am |ψ± =

³ ²am |ψ±|am ±,

(15.2.2)

m

where the final equality follows because ²am |ψ± is a scalar. Comparing (15.2.2) with . (15.2.1) and recalling that {|am ±} is an orthonormal basis gives am = ²am |ψ± as the complex expansion coefficient. The expansion coefficient am is the inner product of the signal state |ψ± and the eigenstate |am ±. The set of complex expansion coefficients {a1 , a2, . . .} forms a complex vector or a complex function. This complex vector or complex function is called the quantum wave function.7 The same signal state expressed in a different basis leads to a different vector of complex expansion coefficients and so to a different quantum wave function. This change of basis (cf. (2.1.88)) is expressed by a unitary transformation (or generalized rotation) of the quantum wave function. Because the signal state is normalized with ²ψ|ψ± = 1, the quantum wave function ∑ . u (am ) = {am } as a function of m satisfies m |am | 2 = 1. The normalization of the signal state |ψ± permits an interpretation of |am |2 as a probability. Thus the magnitude squared of the wave function has the form of a probability distribution. Indeed, it will be seen that it is a probability distribution. For a discrete observable operator such as the photon-number state operator ± N , the basis is countable, the wave function cm is discrete, and the magnitude squared | cm |2 is a probability mass function. For a continuous observable operator such as the in-phase 7 The use of the word “wave” in the quantum wave function describes a mathematical property of that

function. The use of the word “wave” in this context should not be associated with an electromagnetic wave.

754

15 The Quantum-Optics Model

signal component operator ± aI with a corresponding set {αI } of real eigenvalues, the basis is uncountable, the wave function u (α I ) is continuous, and the magnitude squared |u (αI )|2 is a probability density function.

15.2.5

Measurements

Selected properties of a quantum-lightwave signal state, called observable properties, are revealed by the outcomes of an appropriate measurement on the signal state using an observable operator. To this end, consider measuring an observable integer-valued property of a lightwave signal state | ψ ±. This signal state need not be an eigenstate of the observable operator defining the measurement. When a basis is given, the observable operator ± R is represented by a hermitian matrix, and the signal state is represented by a complex column vector using as a basis the eigenstates of ± R. The outcome of the measurement of an observable property described by ± R on a signal state |ψ± is given by the quantum expectation of that operator on that signal state, defined as

² ±R± =. ²ψ| ±R|ψ±.

(15.2.3)

The expectation given in (15.2.3) may yield a deterministic outcome or a probabilistic ±and the signal outcome, depending on the relationship between the eigenstates {|r ±} of R state |ψ±. ±, it satisfies When the signal state |ψ± to be measured is a specific eigenstate |q ± of R |ψ± = |q ±. Then ±R|q ± = Q|q±, where Q is the eigenvalue corresponding to the eigenstate | q±. For any signal state that is an eigenstate of the observable, the expectation is equal to the eigenvalue of that state, so that

² R±± = ²ψ|R±|ψ± = ²q | ±R|q ± = ²q |Q |q ± = Q ²q |q ± = Q .

(15.2.4)

As an example, suppose that the number of photons in a single mode is measured using ideal photon counting. The outcome of this measurement operation is defined by ±. When this operator an observable operator called the photon-number state operator N ±|m± = m is applied to a photon-number state |m±, the outcome of a measurement is ²m| N ± with an eigenvalue m. The because the photon-number state |m± is an eigenstate of N outcome of this measurement is exactly the number of photons m in that mode because the measured state is an eigenstate of the observable operator ± N describing the outcome of the measurement. The quantum wave function {am } has a single value equal to one for the component corresponding to the number m of measured photons. In contrast, consider a coherent signal state, denoted |α ±. This signal state is not a ±. The coherent photon-number state and is not an eigenstate of N ∑ state |α ± is a superposition of the eigenstates | m± of the observable ± N with |α ± = m c m |m± for some set of expansion coefficients {cm } (cf. (15.2.1)). Because the eigenstates of the photon-number state operator ± N span the signal space in which the coherent signal state is defined, any

15.2 A Formal View of Quantum Optics

755

coherent state can be expressed in this form. The outcome of a photon-number measurement on a coherent state is determined by expressing the state in this way. For this case, the quantum wave function is nonzero for every m.

Measurement Uncertainty

The standard axiomatic formulation of quantum theory states that, for the observable ±, the output signal state immediately after the measurement operation on an incident R signal state |ψ ± is one of the eigenstates |r ± of ± R . The outcome of a measurement is the eigenvalue r of the eigenstate |r ± produced by the measurement operation. ±, the output signal state after When the incident signal state |ψ± is an eigenstate of R the measurement operation is the same as the incident signal state. Then the outcome of the measurement has no quantum uncertainty. When the incident signal state is not an eigenstate of ± R , the state after the measurement operation must still be an eigenstate of ± R , but that resulting eigenstate and the corresponding eigenvalue are random. In this case, the outcome of the measurement has quantum uncertainty. We show next that the corresponding probability is given by the quantum expectation of the projection operator ±Pq = |q±² q| for the specific eigenstate |q± of ±R produced by ∑ the measurement. Using (15.2.2), expand the incident signal state |ψ ± = r cr |r ± in terms of a super±. Using ±Pq = |q ±²q |, the probability p(q ) position of the orthonormal eigenstates of R that the measurement produces the eigenstate | q± with the measured outcome q can then be written in several equivalent forms as

±q |ψ± p(q ) = ²ψ| P

= ²ψ|q ±²q|ψ± = |²ψ|q ±|2 (15.2.5a) ³ 2 = |cr | ²r |q ±²q |r ± r ´´ ´´2 = cq , (15.2.5b) where ²ψ| q±²q |ψ± = ²ψ|q ±²ψ|q ±∗ = |²ψ|q ±|2 is simply the product of two scalars, and where ²r |q ± = δrq is a Kronecker impulse because the eigenstates of an observable ∑ ∑ ´ ´2 operator are orthonormal. Moreover, q p(q ) = q ´cq ´ = 1. This is required for a probability distribution. For the specific case of a coherent state measured using ideal photon counting, the probability p(m) = | cm |2 of measuring m photons is given by a Poisson probability mass function (cf. (15.3.36)), as will be shown. For an operator that measures a continuous variable r , the summation is replaced by an integral, the Kronecker impulse is replaced by a Dirac impulse, c(r ) is the continuous quantum wave function for r, and f (q) = | c(q )|2 is the probability density function for the continuous outcome of a measurement. The same signal state expressed in a different orthonormal measurement basis is described by a generalized rotation which changes the orthogonal eigenstates defining the outcomes of a measurement. The signal state then will have a different set of

756

15 The Quantum-Optics Model

expansion coefficients, and a different quantum wave function with respect to that basis. This results in a different probability distribution on the outcomes of a measurement. Because a quantum wave function describes an observed outcome in a specific basis, it is not the same quantum wave function as the one for the original basis from which it is derived.

Polarization Measurements

To demonstrate how the basis of the observable affects the outcome of a measurement, suppose that the polarization of a quantum-lightwave signal state is to be measured using the measurement configuration shown in Figure 15.5(a). The observable operator ± R for polarization has two eigenstates as shown in Figure 15.7(a). The first eigenstate, |r1 ±, has eigenvalue r1 , and corresponds to horizontal polarization. The second eigenstate, |r 2±, has eigenvalue r2 , and corresponds to vertical polarization (cf. Section 2.3.4). Consider an incident signal state |ψ±, as shown in Figure 15.7(a), that is a superposition of the two polarization eigenstates |r1 ± and |r2 ±. This is given by (a)

(b)

1

p(r1) 0.5 0

⏐r2 〉

0

π

π

0

π

π

2

3π 2



3π 2



θ

1

⏐ ψ〉 θ

p(r2) 0.5 0

⏐ r1〉

(c)

(d) ⏐q2 〉

2

θ

1

p(q 1) 0.5 ⏐ ψ〉 θ

0

0

π

0

π

1

4

3π 4

5π 4

7π 2π 4

3π 4

5π 4

7π 2π 4

θ

p(q 2) 0.5 ⏐q1 〉

0

4

θ

|ψ± expressed in the polarization measurement basis {|r1 ±, |r 2 ±}. (b) The probability of the outcomes p(r 1) and p(r2 ) of a polarization measurement defined by the angle θ . (c) The same state expressed in a basis {|q1 ±, |q2 ±}, which is rotated by π/ 4 compared with the basis {|r1 ±, |r 2±} . (d) The probability of the outcomes p(q1) and p(q2) of a measurement as a function of the angle θ . Figure 15.7 (a) A state

15.2 A Formal View of Quantum Optics

757

|ψ± = cos θ|r 1± + sin θ|r2 ±, (15.2.6) where θ is the angle of |ψ± with respect to the eigenstate |r1 ± that defines the horizontal ± = |r ±²r |, where |r ± is the polarization polarization. Apply the measurement operator R state along a direction defined by the angle θ . Considered in isolation, each component polarization state, |r1 ± or |r2 ±, in (15.2.6) is a scalar multiple of a polarization eigenstate. However, the superposition of the polarization states is an eigenstate of the observable ± if and only if θ = m π/2 for some integer m. R As an example, suppose that ± R = |r 1±²r1 |. Then applying (15.2.5a) to (15.2.6) gives p (r1 ) = |²ψ|r1 ±|2

= |cos θ²r1 |r1 ± + sin θ²r1|r 2±|2 = cos2 θ, (15.2.7) where ²r1 |r1 ± = 1 and ²r1 |r2± = 0 have been used. Similarly, for ± R = |r2 ±²r2 |, p(r 2) = sin2 θ . These probabilities are plotted in Figure 15.7(b) as a function of θ . Referring to Figure 15.7(b), for θ = 0 the eigenvalue r 1 corresponding to the horizontally polarized state |r1 ± is measured with certainty with p(r1) = 1 and p(r 2) = 0, whereas, for θ = π/2, the eigenvalue r 2 corresponding to the vertically polarized state |r2 ± is measured with certainty with p(r2 ) = 1 and p(r2) = 0. For other θ , the signal state |ψ± given in (15.2.6) is a superposition of the two polarization eigenstates and is not aligned with either |r1± or |r2±. Consequently, when a signal state with this polarization is measured in the {|r1±, |r2±} polarization basis, the outcome defined by (15.2.5) provides incomplete knowledge about the state |ψ±. This incomplete knowledge is expressed in terms of the probabilities p(r1) = cos2 θ and p(r2 ) = sin2 θ of the outcomes of the two measurements. ± defined by a different polarization A different polarization measurement operation Q basis {|q1 ±, |q2±} as shown in Figure 15.7(c) can be expressed using a different linear combination of the original polarization eigenstates |ri ± as given by

|q1± = √12 (|r 1± − |r2±) ,

|q2 ± = √12 (|r1 ± + |r2 ±) .

(15.2.8)

This polarization measurement basis is simply a π/ 4 rotation of the basis used for ± R 2 2 with the two probabilities p(q1 ) = cos (θ +π/ 4) and p(q2 ) = sin (θ + π/4) shown in Figure 15.7(d). Suppose that an incident signal state specified by θ = 0 is measured in the rotated polarization measurement basis. The outcome of the polarization measurement is now random, with probability vector ( p(q1), p(q2)) = (1/2, 1/2), instead of deterministic, with probability vector ( p(r1), p(r2 )) = (1, 0), as was the case using the original {|r1 ±, |r2 ±} basis. This elementary example illustrates an important feature of quantum optics. It shows that the measured properties of a lightwave signal depend on the basis used for that measurement. When the incident signal state |ψ± is aligned with an eigenstate of that basis, the outcome of a measurement has no quantum uncertainty, but may have statistical uncertainty. When the incident signal state |ψ± is not aligned with an eigenstate of that basis, the outcome of a measurement has basis-dependent quantum uncertainty. That signal state may have additional statistical uncertainty. This combined uncertainty is characterized in Section 15.4.

758

15 The Quantum-Optics Model

Quantum uncertainty, as such, has no classical counterpart. For lightwaves treated semiclassically, quantum uncertainty is empirically described as shot noise. Within quantum optics, shot noise is a form of basis-dependent quantum uncertainty. Specifically, when a classical noiseless lightwave signal is directly photodetected, the resulting quantum uncertainty is called “counting” shot noise and is described by a Poisson probability mass function. When the same lightwave is homodyne demodulated to a real-baseband signal, the resulting quantum uncertainty is called “mixing” shot noise and is equivalent to half a photon (cf. Section 10.2.5). Section 15.3.4 shows that the minimum-uncertainty distribution for “mixing” shot noise is a gaussian distribution. In a semiclassical description, the difference between the forms of shot noise is attributed to the different kinds of demodulation. Within quantum optics, the difference is a consequence of the basis-dependent quantum uncertainty. In the absence of quantum uncertainty, an ideal classical noise-free measurement using the basis defined by ± R can always be converted without error into a measure± by a simple coordinate transformation applied after ment using the basis defined by Q the measurement. This kind of post-measurement realignment operation is frequently used in classical communications for carrier-phase and polarization-state estimation (cf. Chapter 12). The corresponding statement within quantum optics is not true. Each of ± has a different form of basis-dependent quantum uncerthe two operators ± R and Q tainty. This means that the uncertainty in the outcome of a measurement depends on the measurement basis and cannot be removed by a post-measurement transformation of that basis. This unique property is an essential distinguishing feature of quantum information-processing systems.

Joint Measurements

The previous subsection showed that the quantum uncertainty in the outcome of a measurement depends on the basis used for that measurement. When the basis is aligned with the signal states, there is no quantum uncertainty. This subsection shows that, whenever the two operators describing two properties do not commute, any joint measurement that simultaneously measures those two properties must have quantum uncertainty in at least one outcome because there is no basis aligned to both. Suppose that two observable properties described by observables ± A and ± B are to be jointly measured. To determine the effect of these operators on a signal state, form the commutator [ ± A, ± B ] defined in (2.1.98). When the commutator is zero, the operators commute, and share a common eigenbasis. Then both ± A and ± B can be diagonalized using the same basis. Consequently, a signal state |ψ± that is an eigenstate of observable ± A with an eigenvalue a is also an eigenstate of observable ± B with an eigenvalue b. A joint measurement of these two observable properties has no quantum uncertainty whenever the signal state is a common eigenstate of both observables. The statement that the two operators commute means that there is no additional uncertainty for a joint measurement than for a separate measurement of each observable. Of course, each of these separate measurements may still have quantum uncertainty when the signal state is not aligned with the measurement state.

15.2 A Formal View of Quantum Optics

759

Other pairs of operators need not commute. The observable operator ± aI that measures the in-phase signal component and the observable operator ± aQ that measures the quadrature signal component ± aQ do not commute. Indeed, it will be shown that [± aI , ± a Q ] = i/2 (cf. (15.3.7)). For such a joint measurement, there is no common set of eigenstates, and so the joint measurement must produce some quantum uncertainty for at least one of the observables. Informally, this means that no orientation of the measurement basis can be simultaneously aligned with both signal states. The magnitude of the uncertainty is quantified by the determinant of the commutator, which can be regarded as a measure of the misalignment of the two bases used to describe the two observables.

Composite Signal Spaces

Discussion of the higher-order dependences admitted within quantum optics requires a generalization of the signal space conventionally used to describe classical signals. This composite signal space is constructed from the outer product (or tensor product) of the component signal spaces of the component-symbol states. For example, suppose that one symbol is an element of a signal space A spanned by one set of basis states {|ai ±}, and a second symbol is an element of a signal space B spanned by a second set of basis states {|b j ±} . The composite signal state is a single point in the composite signal space. One set of basis states {|ci j ±} for the composite signal space C can be constructed using the outer product of the component basis states |ai ± and |b j ± as given by

|ci j ± = |ai ± ⊗ |b j ±,

(15.2.9)

where the outer product of two vectors in different signal spaces is defined in (2.1.70). 8 The outer products of all basis states form a basis for the composite signal space C = A ⊗ B . This basis is one of an infinite number of bases that could be used to describe the composite signal state. Other bases can be formed as generalized rotations of C described by unitary transformations of the outer product basis. A joint measurement in the joint signal space could use any of these other bases. The quantum uncertainty in the composite signal space depends on the composite basis. A composite signal state |C ± is constructed from a superposition of the basis states |ci j ± as given by

|C ± =

³ i, j

χi j |ai ± ⊗ |b j ±,

(15.2.10)

where χi j is the expansion coefficient in the outer product basis that describes the composite signal state in the composite signal space. 9 Now consider a subset of signal states in the composite signal space formed by the ∑ outer product of one component state | A ± = i αi | ai ± defined in a space with a basis {|ai ±}, and a second component state | B± = ∑ j β j |b j ± defined in a different space with a different basis {|b j ±} . This outer product is

| ±² |

8 When the two signal states are defined in the same signal space, the outer product is written as a b . i j 9 For brevity, the symbol may be suppressed so that C A B . Another abbreviated notational

⊗ | ± = | ±| ± convention is | C ± = | AB ± . These abbreviations will be eschewed herein.

760

15 The Quantum-Optics Model

|C ± = | A± ⊗ | B ± ³ = αi β j |ai ± ⊗ |b j ±. i, j

(15.2.11)

For this product state, the expansion coefficient χi j in the composite signal space is equal to the product αi β j of the expansion coefficients in the component signal spaces. Informally, a product state can be “factored” into an outer product of component signal states. Only the members of a subset of the signal states in the composite signal space have this property. A quantum-lightwave modulation format that uses product states constructed from pure signal states corresponds to a classical system for which the joint probability distribution can be expressed as a product of the marginal probability distributions for each symbol. In the context of quantum optics, a composite pure signal state |C ± given by (15.2.10) that cannot be expressed as a product state given by (15.2.11) for a given measurement basis is called an entangled state.10 There are two definitions of entanglement in common use. The (weak) definition of entanglement for a composite pure signal state is a relative notion, defined with respect to the choice of basis. This means that, in general, the same composite signal state can be either separable or entangled depending on the constituent basis states used to describe the composite signal state. An alternative (strong) definition of entanglement requires that the composite signal state is not separable for any choice of constituent basis states. As an example, consider a pure signal state (unnormalized) in the composite signal space C given by

|C ± = 2|a1 ± ⊗ |b1 ± − 2|a1± ⊗ |b2± + |a2 ± ⊗ |b1± − |a2 ± ⊗ |b2 ±. This signal state can be factored into the form | C ± = (2|a1± + | a2 ±) ⊗ (|b1 ± − |b2 ±). Therefore, this signal state is a product state because it can be expressed in the form of (15.2.11). A different signal state in the composite signal space C is

|C ± = 2|a1± ⊗ |b1± − 2|a1 ± ⊗ |b2 ± + |a2 ± ⊗ |b1 ± + |a2± ⊗ |b2±, differing only in one sign change when expressed using the same set of bases. This signal state cannot be factored into an outer product of a linear combination of the constituent basis states {|ai ±} and {|b j ±}. This means that the signal state | C ± must be expressed using the general form given in (15.2.10). Therefore, the signal state |C ± is entangled with respect to the constituent basis states. A superposition of two distinct product states given by |a1± ⊗ | b2 ± + |a2 ± ⊗ |b1± is always entangled with respect to the constituent basis states. The proof of this statement is asked for as an exercise. Transformations in the composite signal space follow the same rules used to construct composite signal states. Let a transformation in the component signal space A be represented by a matrix A and let a different transformation in a different component signal 10 Determining the conditions for which mixed signal states are entangled is more complex. See Horodecki,

Horodecki, Horodecki, and Horodecki (2009).

15.3 Coherent States

761

space B be represented by a matrix B. The matrix representation C of the tensor product11 A ⊗ B of the two transformations in the enlarged composite signal space is given by the Kronecker product (cf. (2.1.99)).

15.3

Coherent States The quantum-lightwave signal state that most closely resembles a classical wave-optics signal is the coherent state or Glauber state | α±. A coherent state | α± is described using a quantized form of a harmonic oscillator. For a classical lightwave, the signal level in a single spatiotemporal mode can be expressed as a complex signal point s as given by (cf. (10.1.4)) s

= s + is , I

Q

(15.3.1)

where s I is the in-phase signal component and s Q is the quadrature signal component. The complex signal point s is a scaled form of a classical wave-optics sinusoidal signal expressed in a signal space defined on the complex plane (cf. Section 10.1.3). When a lightwave is described using a semiclassical approach, the energy per symbol is given by (cf. (10.1.4)) E

= ±ω

µ2 sI

¶ + s2 . Q

(15.3.2)

Expression (15.3.2) states that the energy in a single spatiotemporal mode can be expressed as the sum of two squared terms, both of which are real (cf. (6.2.3)).

15.3.1

Operators for Coherent States

A classical lightwave is described by the complex signal point s = sI + is Q given in (15.3.1). Two equivalent classical representations of this point are s∗ = s I − is Q and the pair of real components (s I , s Q). The squared magnitude of a classical lightwave is |s|2 . In quantum optics, these five classical dynamical variables (s, s∗ , s I , s Q, |s|2 ) are “lifted” using canonical quantization. This process leads to five interrelated oper±) that replace the classical variables. The operators used for this ators (± a, ± a† , ± a I ,± aQ , N replacement describe the properties of coherent states and photon-number states. The properties of these five operators comprise the first topic of this section. The observable operator ± aI corresponding to s I is called the in-phase signalcomponent operator or simply the in-phase operator. This operator has a continuous set of real eigenvalues {αI } and a corresponding set of orthonormal eigenstates {|αI ±} that form an eigenbasis. The continuous set of eigenvalues forms the quantum wave function u(α I ) in the in-phase signal-component basis. Similarly, the operator ± aQ corresponding to s Q is called the quadrature signal-component operator or simply the quadrature operator. This operator has a continuous set of real eigenvalues {α Q } and

⊗ is used for the outer product of signal states, for the tensor product of matrices denoting linear transformations, and for the tensor product of signal spaces.

11 The notation

762

15 The Quantum-Optics Model

a corresponding continuous set of orthonormal eigenstates {|αQ ±} that form an eigenbasis. The continuous set of eigenvalues forms the quantum wave function U (α Q ) in the quadrature-signal-component representation. The operator

±a =. ±a + i ±a I

(15.3.3a)

Q

corresponding to s = sI + is Q is called the coherent-state operator. The complex conjugate transpose of this coherent-state operator is

±a† = ±a − i±a , I

(15.3.3b)

Q

and corresponds to s∗ = sI − i s Q. The coherent-state operator ± a has a continuous set of complex eigenvalues {α }, called the Glauber numbers. Because the eigenvalues are complex, this operator is not hermitian and is not observable as such. The coherent-state operator ± a corresponds to the set of coherent states {|α ±}. These states {|α ±} form a basis, but the basis is not an orthogonal basis. The operators ± a and ± a† are not observable operators. The Glauber numbers {α }, which are the eigenvalues of |α ±, are not real as would be the case were the coherent-state operator ± a an observable operator. Instead, the Glauber numbers are, in general, complex and range over the entire complex plane. While neither ± a nor ± a † is an observable operator, the product ± a†± a, which corresponds 2 to the classical real expression |s| , is an observable operator called the photon-number ±. The operator N± = ±a†±a has a discrete set of real eigenvalues {m} state operator N that correspond to the outcomes of an ideal photon-counting measurement, and a corresponding discrete set of orthonormal eigenstates {|m±} that forms a basis. The properties of the coherent states and the photon-number states are specified by ±. The correspondence principle the properties of the five operators ± aI , ± aQ , ± a, ± a †, and N is the statement that the corresponding classical quantities s I , s Q , s, s∗ , and | s| 2, respectively, are the large-signal limit of the expectation of the corresponding quantum-optics operators. This statement is one reason why a large-signal quantum-optics analysis can be replaced by a semiclassical analysis.

Hamiltonian

In quantum theory, the total classical energy is replaced by an observable operator called ±. Using the principle of canonical quantization, the Hamiltonian the Hamiltonian H replaces the classical energy and is given by12

µ

± = ±ω ±aI2 + ±aQ2 H



,

(15.3.4)

± replaces E, ±a I2 replaces s 2I , and ±a Q2 replaces s 2Q . This expression is equivalent to where H the classical expression E = ± ωs∗s with s∗ s = s2I + s2Q . In quantum optics, as follows, the eigenvalues of the corresponding operator ± a †± a are not exactly a scaled form of |s|2 . To show this, replace s by ± a = ± a I + i± a Q and replace s∗ by ± a† = ± a I − i± aQ to give ±

±±

12 The notation a 2 means a a . I I I

15.3 Coherent States

763

the corresponding quantum-optics expression13 for the photon number-state operator ± = ±a †±a , N

±a†±a = (±a − i ±a ) (±a + i ±a ) ( ) = ±a 2 + ±a 2 + i ±a ±a − ±a ±a · ¸ = (1/± ω) H± + i ±a , ±a , (15.3.5) · ¸ ( ) where (15.3.4) has been used and ± a ,± a = ± a± a −± a ± a is the commutator of the in-phase and· quadrature operators. Were ± a and ± a variables instead of operators, the ¸ † commutator ± a ,± a would be zero and ± a ·± a would¸ be equivalent to the classical squared magnitude | s| 2. But, as will be discussed, ± a ,± a is not zero. I

Q

I

I

Q

I

Q

Q

I

I

Q

Q

Q

I

Q

Q

I

15.3.2

I

Q

I

I

I

Q

Q

Canonical Commutation Relationship

The nonzero value of the commutator in (15.3.5) expresses a fundamental dependence between the in-phase component operator ± a I and the quadrature component operator ± aQ . The nature of this dependence is an axiom in quantum optics that can be expressed in several equivalent forms. When a signal state |ψ(aI )± is expressed in the basis of the in-phase signal component operator ± aI , one form of this axiom states that the quadrature component operator ± aQ in the in-phase component representation can be replaced by

±a −→ − 2i dad . Q

(15.3.6)

I

Similarly, the expression for the in-phase component operator in the quadrature component basis is given by ± aI = (i/ 2)d/ daQ . An equivalent expression, which is often stated as an alternative form to the axiom given in (15.3.6), can be derived from (15.3.6) by forming the commutator [± aI , ± aQ ] = ±aI ±aQ − ±aQ ±a I (cf. (2.1.98)). Using in-phase component representation of a signal state and d± aI / daI = ± I , where ± I is the identity operator, apply the commutator [± aI , ± a Q ] to a test signal state |ψ(aI )± expressed using the in-phase component basis. Then, because ±aQ −→ −(i /2)d/da I in the basis of the in-phase signal component, it follows that

· ±a , ±a ¸ |ψ(a )± = ±a ±a |ψ(a )± − ±a ±a |ψ(a )± ¹ d º i d = 2 −±a da |ψ(a )± + da (±a |ψ(a )±) ¹ d º i d ± = 2 −±a da |ψ(a )± + I |ψ(a )± + ±a da |ψ(a )± I

Q

I

I

Q

I

I

Q

I

I

I

I

I

I

I

I

I

I

I

I

I

I

= 2i |ψ(a )±. I

This leads to the fundamental and far-reaching expression for the commutator

·±a , ±a ¸ = i , 2 I

±

(15.3.7)

Q

±

13 It is conventional to write the operator a † to the left of the operator a . This is called normal ordering.

764

15 The Quantum-Optics Model

where an identity operator ± I is suppressed on the right side. This expression can be asserted as an axiom of quantum theory in place of (15.3.6). · a , ±a ¸ expresses a fundamental dependence between the in-phase The commutator ± I Q component operator ± aI and the quadrature component operator ± aQ that does not exist classically. This fundamental dependence is called the canonical commutation relationship.14 Because the operators ± a I and ± aQ do not commute, there is a fundamental dependence between the quantum wave function u(α I ) for the in-phase signal component and the quantum wave function U (α Q ) for the quadrature signal component. No set of orthogonal measurement states exists that can be simultaneously “aligned” with eigenstate |αI ± of the in-phase-component operator ± a I and eigenstate |αQ ± of the quadrature-component operator ± a Q. The form of this dependence, to be described in Section 15.3.4 as a Fourier transform, means that there must be some uncertainty in at least one of the outcomes of a measurement when the in-phase and quadrature components are simultaneously measured as would occur in a joint demodulation process. This fundamental dependence also means that the concept of entropy must be modified for a quantum-lightwave signal compared with a classical signal. This modification is discussed in Section 15.4.4. The commutator for the coherent-state operators ± a and ± a† is determined using (15.3.3) and is given by

» †¼ ( ±a , ±a = ±a + i ±a ) (±a − i ±a ) − (±a − i ±a ) (±a + i ±a ) · ¸ = −i2 ±a , ±a = 1, I

I

Q

I

I

Q

Q

I

Q

Q

(15.3.8)

where (15.3.7) has been used. The canonical commutation relationships given in (15.3.7) and (15.3.8) are equivalent and describe a fundamental dependence that does not exist within wave optics or photon optics.

Photon-Number-State Operator

± gives Substituting (15.3.7) into (15.3.5) and solving for the Hamiltonian H µ µ

± = ± ω ±a†±a + H

1 2



¶ = ± ω N± + 12 ,

where

.

± = ±a†±a N

(15.3.9a) (15.3.9b)

±. Expression (15.3.9a) differs from the classical is the photon-number-state operator N expression by the half-photon of vacuum-state energy, which is a direct consequence of (15.3.7). ± comprise the set The orthogonal eigenstates of the photon-number-state operator N of photon-number states {|m±}. The nonnegative eigenvalue m of the eigenstate |m± corresponds to the number of photons detected using ideal photon counting. 14 See Dirac (1981) Chapter 4, Section 21.

15.3 Coherent States

765

Expression (15.3.9) completes the set of relationships for the five interrelated operators ± aI , ± aQ , ± a, ± a† , and ± N that describe the properties of a coherent state. These are the main topics of this section.

15.3.3

Position–Momentum Representation

Within quantum theory, the properties of a lightwave described by the two observable operators ± aI and ± aQ , which are the in-phase and quadrature operators, are mathematically equivalent to the properties described by two observable operators called the position operator ± x and the momentum operator ± p. A position representation uses the eigenstates of the position operator ± x as a basis, and a momentum representation uses the eigenstates of the momentum operator ± p as a basis. Each operator has a continuous set of real eigenvalues. This set of eigenvalues in either representation is the quantum wave function in that representation. The quantum wave function in the position representation is denoted as u(x ). The quantum wave function in the momentum representation is denoted as U ( p). An in-phase component representation uses the eigenstates of the in-phase component operator ± aI as a basis, and a quadrature representation uses the eigenstates of the quadrature component operator ± aQ as a basis. Each operator has a continuous set of real eigenvalues. This set of eigenvalues in either representation is the quantum wave function in that representation. The quantum wave function in the in-phase component representation is denoted as u (a I ). The quantum wave function in the quadrature component representation is denoted as U (a Q). The pair of signal component operators (± aI , ± a Q ), and the pair of operators (± x, ± p) are 15 scaled versions of each other given by

±a =. I

±a =. Q

½ mω

±x ,

(15.3.10a)

1 ±p , 2m ω±

(15.3.10b)

½ 2±

where m is the mass of a system described by (ˆx , pˆ ). Because ± aI and ± a Q are dimen√ sionless quantities (cf. (15.3.9a)), the scaling factor m ω/2± in (15.3.10a) has units of √ reciprocal length and the scaling factor 1/2m ω± in (15.3.10b) has units of reciprocal momentum. Using the canonical commutation relationship given in (15.3.7) and repeated here

·±a , ±a ¸ = i , 2 I

Q

and substituting the expressions given in (15.3.10) yields the canonical commutation relationship for position and momentum

· ±x , ±p ¸ = i±.

(15.3.11)

15 The factor of two in this definition corresponds to a root-mean-squared magnitude. Peak values

±q =.



. √2 ±a I are also used as is convenient.

2± a I and ± p=

766

15 The Quantum-Optics Model

This expression states that the position operator ± x and the momentum operator ± p do not commute and do not share a common set of eigenstates. Similarly, starting with (15.3.11), direct substitution of (15.3.10) gives (15.3.7). Using (15.3.6) and the scaling terms given in (15.3.10), the momentum operator in one dimension can be expressed in a position representation using a spatial derivative with ± p −→ −i±(d/dx ). Because the two pairs of operators {± aI , ± aQ } and {± x, ± p} are mathematically equivalent, the pair {± x, ± p } is used whenever it is convenient. 15.3.4

Minimum-Uncertainty Coherent States

The fundamental dependence between the two quadrature signal components expressed by the canonical commutation relationship given in (15.3.7) leads to quantum uncertainty whenever a coherent state is measured. Quantifying this uncertainty requires the functional form of the dependence between the continuous eigenvalue distribution defining the quantum wave function u (α I ) for the in-phase signal component and the continuous eigenvalue distribution defining the quantum wave function U (α Q ) for the quadrature signal component. This section shows that these two quantum wave functions are related by a Fourier transform. To show this seminal relationship, it is conventional to work with the position operator ± x and the momentum operator ± p instead of the in-phase operator ± aI and the quadrature operator ± a Q (cf. (15.3.10)). With reference to (15.2.2), the coefficients of a one-dimensional position representation for a signal state |ψ± form the quantum wave function u(x ) in the position representation as given by

|ψ± =

¾∞

−∞

±x |ψ±dx P

=

¾∞

−∞

|x ±² x |ψ±dx =

¾∞

−∞

u(x )| x ±dx ,

(15.3.12)

where u( x ) = ²x |ψ±. Similarly, the quantum wave function in the momentum representation is given by. Applying ² p| on the left of each side of (15.3.12) gives U ( p) = ² p|ψ± =

¾∞

−∞

u(x )² p| x ±dx .

(15.3.13)

where U ( p) = ² p|ψ± is the quantum wave function in the moment representation. Expression (15.3.13) relates the quantum wave function u(x ) in a position representation to the quantum wave function U ( p) in a momentum representation in terms of the inner product ² p| x ±. The inner product ² p| x ± can determined by writing the position operator ± x in a momentum representation. Using (15.3.6) and the scaling terms given in (15.3.10), ±x = i±(d/dp). Applying this relationship to ² p|x ± gives d i ² p| x ± = − x ² p |x ±, dp ±

(15.3.14)

² p|x ± = Ae−ipx /±,

(15.3.15)

where ± x ² p |x ± = ² p|± x | x ± = x ² p| x ±. The general solution to (15.3.14), which is a differential equation in ² p| x ±, is

15.3 Coherent States

767

where A is a constant. The√constant can be determined by normalizing the inner product ²x1 |x2± √and is given by 1/ 2π ± . This is asked for in an end-of-chapter problem. Using A = 1/ 2π ± and substituting (15.3.15) into (15.3.13) gives

¾∞

1

u(x )e−i px/ ± dx . (15.3.16) 2π ± −∞ Expression (15.3.16) shows the wave function u (x ) in a position representation as the Fourier transform of the wave function U ( p) in a momentum representation. The root-mean-squared (rms) width σ x of | u(x )|2 is a measure of the uncertainty for the position. Similarly, the rms width σ p of | U ( p)|2 is a measure of the uncertainty for the momentum. Using the Fourier transform inequality given in (2.1.36) with Trms replaced by σx and 2π rms replaced by σk = σ p /± , the product of the rms widths of the two wave functions is lower-bounded by U ( p) =



W

σ x σ p ≥ 2± .

(15.3.17a)

This mathematical property for the Fourier transform as stated in the context of quantum theory is called the Heisenberg uncertainty relationship. It quantifies the joint uncertainty when using a canonical pair of noncommuting observable operators {± x, ± p} that do not share a common set of eigenstates. The quantum wave functions of a pair of noncommuting observable operators related by a Fourier transform are called conjugate variables. The in-phase operator and the quadrature operator are scaled versions of the position operator and momentum operator (cf. (15.3.10)). Accordingly, define σαI as the rms width of the probability distribution | u(α I )|2 . Define σα Q as the rms width of the probability distribution |U (α Q )| 2, where u(α I ) and U (α Q ) are related by a Fourier transform. Using the scaling factors given in (15.3.10) and (15.3.17a), the product of the rms width σα I of the probability distribution |u(αI )|2 for the in-phase component and the rms width σα Q of the probability distribution |U (αQ )|2 for the quadrature component satisfies

σα σ α ≥ 41 , I

Q

(15.3.17b)

with the product of the variances given by (15.3.18) σα2 σα2 ≥ 161 . Equality occurs only when u(α ) and U (α ) are gaussian distributions (cf. (2.1.37)). I

I

Q

Q

A minimum-uncertainty coherent state is generated whenever the inequality in (15.3.17) is satisfied with equality and, in addition, the rms widths are equal with σα2I = σα2Q . The equal variances are given by σα2I = σα2Q = 1/4. This minimumuncertainty condition corresponds to a circularly symmetric gaussian distribution with a uniform phase. The minimum-uncertainty coherent state is the quantum state that most closely resembles a classical shot-noise-limited system. Because the minimum-uncertainty state is achieved by a gaussian distribution, the measured shot noise for a classical additivenoise-free signal described by a complex signal s = s I + i sQ is a gaussian distribution

768

15 The Quantum-Optics Model

for each signal component irrespective of the mean signal level. This validates the use of a gaussian distribution to describe the shot noise in each signal component for a semiclassical analysis. However, the relationship between αI and αQ is fundamentally different than the relationship between sI and sQ . For a classical complex signal, no prior relationship is presumed between s I and s Q , with the joint statistical properties of a complex symbol described by a product distribution. In the absence of additive noise or shot noise, the classical in-phase and quadrature signal components can be jointly demodulated without error. In contrast, the wave functions u (α I ) and U (α Q) are related by a Fourier transform. Therefore, a joint probability distribution relating the probability distributions |u(α I )| 2 and |U (α Q)|2 is not meaningful, as such. 15.3.5

The Coherent-State Operator

Because the operators ± aI and ± aQ do not commute, the coherent-state operator ± a = ±aI + i±aQ is not self-adjoint and does not represent an observable property. This implies that a joint measurement of both signal components must be treated differently than a measurement of one signal component. This distinction does not exist for a classical lightwave. The eigenstates of the coherent-state operator are | α±, where α is the Glauber number generated when the operator ± a is applied to a coherent-state signal |α ± with Similarly,

±a |α± = α |α ±.

(15.3.19a)

²α |±a† = ²α|α∗ .

(15.3.19b)

The sum and difference of the operators are given by

±a + ±a† = 2±a , ±a − ±a† = 2i±a . I

Q

These two expressions are analogous to the classical expressions s s − s∗ = 2is Q .

(15.3.20a) (15.3.20b)

+ s∗ =

2s I and

Time Evolution

The correspondence between the temporal evolution of the coherent-state operator ± a (t ) and the temporal evolution of the classical expression s(t ) can be established by forming the expectation of the coherent-state operator in a Heisenberg representation for which the time dependence is carried with the operator. Consider ± a(t ) = ± a(0)eiωt and form the expectation. Because the complex exponential factors out, this gives

²α|±a (t )|α ± = ²α |±a(0)|α ±eiωt = α eiωt , (15.3.21a) where (15.3.19a) with ± a = ± a(0) has been used to write ²α|± a(0)|α ± = α ²α |α ± = α . Similarly, using (15.3.19b) gives

²α|±a †(t )|α± = α ∗e−i ωt .

(15.3.21b)

15.3 Coherent States

769

Because the expectation is linear, using (15.3.20) and (15.3.21) gives

²α |±a |α± = 21 (²α|±a(t )|α ± + ²α |±a †(t )|α ±) µ ¶ = 21 α eiωt + α∗ e−iωt = |α |cos (ωt + arg α ).

(15.3.22a)

²α|±a |α± = (1/2i)(²α |±a (t )|α± − ²α |±a† (t )|α ±) µ ¶ = (1/2i) α eiωt − α ∗ e−iωt = |α|sin(ωt + arg α ).

(15.3.22b)

I

Similarly, Q

These expressions show that the expectations of the two component operators ± a I and ± aQ have the same form as the corresponding classical signal components with the classical complex amplitude s replaced by α.

Electric Field Operator

E that corresponds to the electric field By similar reasoning, the observable operator ± vector for a single frequency ω in a single spatial mode propagating in the x direction with a propagation constant β(ω) and a polarization ± e satisfies16

µ ¶ ±E ∝ ±aei(β x −ωt ) − ±a †e−i (βx −ωt ) ±e,

(15.3.23)

where the proportionality constant depends both on the frequency ω and on the geometry that defines the mode. In contrast to classical optics, the noncommuting nature of ± a and ±a† given in (15.3.8) implies that the negative-temporal-frequency component which is the first term on the right side of (15.3.23), is not equivalent to the positive-temporalfrequency component, which is the second term on the right side of (15.3.23).

Expected Values

.

±± in a single spatiotemporal mode described by The expected number of photons E = ² N a quantized harmonic oscillator with Glauber number α is E

= ²α |N±|α± = ²α|±a †±a |α± = ²α|α ∗ α|α± = |α |2 ²α |α ± = |α|2,

(15.3.24)

where ²α|± a† = ²α |α ∗ and (15.2.3) have been used. The expected energy E is E

¶ µ = ² H±± = ²α|±ω N± + 21 |α ± µ ¶ = ± ω ²α |±a†±a|α± + ²α| 12 |α±

16 See Loudon (2000), Section 4.5. For consistency with the literature, the sign convention used for this field

is reversed in this chapter compared with the rest of the book.

770

15 The Quantum-Optics Model

µ ¶ = ±ω |α|2 + 12 µ ¶ = ±ω E + 21 ,

(15.3.25)

where the expected number of photons is given in (15.3.24) as E = |α|2 . Expression (15.3.25) validates the expression stated without proof in (6.1.11). It is not equal to the semiclassical expression E = ±ωE for the energy in a single mode. This is because the vacuum-state energy ±ω/2 must be included.

Raising and Lowering Operators

Although ± a and ± a† are not observable operators, they have simple interpretations when applied to a photon-number state |m±. When ± a† is applied to |m±, it increments the number of photons in a mode by one photon, giving the (unnormalized) state |m + 1±. Similarly, when ± a is applied to |m±, it decrements the number of photons in a mode by one photon, giving the (unnormalized) state |m − 1±. To derive these relationships, form the eigenvalue equation for the energy

±|m± = Em|m±. H

(15.3.26)

µ † 1¶ Now use (15.3.9a) to write this as ±ω ± a± a + 2 |m± = E m |m± and multiply both sides † from the left by ± a to give µ ¶ ±ω ±a † ±a† ±a + 21±a† |m± = Em±a†|m±. (15.3.27) Using ± a †± a = ± a± a† − 1 from the commutator relationship given in (15.3.8), write † † † † ±a (±a ±a) = ±a ±a ±a − ±a †. Substituting this expression into (15.3.27) gives

µ

¶ ¶ µ ± ω ±a†±a ±a † + 21±a † |m± = ( E m + ±ω) ±a † |m± µ ¶ ±ω ±a †±a + 12 ±a† |m± = ( E m + ±ω) ±a † |m± , ±(±a¿ †ÀÁ|m±) = ( Em + ±ω) (±a¿ †ÀÁ|m±), H

±ω ±a †±a ±a† − ±a † + 21±a † |m± = E m±a † |m±

|m+1±

(15.3.28)

|m+1±

where (15.3.9a) has been used to go from the third line to the fourth line. The expression on the last line of (15.3.28) is an eigenvalue equation for Hˆ with an eigenstate ± a † |m± and a corresponding eigenvalue Em + ±ω . Given that E m corresponds to the energy in a mode with m photons and that ±ω is the energy of a single photon, this energy eigenstate corresponds to a state with m + 1 photons. Accordingly, the state ± a † |m± corresponds to a photon-number state | m + 1± that has one more photon. Consequently, the operator ± a† 17 is called is called the raising operator. 17 The lowering operator is also called the annihilation operator and the raising operator is also called the

creation operator.

15.3 Coherent States

771

Similarly, ± a | m± corresponds to a photon-number state |m − 1± that has one photon fewer. Consequently, the operator ± a is called the lowering operator with ± a|m± = ± = ±a†±a as follows: K | m − 1± for some K . To determine K , use N

|K |2²m − 1|m − 1± = ²m|±a†±a|m± |K |2 = ²m| N±|m± |K |2 = m, or K

= m1/2 , so that

±a |m± = m1/2 |m − 1±.

(15.3.29)

= N± + 1 (cf. (15.3.8)) ±a †|m± = (m + 1)1/2 |m + 1±. (15.3.30) The eigenstates of ± a are the coherent states {|α±} (cf. (15.3.19)). A different equivalent

Similarly, using ± a± a†

representation of the coherent states is presented in the next subsection.

Vacuum state The photon-number state | 0± with zero photons, which is the vacuum state of a mode, must be considered separately. Because the vacuum state is the lowest-energy state of a mode, applying the lowering operator to the vacuum state cannot produce a quantum state with a lower energy. Therefore

±a |0± = 0,

(15.3.31)

where the zero on the right side means that no state is produced because no state exists with an energy below the vacuum state. The energy in the vacuum state is determined using (15.3.26)

µ

±|0± = ±ω ±a †±a + H

1 2



|0± = 21 ± ω|0±,

(15.3.32)

where (15.3.31) has been used. The factor of one-half is the energy in an unoccupied mode with zero photons and is the zero-point energy or the vacuum-state energy. This energy corresponds to half a photon. The vacuum state is an eigenstate of the Hamiltonian energy operator. It is not an eigenstate of the in-phase component operator ± a I , the quadrature component operator ±aQ , or the coherent-state operator ±a. The noncommuting nature of the signal components given in (15.3.7) leads directly to (15.3.32). The energy ±ω/2 in the vacuum state may be regarded as expressing fundamental fluctuations that are the origin of lightwave noise. For large signal levels, the vacuum-state energy can be neglected. For this case, the phase-space representation of a coherent state has the same functional form as the signalspace representation of a classical signal, as shown in Figure 15.3(a). Asserting the ±± corresponds to |s|2 = E and ²H±± corresponds to ±ω|s|2 = correspondence principle, ²N ±ω E, with the mean energy equal to the product of the mean number of photons and the energy of a photon ±ω .

772

15 The Quantum-Optics Model

15.3.6

Representation of a Coherent State

The previous subsection showed that the coherent states {|α±} are the eigenstates of an operator ± a that decrements the number of photons in a mode by one photon. This section now shows that a coherent state |α± can be expressed as a linear combination of the photon-number states |m±. Start with the only photon-number state that has a classical counterpart. This is the vacuum state |0±, which has zero photons. A coherent state | α± with Glauber number α is constructed from the vacuum state by applying the displacement operator, which is defined as

D±(α ) =. e ±a − ∗±a = e−| | /2 e ±a e− ∗±a . α



α

α

2

α



α

(15.3.33)

(The exponentiation of an operator is defined using a power series expansion for the exponential function (cf. (2.1.92))). The second expression in (15.3.33) is derived from the first expression using the operator relationship18

± ± = e A±e ±B e− 21 [ ±A ,±B ] ,

e A+ B

±, ±B]. Showing the derivaprovided that ± A and ± B each commute with the commutator [ A tion of (15.3.33) by using this relationship is asked for as an end-of-chapter exercise. The displacement operator displaces or shifts the vacuum state with the displacement defined by the Glauber number. This displacement can be regarded as the quantumoptics equivalent of adding a bias to a zero-mean probability distribution and is shown pictorially in Figure 15.13. ±(α )|0± is now determined by applying the The displacement of the vacuum state D ±(α ) given in (15.3.33), in order, to the vacuum state |0±. three factors of the operator D ∗a − α ± The rightmost operator e with the exponential function written as a series gives e−α

∗± a

|0± = (1 + α∗±a + · · ·)|0± = |0±,

where ± a |0± = 0 is used on all terms except the first term. Applying the remaining two terms of (15.3.33), a coherent state displaced vacuum state given by

D±(α)|0± =. |α ± = e−| | /2 e ±a |0± ∞ m µ ¶m ³ = e−| | /2 αm! ±a † |0± m =0 ∞ m ³ = e−| | /2 √α |m±, m! m =0 α

α

α

2

α

|α ± is defined as a



2

2

(15.3.34)

where, in the mth term of the sum, the raising √ operator defined in (15.3.30) has been applied m times to the vacuum state to give m!|m±. 18 This relationship is called the Campbell–Baker–Hausdorff identity. It is suggested by inspection of the

series expansion of an exponential. For a proof see Section 10.11.5 in Mandel and Wolf (1995).

15.3 Coherent States

773

Equation (15.3.34) expresses a coherent state as an expansion in an infinitedimensional signal space using a basis {|m±} of photon-number states parameterized by the Glauber number α. Using ²n|m± = δnm for the eigenbasis of photon-number states, the expansion coefficients {cm } of a coherent state expressed in a photon-number-state basis are cm

= ²m|α ± m = e −| | /2 √α . m! α

2

(15.3.35)

The probability p(m) that m photons are measured is given by (15.2.5b), so that p(m) = |²m|α ±|2 = | cm |2

(|α |2 )m m! m ( E ) = e−E m! . = e−|α|2

(15.3.36)

This is the Poisson probability mass function with an expected number of photons given by E = | α| 2 (cf. (15.3.24)). This expression states that when a coherent state is expressed using the photon-number-state basis, the outcome of an ideal photon-counting measurement is random and is described by a Poisson probability mass function. This statement justifies the use of the Poisson probability mass function within photon optics. The classical description of the same probability mass function postulates random photon arrival times (cf. Section 6.2.3) and then derives the Poisson probability mass function as the limiting case of the binomial probability mass function. While this approach leads to the same probability distribution, the interpretation of the randomness is markedly different. The semiclassical expressions do not explicitly account for quantum uncertainty. Accordingly, the Poisson probability mass function is viewed as a form of statistical uncertainty associated with random photon arrivals. Within quantum optics, the same Poisson probability mass function is regarded as a form of quantum uncertainty, which is distinct from statistical uncertainty.

Phase-Space Representation of a Coherent State

A different representation, called a phase-space representation, can be used to represent a coherent state parameterized by the Glauber number α . A phase-space representation superficially resembles the classical representation of a complex signal point s on the complex plane in terms of the in-phase and quadrature components. However, a phasespace representation is fundamentally different than either the in-phase/quadrature representation of a classical complex signal point or the photon-number state representation of a coherent state given in (15.3.34). This is because there is an inherent dependence between the real part and the imaginary part of the states described by the Glauber number that does not exist for the other representations (cf. (15.3.16)). This dependence is most pronounced at small Glauber numbers. For large Glauber numbers, the dependence is not evident with the complex Glauber number α equivalent to the complex signal point s. Quasi-probability distributions that characterize the joint

774

15 The Quantum-Optics Model

properties of the in-phase and quadrature components of a coherent state are discussed in Section 15.6.

15.3.7

The Pairwise Nonorthogonality of Coherent States

The identity operator ± I can be expressed in terms of the coherent states using the following closure property of the coherent states:

±I =

1

π

¾

α

|α ±²α |dα.

(15.3.37)

The proof of this closure property is considered in an end-of-chapter exercise.The operator |α ±²α| in (15.3.37) is generated by the outer product of the coherent state |α ± with itself, and the integral is evaluated over the phase space of the complex Glauber numbers α . Expression (15.3.37) implies that the coherent states span the space. As such, they form an over-complete basis19 that can be used to express other signal states. The outer-product operators {|α ±²α|} used in (15.3.37) are not orthogonal projection operators. Accordingly, the corresponding set {|α ±} of coherent states is pairwise nonorthogonal as is quantified by the inner product. Using (15.3.34), the inner product of two coherent states, denoted κ , is given by α1 |2 +|α0 |2)/2 ³ √ n! m ,n ( 2 ) ∗ 2 = e α1α0 − |α1| +|α 0| /2

( κ =. ²α 1|α0 ± = e − |

= e −|

m

α1

α1

−α 0|2 /2 .

( ∗) n √α 0 ²n|m± m!

(15.3.38)

The series expansion of the exponential function has been used to collapse the sum, and ²n|m± = δnm for orthogonal photon-number states. The inner product is real, with square

κ2 =. e−| − = e−d , α1

.

.

2 01

α0

|2 (15.3.39)

where E1 = |α1 |2 and E2 = |α 0| 2 are the mean numbers of photons for the two coherent 2 states, and d01 = |α 1 − α0 |2 is the squared euclidean distance (cf. (2.1.73)) between the two Glauber numbers α1 and α 0 that specify the two coherent states. Expression (15.3.39) states that no two coherent states are orthogonal. This is true even when the two complex Glauber numbers that specify the two coherent states, viewed as vectors, satisfy α 1 · α0 = 0 and are said to be orthogonal complex numbers. The orthogonality of two Glauber numbers does not imply that the corresponding two Glauber states are orthogonal. Specifically, when the total mean signal level E = E1 + E0 2 is large, the squared euclidean distance d01 is large and κ 2 in (15.3.39) goes to zero 19 Formally, a basis is a minimal set of functions, which need not be orthogonal, that spans the signal space.

For quantum optics, the countable set of photon-number states is such a basis. Because the expansion in (15.3.37) uses an uncountable set of coherent states that span the same signal space, the set of coherent states is an over-complete uncountable basis.

15.3 Coherent States

775

irrespective of the inner product α 1 · α 0 of the two Glauber numbers that specify the coherent states. In this case, any two coherent states, though never truly orthogonal, can be treated as orthogonal. When E is small, two members of any pairwise subset of coherent states are not orthogonal irrespective of the inner product α1 · α 0 of the two Glauber numbers. This is stated informally as the nonorthogonality of the set of coherent states. The nonorthogonality of the coherent states may be viewed as an expression of the dual particle/wave nature of a quantum lightwave. At small signal levels, the nonorthogonality of a set of coherent states is pronounced, with the particle nature of a quantum lightwave evident. At large signal levels, the set of coherent states becomes nearly orthogonal, and the classical wave nature of a quantum lightwave is most pronounced. The signal-dependent nature of the inner product for a set of pairwise nonorthogonal signal states is shared by other properties of a quantum-lightwave communication system. This is an essential difference between classical communications and quantumlightwave communications, and is most pronounced for small signal levels.

Classical Large-Signal Limit

The nonorthogonality of a set of coherent states may be viewed as quantifying the different dependences between a set of coherent states and a set of classical signals. For large signal levels, the pairwise inner product of a set of coherent states approaches zero (cf. (15.3.39)). This equivalence provides the conceptual bridge between quantum optics and wave optics, as is illustrated in Figure 15.2. Then replacing the set of coherent states by a set of classical signals gives an accurate analysis with the classical complex signal point s equivalent to the Glauber number α (cf. Section 10.2.5). This case is shown notationally in Figure 15.3(a).

15.3.8

Antipodal Coherent States

Binary antipodal coherent-state modulation involves two Glauber numbers, α 0 and α1 , satisfying α1 = −α0 . The squared euclidean distance between these two complex numbers is

= |α 1 − α 0|2 = 4Eb, 2 and the squared inner product κ01 = ²α0 |α1 ±2 is given by κ012 = e−d = e−4E , . where Eb = |α1 |2 is the mean number of photons per bit. 2 d01

2 01

b

(15.3.40a)

(15.3.40b)

Binary antipodal coherent-state modulation is not the same as classical binary phaseshift keying. While the two Glauber numbers α1 and α0 that define the two antipodal coherent states are related by a simple sign change, as would be the case for binary phase-shift keying, the two antipodal coherent states | α1 ± and | α0 ± specified by those two Glauber numbers are not scaled versions of each other. This means that the state |−α 1± is not equal to the state −|α 1±. This statement can be directly verified using (15.3.34). Generalizing, the elements of a constellation of coherent states cannot be expressed as scalar multiples of a single “basis” coherent state as would be the case for a

776

15 The Quantum-Optics Model

classical signal constellation defined on the complex plane. This statement has important consequences for the detection of coherent states and is discussed in Chapter 16. As a numerical example, let Eb equal 2, corresponding to an average of two photons. 2 Then κ12 = e−4Eb = e−8 = 3.36 × 10−4 . Given that the energy of each photon is ± ω, Eb equal to 2 corresponds to a signal power density spectrum of 2 ±ω = −156 dBm/Hz at a wavelength of 1550 nm. For an average signal level Eb much larger than this value of two, though they are never orthogonal, the two coherent states | α0 ± and |α1± may be regarded as orthogonal. In this large-signal regime, a quantum-optics analysis may be replaced by a continuous-wave-optics analysis augmented by a semiclassical description of quantum noise when appropriate (cf. Section 10.2.5).

15.4

Statistical Quantum Optics Section 15.2 deals only with the quantum uncertainty introduced by the process of measurement for a pure signal state, which has no statistical uncertainty. Now, in this section, the signal state is random. This randomness is called statistical uncertainty. The statistical uncertainty must be reconciled with the quantum uncertainty. This reconciliation is accomplished by means of the density matrix ± ρ. Any nonnegative-definite hermitian matrix with trace one is a valid density matrix. Because a density matrix is more general than a probability distribution, the class of transformations that preserve the properties of a density matrix is larger than the class of transformations that preserve the properties of a probability distribution. Our challenge is to understand how the transformations in this larger class of density matrices differ from the transformations on probability distributions. For a classical memoryless channel described by a product probability distribution, the probability distribution for each individual symbol is defined either on the real axis or on the complex plane. The quantum-lightwave signal state that corresponds to a classical product distribution is a product state composed of component-symbol states (cf. (15.2.11)). The component-symbol states used to form a product state, such as a set of coherent states, are described in a higher-dimensional signal space than are classical symbols (cf. (15.3.34)). This means that a transformation on a density matrix, such as the propagation through a lightwave channel, is a mapping from one subspace to another subspace rather than a mapping on the real axis or the complex plane. The eigenvalues of the density matrix express the combined effect of quantum uncertainty and statistical uncertainty. The eigenvalues of the density matrix correspond to a classical probability distribution when the quantum signal states in an ensemble are orthogonal and can be “aligned” one-to-one with the orthogonal basis states that define an observable property. When the signal states in the ensemble are not orthogonal, which is the case for coherent states, this alignment is not possible for any observable property, and thus quantum uncertainty is always present in the outcome of a measurement. For this case, the eigenvalues of the density matrix express a mixture of statistical uncertainty and quantum uncertainty. These eigenvalues cannot be interpreted as a classical probability distribution unless the mean signal level is large enough that the coherent

15.4 Statistical Quantum Optics

777

states can be regarded as orthogonal and the quantum uncertainty can be neglected with respect to other forms of uncertainty such as classical noise. 15.4.1

Derivation of the Density Matrix

Our discussion of a density matrix starts with the special case in which the set of signal states used to form a statistical ensemble is the set {|r ± ±} of orthonormal eigenstates of the observable ± R so that ²r± |rm ± = δ ±m , where δ ±m is the Kronecker impulse. For this special case, the quantum expectation of the observable operator ± R for each signal state |r± ± in the ensemble is

²R±±± =. ²ψ±| ±R|ψ±± = ²r±| ±R|r± ± = r±.

(15.4.1)

This is simply the eigenvalue for that eigenstate (cf. (15.2.4)). Using the quantum expectation for a measured outcome, form the statistical expectation over an ensemble of orthogonal signal states using a measurement basis aligned to that set of signal states. This expectation is

² ±R± =

³ ±

Ã

±|r± p(±) r± | R

Ä ³ = p(±)r±, ±

(15.4.2)

where p(±) is the probability distribution on the ensemble of signal states. When the ensemble is a set of orthogonal signal states that are measured in a basis aligned with that set, there is no quantum uncertainty in the measurement process. Therefore, the ±± of the observable R± depends only on the statistical uncertainty, which expectation ²R is characterized by the probability distribution p(±). Ã Ä Ã R and |r Ä. The ± The expectation r ± | R |r ± is the inner product of the two states r± | ± Ã ±±|r Ä = inner product is equal to the trace of the outer product. This is given by r± | R ± ±), where the trace is defined in (2.1.85). Because the trace is a linear opertrace(|r ± ±²r± | R ation (cf. (2.1.84)) invariant for any basis describing the operator, the expectation in (15.4.2) can be written as

à ±Ä ³ ( ) R = p(±) trace |r± ±²r± | ± R ± ¹ ų Æ º p (±)|r± ±²r ± | ± R = trace ± ( ) = trace ±ρ ±R ,

(15.4.3)

where the term in brackets on the second line of (15.4.3) is written as

³ ρ± =. p(±)|r ±±²r±|, ±

(15.4.4)

which is a density matrix. When the members {|r± ±} of the ensemble are restricted to being orthogonal, each ±± (cf. (15.2.2)) for the outer product |r ± ±²r± | in (15.4.4) is a projection operator P state |r± ±. Each projection operator has one diagonal element equal to one and all other elements, diagonal and off-diagonal, equal to zero. Therefore, the sum of these

778

15 The Quantum-Optics Model

projection operators leads to a density matrix ± ρ that is a diagonal matrix with the diagonal elements ± ρ±± describing the probability distribution p(±). This means that, for the restricted case of a set of orthogonal signal states expressed in a basis aligned to those states, the density matrix is a diagonal matrix that is equivalent to the classical probability distribution p(±). This statement provides the bridge between probability distributions used for classical communications and density matrices used for quantum optics. For an arbitrary ensemble {|ψ± ±} of normalized signal states, which need not be individual component-symbol states and need not be the pairwise orthogonal eigenstates of an observable, the density matrix is given by

±ρ =

³ ±

p(±)|ψ± ±²ψ± |,

(15.4.5)

where the set of outer products {|ψ± ±²ψ± |} is not a set of orthogonal projection operators when the signaling states {|ψ± ±} are pairwise nonorthogonal.

Pure and Mixed Signal States

A quantum-lightwave signal that has no statistical uncertainty is called a pure signal state. A pure signal state can be a component-symbol state or it can be a block-symbol state defined in an enlarged signal space. A pure signal state is the quantum-lightwave equivalent of a deterministic classical signal that has no statistical uncertainty. However, the outcome of a measurement on a pure state may have quantum uncertainty if the pure state is not aligned with an eigenstate of the observable. The density matrix of a pure state is an outer product given by

±ρ = |ψ±²ψ|.

(15.4.6)

It has one eigenvalue equal to one and all other eigenvalues equal to zero. When a quantum-lightwave signal has statistical uncertainty, it is called a mixed signal state. The density matrix of a mixed signal state is given in (15.4.5). When each term |ψ± ±²ψ± | on the right side is a pure signal state, the mixed signal state is a statistical mixture of pure signal states. This is the conventional case for communication systems. This mixture corresponds to a classical random signal described by a probability density. When there is only one nonzero term in the sum, a mixed signal state reduces to a pure signal state. A mixed signal state at the channel output is generated when a pure signal state at the channel input interacts with a set of unknown external states within the channel. This interaction generates a signal state at the channel output described in an enlarged signal space. In principle, were the external states known with certainty, the signal state in the enlarged signal space could be expressed as a pure signal state. The lack of knowledge about the external states leads to a mixed signal state in the original signal space characterized by statistical uncertainty, which is considered as noise. For either a pure signal state or a mixed signal state, quantum uncertainty exists whenever the signal state is not an eigenstate of the measurement operator.

15.4 Statistical Quantum Optics

779

Density Matrix for a Product State

The density matrix ± ρblk for a block-symbol state that is a product state is the outer product of the density matrices for each component-symbol state ± ρsym . Each member of a set {± ρblk (±)} of such block-symbol states can be written as

ρ±blk(±) = ±ρsym (1, ±) ⊗ ±ρsym (2, ±) ⊗ · · · ⊗ ρ±sym (K , ±), (15.4.7) where ± ρsym (i , ±) is the i th component-symbol state of the ±th block-symbol state. When the component-symbol states are pure signal states, ρ±sym (i ) = |ψi ±²ψi | (cf. (15.4.6)) and the block-symbol state is a pure signal state. When the component-symbol ∑ states are mixed signal states, ± ρ (i ) = p(i )|ψ ±²ψ | (cf. (15.4.5)) and the blocksym

i

i

i

symbol state is a mixed signal state. The density matrix ± ρblk of the product state in the enlarged composite signal space is the outer product (cf. (2.1.99)) of the multiple component-symbol states ρ±sym (i ). When the size of the signal space for each component-symbol state is the same, the outer product ± ρblk = ±σ ⊗ ±µ of two component-symbol states ±µ and ±σ can be written as a Kronecker product (cf. (2.1.99)) of the following form

⎡ ±ρblk = ⎢⎣

σ11±µ σ 12±µ .. .. . . σn1±µ σn 2µ±

· · · σ1n ±µ ⎤ .. .. ⎥⎦ . . . · · · σnn ±µ

(15.4.8)

Given the density matrix ± ρblk for the block-symbol product state, the density matrix of the component-symbol state ± µ can be recovered from (15.4.8) by summing the subblocks along the diagonal of ± ρ as given by

σ11 ±µ + σ22±µ + · · · + σnn ±µ = ±µ,

(15.4.9a)

because trace ± σ = 1. Reducing a density matrix defined in a composite signal space to a density matrix defined in a component signal space is called taking a partial trace. For a block-symbol product state 20 ± ρblk composed of two component-symbol states ±σ and ±µ, the partial trace for each component-symbol state is simply (cf. (15.4.9a)) traceσ (± σ ⊗ ±µ) = ±µ,

traceµ(± σ ⊗ ±µ)

= ±σ .

(15.4.9b)

The partial-trace operation extends to a block-symbol product state composed of multiple component-symbol states and, similarly to the trace operation, the partial trace does not depend on the basis. The partial trace for a block-symbol product state can be viewed as the quantumoptics equivalent of the classical marginalization of a product probability distribution to the marginal probability distribution of a single component (cf. Section 11.3.4). Accordingly, the partial-trace operation “marginalizes” the density matrix for a blocksymbol state to the marginal density matrix that describes a component-symbol state. This equivalence applies only to product states, and is not generally true for entangled states. 20 The partial trace for a general composite signal state which need not be a product state is different. For

details, see Section 2.4.3 of Nielsen and Chuang (2000).

780

15 The Quantum-Optics Model

Density Matrix for a Quantum-Lightwave Signal Constellation

The elements of a quantum-lightwave signal constellation depend on the statepreparation process. Two kinds of quantum-lightwave signal constellations are distinguished. For the first kind of signal constellation, the elements are pure componentsymbol states. This quantum-lightwave signal constellation is equivalent to a classical signal constellation defined in signal space (cf. Section 10.4). For this case, the density matrix for each pure component-symbol state is ρ±sym (i ) = |ψi ±²ψi |. The average density matrix ρ±sym at the input to the lightwave channel is

±ρsym =

³ ±

psym (±)ρ±sym (±) =

³ ±

psym (±)|ψ± ±²ψ± |,

(15.4.10)

where psym (±) is the prior probability for the component-symbol state ± ρsym (±). This average density matrix is a statistical mixture of the pure component-symbol states in the signal constellation. For the second kind of signal constellation, the elements are pure block-symbol states ± ρblk (±) defined in an enlarged signal space. For this case, each pure blocksymbol state ± ρblk (±) in the signal constellation is a product state composed of pure component-symbol states. This pure block-symbol state can be written as (cf. (15.4.7))

ρ±blk (±) = (|ψ1±±²ψ1±|) ⊗ (|ψ2± ±²ψ2±|) ⊗ · · · ⊗ (|ψ ± ±²ψ ± |) , (15.4.11) where |ψi ± ± is the i th pure component-symbol state used to compose the ±th pure blockK

K

symbol state. When viewed at the level of the enlarged signal space, each block-symbol state is appropriately described as an element of a signal constellation defined in that enlarged signal space. When viewed at the level of a component-symbol state, the set of blocksymbol states {± ρblk (±)} is appropriately described as a statebook, which is analogous to a classical codebook. The composition of a statebook and the difference between a statebook and a codebook are discussed in Section 16.4.3. The average density matrix ± ρblk of a signal constellation {±ρblk (±)} of pure blocksymbol states defined in an enlarged signal space is given by

±ρblk =

³ ±

pblk (±)ρ±blk(±).

(15.4.12)

This density matrix is a statistical mixture of pure block-symbol states, where pblk(±) is the prior probability on the ±th block-symbol state. A general notation that describes the average density matrix ± ρ for both kinds of signal constellation is convenient and is given by

±ρ =

³ s

p(s)± ρs ,

(15.4.13)

where ± ρ is the average density matrix of the signal constellation, and where s indexes the elements ± ρs of the signal constellation. Each pure signal state occurs with a prior probability p(s ). For this general form, the set {|ψs ±} of signal states that comprise the signal constellation may be component-symbol states (cf. (15.4.6)), block-symbol

15.4 Statistical Quantum Optics

781

states that are product states (cf. (15.4.7)), or even block-symbol states that are entangled states. When the pure signal states {|ψs±} that comprise the signal constellation given in (15.4.13) are orthogonal, every outer product |ψs ±²ψs| in (15.4.13) is a projection oper±s. Summing the projection operators produces a diagonal density matrix with ator P the prior probability of each signal state lying along the diagonal of the matrix. This corresponds to classical signaling. When the set of signal states {|ψs±} is pairwise nonorthogonal, the density matrix expresses a mixture of statistical uncertainty and quantum uncertainty. For this case, the system cannot be analyzed classically. 15.4.2

Representation of a Density Matrix

The representation of the density matrix depends on the basis because the quantum uncertainty depends on the basis. Consider a mixed signal state described by a statistical ensemble of two pure signal states, |r 0± and |r φ ±, that differ by a generalized angle φ . These states need not be coherent states. The first state |r 0± has probability p and the second state |r φ ± has probability 1 − p. Let κ = ²r0 |rφ ± = cos φ be the inner product between the two signal states, where κ is assumed to be real and the states are normalized so that ²r0|r 0± = ²r φ |rφ ± = 1. The density matrix is given by (15.4.5):

±ρ =

³ n

p(±)|ψ± ±²ψ± | = p| r0±²r0 | + (1 − p)|r φ ±²rφ |.

(15.4.14)

Express |r φ ± using a basis consisting of the basis state | r0± and another state |r 1± orthogonal to |r0±. Because the squared coefficients sum to one,

|rφ ± = κ|r0± +

Ç

1 − κ 2 |r1±.

Substituting this expression into (15.4.14), the density matrix in the {|r 0±,|r 1} basis is

»µ ¶µ ¶¼ Ç Ç ±ρ = p|r 0±²r0 | + (1 − p) κ|r0 ± + 1 − κ2 |r1 ± κ²r 0| + 1 − κ 2²r1| ¶ µ Ç = ( p + (1 − p)κ 2)|r0 ±²r0| + (1 − p)κ 1 − κ2 |r0 ±²r 1| µ ¶ µ ¶ Ç + (1 − p)κ 1 − κ 2 |r 1±²r0 | + (1 − p)(1 − κ 2 ) |r1 ±²r1 |. (15.4.15) Writing this as a matrix expression with each element ρ±±m given by the coefficient of the outer product term |r± ±²rm | gives the density matrix É È √ p + (1 − p)κ 2 ( 1 − p)κ 1 − κ 2 . (15.4.16) ±ρ = (1 − p)κ √1 − κ 2 (1 − p)(1 − κ 2 ) .

The eigenvalues of this density matrix are

λ0,1 = 21

¹

Ê

1 ´ 1 + 4p

(κ 2(1 − p) + p − 1)º .

(15.4.17)

When the pairwise inner product κ of the signal states lies in the interval between zero and one, the two signal states |r0 ± and | rφ ± are nonorthogonal, with |r φ ± expressed as

782

15 The Quantum-Optics Model

a superposition of the two eigenstates {|r0 ± and | r1±} . For this case, the eigenvalues given in (15.4.17) depend both on the statistical uncertainty expressed by p and on the quantum uncertainty expressed by the inner product κ . This case emphasizes the basisdependent nature of the outcome of a measurement on an ensemble of nonorthogonal signal states. The off-diagonal elements in (15.4.16) are called the pairwise quantum coherence of the set of signal states expressed in the basis used to define the density matrix. This quantum coherence is typically quantified as a quantum phase term e iφ , with the generalized angle φ related to the inner product by cos φ = κ . The basis-dependent quantum coherence terms may have a correlation structure with higher-order statistics that cannot be expressed in terms of first-order and second-order statistics as would be the case for a gaussian signal state corresponding to a classical signal. The presence of such higher-order coherence is evidence of a nonclassical signal state.

Representation of a Pure Signal State

When the pairwise inner product κ between two pure signal states is one, the two signal states are coincident. This is a single pure signal state with the eigenvalues in (15.4.17) given by λ0 = 1 and λ1 = 0. In the basis that diagonalizes the density matrix, the pure signal state is equal to a single basis state in that representation. Therefore, the density matrix has only one nonzero diagonal element, which is equal to one. Equivalently, the matrix representation of a pure signal state is a rank-one matrix. The same pure signal state expressed in a different basis is a superposition of the basis states in that representation. Accordingly, the same pure signal state has quantum uncertainty when measured in a basis for which the density matrix is not diagonal. These observations show that a pure signal state can be distinguished from a mixed signal state using the trace of the square of the density matrix ± ρ 2 , which is invariant under the choice of the basis. For a pure signal state, the following properties hold:

ρ± 2 = ±ρ , trace ± ρ 2 = 1.

(15.4.18a) (15.4.18b)

Because the trace is invariant under the choice of basis, these relationships are readily verified for the density matrix of a pure signal state expressed in the basis for which ρ± is diagonal.

Representation of a Mixed Signal State

The density matrix for a mixed signal state has a rank larger than one with trace ± ρ 2 < 1, so ρ±2 µ = ρ±. Smaller values of trace ± ρ 2 indicate more statistical uncertainty. The representation of a mixed signal state generated from an ensemble of two pure signal states with one basis state aligned to one signal state is given in (15.4.16). When the pairwise inner product κ between the two signal states is zero, the signal states are orthogonal and the eigenvalues in (15.4.17) are λ 0 = p and λ1 = 1 − p. For this case, the density matrix reduces to a diagonal matrix because the measurement states can be aligned one-to-one with the signal states. Therefore, there is

15.4 Statistical Quantum Optics

783

no quantum uncertainty, and the statistical uncertainty is characterized by the probability p. The outer product |ψ± ±²ψ± | in (15.4.5) is then a projection operator ± P± for ± = 0, 1. Summing the two projection matrices, each with a single diagonal element p(±), expression (15.4.5) is reduced to

±ρ = p ±P0 + (1 − p) P±1 Ë Ì Ë = p 10 00 + (1 − p)

0 0

0 1

Ì Ë =

p 0

0

1− p

Ì

,

(15.4.19)

with the set of diagonal elements corresponding to a classical probability mass function p(±). This mixed signal state exhibits no quantum coherence effects or quantum uncertainty. It is called a noncoherent signal state and can be analyzed classically. In summary, a density matrix represents a combination of statistical uncertainty and quantum uncertainty. The statistical uncertainty is quantified by the value of the trace of ± ρ 2 , which is invariant with respect to the choice of basis. Quantum uncertainty is present whenever the density matrix is not diagonal in the measurement basis. This form of uncertainty is always present for a set of nonorthogonal signal states and may be present for a pure signal state when the pure signal state is not one of the eigenstates of the measurement basis.

15.4.3

Decoherence

The off-diagonal elements of the density matrix in (15.4.16) describe the pairwise quantum coherence of the set of signal states in the basis used to define that density matrix. Interaction with a set of external states can reduce or eliminate quantum coherence. An interaction of this form is called decoherence. Consider a density matrix ± ρφ for a binary modulation format at the channel output defined by a statistical ensemble of pure signal states given by

±ρφ =

È

p eiφ

e−iφ 1− p

É

,

(15.4.20)

where φ is a random phase that is uniformly distributed over [0, 2π) with a probability density function f φ (φ) = 1/ 2π . This density matrix models the interaction of the input signal state with an unknown set of external states. The interaction can be further modeled as randomizing the quantum phase φ in the signal space in which ρ±φ is defined. Using the continuous equivalent of (15.4.13), the expected density matrix ± ρ at the channel output is

±ρ =

¾ 2π 0

ρφ dφ f φ (φ)±

=

Ë

p 0

0

1− p

Ì

(15.4.21)

because the density matrix elements that contain the random quantum phase φ average to zero. The averaging over the random quantum phase leads to a noncoherent signal state with (15.4.21) equal to (15.4.19). This averaging does not affect the probability of an outcome of a measurement. Therefore, the mean density matrix of a statistical ensemble

784

15 The Quantum-Optics Model

of pure-state symbols, each with a random quantum phase, reduces to a noncoherent signal state, which can be analyzed classically. This form of quantum phase-averaging is a simple model of the effect of decoherence on a quantum-lightwave signal state. Decoherence is caused by an interaction between a signal state and one or more external states. The strength of the decoherence depends on the spacing of the energy levels of the signal states compared with the energy levels of an external set of states that may interact with the signal state. A description of the channel output signal state as a statistical ensemble of pure signal states with a random quantum phase is often appropriate for low-frequency systems for which the energy of a photon ±ω is much smaller than the mean thermal energy kT0 (cf. Section 6.1.1). For this case, there are many external states that can interact with the signal state, thereby producing a mixed signal state that can be treated as an ensemble of pure signal states with a maximum-entropy, uniformly distributed quantum phase. The resulting noncoherent mixed signal state given in (15.4.21) can be analyzed classically. The difficulty of isolating a low-frequency system from the external environment is one reason why quantum coherence is not observed in lower-frequency systems. For high-frequency lightwave systems, the energy of a photon ±ω is much larger than the mean thermal energy kT0 at a normal temperature. This large energy difference means that the number of external states that can interact with a quantum-lightwave signal state may be limited. In this regime, the quantum coherence of a signal state can often be observed, maintained, and controlled. Moreover, the statistical properties of this interaction are not necessarily described by a gaussian random process because there are not enough interaction events for the assertion of the central limit theorem. 15.4.4

Quantum Entropy

A density matrix is not a probability vector and does not have a Shannon entropy as such. However, the matrix diagonal does have the form of a probability vector and the Shannon entropy of the diagonal is formally defined. The relevance of the Shannon entropy of a density matrix is not asserted at this time. Instead, an alternative notion of the entropy, called the von Neumann entropy, is relevant to the notion of a density matrix. The composite statistical and quantum uncertainty of a quantum-lightwave signal state described by a density matrix ± ρ is characterized by the von Neumann entropy S (ρ±), which is defined by

S (± ρ ) =. −trace(±ρ log ρ±).

(15.4.22)

The von Neumann entropy corresponds to the classical Shannon entropy H with the probability distribution function p(s) replaced by the density matrix ± ρ and the summation replaced by the trace. The logarithm of the density matrix can be expressed as log ± ρ = A(log ρ±³ ) A−1,

where ± ρ ³ = A−1 ±ρ A is diagonal and log ±ρ ³ is the element-by-element logarithm of the diagonal elements.

15.4 Statistical Quantum Optics

785

The difference between the von Neumann entropy and the classical Shannon entropy is that the von Neumann entropy includes the effect of quantum uncertainty. The classical Shannon entropy does not. When there is only statistical uncertainty, the density matrix is a diagonal matrix, and the von Neumann entropy is the same as the Shannon entropy. The density matrix ρ± can be expressed in terms of its eigenvalues λi and projection ∑ λ ±P . The logarithm of this quantity is operators ± Pi , using (2.1.95) to give ± ρ=∑ i i i Pi . Substituting these expressions determined using (2.1.96) and is log ρ± = i log λi ± into (15.4.22) gives the von Neumann entropy as

S (± ρ ) = −trace

=− =−

³

i ³ i

ͳ i

λi log λi P±i2

Î

λi log λi trace P±i2 λi log λi ,

(15.4.23)

±i = 1 for a projecwhere in going from the second to third line trace ± Pi2 = trace P . tion operator. By convention λi log λi = 0 for λi = 0 so that S (± ρ ) is well-behaved. Because the eigenvalues λi of any density matrix are nonnegative with 0 ≤ λi ≤ 1 for all i , the von Neumann entropy is nonnegative. The diagonal elements λi depend both on the statistical uncertainty and on the quantum uncertainty, so the von Neumann entropy depends on both. As an example, the von Neumann entropy of the density matrix given in (15.4.16) with eigenvalues given in (15.4.17) is S (ρ±) = −λ0( p, κ) log λ 0( p, κ) − λ 1( p, κ) log λ 1( p, κ),

(15.4.24)

where the dependence of λi on p is due to statistical uncertainty and the dependence of λi on κ is due to quantum uncertainty expressed as the inner product between the two signal states. When the inner product κ equals zero, the states are orthogonal, the eigenvalues are the statistical probabilities (cf. (15.4.19)), and the von Neumann entropy reduces to the classical Shannon entropy H (s) (cf. (14.1.1)), where p(s) is composed of the diagonal elements of ± ρ. When κ equals one and the signal state is a pure signal state, there is only a single eigenvalue, so that S (ρ±) = 0. This simply states that there is no statistical uncertainty for a pure signal state. When κ is between zero and one, the two signal states are nonorthogonal and the von Neumann entropy depends on both. This basis-dependent entropy mirrors other basis-dependent properties of quantumlightwave signal states. For this case, the diagonal elements of the matrix are equal to the eigenvalues of the matrix, because the matrix is diagonal in the measurement basis. Because the von Neumann entropy incorporates both statistical uncertainty and quantum uncertainty, it is upper-bounded by the Shannon entropy of the diagonal with21

S (ρ±) ≤ H (s), 21 This inequality is a consequence of the Schur–Horn theorem.

(15.4.25)

786

15 The Quantum-Optics Model

with equality achieved for a set of orthogonal signal states that have no quantum uncertainty when measured in a basis composed of those signal states. Classical information is represented using statistical uncertainty imposed as a prior by a encoder at the channel input. Additional quantum uncertainty is not controlled once a measurement basis is given and does not increase the ability of the channel to convey classical information. Instead, the presence of quantum uncertainty may make it more difficult to discriminate the information conveyed using statistical uncertainty. This is why the relationship between the von Neumann entropy and the Shannon entropy is expressed as an inequality. For a set of pairwise-nonorthogonal signal states, (15.4.25) is a strict inequality, because the density matrix is not diagonal in the measurement basis. This inequality may be viewed in two ways. One view is that the nonorthogonality expresses intrinsic dependences between the signal states, which limits the amount of statistical randomness in an ensemble of nonorthogonal signal states compared with a set of orthogonal signal states with no dependences. A complementary viewpoint is that the nonorthogonality leads to quantum uncertainty, which makes it more difficult to discriminate between signal states.

Quantum Entropy of a Product State

The von Neumann entropy of a product state S (± ρ1 ⊗ ±ρ1 ) is additive, being given by

S (± ρ1 ⊗ ρ±2) = S (±ρ1) + S (ρ±2).

(15.4.26)

This additivity property can be derived starting with (2.1.100), which states that the eigenvalues ζk of the outer product ± ρ1 ⊗ ±ρ2 are the pairwise products of the eigenvalues λi of the component signal state described by the density matrix ±ρ1 and the eigenvalues µ j for the component signal state described by the density matrix ±ρ2 . Therefore, using (15.4.23), we can write

S (± ρ1 ⊗ ρ±2) = −

=−

³

³ k

ζk log ζk

λi µ j log(λi µ j ) i, j ³ ³ ³ ³ = − λi µ j log µ j − µ j λi log λi i

j

j

i

= S (±ρ1) + S (ρ±2), (15.4.27) ∑ ∑ where (15.4.23) has been used, and i λi = ρ = 1 for any j µ j = 1 because trace ±

density matrix. This property states that quantum product states are additive in the von Neumann entropy in the same way that a product distribution is additive in the Shannon entropy. 15.4.5

Measurements on Density Matrices

The measurement of a signal state described by a density matrix differs from a measurement of a pure signal state, which is one with no statistical uncertainty. For a pure

15.4 Statistical Quantum Optics

787

signal state |ψ±, the measurement projects the signal state at the channel output onto an orthogonal eigenstate |r ± of an observable using the projection operator ± Pr = |r ±²r | for that eigenstate. This measurement produces an outcome r with a probability p(r ). Because the density matrix ± ρ is a composite of basis-dependent quantum uncertainty originating in the measurement process and the statistical uncertainty about the signal state, a more general expression for a measurement is required. To proceed, let { On } be a set of outcomes of a measurement indexed by n, and let p(n) be the probability that the outcome On is measured. When the signal state is described by a density matrix ± ρ, the probability p(n ) of outcome On is given by

.

p(n) = trace

(±ρ Y± ) . n

(15.4.28)

The observable operator ± Yn in this expression is called a generalized measurement operator. Its properties are the topic of this section. To understand how this definition of the probability of a measurement outcome differs from the definition given in (15.2.5a), suppose that the set {± Y n } is a set of projection ±n } = {|ηn ±²ηn |}, and that the state to be measured is a pure signal state with operators { P ±ρ = |ψ±²ψ|. For this case, (15.4.28) reduces to (15.2.5b), with the probability p(n) for the outcome On given by

(±ρ ±Y ) n = trace(|ψ±²ψ|ηn ±²ηn |) = |²ψ|ηn ±|2 ,

p(n ) = trace

(15.4.29)

using the fact that the trace converts an outer product to an inner product. This expression shows the outcome of a generalized measurement using (15.4.28) reduces to (15.2.5a) when the signal state is a pure signal state. The generalized measurement operator given by (15.4.28) is satisfied by other measurement operators that are not projection operators. The requirement is that the set of measurement operators {± Yn } gives a valid probability distribution p(n ). One require∑ ment is that n p(n ) = 1. Applying this condition on the right side of (15.4.28) gives

³

(±ρ ±Y ) = 1 n n Í ³ Î trace ± ρ ±Yn = 1, trace

(15.4.30)

n

where the linearity of the trace operation has been used. Because the measurement must produce a definite outcome, we also require that

³± n

where ± I is the identity operator.

Yn

= ±I ,

(15.4.31a)

788

15 The Quantum-Optics Model

Table 15.2 Relationship between quantities used in classical detection and quantum-optics detection

Classical

Quantum

Description

Variable

Description

Signal Average signal Output probability distribution Region defining outcome Sum of the regions n

s

∑ ps i i i f (s ± )

Pure state Average state Output density matrix for symbol state ± Operator defining outcome Sum of the operators

R

Probability of outcome

R∑n

R

n n (entire region) Ï f (r |s± )dr n

R

Probability of outcome

Variable

|ψ±²ψ| ∑ p |ψ ±²ψ | i i i i ρ±± Y±n ∑ Y± = ±I n n (entire( space) ) trace ± ρ± Y±n

The second requirement is that p(n) is nonnegative. This requirement means that ± Yn must be a nonnegative definite transformation so that Y±n

≥0

for all n.

(15.4.31b)

A set {± Yn } of measurement operators that satisfies (15.4.31) is called a positive-operatorvalued measure. This kind of generalized measurement is not restricted to use only projective measurements. It can lead to a simplified analysis when only the probabilistic outcomes of a measurement are required. The analysis of a communication system also involves conditional probabilities. The conditional probability p(n |±) of an outcome On given that signal state |ψ± ± described by density matrix ρ±± is transmitted is

.

(

p(n |±) = trace Y±n ± ρ±

)

( ) = trace ±ρ± ±Yn ,

(15.4.32)

noting that the trace operation is invariant with respect to the ordering of the operators. Classical detection (cf. Chapter 9) provides a useful context for the form of this expression. For classical detection, the conditional probability p(n |±) that Ï the symbol rn is detected given that the symbol s± was transmitted is p(n|±) = Rn f (r |s± )dr, where f (r |s± ) is the conditional probability distribution that, given the symbol s± was transmitted, the value r is detected (cf. Section 10.2.3). The probability p(n ) of a detected Ï classical outcome On is p(n) = Rn f (r )dr, where f (r ) is the probability distribution for the sample r at the channel output, n is the decision region on the complex plane ∑ that defines the nth detected outcome On , and the sum of the integration regions n n covers the entire complex plane. Table 15.2 compares classical detection with the corresponding generalized quantum-optics detection. Quantum-optics detection is discussed in detail in Chapter 16. Each generalized measurement operator ± Yn is constructed from a set of outer products {|ηk ±²ηk |} generated from a set of measurement states {|ηk ±}. However, the constraints given in (15.4.31) do not require that the set {|ηk ±} of measurement states be orthogonal so that the set {|ηk ±²ηk |} of outer products need not be projection operators. This means

R

R

15.5 Classical Methods for Quantum-Lightwave Signals

789

that, in general, the elements Y±n of the set {Y±n } of measurement operators need not commute and need not share a set of common orthogonal eigenstates. The possible noncommuting nature of the set of measurement operators {Y±n } can be seen as a consequence of defining the generalized measurement in a signal space only according to the states of the incident signal without considering how other states may interact with the signal state during the measurement process. Instead, the analysis can be enlarged by appending additional states, called ancilla states. This means that when an ancilla state is included in the analysis, the signal space used to describe the outcome of a measurement is enlarged. With this enlarged signal space so defined, the set of potentially noncommuting measurement operators {Y±n } of a generalized measurement, defined in that signal space, can be “lifted” into an enlarged signal space by including the ancilla states that interact during the measurement. When a generalized measurement is expressed in a suitable enlarged signal space, an equivalent set of commuting, orthogonal projection operators of an observable measurement can always be defined.22 This principle will be demonstrated using heterodyne demodulation in Section 15.5.3. It is also discussed in an end-of-chapter problem.

15.5

Classical Methods for Quantum-Lightwave Signals Classical homodyne and heterodyne demodulation were described in earlier chapters using a heuristic combination of wave optics and photon optics. Now these demodulators will be reformulated within a quantum-lightwave signal model using coherent states so as to formally justify the earlier treatment within the formalism of quantum optics. The conclusions will be the same as in the earlier chapters, but now the justification and explanations will be deeper. This is the subject of this section.

15.5.1

Lightwave Couplers for Coherent States

Phase-synchronous demodulation requires a lightwave coupler to combine the local oscillator signal and the lightwave signal within a single spatial mode before the photodetector. Couplers were analyzed using wave optics in Section 7.1.1. The operation of a lightwave coupler in terms of quantum optics and coherent states is discussed in this section. Suppose that two coherent states are the inputs to a symmetric directional coupler.23 The coupler is described by a coupling matrix T (cf. (7.1.1)). The input coherent states have an associated set of coherent-state operators ± ai and ± a j . The coupler transforms these operators, generating a new set of operators at the output of the coupler that describe the output coherent states. Classically, the coupling matrix T generates a linear combination of the input signals at each of the coupler outputs. Within quantum optics, each output state is also a linear 22 This extension using ancilla states is called a Nuemark extension. (See Nuemark (1943).) 23 The word “coupler” is formally equivalent to the word “beamsplitter”.

790

15 The Quantum-Optics Model

combination of the two input component signal states. The input constituent signal states are coherent states defined by the coherent-state operators ± ai and ± a j . Therefore, the input composite signal state is a product state. The output constituent signal states are also described by the coherent-state operators ± a1 and ± a2 with a corresponding output composite product state. The coupling matrix T transforms the coherent-state operator for each input state into the corresponding operator for each output state in the same way that the transformation affects a classical complex signal. Therefore24

Ë ±a Ì Ë ±a Ì 1 i ±a2 = T ±a j .

For a 180-degree hybrid coupler with operators are

T

given by (7.1.4), the output coherent state

√ ±a1 = (1/ 2)(±ai + ±a j ),

15.5.2

(15.5.1)

√ ±a2 = (1/ 2)(±ai − ±a j ).

(15.5.2)

Homodyne Demodulation to Real Baseband

Consider the measurement operator described by a balanced homodyne demodulator shown in Figure 15.8 for which the carrier frequency f c and the polarization are the same as the frequency f LO and the polarization of the local oscillator. The homodyne demodulation to real baseband is phase sensitive. This means that the local oscillator mixes only with the corresponding phase components of the incident lightwave signal. The demodulator shown in Figure 15.8 is equivalent to the demodulator shown in Figure 8.10 for a shot-noise-limited semiclassical signal. The incident coherent signal state |α ± and the local oscillator coherent state | ALO ± are at the same frequency. Each direct-photodetection operation is implemented by ideal photon counting described by a ± =. ±a †±a (cf. (15.3.9b)). When summed over an interval photon-number state operator N of duration T , this gives the sample statistic that is used for detection. The two coherent-state operators at the input to the coupler shown in Figure 15.8 are ±LO . Using the input signal-state operator ± a and the local oscillator signal-state operator A (15.5.2), the coherent-state operators ± a1 and ± a2 at the output of the coupler are

√ ±a1 = (1/ 2)(±a + ±A ), LO

√ ±a2 = (1/ 2)(±a − ±A ). LO

(15.5.3)

For homodyne demodulation to real baseband using balanced photodetection, a single quadrature component is measured. The output state after demodulation is described by a ALO

180-degree hybrid coupler

Photon-counting receiver

+

Photon-counting receiver



Ahomo

Figure 15.8 Homodyne demodulation of a coherent state to real baseband. 24 See Mandel and Wolf (1995) Section 12.12, and Prasad, Scully, and Martienssen (1987) for a formal

description of couplers within quantum optics.

15.5 Classical Methods for Quantum-Lightwave Signals

791

a measurement operator ± Ahomo that is the difference between the two signals measured separately by photon counting at the output of the balanced photodetector as shown in Figure 15.8. This operator can be written as

±Ahomo = ±N1 − N±2 = ±a1†±a1 − ±a†2±a2 µ ¶( ¶( ) µ ) = 12 ±a† + ±A† ±a + ±A − 21 ±a † − ±A† ±a − ±A µ ¶ = 21 ±a†±a + ±A† ±a + ±a † ±A + ±A† ±A µ ¶ − 21 ±a †±a − A±† ±a − ±a† A± + A±† ±A = ±A† ±a + ±a † ±A . (15.5.4) ±a = The use of balanced photodetection cancels out the photon-number operator terms N † † ±a†±a and N± = A± A± in (15.5.4), leaving only the two mixing terms A± ±a and ±a† A± . LO

LO

LO

LO

LO

LO

LO

LO

LO

LO

A

LO

LO

LO

LO

LO

LO

LO

LO

For this reason, the properties of the shot noise in the demodulated electrical signal using the operator ± Ahomo are different than the properties of shot noise when the photon± number operator Na = ± a†± a is used. This is the quantum-optics origin of the difference between “counting” shot noise and “mixing” shot noise (cf. Section 10.2.5). When the magnitude of the local oscillator is large, the coherent state operator ± A LO may be replaced by a scalar A LO , which is taken to be real so that ± A†LO ≈ A∗LO = A LO . Substituting these expressions into (15.5.4) and using ± a +± a† = 2± aI (cf. (15.3.20)) gives

±Ahomo = 2A ±a . LO

(15.5.5)

I

This expression shows that when the local oscillator signal is large, the measurement operator ± A homo for balanced homodyne demodulation to real baseband is proportional to the in-phase signal component operator ± aI with a mixing gain given by 2A LO . The expectation of the product state expressed by the operator in (15.5.4) corresponds to the expectation of a classical product distribution. The classical expectation is determined by taking the expectation over each marginal probability distribution that comprises the classical product distribution. Similarly, the expectation of a quantumoptics product state is determined by taking the expectation over each component signal state that comprises the quantum-optics product state. Using (15.5.4), this expectation is

Ä = ²α |²A | µA±† ±a + ±a† ±A ¶ |A ±|α± homo = ²α |±a|α±²A | ±A† |A ± + ²α|±a†|α±²A | A± |A ± = α A∗ + α ∗ A = 2|α||A |cos φ, (15.5.6) where φ = φs − φ is the phase difference between the two complex Glauber numbers à ±A

LO

LO

LO

LO

LO

LO

LO

LO

LO

LO

LO

LO

LO

LO

that specify the two component coherent states. The expectation in (15.5.6) has the same form as (15.3.22a) and is equal to the output of a shot-noise-limited homodyne demodulator given in (8.2.4c) followed by a sampled matched filter. This equivalence is obtained by using fIF = 0, no (t ) = 0,

792

15 The Quantum-Optics Model

√ |α | = |s|/ 2, |A | =



ALO / 2, and φ = φs under the condition that the Glauber number ALO for the local oscillator is real. 25 The equivalence of the expectation of the quantum-lightwave homodyne demodulation operator with classical homodyne demodulation given in (8.2.4c) is an illuminating example of the correspondence principle. LO

15.5.3

Joint Demodulation

This section discusses three methods of state detection described by the joint demodulation of the in-phase and quadrature signal components within quantum optics. It will be shown that, for each of these three methods of joint demodulation, the signal state interacts with at least one ancilla state that has its own fundamental quantum noise – even when that ancilla state is unoccupied. Accounting for this ancilla state produces a set of commuting measurement operators for the in-phase signal component and the quadrature signal component at the expense of additional quantum noise that is equivalent to half a photon. The uncertainty in each component is larger than the uncertainty of one signal component demodulated using homodyne demodulation to real baseband because no ancilla state couples into that form of demodulator. Within quantum optics, the joint measurement of the in-phase signal component and the quadrature signal component is a generalized measurement because the operators ± aI and ± aQ do not commute (cf. (15.3.7)). To derive an equivalent measurement in terms of a set of operators that do commute, any ancilla states that couple with the signal state when the state is measured must be considered. This lifts the analysis into an enlarged signal space for which commuting observables can be defined. Including one or more ancilla states, the measured signal state is expressed as a product state in a signal space constructed as the outer product of the signal space for the incident signal state and the signal space for the ancilla state. This holds even when the ancilla state is a vacuum state corresponding to an unoccupied mode. This extension reconciles a generalized measurement based on a set of noncommuting operators defined in the signal space of the incident signal with a set of observable measurements based on a set of commuting operators defined in an enlarged signal space constructed using ancilla states.

Heterodyne Demodulation to Passband

The first of the three methods of joint demodulation is the heterodyne demodulation of a lightwave signal to a passband signal. A heterodyne demodulator is shown in Figure 15.9. Heterodyne demodulation differs from homodyne demodulation in that the frequency of the local oscillator is not equal to the carrier frequency. This means that the two coupler inputs are in different temporal modes. When these two modes are coupled, the terms on the last line of (15.5.4) are no longer at zero frequency. To incorporate the time dependence of the demodulated signal, a Heisenberg representation is used so that the time dependence is carried with the operators. Explicitly showing this dependence, the two terms on the last line of (15.5.4) are modified to read 25 The different scaling factors for the amplitudes are a consequence of defining the Glauber number using a

root-mean-squared value and defining the classical complex signal using a peak value (cf. (8.1.1)).

15.5 Classical Methods for Quantum-Lightwave Signals

ae i2

f ct

180-degree hybrid coupler

π

ALOe

i2πf LOt

Photon-counting receiver

+

Photon-counting receiver



793

A hetero

Figure 15.9 Heterodyne demodulation of a coherent state.

±A† e−i2π f t ±ae i2π f t = ±A† ±aei2π f t , ±a† e−i2π f t A± ei2π f t = ±a† A± e−i2π f t , LO

LO

c

c

LO

LO

(15.5.7a)

IF

LO

(15.5.7b)

IF

LO

where f IF = fc − fLO is the intermediate frequency (cf. Section 8.2.1). Summing these two terms while ignoring any ancilla modes that may affect the measurement, the generalized operator for heterodyne demodulation ± Ahet is

±Ahet = A±† ±aei2π f t + ±a† ±A e−i2π f t (15.5.8a) ¶ ¶ µ µ † = ±A ±a + ±a† A± cos (2π f t ) + i ±A† ±a − ±a† ±A sin(2π f t ), (15.5.8b) where Euler’s identity (eix = cos x + i sin x ) has been used. The heterodyne demodLO

IF

IF

LO

LO

LO

IF

LO

LO

IF

ulation operator defined in (15.5.8) reduces to the homodyne demodulation operator defined in (15.5.4) for f IF = 0. Applying the results of (15.5.6) to (15.5.8a), the expectation is

à ±A Ä = α A∗ ei2π f t + α ∗ A e−i2π f t het = 2|α ||A |cos(2π f t + φ) (signal only) , (15.5.9) where φ = φs − φ is the phase difference between the carrier and the local oscillaIF

LO

LO

LO

IF

IF

LO

tor. The expectation given in (15.5.9) reduces to the expectation given in (15.5.6) when fIF = 0. Similarly, the expectation is equal to the output of a shot-noise-limited heterodyne demodulator given in (8.2.4c) using the same scaling factors as used for homodyne demodulation. The heterodyne measurement operator defined in (15.5.8) may be viewed as equivalent to two real-valued measurement operators in phase quadrature. These are the in-phase component operator ± A I , which has a cosine time dependence, and the quadra±Q, which has a sine time dependence. Suppressing the time ture component operator A dependence, these operators are given by

±A = ±A† ±a + ±a † ±A , ± = A±† ±a − ±a † ±A . A I

LO

LO

(15.5.10a)

Q

LO

LO

(15.5.10b)

In this form, it is seen that (15.5.10a) is equal to the homodyne measurement operator ±Ahomo for a single component (cf. (15.5.4)). ±Q, gives a real-valued measurement outcome, but these operEach operator, ± A I or A ators do not commute and do not share a common set of eigenstates. Accordingly, the in-phase signal component and the quadrature signal component cannot be measured simultaneously without increasing the uncertainty in at least one of the two signal components. The proof of this statement is asked for as an exercise at the end of the chapter.

794

15 The Quantum-Optics Model

–f LO

fLO

–f c

fc

Image mode

Image mode

2fIF

fIF

–fIF

Shifted negative frequencies

Frequency Shifted positive frequencies

Sum

Signal-plus-imagemode quantum noise Figure 15.10 Heterodyne demodulation mixes the vacuum-state fluctuations of an ancilla state of

an unoccupied image mode with the signal state. This produces a term at the intermediate frequency that contains fluctuations from both modes.

Image Modes Heterodyne demodulation mixes the local oscillator state with an ancilla state thereby producing additional fluctuations in the demodulated signal state. This ancilla state is the classical image mode as shown in Figure 15.10. This section shows that accounting for the quantum noise in the image mode, heterodyne demodulation can be described using a set of two commuting operators defined in an enlarged signal space, which is larger than the signal space used to define the generalized measurement described by (15.5.8). This enlarged signal space is constructed from the outer product of the signal space S for the signal state and the signal space N for an unoccupied image mode that is a vacuum state. This image mode is shown pictorially in Figure 15.10. The vacuum-state fluctuations in the image mode are described by a zero-mean vacuum-state operator ± aN . Including the vacuum-state fluctuations from the image mode centered at ´ f c ∓ 2 f IF , the operator ± Ahet given in (15.5.8a) is modified to read

µ ¶ µ ¶ ±A + = ±A† ±a + ±a † ±A ei2π f t + ±a† A± + ±A† ±a e−i2π f t (15.5.11) (signal mode plus image mode), ± + is the modified heterodyne operator defined in the enlarged signal space that where A includes the image mode. When ± a = 0 and f = 0, (15.5.11) reduces to the homodyne S N

LO

N

IF

LO

LO

LO N

IF

S N

N

IF

demodulation operator given in (15.5.4). For comparison, when homodyne demodulation to a real-baseband signal is used, the image mode and the signal mode are coincident, with both modes defined in the same signal space with the same vacuum-state fluctuations. This is made evident by setting

15.5 Classical Methods for Quantum-Lightwave Signals

795

fIF to zero in Figure 15.10 so that the spectra for positive and negative frequencies overlap near zero frequency. There is only a single mode that contributes vacuum-state fluctuations to the demodulated signal. These fluctuations are equivalent to half of a photon. Intuitively, the image mode adds its own uncorrelated quantum noise with a mean of half of a photon to the quantum noise in the signal mode. Together, this produces a mean quantum-noise level that is equivalent to one photon per mode instead of half of a photon per mode generated for the homodyne demodulation of one signal component as discussed in Section 15.5.2. This is the quantum origin of the additional noise in the semiclassical expression for E b / N 0 for shot-noise-limited heterodyne demodulation given in (8.2.15) and validates that expression. ±LO Under the condition that the magnitude of the local oscillator is large, the operator A † ± ± may be replaced by a real number ALO (cf. (15.5.5)). Then A LO = ALO = A LO. Making this substitution and following the same steps that were used to derive (15.5.8) and (15.5.10), the in-phase component operator ± A I and the quadrature component operator ±AQ defined in (15.5.10) for the signal-plus-image-mode noise are

±A = 2 A ±A = 2 A I

LO

Q

LO

(±a + ±a ) , µ ¶ ±a − ±a , I

(15.5.12a)

NI

Q

(15.5.12b)

NQ

where (15.3.20) has been used both for the signal mode and for the image mode. Expanding the commutator as ± AI ± AQ − ± AQ ± A I leads to

¼ · ¼ » µ· ¸¶ ·± ± ¸ ¸ » a ,± − ± a ,± a − ± a ,± a a A , A = 2A ± a ,± a − ± ¿ ÀÁ  ¿ ÀÁ  ¿ ÀÁ  ¿ ÀÁ  = 0, i/ 2 0 I

Q

LO

I

Q

NI

NQ

i/2

I

NQ

Q

NI

(15.5.13)

0

where (15.3.7) has been used. The last two terms are equal to zero because the components of the signal and the image-mode noise are independent, so the operators describing these modes commute. This shows that ± A I and ± A Q commute when the vacuum-state fluctuations from the image mode are included in the analysis. The analysis of heterodyne demodulation shows that a generalized measurement defined using a set of noncommuting measurement operators can always be lifted into an enlarged signal space by appending one or more ancilla states. These ancilla states for quantum-lightwave communication systems are often vacuum states. The effect of these ancilla states can be regarded as accounting for additional uncertainty in the measurement when that measurement is expressed in an enlarged signal space. For heterodyne demodulation, the single ancilla state is an image mode.

Homodyne Demodulation to Complex Baseband

The second of the three methods of joint demodulation is the simultaneous homodyne demodulation of both the in-phase and the quadrature signal components to a complexbaseband signal. A semiclassical functional description of this method of demodulation is shown in Figure 8.11. A quantum-optics functional description of this method of demodulation is shown in Figure 15.11. In comparison with heterodyne demodulation, homodyne demodulation to complex baseband requires an additional signal path and an

796

15 The Quantum-Optics Model

a Open port A LO Open port

90-degree hybrid coupler

N N N N

+

In-phase – component + Quadrature – component

Figure 15.11 Homodyne demodulation to complex-baseband using a 90-degree hybrid coupler.

⏐α 〉

Coherent state

High-gain phase-insensitive amplifier

s Classical signal

Classical joint demodulation

In-phase component Quadrature component

Figure 15.12 Joint demodulation using a high-gain phase-insensitive lightwave amplifier followed by classical joint demodulation.

additional coupler (cf. Figure 7.7). The signal state is one input to the coupler. A vacuum state is the other input to the coupler. The output of the composite product state is defined by the two input constituent signal states. This is equivalent to the product state defined for heterodyne demodulation using a vacuum-state image mode. Here the vacuum state enters the spatial mode of the coupler rather than the temporal mode of the intermediate frequency f IF . Therefore, homodyne demodulation to complex baseband has similar properties to heterodyne demodulation, with a pair of commuting operators describing the two noisy demodulated baseband components. This statement parallels similar statements given in Section 10.2.5 for shot-noise-limited heterodyne demodulation and shot-noise-limited homodyne demodulation to complex baseband.

Large-Signal Classical Joint Demodulation

The last of the three methods of joint demodulation uses a classical method of joint demodulation on a large-amplitude signal generated from a high-gain phase-insensitive lightwave amplifier. A block diagram of this demodulator is shown in Figure 15.12. For this case, the product state describing the output of the high-gain phase-insensitive lightwave amplifier is defined using the signal space of the incident coherent-state signal and the signal space of an uncorrelated gain site in the amplifying medium. The gain sites of a phase-insensitive amplifier have their own fundamental quantum fluctuations. These fluctuations add to the fluctuations in the incident signal state producing an equivalent of one photon of quantum noise at the input to the lightwave amplifier (cf. Section 7.7). The noise level is the same level as that generated using the other two methods of joint demodulation. The high gain of the lightwave amplifier means that the coherent state at the output of the amplifier is specified by a large Glauber number. Therefore, the output signal state can be analyzed classically, with the classical signal point s replacing the Glauber number α. The quantum noise of one photon per mode is then augmented to this classical analysis as was done in Sections 8.2.1 and 10.2.5, leading to a demodulator that has the same performance as the two other joint demodulators studied in this section.

15.6 Quantum-Lightwave Signal Distributions

15.6

797

Quantum-Lightwave Signal Distributions A density matrix ± ρ provides a complete description of a quantum lightwave signal state. The importance of the density matrix motivates equivalent representations that will be described in this section. These alternative representations reveal other properties of a density matrix that are similar, but not equivalent, to the properties of a classical probability distribution. These alternative representations, called quasi-probability distributions, are discussed in this section. A quasi-probability distribution is derived from a density matrix, which itself is a composite of quantum uncertainty and statistical uncertainty. A quasi-probability distribution is not a true probability distribution, which characterizes only statistical uncertainty. A classical probability distribution cannot describe a coherent state because of the fundamental dependence between the value α I measured using the in-phase operator ± aI and the value αQ measured using the quadrature operator ± aQ . This fundamental dependence is described by the commutation relationship given in (15.3.7). This dependence does not exist for a classical signal model, for which the in-phase and quadrature signal components are treated independently (cf. (6.2.8)). The next three subsections describe three quasi-probability distributions used to represent the joint statistical properties of αI and α Q . Each of these distributions is equivalent to the density matrix, but expressed in forms that resemble probability distributions. Each of these distributions is useful in a different context.

15.6.1

The

P

Quasi-probability Distribution

The first quasi-probability distribution is useful when the signal states of the ensemble are coherent states. This distribution expresses the density matrix ρ± in a coherent-state representation using the closure relationship given in (15.3.37). For this coherent-state representation, the density matrix is expressed as a weighted superposition of the outer products | α±²α | of the coherent states

¾ ρ± = P (α)|α ±²α|dα, α

(15.6.1)

where P (α) is a continuous function and the integration is over the Glauber numbers α in the complex plane. The term P ( α) is called the P quasi-probability distribution or simply the P distribution. Were the set of outer products {|α ±²α |} in (15.6.1) a set of orthogonal projection operators, then on comparing (15.6.1) with (15.4.5) the P distribution would be a probability distribution. Because the set {|α ±} of coherent states does not have this property, the P distribution is a quasi-probability distribution because it does not have all the properties required of a probability distribution.26 When the density matrix ± ρ describes a statistical ensemble of coherent states, the corresponding P distribution P (α) has the same form as a classical probability distribution 26 See Klauder and Sudarshan (1968) Section 8.4 and Chapter 9.

798

15 The Quantum-Optics Model

p(s) for the complex signal point s. For this restricted case, P (α) has the same functional form as p(s). With this correspondence in mind, consider the classical probability density function f (s) for the measured value s for a signal with a constant amplitude A. This is a Dirac impulse given by f (s) = δ(s − A ). Using the correspondence between P (α) and f (s), the P distribution in this case is

P(α ) = δ(α − A).

(15.6.2)

For a joint gaussian probability density function with mean A expressed in terms of the mean number of additive-noise photons N 0, the P distribution is 1 2 P (α) = π N e−| −A| /N , α

0

0

(15.6.3)

where the real and imaginary parts of α are dependent. The additive gaussian noise can be either spontaneous emission noise so that N0 = N amp (cf. (7.7.5)) or thermal noise so that N0 = kT0/± ω. Substituting (15.6.3) into (15.6.1), the density matrix for a coherent state in additive gaussian noise expressed using a P distribution is

±ρ = π1N

¾

0

α

2 e−|α−A | / N0 |α ±²α|dα .

(15.6.4)

The corresponding probability mass function p(m) for the number of measured photons using photon counting is the mth diagonal element ρmm of the density matrix expressed in a photon-number representation. This probability is p(m) = ρ mm = ²m|± ρ |m±

(Ï ) = ²m| C P (α)|α ±²α|dα |m± ¾ = P (α )|²m|α±|2 dα ( |α|2)m ¾ −| | d α, = P (α )e m! α

α

2

(15.6.5)

α

where (15.3.36) has been used, and the integration is over the complex α plane.

The Poisson Transform within Quantum Optics

The Poisson transform was introduced empirically in Chapter 6 to bind the statistical uncertainty of wave optics to the quantum uncertainty of photon optics expressed as photon noise. The Poisson transform can be validated by a formal derivation from (15.6.5), provided that P (α) is circularly symmetric so that P (α) = (1/π) P (| α|). Integrating out the uniform phase, the probability p(m) given in (15.6.5) can be determined using only the marginal distribution P (|α|). This distribution can be expressed in terms of the mean number of photons E = |α |2 (cf. (15.3.24)) and the corresponding probability density function f (E). To do so, write |α |2 = E and P (| α|)d|α | = f (E)dE. Substituting these expressions into (15.6.5) gives p ( m) =

¾∞ 0

f (E)e−E

Em m

! dE.

(15.6.6)

15.6 Quantum-Lightwave Signal Distributions

799

This expression is the Poisson transform of the continuous probability density function f (E) for the average number of photons (cf. (6.3.2)). For ideal photodetection, (15.6.6) relates the average number of photons E over an interval T within the wave-optics signal model to the probability mass function p(m) for the number of photons within the photon-optics signal model. The Poisson transform is appropriate when the wave-optics signal probability density function has a maximum-entropy uniform phase so that it is circularly symmetric. A communication channel that satisfies this condition is a phase-insensitive channel. For this case, the Poisson transform given in (15.6.6), which incorporates both statistical uncertainty and quantum uncertainty, replaces the more general expression given in (15.6.5). When P(α ) is not circularly symmetric, p(m) must be determined using the general form given in (15.6.5). In this case, a photon-optics description of the signal state based on the Poisson transform is not appropriate because the phase is not uniform.

15.6.2

The Wigner Quasi-probability Distribution

The second quasi-probability distribution is the Wigner quasi-probability distribution. This representation of a density matrix ± ρ shows the statistical properties of a quantumlightwave signal in terms of the homodyne demodulation of one component to a realbaseband signal. . √ a I and ±p =. √2 ±aQ The Wigner distribution is defined using peak values ± q = 2± instead of a root-mean-squared value.27 One form of the Wigner distribution of the quantum wave function u(q ) for a pure signal state is28

.

W (q , p) =

¾

∞ 1 u(q − x /2)u ∗ (q + x /2)ei px dx . 2π −∞

(15.6.7)

The Wigner distribution can also be defined as the two-dimensional Fourier transform of a quantum characteristic function C (ωq , ω p ). This function is a generalization of (2.2.14) and can be written as

Ñ .Ð ¾ ∞¾ ∞ = W (q , p)ei(ω q+ω

q +ω p ± p) C (ωq , ω p ) = ei(ωq ±

q

−∞ −∞

pp

) dq dp.

(15.6.8)

The Wigner distribution has the property that the probability distribution |U ( p)|2 for the variable p is the projection of the Wigner distribution along the p axis. This projection is determined by integrating W (q , p) with respect to q. The same relationship holds for q, so that

|U ( p)|2 = |² p|ψ±|2 = (±, ±)

¾∞

−∞

W (q , p)dq ,

(15.6.9a)

(±, ±)

27 The pair of operators q p is also mathematically equivalent to the pair of operators x p (cf.

(15.3.10)), but has different units. 28 See Leonhardt (2010).

800

15 The Quantum-Optics Model

|u (q )| = |²q |ψ±| = 2

2

¾∞ −∞

W (q, p)d p.

(15.6.9b)

The expression states that the projection of W (q , p) onto either the p axis or the q axis gives the probability density√ for the measurement √ of that variable. Rescaling (15.6.9a) and (15.6.9b) using αQ = p/ 2 and α I = q / 2 yields | U (α Q )|2 and |u(α I )|2 expressed using root-mean-squared values. While the projections of the Wigner distribution onto the α I or α Q axis do lead to valid probability density functions, the Wigner distribution itself, W (α I , α Q ), is not a joint probability density function. In particular, it can take on negative values. For all classical states of light and both classical and nonclassical gaussian states of light, to be defined below, the Wigner distribution is nonnegative, but it is still not a joint probability density as will be discussed below. For other states of light, the Wigner distribution may be negative, which always indicates a nonclassical, nongaussian signal state. To compare the Wigner distribution to the P distribution, consider a vacuum state. The Wigner distribution W (α I , α Q ) for this state is a circularly symmetric gaussian function with variance 1/4 given by W (α I , α Q ) =

2 −2(α2I +α2Q ) e

π

(15.6.10)

This distribution is shown pictorially in Figure 15.13(a). For a nonzero mean, the Wigner distribution is displaced from the origin by the mean value (A I , AQ ) so that W (α I , α Q) =

2 −2((α I −AI ) 2+(α Q−A Q )2) e .

π

(15.6.11)

This is shown pictorially in Figure 15.13(b). The details of the derivation of (15.6.10) and (15.6.11) are asked for in a problem at the end of the chapter. Although the Wigner distribution in (15.6.10) has the mathematical form of a twodimensional circularly symmetric gaussian probability function, α I and αQ are not independent because the two marginal distributions u(α I ) and U (α Q ) are related by a Fourier transform (cf. (15.3.16)). This means that each point in Figure 15.13 describes a possible value of the in-phase component α I , measured using homodyne demodulation, and a separately measured value of the quadrature component α Q , also measured using (a)

(b)

αQ

αI

αQ

αI

α I and a separately measured quadrature component α Q for a minimum-uncertainty vacuum state. (b) A minimum-uncertainty coherent state is generated by displacing the vacuum state.

Figure 15.13 (a) Representation of a separately measured in-phase component

15.6 Quantum-Lightwave Signal Distributions

801

homodyne demodulation. The Wigner distribution incorporates the minimum amount of uncertainty introduced by the homodyne demodulation of one signal component to real baseband. The Wigner distribution W (α I , α Q ) can be generated from the P distribution P (α I , α Q) by using a two-dimensional smoothing function so that29 W (α I , α Q ) = P (α I , α Q) ± ± e−2(αI +αQ ) , 2

where ±± denotes a two-dimensional convolution over both

2

α

I

(15.6.12)

and α Q . The smoothing

2 2 function e−2(α I +αQ ) has a variance equal to 1/4, which corresponds to the minimum

uncertainty for a measurement of a single quadrature component. While the uncertainty shown in Figure 15.13 appears to resemble the separable probability density function of a classical noisy signal (cf. Figure 10.8), the two forms of uncertainty are not at all similar. A “classical” noise cloud describes only statistical uncertainty and, accordingly, can be modeled using independent in-phase and quadrature noise components. The Wigner distribution expresses the fundamental quantum uncertainty, with the wave functions u (α I ) and U (α Q ) for each signal component related by a Fourier transform.

15.6.3

The Husimi Quasi-probability Distribution

The third and final quasi-probability distribution is the Husimi distribution. This representation of a density matrix ± ρ expresses the statistical properties of a quantumlightwave signal measured using a joint demodulation method such as heterodyne demodulation or homodyne demodulation of both signal components to complex baseband (cf. Section 15.5). In contrast to the Wigner distribution, the Husimi distribution includes the additional quantum uncertainty that arises in a joint measurement, which corresponds to half of a photon. The Husimi distribution is defined using the set {|α±} of coherent states. It is given by

. 1 ²α|±ρ|α ± = 1 trace(±ρ |α ±²α|), π π

Q (α I , α Q ) =

(15.6.13)

where α = (α I , α Q ) and ρ± is the density matrix describing the signal state. Integrating both sides of (15.6.13) with respect to the Glauber number α , applying (15.3.37), and using the linearity of the trace operation gives

¾

α

Q (α I , α Q )dα

¹ ¾ º ( ) = trace ρ± π1 |α ±²α|dα = trace ±ρ±I = 1, ¿ ÀÁ Â

(15.6.14)

α

±I

where ± ρ±I =Ï ±ρ and trace ±ρ = 1 have been used. Because α Q(α I , α Q )dα = 1 and is nonnegative, several properties that hold for a classical joint probability density function also hold for the Husimi distribution. 29 See Lai and Haus (1989), and Leonhardt (2010).

802

15 The Quantum-Optics Model

The Husimi distribution approaches a classical joint probability density function f (α ) for large values of α. In this regime, the coherent states can be regarded as orthogonal. . Defining Q (α I , α Q ) = Q (α) the set of outer products {|α±²α |} approaches a set of orthogonal projections. In this large-signal regime, Q (α ) approaches f (s), where s is a classical complex signal (cf. (15.3.1)) with s equivalent to the Glauber number α . For any signal level, the Husimi distribution is the two-dimensional convolution of the Wigner distribution W (α I , α Q ) with a circularly symmetric gaussian smoothing function e−2(α I +αQ ) , so that 2

2

Q (α I , α Q ) = W (α I , α Q ) ±± e−2(α I +αQ ) . 2

2

(15.6.15)

Comparing (15.6.15) with (15.6.12), the Husimi distribution is related to the Wigner distribution in the same way that the Wigner distribution is related to the P distribution. Starting from the P distribution, each of these distributions is generated by using a circu-

larly symmetric smoothing function e −2(α I +αQ ) . The Wigner distribution is a smoothed version of the P distribution that incorporates the fundamental quantum uncertainty of half of a photon per mode as expressed by a minimum-uncertainty state (cf. (15.3.17)). The Husimi distribution is a smoothed version of the Wigner distribution that incorporates an additional half of a photon of quantum uncertainty generated by a joint measurement as a consequence of an ancilla state that is coupled into the demodulation process. The equivalence of the Husimi distribution for a coherent state to a classical product probability density function f (s) in a large-signal regime provides the conceptual bridge between the three kinds of quasi-probability distributions that describe quantumlightwave signal states and a product distribution that describes a classical lightwave. These relationships are shown notionally in Figure 15.14. 2

15.6.4

2

Representations of Classical Signals

The density matrix of a classical signal can be expressed using a photon-number-state representation or a coherent-state representation. This section derives several representations of classical signals that will be used to characterize the performance of quantum-lightwave channels in Chapter 16.

Classical Noiseless Signal

The quantum-optics equivalent of a classical noiseless lightwave signal is a coherent state. The density matrix ρ±E describing this coherent symbol state is

±ρ = |α±²α|. E

P distribution

Half of a photon of uncertainty

Wigner distribution

Half of a photon of uncertainty

Husimi distribution

Large

Classical product distribution Q( ) ≈ f(s)

Figure 15.14 Notional relationship between the three quasi-probability distributions and a classical product distribution.

15.6 Quantum-Lightwave Signal Distributions

803

√ 2 To express this state in a photon-number representation, use ²m| α± = e−|α| /2 α m / m!, (cf. (15.3.35)). Then the elements of the density matrix ± ρE (k, m) are

±ρ (k, m) = ²k|±ρ |m± = ²k|α±²α|m± − = √e E(m+k)/2 ei(m−k)φ , (15.6.16) m!k! √ where the Glauber number is written as α = Eei φ , with φ being the phase. For k = m, the diagonal elements ²m| ± ρ |m± = p(m) comprise a Poisson probability mass function E

E

E

E

(cf. (15.3.36)), with each element being the probability that m photons are measured. The coherence properties of a quantum-lightwave signal state – both classical coherence and higher-order quantum coherence – are expressed by the off-diagonal elements. For a density matrix representing an ensemble of pure coherent signal states with the phase being uniformly distributed over [0, 2π), the off-diagonal elements in (15.6.16) of the mean density matrix are zero (cf. (15.4.21)). This mean density matrix represents a noncoherent signal state that can be described using a Poisson probability distribution and can be analyzed using photon optics.

Classical Noise in a Photon-Number Representation

The density matrix of classical noise in a photon-number-state representation is given by

±ρ = N0

³ m

p(m)|m±²m|,

(15.6.17)

where p(m) is the probability mass function of the ensemble of photon-number states that describes the noise. Within quantum optics, classical noise is modeled by setting p(m) in (15.6.17) to a maximum-entropy distribution. The maximum-entropy distribution for a discrete probability mass function subject to a constraint on the mean energy was derived in Section 6.1.1. It is the Gordon distribution given in (6.1.5). Substituting that expression into (15.6.17) gives

±ρ = N0

∞ ³

Nm 0

m+1 = (1 + N0)

m 0

|m±²m|.

(15.6.18)

Expression (15.6.18) is the photon-number-state representation of classical additive noise. When the noise is generated by a set of external states in thermal equilibrium, N 0 is the mean number of thermal noise photons given by (cf. (6.1.8)) N0

= e±ω/ kT1 − 1 . 0

When the noise is generated from a phase-insensitive lightwave amplifier that operates at the photon-noise limit, then N0 = Namp , where Namp is the classical noise source given in (7.7.5) that excludes the effect of quantum noise.

804

15 The Quantum-Optics Model

Classical Noise in a Coherent-State Representation

The density matrix ± ρN0 of classical noise in a coherent-state representation is given by (15.6.4), with s = 0 and 2σ 2 = N0, so that

±ρ = π1N 0 N0

¾

α

2 e−| α| /N 0 | α±²α| dα .

(15.6.19)

The density matrix can be written in a coherent-state representation and a photonnumber-state representation. Because a classical noise source has a uniform phase distribution, the diagonal elements of the density matrix in these two representations are related by a Poisson transform (cf. (15.6.6)). Specifically, the discrete probability mass function p(m) of the diagonal elements in the density matrix given in (15.6.18) expressed in a photon-number representation is a Gordon distribution (cf. (6.3.11)). That distribution is the Poisson transform (cf. (15.6.6)) of the continuous exponential probability density function f (E). The squared-magnitude of the diagonal elements in the density matrix given in (15.6.19) expressed in a coherent-state representation are an exponential distribution with E = |α| 2. With the derivation of (15.6.18) and (15.6.19), classical additive noise modeled by a maximum-entropy distribution can be described in four interrelated ways using three different signal models that are connected via the Poisson transform. In wave optics, the complex amplitude of additive noise is described by a continuous circularly symmetric gaussian random variable with variance σ 2 per signal component. The noise energy is described by an exponential random variable. In photon optics, the noise is described by a discrete random variable with a Gordon probability mass function. The mathematical relationships between the three descriptions of noise are shown in Figure 6.5. In quantum optics, the density matrix for the noise in a photon-number representation is given by (15.6.18), whereas the density matrix for the noise in a coherent-state representation is given by (15.6.19). The probability distributions described by the diagonal elements of each of these representations of noise are related by a Poisson transform.

Bias Plus Noise in a Photon-Number-State Representation

A constant classical bias in additive white gaussian noise can also be described using a density matrix ± ρE+ N0 expressed in a photon-number-state representation. The elements of this matrix ± ρkm are given by30

⎧ ¹ º1/2 ⎪⎪⎪ k! Nm 0 −E /(1+N ) ⎪⎪ m! (1 + N0 )m+1 e ⎨ ±ρ + (k, m) = ²k|ρ±+ |m± = ⎪ × ¹ α ∗ ºm−k Lm−k ¹− E º k ⎪⎪ N0 N0 (1 + N 0 ) ⎪⎩ ∗ ²m|ρ± + |k± 0

E N0

E N0

E N0

form ≥ k for m < k, (15.6.20)

30 See Lachs (1965), Helstrom (1976), Chapter 5, Section 4, and Yoshitani (1970) for a complete derivation.

15.6 Quantum-Lightwave Signal Distributions

805

2 is the mean number of signal photons, N is the mean number of noise where E = |α|√ 0 photons, α = Eeiφ is the Glauber number, and Lm ( x ) is a Laguerre polynomial defined k in (2.2.52). The probability mass function p(m) for the number of photons is given by the diagonal elements of ± ρE +N0 and is

p(m) = ²k|ρ±E+N0 |m± =

Nm 0

(1 + N0 )m+1

¹ º E . m − N (1 + N )

e−E /(1+N 0 ) L0

0

0

(15.6.21)

This is the Laguerre probability mass function derived in (6.5.9) for K equal to 1 and N sp equal to N0 . The Laguerre probability distribution, given in (15.6.21), reconciles several of the principal differences among quantum optics, photon optics, and wave optics. The offdiagonal elements in (15.6.20) incorporate both classical and quantum coherence effects that cannot be described using photon optics. A mean density matrix can be defined using a quantum phase term φ that is uniformly distributed over [0, 2π) (cf. (15.6.21)). For this case, each of the off-diagonal elements in (15.6.20) averages to zero. The resulting density matrix describes a noncoherent signal state with the diagonal elements forming a Laguerre probability mass function. This mean density matrix can be analyzed using semiclassical photon optics. While the mean density matrix can be described classically, any given realization of the density matrix may exhibit quantum coherence effects. This is similar to a classical noncoherent signal (cf. Section 8.2.2). When the mean number of photons is large, the discrete Laguerre probability mass function is well approximated by the continuous noncentral chi-square probability distribution given in (6.5.5) (cf. Figure 6.7). In this case, a classical noncoherent wave-optics signal model based on the probability density function for the wave-optics signal energy is appropriate.

Bias Plus Noise in a Quasi-probability Representation

A constant bias signal in classical additive gaussian noise can be described using any of the three quasi-probability distributions. The P distribution for this classical state was defined in (15.6.3). The Wigner distribution for this classical state can be derived from the P distribution using (15.6.12). The Husimi distribution for this classical state can be derived from the Wigner distribution using (15.6.15). These three distributions are

P (α I , α Q ) = W (α I , α Q ) = Q (α I , α Q ) =

1

π N0

(

)

2 2 e− (α I −AI ) +(α Q −AQ ) / N0 ,

(15.6.22a)

(

)

(

)

1

2 2 e− (α I −AI ) +(α Q −AQ ) /(N0 + 1/ 2) ,

1

2 2 e− (αI −A I ) +(α Q −AQ ) /(N 0 +1) .

π(N0 + 12 ) π(N0 + 1)

(15.6.22b) (15.6.22c)

When the mean additive-noise level N0 is much greater than one photon, (15.6.22b) and (15.6.22c) differ little from (15.6.22a), which has the same form as a classical wave-optics product distribution with N0 = 2σ 2, and with s equivalent to the Glauber number α . This equivalence is valid for large values of α. It is not valid for small values of α because of the inherent dependence between the in-phase and quadrature signal

806

15 The Quantum-Optics Model

components. That is why each function in (15.6.22) is written in terms of the separate components. 15.6.5

Gaussian Signal States

Gaussian signal states were introduced in Section 15.1.5. These states have several properties similar to the properties of a classical signal based on a gaussian probability density function. In particular, they have the property that a linear transformation of a gaussian state will produce another gaussian state. For this reason, gaussian quantum signal states can often be analyzed using classical methods. Here we consider a multivariate circularly symmetric gaussian signal state. For this signal state, the density matrix ± ρ is represented by a P distribution that has the same form as the probability density function of a zero-mean multivariate circularly symmetric gaussian random variable given in (2.2.30a) with 31 1 † −1 e −¶α V α¶ , (15.6.23) π det V where α ¶ = [α 1, α2 , . . . , α M ]T is a column vector with complex components that are the Glauber numbers for each coherent state |α j ± in the multivariate distribution, where α ¶ † = [α ∗1 , α ∗2, . . . , α∗M ] is a row vector, and where V = trace(¶αρ α¶ † ) is a hermitian covariance matrix which has the same form as the classical expression given in (2.2.30b). The multivariate circularly symmetric gaussian signal state corresponds to a classical multivariate circularly symmetric gaussian random vector z for which eiθ z has the same multivariate probability density function for all θ (cf. (2.2.28)). Accordingly, a multivariate circularly symmetric gaussian signal state |¶α± has the property that it has the same representation under the generalized rotation eiθ . A gaussian signal state can be described by a gaussian Wigner distribution, such as the signal state given in (15.6.22b). Equivalently, a gaussian signal state is a signal state described by a gaussian characteristic function (cf. (15.6.8)). When a multivariate circularly symmetric gaussian signal state is expressed using a Wigner distribution, that distribution is described by a real covariance matrix C, instead of a complex covariance matrix. This is because the Wigner distribution expresses the statistical properties of the gaussian signal state in terms of separate homodyne measurements for each signal component described by the operators ± a I and ± a Q. Using a real 2M × 2M covariance matrix C, the Wigner distribution W (x) for a multivariate circularly symmetric gaussian signal state can be written as (cf. (2.2.21))

P (¶α ) =

W (x) =

(2π)

M

1 M



det C

T −1 e−x C x/2 ,

·

(15.6.24)

where ¸ x is a real column vector of length 2M given by x = αI 1, . . . , α I M , α Q1, . . . , αQ M T. The real components α I j and αQ j describe the in-phase and quadrature components of each coherent state | α j ± in the multivariate circularly symmetric gaussian 31 See Helstrom (1976), Section 5 of Chapter 5, and Holevo, Sohma, and Hirota (1999), Section IIIa.

15.7 References

807

signal state. These components may be viewed as the separately measured in-phase and quadrature signal components using homodyne demodulation of a single component to real baseband (cf. Figure 15.13). The 2 M × 2M real covariance matrix C that specifies the Wigner distribution for a gaussian signal state can be expressed in block form using the M × M complex covariance matrix V that specifies the P distribution. To do so, use (2.2.31) and augment the diagonal terms with the quantum noise of half of a photon to account for the transformation of the P distribution into a Wigner distribution (cf. (15.6.12)). This gives

C=

1 2

Ë

Re V + IM /2 Im V

Ì −Im V , Re V + I /2 M

(15.6.25)

where I M is the M × M identity matrix. As an example, for a circularly symmetric gaussian state with a complex covariance matrix V = SI M , using (15.6.25) gives the real covariance matrix C used for the Wigner distribution as

C = 21 (S + 21 )I2 M ,

(15.6.26)

where S is the mean signal. For a minimum-uncertainty vacuum state, the mean signal S is equal to zero and

Cvac = 14 I2M ,

(15.6.27)

which agrees with (15.3.17b) for each component of the covariance matrix. The same covariance matrix is generated for a state-preparation process that produces a displaced vacuum state with ²x± being nonzero (cf. Figure (15.13(b)) because the mean value does not affect the form of the covariance matrix or the von Neumann entropy of the gaussian signal state. For a gaussian thermal noise state, the mean value S is equal to N0 (cf. (6.1.8)). For a gaussian lightwave amplifier noise state, the mean value S is equal to N amp (cf. (7.7.5)). For a gaussian signal state generated by a state-preparation process, the mean value S is equal to the mean signal E.

15.7

References General material on quantum theory that is applicable to quantum-lightwave signals can be found in Griffiths (2002), in Schumacher and Westmoreland (2010), and in Sakurai and Napolitano (2011). Quantum statistics of lightwaves is discussed in Peˇrina (2012). The relationship between quantum theory and information theory is discussed in Zurek (1980). An amusing and informative introduction to entanglement in the form of quantum cakes is given in Kwiat and Hardy (2000). Material on the quantization of the electromagnetic field, photon-number states, and coherent states is given in Loudon (2000), in Mandel and Wolf (1995), and in Agrawal (2013). Quasi-probability distributions for coherent states are detailed in Klauder and Sudarshan (1968), in Lai and Haus (1989), and in Lee (1995). The classical limit of coherent states is presented in Hepp (1974). A rigorous quantum-optics treatment of the state detection of a

808

15 The Quantum-Optics Model

homodyne-demodulated coherent state is given in Braunstein (1990). A quantum description of heterodyne demodulation is given in Leonhardt and Paul (1995) and in Haus (2000b). Nonclassical states of light are reviewed in Bužek and Knight (1995).

15.8

Historical Notes The density matrix was introduced by von Neumann (1932). The definition of the coherent state is due to Glauber (1963). The Fock states were named after Vladimir Fock, who introduced this concept in Fock (1932). It appears that generalized measurements were first considered in Gordon and Louisell (1966) and developed in Davies and Lewis (1970), in Holevo (1973), and in Helstrom (1976). The P distribution was introduced in Sudarshan (1963) and in Glauber (1963). The Wigner distribution was introduced in Wigner (1932). It is the two-dimensional Fourier transform of the radar ambiguity function introduced by Woodward (1953). The Husimi distribution was introduced in Husimi (1940). The relationship between the P distribution and the Poisson transform was discussed in Mandel and Wolf (1966).

15.9

Problems 1 Coherent states Show that no nonzero eigenstate exists for the operator a †. This means that there does not exist a nonzero eigenstate A such that a † A AA .

± ±| ±= | ±

| ±

2 Coherent states as a basis Prove (15.3.37) by writing α in polar coordinates and performing the result2 ∞ 1 ing integrations using 0 re −r r n+m dr n m 2 2 , where k ∞ x k −1 e−x dx is the gamma function. Note2 that j 1 j , where j is an 0 integer.

Ï

Ï

= ²[( + + )/ ] ²( − ) = !

3 Trace of a mixed signal state Starting with

±ρ =

³ j ,k

²( ) =

ρ jk |x j ±²xk |,

show that the trace of a density matrix for a mixed signal state equals 1 for any basis {|x j ±} . 4 Quantized harmonic oscillator Using the eigenstates aI of the in-phase component operator a I as a basis, the Schrödinger equation for a quantized harmonic oscillator expressed in terms of the peak amplitude can be written as

±

| ±

d2 u(α I ) dαI

2

+ (2E/ ±ω − α2 )u (α ) = 0, I

I

15.9 Problems

809

where u(α I ) is the quantum wave function in the in-phase component representation, and E is the eigenvalue for the energy in the quantized harmonic oscillator, which is independent of the representation. √ (a) Using the scaling factor for the peak amplitude given by α I = m ω/± x (cf. (15.3.10a)), rewrite this equation in a position representation. (b) Using the expression derived in part (a), identity the potential function V (x ) in the Schrödinger equation in a position representation and discuss the physical significance. 5 Eigenstates and eigenvalues of a quantized harmonic oscillator The differential equation

d 2φn ( y) dy 2

+ (1 + 2n − y2 )φn ( y ) = 0,

where n is an integer, has the solution

φn ( y) = Hn ( y )e− y /2, 2

where Hn (y ) is the nth Hermite polynomial (cf. (3.3.50)). The functions are normalized so that

¾∞

−∞

φn (y )φm ( y)dy = 2n √π n!δmn ,

with δ mn being the Kronecker impulse. Using this solution and the orthogonality relation, show that the eigenvalues of the quantized harmonic oscillator are given by E

= ±ω

¹

n+

1 2

º

,

which is (15.3.25) with the quantum wave function in the in-phase component representation given by u n (α I ) =

Ò

2n

√1π n! Hn (α )e −α /2 . I

2 I

6 Classical and quantum harmonic oscillators (a) Show that the maximum displacement of a classical harmonic oscillator from its equilibrium position is given by

xmax



Ê

2E / m ω2.

(b) Using the classical solution for the harmonic oscillator, plot the probability distribution p(α I ) for the in-phase component αI . (Hint: use the mathematical equivalence between the in-phase component αI and the position x given in (15.3.10a) and work with x . The probability p(x ) is the fraction of time that the particle spends in a region dx , and is inversely proportional to its velocity v(x ).)

810

15 The Quantum-Optics Model

(c) Plot the quantum wave functions un (α I ) =

Ò 2n

√1π n ! Hn (α )e−α /2 2 I

I

for the first four eigenfunctions n = 1, 2, 3, 4. (d) Show that as n becomes large, the probability | un (α I )|2 approaches the classical probability distribution derived in part (b). 7 Commutation relationships (a) Prove that N a a, where N (b) Prove that N a† a †.

[ ±, ±] = −± [ ±, ± ] = ±

± = ±a†±a.

8 Commuting transformations The Rubik cube is a well-known recreational puzzle. This problem discusses commuting and noncommuting transformations using the group of moves on a Rubik cube. (a) Organize the moves on a Rubik cube into groups of transformations that have common properties. (b) Describe two sets of moves on a Rubik cube that commute. (c) Describe two sets of moves on a Rubik cube that do not commute. (d) Define the commutator for the group operations on a Rubik cube. Note that there is no such thing as subtraction of operations. (Hint: see Problem 2.9). (e) Would the Rubik cube be an interesting puzzle if every pair of operations commuted? 9 Commutation in an enlarged signal space Let A and B be two n n hermitian matrices for which (a) Prove that the two 2n 2n matrices

×

× Ë

A B B A

Ì

and

[A, B] µ = 0.

ËB AÌ A B

do commute. This shows that operators that do not commute can be embedded into a larger signal space, called an ancilla embedding, in which they do commute. This motivates the definition of an ancilla state. (b) Prove that trace (AB) = trace (BA) (cf. (2.1.84c)) even if [A, B] is not equal to zero. 10 Phase operator Consider a phase operator eiφ defined by the two equations

Ó

(a) (b) (c)

Ç ±a = N± + 1 eÓiφ , Ç ±a† = eÓi φ N± + 1. iφ and eÔ −i φ in terms of ± ±. Write the two phase operators eÓ a, ± a† , and N ±, eÓi φ ] = −eÓiφ . Prove that [ N −i φ ] = eÔ −iφ . ±, eÔ Prove that [ N

15.9 Problems

811

11 Density matrix for a pure signal state (a) Show that 2 without explicitly expressing the density matrix in terms of a basis. Do so by expressing the density matrix for a pure signal state in terms of a projection operator of the form . (b) A density matrix is given by

±ρ = ±ρ

|ψ±²ψ|

±ρ = |ψ±²ψ| µ ¶ √ µ ¶ √ = (1/ 2) |r1± + eiφ |r2 ± (1/ 2) ²r 1| + e−iφ ²r2 | µ ¶ = 21 |r1 ±²r1 | + e−iφ |r 1±²r 2| + eiφ |r 2±²r1| + |r 2±²r 2| . Show that the state described by this density matrix is a pure signal state by showing that ± ρ2 = ±ρ. 12 Entangled states Let ai be a basis state defined in signal space . Let b j be a basis state defined in signal space . Show that a superposition of product states of the form

| ±

B

A

| ±

|a1± ⊗ |b2± + |a2± ⊗ |b1±

is entangled by showing that this state cannot be written as a product state (cf. (15.2.11)). 13 Displacement operator Starting with the Campbell–Baker–Hausdorff identity

± ± = e ±A e ±B e− 21 [ ±A, ±B] ,

e A+ B

set ± A = α± a † and ± B = −α ∗± a. ± ± (a) Show that [ A , B ] = |α| 2. (b) Show that both ± A and ± B commute with [ ± A, ± B ], and thus show that eα±a −α †

∗± a

= e−|

α

|2/ 2e−α ∗±a eα±a † ,

which is (15.3.33). 14 Small-signal coherent state Consider a coherent state with α 2 much smaller than 1. (a) Using the expression of a coherent state

| |

∞ m ³ |α ± =. e−| | /2 √α |m±, m! m=0 α

2

expand the right side keeping only the first two terms. (b) The state is photodetected by ideal photon counting. In terms of | α| 2, determine the probability that zero photons are generated and the probability that one photon is generated. (c) Determine the value for | α| 2 such that the probability that more than one photon is generated is smaller than 10−3 .

812

15 The Quantum-Optics Model

15 Large-signal coherent state Consider a coherent state with α 2 much larger than 1. (a) Using the expression for the pairwise inner product of two coherent states given in (15.3.38), determine the value α 2 for antipodal coherent-state modulation such that the real inner product is less than 10−4 ( 40 dB). (b) Using the expression for the probability p m that m photons are measured that is given in (15.3.36), determine the value of α 2 such that the signal-to-noise ratio of the photodetected signal is greater than 40 dB. (c) Discuss the results of parts (a) and (b) with respect to the statement that, for α 2 much larger than 1, a coherent state approaches a classical noiseless signal described by a complex number s. (d) How does the time interval T of the measurement affect the conclusions of part (c)?

| |

| | κ ( ) ||

κ



| |

16 Shannon and von Neumann entropies A 2 2 density matrix is given by

×

Ë 0.5 0.1 Ì ±ρ = 0.1 0.5 .

Compute the Shannon entropy and the von Neumann entropy. Comment on your results. 17 Normalization of the Heisenberg uncertainty relationship The constant A in (15.3.15) can be determined by forming the inner product x2 x 2 . (a) Using (2.1.94), write the identity operator I as an integral of the projection operators p p . (b) Using the expression of I determined in part (a), write x1 x2 x1 I x2 as an integral. (c) Using properties of the Fourier transform, show that the integral evaluates to 2 ± x1 x2 , where x is a Dirac impulse. (d) Given the normalization condition x1 x2 x1 x2 , show that A 1 2 ±.

² | ±

±

| ±² |

±

π δ( − ) √ / π

δ( )

² | ± = ² |±| ±

² | ± = δ( − )

=

18 Partial trace The partial trace of the product state that recovers the density matrix of the component signal state is given by (15.4.9a). Determine an explicit expression for the partial trace of that recovers the density matrix of the component signal state .

±ρ

±σ

±µ

±ρ = ±σ ⊗ ±µ

19 Large-signal regime Suppose that a coherent state is regarded as orthogonal when the pairwise inner product e−4Eb is 10−5 , where Eb is the mean number of photons. (a) Determine the signal power density spectrum required to achieve this condition for the following wavelengths: 1 nm, 1000 nm (1 micron), and 106 nm (1 mm).

κ=

15.9 Problems

813

(b) Determine the thermal noise power density spectrum at 290 K given by N0 = kT0 , where k is Boltzmann’s constant and T0 is the temperature in degrees Kelvin. (c) Suppose that the nonorthogonal nature of the coherent states is evident when κ is larger than one-half and that the signal-to-noise ratio E / N 0 is larger than 20 dB. For these parameters, what is the largest wavelength for which the nonorthogonal nature of a coherent state may be evident? 20 Noise from phase-insensitive amplification Let the two operators

±x = ±a + ±n , ±y = ±a + ±n I

I

Q

Q

be noisy versions of the in-phase operator and the quadrature operator, where the terms ± n I and ± n Q account for the additional noise from phase-insensitive amplification. (a) Write down the necessary condition for ± x and ± y to be jointly observed without additional uncertainty. (b) Using this condition, solve for the relationship between ± n I and ± n Q such that the condition in part (a) is satisfied. 21 Heterodyne demodulation versus homodyne demodulation Redraw Figure 15.10 for a homodyne demodulator for which f c is equal to f LO . Discuss what happens when the signal is not filtered to reject noise before demodulation. 22 Composite signal states Let one signal state s A be described by a density matrix

| ±

È

±ρ √ p(1 − p) É

p

A

given by

±ρ = √ . p(1 − p) 1− p Let a second signal state | s ± be described by a density matrix ± ρ Ë1 0Ì ±ρ = 0 0 . A

B

B

(15.9.1) given by

B

(a) Are these signal states mixed signal states or pure signal states? (b) Determine the density matrix ± ρAB of the composite signal state of these two component signal states using the Kronecker product given in (2.1.99). (c) Is this composite signal state mixed or pure? 23 Husimi distribution Starting with the matrix density for classical noise in a photon-number representation

±ρ = N0

∞ ³ m

Nm 0

(1 + N0)m+ 1 |m±²m|,

814

15 The Quantum-Optics Model

which is (15.6.18), derive the Husimi distribution given in (15.6.13) and show that it equals the expression given in (15.6.22c) with A = 0. 24 Even and odd coherent states (a) An even coherent state is defined as

|even± =. √1N (|α ± + |− α±), +

where N + is a normalization constant. When ²even| even± equals 1, using (15.3.38), repeated here as

²α 1|α0 ± = e−| show that N + = 2(1 + e −2|α| ).

α1

−α0| 2/2 ,

2

(b) Repeat for the odd coherent state given by

|odd± = √1N (|α± − |− α ±). −

(c) Derive approximate expressions for the even and odd coherent states when α is large and comment on the result.

16

The Quantum-Lightwave Channel

The transmission of classical information through a quantum-lightwave information channel is studied in this, the last of the three concluding chapters of the book. The development herein provides the formal science that underlies the transmission of classical information in the form of bits using a lightwave. This treatment validates the use of the semiclassical treatment in earlier chapters, and provides a deeper understanding of the dual wave/particle nature of a lightwave. Moreover, the discussion uncovers methods that, in principle, can provide an increase in the information rate compared with semiclassical methods. The principal differences between a quantum-lightwave information channel and a classical information channel, such as those considered in Chapters 10 and 14, are the larger class of admissible state-preparation processes at the channel input and the larger class of admissible state-detection operations at the channel output. This larger class of admissible operations is shown notionally in Figure 15.1. These operations can in principle, exploit the complete wave/particle properties of a lightwave signal and so have the potential to produce a lower probability of a detection error, and a larger informationtheoretic channel capacity for classical information, than for methods that rely on solely the wave-like properties or the particle-like properties of a lightwave. Exploiting the full range of properties of a lightwave to convey classical information in the form of a conventional bitstream requires methods of state preparation and state detection that are significantly different than classical methods. The classical methods describe the function of a modem. The methods of state preparation and state detection describe a quantum modem. Different methods for quantum-lightwave signals are necessary because of the presence of quantum uncertainty. When the signal states that comprise the signal constellation in state space are pairwise nonorthogonal, it will be seen that the detection of an individual pure signal state requires a measurement state that is, in general, a superposition of the complete set of pairwise-nonorthogonal pure signal states of the signal constellation. Moreover, the detection of a component-symbol state is not the same as the detection of a block-symbol state such as a product-state composed of component-symbol states. This is because the quantum uncertainty of a block-symbol state defined in an enlarged signal space may be different than the quantum uncertainty of a separately measured component-symbol state defined in the component signal space. The difference in the two methods of quantum-optics state detection, which exists even when the

816

16 The Quantum-Lightwave Channel

channel is memoryless, does not exist for a classical memoryless channel for which the individual detection of all component symbols is equivalent to the detection of a block symbol. To understand these fundamental differences, expressions are derived for the probability of a detection error both for the detection of a component-symbol state and for the detection of a block-symbol state composed using the outer product of a set of pure pairwise-nonorthogonal coherent states. These expressions are compared with their classical counterparts. Using methods of detection appropriate for quantum lightwaves, the classical information capacity of an ideal quantum-lightwave channel will be discussed in Section 16.4. Indeed, the classical information capacity of several forms of gaussian quantum-lightwave channel will be discussed. The classical additive white-gaussiannoise channel is the large-signal limit of one kind of gaussian quantum-lightwave channel.

16.1

Methods of Quantum-Optics State Detection Detection errors for classical signaling are caused by statistical uncertainty in the channel. Classical methods of detection for real-valued and complex-valued signal constellations are discussed in Sections 9.5 and 10.2, respectively. Classical methods of detection in signal space are discussed in Section 10.4. Detection errors in quantum optics are caused by a combination of statistical uncertainty and quantum uncertainty. Even when the channel is classically ideal and has no statistical uncertainty, there can be detection errors caused by quantum uncertainty. Within a semiclassical analysis, this quantum uncertainty is attributed to photon noise transcribed into a form of statistical uncertainty. Within quantum optics, this kind of error is understood to be caused by quantum uncertainty, which is distinct from statistical uncertainty. To incorporate both quantum uncertainty and statistical uncertainty, classical detection based on hypothesis testing must be replaced by quantum-optics state detection based on detection operators. This kind of quantum-optics state detection incorporates both quantum uncertainty and statistical uncertainty, jointly expressed by a density matrix. The classical probability distribution is replaced with that density matrix. This replacement “lifts” the formalism of classical detection into a higher-dimensional formulation appropriate for quantum optics. This section provides an overview of the principal differences between classical detection and quantum-optics detection. The formalism of quantum-optics detection operators is then presented. This formalism is applied first to the detection of a set of pure signal states that has no statistical uncertainty. This set of signal states has quantum uncertainty when the set cannot be aligned, one-to-one, with a set of measurement states. The formalism is then applied to the detection of a set of mixed signal states with a combination of quantum uncertainty and statistical uncertainty expressed by a density matrix.

16.1 Methods of Quantum-Optics State Detection

16.1.1

817

Classical Channels and Detection

An overview of the fundamental differences between classical detection and quantumoptics detection is the topic of the next two subsections. This subsection considers classical channels and detection. The next subsection considers quantum-lightwave channels and detection.

Classical Channels

The properties of a classical electrical channel are discussed in Chapter 8. The properties of a classical information channel are discussed in Chapter 9. A classical information channel can be described by a transition matrix Q (cf. Section 13.4) with q = Qp ,

(16.1.1)

where p is the channel input probability vector and q is the channel output probability vector.

Classical Detection on the Real Line or the Complex Plane For a classical binary signal constellation {s± error is (cf. (9.5.3b)) pe

: ± = 0, 1}, the probability of a detection

= 1 − ( p0 p0|0 + p1 p1|1) ) ) ( ( = 1 − p0 1 − p1|0 + p1 p1|1 ) ( = p1 − p1 p1|1 − p0 p1|0 ± = p1 − ( p1 f (r |1) − p0 f (r |0))dr , .

R1

(16.1.2)

where ( p0, p1 ) is the prior, pk |± = p(k|±) is the conditional probability, p0| 0 = 1 − p1|0 for a binary modulation format, and f (r | s± ) or f (r |±) is the conditional probability density for r given the ±th channel input symbol s± . The classical decision region 1 (cf. Figure 9.8) for a binary signal constellation on the real line includes every value r at the channel output for which the probability is largest. This condition defines the decision region

R

R1 = {r : p1 f (r |1) > p0 f (r |0)}

(16.1.3a)

or, equivalently,

R1 = {r : p1 f (r |1) − p0 f (r |0) > 0} (16.1.3b) as was discussed in Chapter 9. The decision region R0 is defined in a similar way.

When there are multiple symbols at the channel input, there are multiple hypotheses. This classical detection process uses a generalization of the rule stated in (16.1.3) to partition the real line or the complex plane into a finite set of real or complex decision regions ± corresponding to the hypothesis H± . The quantum-optics generalization of (16.1.2) and (16.1.3) is the principal topic of this section.

R

818

16 The Quantum-Lightwave Channel

Classical Detection in Signal Space

More generally, in Section 10.4, a classical signal constellation is described as a collection of points in a signal space. A set of functions {s± (t )} for ± = 0, . . . , L −1 in that signal space comprises a signal constellation of size L. For any infinite set of basis functions {yk (t )} for that signal space (cf. (10.4.1a)), each function in the signal constellation can be expressed as (cf. (10.4.2)) s(t ) = M y(t ),

(16.1.4)

where s(t ) is a column vector of length L with components s± (t ), where y(t ) is an infinite column vector whose components yk (t ) are the basis functions, and where the elements m ±k of the signal constellation matrix M are given by (10.4.1b). The matrix M fully specifies the signal constellation in signal space. Because the L points of the signal constellation lie in a subspace of dimension at most L , the basis {y(t )} can be chosen so that L of the basis functions span a subspace containing all the points of the constellation. Then the matrix M reduces to an L × L matrix. When the dimension of the subspace containing the signal constellation is smaller than L, the matrix M does not have full rank. For a classical complex signal constellation with L points, the signal-constellation matrix M is an L × 1 complex matrix. For classical L -ary orthogonal signaling with equal-energy symbols, the signal-constellation matrix M is an L × L square matrix equal to the identity matrix I. For this case, the set of basis functions {yk (t )} can be one-toone mapped to the set {s± (t )} of signals. In principle, a bank { yk (t )} of matched filters, with one matched filter for each point in the signal constellation, is used for classical hypothesis testing (cf. Figure 10.17). The decision rule for orthogonal signals partitions the classical signal space into decision subspace regions ± so as to minimize the probability of a detection error. This decision rule asserts the hypothesis corresponding to the largest matched-filter output sample. For classical L -ary nonorthogonal signaling spanning an L -dimensional subspace, the signal-constellation matrix M is still an L × L full-rank square matrix, but it is not the identity matrix I. This means that each signal s± (t ) in the signal constellation is a linear combination of the set { yk (t )} of basis functions as given by the corresponding row of the signal-constellation matrix M . Inverting M gives each orthogonal basis function yk (t ) in terms of a linear combination of the pairwise-nonorthogonal set {s± (t )} of signals with y(t ) = M−1 s(t ) (cf. (10.4.8)). For this case, optimal classical detection uses a bank {yk (t )} of detection filters that is not one-to-one matched to the set {s± (t )} of signals. Instead, each detection filter is a linear combination of the set of pairwisenonorthogonal signals. The decision rule remains the same, with the assertion of that hypothesis H± corresponding to the largest detection-filter output. It will be shown that detection for a signal constellation of coherent states is similar to detection for classical nonorthogonal signaling.

R

16.1.2

Quantum-Lightwave Channels and State Detection

An overview of quantum-lightwave channels and methods of quantum-optics state detection is the topic of this subsection.

16.1 Methods of Quantum-Optics State Detection

819

Quantum-Lightwave Channels

The quantum-lightwave channel input and the quantum-lightwave channel output are related by a lightwave channel transformation ² T with

²ρout = T²²ρin ,

(16.1.5)

where ρ²in is the density matrix describing the channel input and ρ²out is the density matrix describing the channel output. Indeed, this expression can be taken to be the definition of the quantum-lightwave channel T², which excludes the state-preparation process and the state-detection process. The density matrix ² ρin at the channel input may describe a component-symbol state or it may describe a block-symbol state composed of component-symbol states. When considering a prior probability distribution on the set of input signal states, an averaged density matrix, which is a mixed signal state, may also be defined. The corresponding output density matrix ² ρout depends on the channel transformation ²T . The possible transformations T² are constrained to those that preserve the properties of a density matrix. Therefore, ² T must maintain the trace of the density matrix equal to one. The density matrix ² ρout at the channel output is then a nonnegative-definite matrix. When T² is equal to the identity operator ² I , the channel is ideal and adds no statistical uncertainty. In this case, a channel input that is a pure signal state will remain a pure signal state at the channel output. When ² T adds statistical uncertainty, the channel output state will be a mixed signal state even when the channel input state is a pure signal state.

State-Preparation Process

The state-preparation process maps a letter or block from an input alphabet onto a channel input state described by density matrix ² ρin . The input signal state may be a component-symbol state (cf. (15.4.6)), a block-symbol state that is a product state (cf. (15.4.7)), or a block-symbol state that is an entangled state. A signal state is an element of a signal constellation in signal space. When expressed in terms of the coherent states (cf. Section 15.3.7), an individual element of a signal constellation may be described by a Glauber number α chosen from a signal constellation in the complex plane. The resulting coherent states are nonorthogonal, but are linearly independent. A block-symbol state that is a product state is composed of the outer product of component coherent-state symbols. This state is an element of a signal constellation in an enlarged signal space and is analogous to a classical block-symbol. However, the detection of a block-symbol state may have quantum uncertainty that does not exist for the classical detection of a block-symbol.

Quantum-Optics Detection in Signal Space

Informally, quantum-optics state detection may be viewed as “lifting” classical detection in signal space into a higher-dimensional signal space so as to accommodate the combined effects of quantum uncertainty and statistical uncertainty. The symbol s± is replaced by the channel input signal state |ψ± ±. The combined uncertainty in the signal state |ψ± ± at the channel output is expressed by replacing the conditional probability

820

16 The Quantum-Lightwave Channel

distribution f (·|±) in classical detection by the density matrix ρ²± . The replacement concisely incorporates the composite effect of both statistical uncertainty and quantum uncertainty in the quantum-optics state-detection process. This form of detection in signal space is not equivalent to detection on the real line or on the complex plane.

Quantum-Optics Detection of Coherent States The elements of a coherent-state signal constellation cannot be expressed as scalar multiples of a single “basis” coherent state as would be the case for a classical signal constellation defined on the (one-dimensional) complex plane. This is because coherent states with different Glauber numbers are pairwise nonorthogonal with a nonzero inner product (cf. (15.3.39)), but are linearly independent. Thus, while the complex Glauber numbers themselves may be chosen from a set of points in the complex plane, and so may be scalar multiples of each other, the coherent states described by those Glauber numbers are not so related. Specifically, the coherent state |−α± is not equal to the state −|α ± as would be the case for classical antipodal signaling. This means that there is a separate hypothesis defined by a decision subspace region for every point in a coherentstate signal constellation. This property means that quantum-optics state detection must be described in a signal space. The property that two quantum coherent states can never be expressed as multiples of a single basis coherent state also means that classical notions of detection as a partition of the real line or the complex plane are not meaningful in the context of the quantum-optics detection of coherent states. The quantum-optics relationship between the pairwise nonorthogonality of the signal states and the resulting quantum uncertainty does not exist for classical signaling. This is an essential difference between classical detection in signal space and quantum-optics state detection in signal space. To this end, ideal classical channel is noiseless. It has no statistical uncertainty. Therefore, there are no detection errors for any reasonable method of signaling. The same statement need not be true in quantum optics. For a noiseless quantum-lightwave channel with no statistical uncertainty that uses coherent states, detection errors result from basis-dependent quantum uncertainty even when the signal state at the channel output is a pure signal state with no statistical uncertainty. 16.1.3

Detection Operators

A quantum-optics hard-decision state-detection process uses a set { ² Dn , n = 0 , . . . , L − 1 } of operators called detection operators to partition an appropriately defined signal space into decision subspace regions ± , one subspace region for each hypothesis H± , with the number of hypotheses equal to the size L of the signal constellation. Other methods – such as soft-decision detection – that may partition the signal space into a larger number of subspaces are not considered herein. The form of the detection operator ² Dn depends on the state-preparation process, the uncertainty in the channel, and the state-detection process.

R

16.1 Methods of Quantum-Optics State Detection

821

Sampling Eigenstates

R

The decision subspaces { ± , ± = 0, . . . , L − 1} are chosen according to a rule that is the quantum-optics equivalent of (16.1.3). This rule is described in terms of elementary ²k . There is one projection operator for each sampling eigenstate projection operators P used in the state-detection process, potentially an infinite number for each decision subspace. The sampling eigenstates {|ηk ±} comprise the basis used to represent the measured outcome at the channel output. These are the eigenstates of an observable operator describing the single measurement operation corresponding to classical demodulation followed by detection. The associated real {ηk } eigenvalues correspond to the classical samples used for detection. Each orthogonal sampling eigenstate |ηk ± corresponds to one potential sample. The subspace spanned by a subset of the sampling eigenstates corresponds to the set of potential measured outcomes. The number of sampling eigenstates corresponding to the potential measured outcomes depends on the form of the uncertainty in the channel. Three cases are evident. (i) The first case has only quantum uncertainty so the channel is noiseless. The signal state at the channel output is equal to the signal state at the channel input. A pure signal state is a point of a signal constellation. The number of sampling eigenstates can then be set equal to the number L of signal states in the signal-space constellation, as would be the case for a classical noiseless channel. (ii) The second case has only statistical uncertainty because the channel is noisy. This corresponds to classical detection of noisy orthogonal signals. In this case, there is an infinite number of potential sample values when the statistical uncertainty is continuous, as is the case for a classical noisy channel. Therefore an infinite number of basis functions is typically needed to express the channel output – one basis function for each potential noisy measured outcome. (iii) The third case has a combination of quantum uncertainty and statistical uncertainty. The signal state at the channel output is a mixed signal state that must be described by a density matrix. This corresponds to classical detection of nonorthogonal signals, and an infinite number of sampling eigenstates is necessary to express all possible samples of the mixed signal state at the channel output. For a signal constellation of L signal states, the quantum-optics state-detection process organizes the possibly infinite set of sampling eigenstates {|ηk ±} into L decision subspace regions defined by disjoint subsets of the complete set of sampling eigenstates. Because the sampling eigenstates are orthogonal, the decision subspaces are pairwise orthogonal.

Linearly Independent Pure Signal States

²n } that minimize the probability of a detection error In general, the detection operators { D need not commute. 1 However, when the signal states {|ψ± ±} are linearly independent pure signal states, the detection operators in the set { ² Dn } do commute pairwise and share 1 For example, see Holevo (1968).

822

16 The Quantum-Lightwave Channel

a common basis of sampling eigenstates {|ηk ±}.2 Because the detection operators in the ²n } share a common basis, they are regarded as observable detection operators that, set { D in principle, can be measured simultaneously without additional error. ²n can be constructed using the set { ²Pk = For this case, every detection operator D |ηk ±²ηk |} of common elementary projection operators ²Pk associated with the set {|ηk ±} of ²k is applied orthogonal sampling eigenstates. When an elementary projection operator P to a channel output state, the corresponding eigenvalue ηk may be viewed as a sample that is used to determine a decision subspace.

Quantum-Optics Decision Rule

²k } is organized into a corresponding set of The set of elementary projection operators { P detection operators, { ² Dn }, that defines a set of decision subspaces { ± }. Subspace ± corresponds to hypothesis H± . The organization of the elementary projection operators { ² Pk } uses the quantum-optics equivalent of the classical decision rule. For binary detection, this rule can be suggested without a formal derivation by mimicking (16.1.3)

R

R1 =

³´ ²

µ Pk : trace( p1 ² ρ1 ²Pk ) > trace( p0 ²ρ0 P²k ) ,

k

R

(16.1.6)

where ² Pk is an elementary projection operator, ( p0 , p1 ) are the prior probabilities, and (²ρ0 , ²ρ1 ) are the density matrices describing the two binary signal states at the channel output. This expression will be formalized in Section 16.2.1. ²± is constructed from the elementary For the general case, the detection operator D ²k that satisfy the rule given in (16.1.6), and written as projection operators P

²± D

´ ² Pk ,

=

k: ² Pk ∈

± = 0, . . . , L−1,



(16.1.7)

where the notation indicates that only the elementary projection operators ² Pk satisfying (16.1.6) are used to construct ² D± . ²± corresponds to one hypothesis chosen with a probabilEach detection operator D ity p(±) . Because the sum of the probabilities must equal one, this means that (cf. (15.4.31a))

´² n

Dn

= ²I ,

(16.1.8)

where ² I is the identity operator. ²1 }, with ²D0 For the binary case, there are only two detection operators, { ² D0 , D ²I − D²1 . Substituting (16.1.7) into (16.1.6) gives the decision rule

R1 =



2 For details, see Kennedy (1973b).

trace( p1 ² ρ1 ²D1) > trace( p0²ρ0 D²1 )

·

,

=

(16.1.9)

16.1 Methods of Quantum-Optics State Detection

823

²1. This detection operator projects now written in terms of only the detection operator D the channel output signal state described by the density matrix ² ρ onto the decision subspace 1 corresponding to the hypothesis H1 . The detection operator ² D0 corresponding ² ² ² to the hypothesis H0 is determined using (16.1.8) and is D0 = I − D1 . For the general case, there are L detection operators, where L is the size of the signal ²n is determined by the application of a decision constellation. The detection operator D rule similar to (16.1.9). Then, given the signal state density matrix ² ρ± at the channel input, the conditional probability p(n|±) at the channel output that the hypothesis Hn is chosen is given by (cf. (15.4.32))

R

.

p(n|±) = trace

(²ρ D² ) = trace ( D² ρ² ) , ± n n ±

(16.1.10)

where (2.1.84c) has been used. Expression (16.1.10) shows that the combination of the density matrix ² ρ± and the detection operator ²Dn determines the probabilistic form of the quantum-lightwave information channel.

16.1.4

Detection for Pure Symbol States

The quantum-optics detection process of a signal constellation composed of pure signal states and a channel with no statistical uncertainty is the topic of this subsection. This ideal channel adds no statistical uncertainty. The set {|ηk ±, k = 0, . . . , L −1} of sampling eigenstates has the same number of elements as the set {|ψ± ±, ± = 0, . . . , L −1} of pure signal states. In contrast, a channel with statistical uncertainty produces a mixed signal state at the channel output. That case is discussed in Section 16.2.1. A classical detection process for a channel with no statistical uncertainty produces no detection errors. A quantum-optics state-detection process for the same channel with no statistical uncertainty can produce detection errors because of the basis-dependent quantum uncertainty. For this case, the optimal alignment of a set of L orthogonal sampling eigenstates {|ηk ±} from which the detection operators { ² D± } are constructed may be viewed geometrically as aligning the set {|ηk ±} of L sampling eigenstates to the set {|ψ± ±} of L possibly nonorthogonal signal states specified by the signal constellation. This alignment minimizes the probability of a detection error by minimizing the quantum uncertainty. To determine the optimal alignment, express the set {|ψ± ±} of L nonorthogonal signal states in signal space using a basis composed of the L orthogonal sampling eigenstates |ηk ±. An illustration with a set consisting of only three pairwise-nonorthogonal signal states and a set consisting of three orthogonal sampling eigenstates in signal space is shown in Figure 16.1.

Pairwise-Orthogonal Signal States

When the elements of a linearly independent set of pure signal states {|ψ± ±} are pairwise orthogonal, the signal states |ψ± ± can each be simultaneously aligned, one-to-one, with a sampling eigenstate |ηk ± by means of a generalized rotation in signal space. These pure signal states may be component-symbol signal states or block-symbol signal states.

824

16 The Quantum-Lightwave Channel

⏐η 2 θ

⏐ψ0 φ

⏐η0

00

φ

22

⏐ψ2

12

⏐ψ1 φ

11

⏐η 1

|ψ±± and three orthogonal sampling eigenstates |ηk ± in signal space used to determine the most likely pure signal state at the channel input.

Figure 16.1 Three pairwise-nonorthogonal signal states

Each detection operator ² D± for a pure signal state |ψ± ± is the corresponding elemen² tary projection operator P± = |ψ± ±²ψ± | for that signal state. Accordingly, the detection of an orthogonal set of pure signal states with no quantum uncertainty reduces to the classical detection of an orthogonal set of signals in signal space with the set of orthogonal sampling eigenstates {|ηk ±} corresponding to a set of classical matched filters { yk (t )} followed by sampling (cf. Figure 10.17). For this case, the quantum-optics detection process for a set of pure signal states has no quantum uncertainty (cf. Section 15.2.5), and generates no detection errors.

Pairwise-Nonorthogonal Signal States

For a set of pairwise-nonorthogonal linearly independent pure signal states such as the set of coherent states, the set of sampling eigenstates {|ηk ±} cannot be simultaneously aligned one-to-one with the set of signal states {|ψ± ±} for any generalized rotation of signal space. Consequently, for at least one signal state, quantum uncertainty cannot be avoided. For this case, each pure signal state |ψ± ± can be expressed as a linear combination of L orthogonal sampling eigenstates |ηk ± as given by

|ψ±± =

²k where P

L³ ´ ² k =0

Pk |ψ± ± =

L³ ´ k =0

m ± k |ηk ±,

= |ηk ±²ηk | is the elementary projection operator, L ³ =. L−1, and . m ± k = ²ηk |ψ± ±

(16.1.11)

(16.1.12)

is the complex expansion coefficient defined by the inner product of the signal state |ψ± ± and the sampling eigenstate |ηk ±. This is the quantum-optics equivalent of the expansion of a classical signal constellation in signal space as given in (10.4.1). For a pure signal state, the conditional probability of a detection error is (cf. (15.2.5)) p(k|±) = |²ηk |ψ± ±|2

= |m ±k |2,

(16.1.13)

where (16.1.12) is used. Using the set of expansion coefficients defined in (16.1.12), the set of pure signal states {|ψ± ±} is related to the set of sampling eigenstates {|ηk ±} by the matrix equation

16.1 Methods of Quantum-Optics State Detection

825

⎡ |ψ ± ⎤ ⎡ m · · · m ³ ⎤ ⎡ |η ± ⎤ 0 00 0 0 ⎢⎣ .. ⎥⎦ = ⎢⎣ .. . . . .. ⎥⎦ ⎢⎣ .. ⎥⎦ , (16.1.14) . . . . |ψ ³ ± m ³ ··· m ³ ³ |η ³ ± where the L × L hermitian matrix, denoted M , is complex. This matrix is the quantumL

L

L 0

L L

L

optics equivalent of the classical signal-constellation matrix in signal space defined in (10.4.2). The coefficients m ±k of the signal-constellation matrix M are derived using an L × L Gram matrix K (defined in Chapter 2), whose elements κi j are given by the inner products ²ψ j |ψi ± composed from the set of signal states. The Gram matrix is a nonnegative-definite hermitian matrix that quantifies the correlation of the signal states and does not depend on the basis. When the signal states are pairwise orthogonal, the off-diagonal elements, κi j for i ´ = j , are zero and the Gram matrix reduces to an L × L diagonal matrix. This corresponds to classical orthogonal signaling. The elements {κ i j } of the Gram matrix K can be determined from the elements {m ik } of the signal-constellation matrix M as follows:

κi j =. ²ψ j |ψi ± = ²ψ j |²I |ψi ± ∑ = ²ψ j | ¸ k |η¹ºk ±²η»k |ψi ± = =

´

L −1 k =0

L −1 ´ k =0

²I

²ηk |ψi ±²ψ j |ηk ± m ik m ∗jk ,

(16.1.15)

where (16.1.12) has been used and k ranges over the L signal states of the signal constellation. Written in matrix form, this set of equations is

K =. M† M,

(16.1.16)

where K is the Gram matrix.

Optimal Sampling Eigenstates The solution to (16.1.16) determines M up to an overall complex scaling factor of unit norm and so determines M−1 as well. Multiplying (16.1.14) by M−1 , the optimal detection basis is

⎡ ⎢⎣

⎡ |η0 ± ⎤ .. ⎥⎦ = M−1 ⎢⎣ . |η ³ ± L

|ψ0 ± ⎤ .. ⎥⎦ . . |ψ ³ ±

(16.1.17)

L

Expression (16.1.17) is the quantum-optics equivalent to the classical expression for nonorthogonal signaling given in (10.4.8). Examining (16.1.17), each optimal sampling eigenstate |ηk ± is a superposition of the L signal states |ψ± ± in the signal constellation, with the weighting coefficients given by

826

16 The Quantum-Lightwave Channel

the elements of the kth row of M−1 . Because each sampling eigenstate is a superposition of signal states, the optimal detection operator is not, in general, a simple projection onto the subspace of a signal state at the channel output as would be the case for classical orthogonal signaling. This is a fundamental consequence of conveying classical information using a set of pairwise-nonorthogonal signal states such as coherent states.

Probability of a Detection Error The probability of a detection error is pe

= 1 − pc L −1 ´ = 1 − p± p(±|±) ±=0

=1−

´

L −1

±=0

p± | m ±±| 2,

(16.1.18)

where (16.1.13) has been used to write p (±|±) = |m ±± |2 . For some pure-state signal constellations, explicit expressions can be derived both ²n , n = 0, . . . , L−1} for the probability of a detection error and for the optimal set { D of detection operators. For other signal constellations, either an approximation to the optimal set of detection operators can be obtained or the solution to (16.1.16) can be determined by numerical methods.3

Generalized Angles

For a set of pairwise-nonorthogonal signal states, the squared magnitude of an offdiagonal element of the Gram matrix can be visualized as the squared cosine of a generalized angle θi j between a pair of signal states. Recalling that |²ψ|ψ±|2 = 1, the generalized angle is defined in terms of the inner product by the expression cos2 θi j

=. |²ψ j |ψi ±|2 = |κi j |2.

(16.1.19)

One such angle θ12 is shown in Figure 16.1. A different set of generalized angles {φ±k } is defined in terms of the elements {m ±k } of the signal-constellation matrix M . The generalized angle between a signal state |ψ± ± and the corresponding sampling eigenstate |ηk ± is given by cos2 φ±k

=. |²ηk |ψ± ±|2 = |m ±k |2 .

(16.1.20)

The generalized angles {φ±k } specify the orientations of the set of signal states {|ψ± ±} with respect to the set of sampling eigenstates {|ηk ±} in the L -dimensional signal space spanned by the set of sampling eigenstates. Using (16.1.13) and (16.1.20), the generalized angle {φ±± } defines the probability p(±|±) = cos2 φ±± of a correct state-detection event. These generalized angles are also shown in Figure 16.1. When viewed in terms of generalized angles, minimizing the probability of a detection error requires jointly 3 See Eldar, Megretski, and Verghese (2003).

16.2 State Detection for Binary Modulation Formats

827

minimizing the set of generalized angles {φ±k } between the set of signal states {|ψ± ±} and the set of sampling eigenstates {|ηk ±}. This minimization can also be regarded as ∑ minimization of the total squared error k ²ek |ek ±, where |ek ± = |ψk ± − |ηk ±.

16.2

State Detection for Binary Modulation Formats The basic case of the state detection of a quantum lightwave with a binary modulation format is the topic of this section. The general case of the binary detection of a mixed signal state is presented first. The binary detection of a general pure signal state follows. These results are applied to coherent-state modulation formats. The section concludes with a discussion of binary state detection for a quantum-lightwave channel with classical additive noise.

16.2.1

Detection for Binary Mixed-Signal-State Modulation

This section discusses the general problem of the detection for a binary modulation format4 when the signal state at the channel output is a mixed signal state with a mixture of quantum uncertainty and statistical uncertainty as described by the density matrix ρ²± . When statistical uncertainty such as noise is present, an infinite number of sampling eigenstates is required in order to express the possible measured outcomes. Therefore, the expression for a detection error is modified compared with the expression for which (ρ ²D ) (cf. only quantum uncertainty is present (cf. Section 15.4), with p(±|±) = trace ² ± ± ²0 = ²I − D²1 (cf. (16.1.8)). For this general case, the probability of a (16.1.10)) and D detection error for a binary modulation format can be written as

= 1 − p0 p0|0 − p1 p1|1 ( ) ( ) = 1 − trace p0 ²ρ0 D²0 − trace p1ρ²1 D²1 ( ) = 1 − trace p0 ²ρ0 (²I − D²1 ) + p1²ρ1 D²1 ( ) = p1 − trace p1²ρ1 D²1 − p0ρ²0 D²1 , where trace( p0ρ²0² I ) = p0 trace ² ρ0 = p0 (cf. Section 15.4), 1 − p0 = pe

(16.2.1)

p1, and the properties of the trace operation (cf. (2.1.84)) have been used. Comparing (16.2.1) with the classical expression given in (16.1.2), the term ²1 ) corresponds to the classical term ¼R1 p1 f (r |1)dr and the term trace( p1ρ²1 D ²1 ) corresponds to the classical term ¼R1 p0 f (r |0)dr . Therefore, the density trace( p0ρ²0 D matrix ² ρ1 corresponds to the classical conditional distribution f (r |1) given in (16.1.2), and the density matrix ² ρ0 corresponds to the classical conditional distribution f (r |0). This correspondence means that the observable operator p1² ρ1 − p0ρ²0 corresponds to the classical expression p1 f (r |1) − p0 f (r |0). Because the operator p1 ² ρ1 − p0 ²ρ0 is observable, it can be expressed using the pro² jection operators Pk = |ηk ±²ηk | defined by the sampling eigenstates |ηk ± of the operator 4 Detection for linearly independent M-ary mixed signal states at the channel input is discussed in Eldar

(2003).

828

16 The Quantum-Lightwave Channel

p1 ² ρ1 − p0 ²ρ0. These sampling eigenstates give the elementary projection operators ²Pk ²0 , ²D1} is constructed. This correspondence from which the set of detection operators {D leads to the decision rule given in (16.1.9). The elements of the set {|ηk ±} of sampling eigenstates are the solutions to the following eigenvalue equation:

( p1 ²ρ1 − p0²ρ0 )|ηk ± = ηk |ηk ±.

(16.2.2a)

In general, when the density matrix expresses a combination of quantum uncertainty and statistical uncertainty, (16.2.2a) has an infinite number of solutions. This means that when statistical uncertainty is included in the analysis, the set {|ηk ±} of sampling eigenstates no longer has the same size as the set {|ψ± ±} of signaling states, as would be the case were there no statistical uncertainty. Given the set {|ηk ±} of sampling eigenstates of (16.2.2a), as well as the corresponding set of eigenvalues {ηk }, the observable operator p1ρ²1 − p0 ² ρ0 can be written as an infinite ²k as given by (cf. (2.1.95)) sum of elementary projection operators P

∞ ´ p1 ² ρ1 − p0ρ²0 = ηk ²Pk .

(16.2.2b)

k =0

The expressions given in (16.2.2) are the basic equations used to construct the set of ²0 , ²D1} from the infinite set of sampling eigenstates {|ηk ±}. binary detection operators {D

Construction of the Detection Operators

The quantum-lightwave detection operator corresponding to the classical decision ²1. By analogy to the classical decision region 1 , which is constructed region 1 is D from the values of r for which the classical expression p1 f (r |1) − p0 f (r | 0) is positive, ²1 is constructed from the observable operator the quantum-optics detection operator D p1 ² ρ1 − p0ρ²0 given in (16.2.2b). Similarly to the classical case for which only the positive values of r such that p1 f (r |1) − p0 f (r |0) are used to construct the decision region 1, only those elementary projection operators ² Pk on the right side of (16.2.2b) that have positive eigenvalues are used to construct the detection operator for the decision subspace region 1 . This means that for each positive eigenvalue ηk of (16.2.2a), the corresponding elementary ²k = |ηk ±²ηk | for that sampling eigenstate is used to construct D²1 . projection operator P The complete expression for ² D1 is given by (cf. (16.1.7))

R

R

R

R

²1 = D

´²

ηk >0

Pk ,

(16.2.3)

where the notation ηk > 0 below the summation symbol indicates that only the projection operators corresponding to the sampling eigenstates of (16.2.2a) with positive ²1 . In general, this results in an infinite number of eigenvalues are used to construct D ²k . Informally, this may be seen as the quantum-optics analog of projection operators P the statement that for a noisy channel there is an infinite number of values of r in each decision region.

16.2 State Detection for Binary Modulation Formats

829

²1 is a projection of the signal state at the channel Formally, the detection operator D output onto the linear subspace spanned by those eigenstates of the observable operator p 1² ρ1 − p0 ²ρ0 that have positive eigenvalues. This leads to the rule given in (16.1.9). This rule is the quantum-optics equivalent of the classical detection expressions given in (16.1.3). The set of sampling eigenstates {|ηk ±} determined using (16.2.2a) that have ²0 , with D²0 + ²D1 = ²I . negative eigenvalues define the detection operator D Substitute (16.2.3) into (16.2.1) and take the trace. Using (16.2.2a), and noting that ²k = |ηk ±²ηk | is equal to 1, the probability of a the trace of every projection operator P detection error is pe

( ) = p1 − trace ( p1ρ²1 − p0 ²ρ0 )D²1 ´ = p 1 − ηk . ηk >0

(16.2.4)

Expression (16.2.4) is the probability of a detection error for an arbitrary binary modulation format. Different binary modulation formats are characterized by different density matrices. Therefore, the corresponding operator p1 ² ρ1 − p0 ²ρ0 for each binary modulation format may have a different set of eigenstates as given by (16.2.2a) and a different set of positive eigenvalues. These eigenvalues determine the probability of a detection error as given by (16.2.4). 16.2.2

Detection for Binary Pure-State Modulation

The density matrix for each binary pure signal state is simply one of the two outer products for that state as given by (cf. (15.4.6))

²ρ0 = |ψ0 ±²ψ0 | and ²ρ1 = |ψ1±²ψ1|. (16.2.5) When the two signal states {|ψ0 ±, |ψ1±} are orthogonal, the two optimal detection oper²0 , D²1 } are equal to the two elementary projection operators { P²0 , ²P1 } for the two ators { D pure signal states |ψ0± and | ψ1± using a sampling basis aligned with the two orthogonal signal states. For this case, the conditional probability p(k|±) of a detection error is given ²1 } projects the signal state by (16.1.13). This means that each detection operator { ² D0 , D at the channel output onto one of the two signal states {|ψ0±, |ψ1±}. This state-detection

process corresponds to classical matched filtering followed by sampling. When two signal states are nonorthogonal, the two orthogonal sampling eigenstates {|η0 ±, |η1 ±} that span the two-dimensional signal space cannot be simultaneously mapped, one-to-one, onto the set {|ψ0±,|ψ1 ±} of nonorthogonal signal states for any orientation of the sampling basis. Instead, each signal state |ψ± ± at the channel output is expressed as a linear combination of the sampling eigenstates {|ηk ±} as given by (cf. (16.1.14))

½ |ψ ± ¾ ½ 0 |ψ1 ± =

m 00 m 10

m 01 m 11

¾ ½ |η ± ¾ 0 |η1± .

(16.2.6)

Explicit expressions for the coefficients m ± k can be derived for specific pure-state modulation formats.

830

16 The Quantum-Lightwave Channel

The minimum probability of a detection error pe can be derived without using the ²k . To expansion coefficients m± k or the specific form of the optimal detection operator D do so, substitute (16.2.5) into (16.2.2a) to give p1|ψ1±²ψ1|η± − p0|ψ0 ±²ψ0|η± = η|η±,

(16.2.7)

where |η± is an unspecified sampling eigenstate with an eigenvalue η . Apply ²ψ0 | and ²ψ1 |, respectively, to both sides of (16.2.7). Writing the resulting equations in matrix– vector form gives

½ −( p + η) p κ ∗ ¾ ½ ²ψ |η± ¾ ½ 0 ¾ 0 1 01 0 (16.2.8) − p0κ01 ( p1 − η) ²ψ1 |η± = 0 , where κ01 = ²ψ1 |ψ0± is the inner product of the two pure signal states and ²ψ± |ψ± ± = 1. This matrix equation has a nonzero solution only when the determinant is zero. The quadratic equation for the determinant p0 p1|κ01| 2 − ( p0 + η)( p1 − η) = 0,

when solved for η gives the two eigenvalues

¿ Á À 2 η = ( p1 − p0) µ 1 − 4 p0 p1 |κ01| , 1 2

(16.2.9)

where p0 + p1 = 1 has been used. For an equiprobable prior, p0 = p1 = 1/2 and the eigenvalues are

À η = µ 1 − |κ01|2 Â = µ 21 1 − cos2θ = µ 21 sin θ, 1 2

.

(16.2.10)

where cos2θ = |κ 01|2 , with θ being the generalized angle between the two pure signal states defined by the inner product. This angle is shown in Figure 16.2(a). The associated . . angles φ0 and φ1 satisfying cos2 φ0 = |²ψ0 |η0±|2 = p0| 0 and cos2 φ1 = |²ψ1|η1±|2 = p1| 1 specify the conditional probabilities for a correct detection event. These two angles are also shown in Figure 16.2(a). In general, for a nonequiprobable prior with p0 not equal to p1 , substitution of the only positive eigenvalue given in (16.2.9) into (16.2.4) gives the probability pe of a bit error as Á ¿ pe

= p1 − η = 21

1−

À

1 − 4p0 p1|κ01 |2

.

(16.2.11)

For an equiprobable prior with η given in (16.2.10), pe is pe

¿

À

= 1 − 1 − |κ01| = 21 (1 − sin θ). 1 2

2

Á (16.2.12)

For a binary modulation format, a symmetric sampling basis is a basis for which the bisector of the angle between the two orthogonal sampling eigenstates {|η0 ±, |η1 ±},

16.2 State Detection for Binary Modulation Formats

(a)

| η1〉

(b)

|η 1〉

| ψ1 〉

|ψ1 〉 Detection state

|η0 〉 aligned

π/2− θ

with the signal state | ψ0〉

θ

|ψ0 〉

φ1

831

φ0

θ

|η 0〉

Figure 16.2 (a) An optimal sampling basis {|η 0±, |η1±} for a binary pure-state modulation format that is symmetric with respect to a set of signal states {|ψ0±, |ψ1±}. (b) Classical photon counting in which the sampling basis is rotated so that one sampling eigenstate |η 0± is aligned with the “space” signal state |ψ0 ±. This sampling basis corresponds to conventional photon counting and produces an asymmetric information channel.

regarded as vectors, is coincident with the bisector of the angle between the two signal states {|ψ0±,|ψ1 ±} so that φ0 = φ1 = π/4 − θ/ 2. A symmetric sampling basis produces a binary symmetric information channel with p0| 1 = p1|0 . An asymmetric sampling basis produces a binary asymmetric information channel with p0|1 ´ = p1|0 (cf. Section 14.2.4). One asymmetric sampling eigenbasis basis is shown in Figure 16.2(b). In Section 16.2.4, it will be shown that this sampling eigenbasis corresponds to conventional photon counting. 16.2.3

Detection for Antipodal Coherent-State Modulation

For antipodal coherent-state modulation, the Glauber numbers satisfy α1 = −α 0 as complex numbers, but this equality does not extend to the corresponding pair of coherent states. Thus | α1 ± ´ = −|α 0±. This is because, using (15.3.40), the inner product of the . |α 1|2 is the mean 2 = e−4Eb , where E = two antipodal coherent states is real, with κ01 b 2 number of photons per bit (cf. (15.3.24)). Substituting κ 01 = e−4Eb into (16.2.12), the probability of a detection error for an equiprobable prior in the absence of statistical uncertainty introduced by the channel is pe

Ã Â Ä = 21 1 − 1 − e−4E . b

(16.2.13)

This expression is compared with other binary modulation formats in Table 16.1. A plot of pe as a function of Eb is shown in Figure 16.3. When E b is large so that e −4E b is much smaller than one, the square-root function in (16.2.13) can be expanded in a power series to give the approximation pe

≈ 41 e−4E

b

(Eb > 1).

(16.2.14)

The argument of the exponential function in this expression is a factor of two larger than that in the expression for classical shot-noise-limited homodyne demodulation of one signal component given in (10.2.24). This means that half as many photons are

832

16 The Quantum-Lightwave Channel

Table 16.1 Probability of a detection error for several binary modulation formats and methods of detection

Binary modulation formata

Error probability ( pe )

pe for Eb

Classical shot-noise-limited homodyne demodulation

1 2 erfc 2Eb (10.2.24)

 1 e −2E 2 2π Eb

Antipodal coherent state with optimal detection On–off-keyed coherent state with optimal detection Conventional photon counting

1 2 1



1 1 2



Â

Ã

Ä

Â

Ä

Detection operator b

²1 = ¼ 0∞ |η1 ±²η1|dη1 D ²D0 = ²I − ²D1

1 − e−4Eb (16.2.13)

1 −4Eb 4e

See (16.2.18)

1 − e−2Eb (16.2.19)

1 e−2Eb 4

See Problem 5

1 −2Eb (16.2.20) and (9.5.36b) 2e

1 −2Eb 2e

1 −4Eb (16.2.24) and (9.5.43) 2e

1 −4Eb 2e

Ã

Displacement receiver

Â

¶1

²0 = |0±²0| D ²D1 = ²I − |0±²0| ²0 = |0±²0| D ²D1 = ²I − |0±²0|

a All formats are for a noise-free channel with unity quantum efficiency and an equiprobable prior. Classical formats are shot-noise limited.

OOK 0.4

Shot-noise-limited homodyne

10 –8

0.2

Coherent-state antipodal

0.2

1

Coherent-state OOK

pe

2

Shot-noiselimited homodyne 0.0 Coherent-state antipodal 0.0 0.2 0.4 0.1

–16

0.1

Displacement receiver

0.3

Displacement receiver

10 –12 10

0.5

OOK Coherent-state OOK

10 –4

pe

(b)

(a)

1

5

10

Expected number of photons Eb per bit

0.6

0.8

Expected number of photons E b per bit

1.0

Figure 16.3 The probability of a detection error using an equiprobable prior for several classical

binary modulation formats and coherent-state binary modulation formats. (a) Large mean signal levels on a log–log plot. (b) Small mean signal levels on a linear plot.

required for antipodal coherent state modulation as for shot-noise-limited homodyne demodulation to achieve the same probability of a detection error. These expressions are compared with other modulation formats in, some discussed later, in Table 16.1.

Optimal Detection Operator

The optimal detection operator ² D1 for antipodal coherent-state modulation can be determined using (16.1.15), which expresses the pairwise inner product κ i j between two signal states in terms of the elements {m ik } and {m ∗jk } of the signal-constellation matrix M. Substituting κ 01 = ²α 0| α1 ± = e−2Eb (cf. (15.3.40)) into (16.1.15) and setting L equal to two gives

κ01 = e−2E = cos θ = b

1 ´ k =0

m 0k m ∗1k

= m 00m ∗10 + m 01m ∗11,

(16.2.15a)

16.2 State Detection for Binary Modulation Formats

833

where θ is the generalized angle between the two coherent states defined by the inner product κ . Using (16.1.13) and requiring a definite outcome so that p0|0 + p1|0 = 1 generates another equation: p0|0 + p1|0

= |²η0|ψ0 ±|2 + |²η1|ψ0 ±|2 = |m 00|2 + |m 01 |2 = 1.

(16.2.15b)

Antipodal coherent-state modulation is an equal-energy modulation format. This means that p0|0 should be equal to p1|1 so that the optimal sampling basis is symmetric with m 00 = m 11 (cf. (16.1.13)). The matrix M is also hermitian, so m 01 = m ∗10. The solution for the two sampling eigenstates can be determined when m± k is real so that m 01 = m 10. Then m 00 = m 11, m 01 = m 10, and (16.2.15) can be solved to give √ √ m 00 = (1 + sin θ)/2 and m 01 = (1 − sin θ)/2, These expressions are used to form the symmetric matrix M (cf. (16.1.14)). Inverting this matrix gives

Æ Å √ √ 1 + sin θ − 1 − sin θ M−1 = √ . √ √ 2 sin θ − 1 − sin θ 1 + sin θ 1

(16.2.16)

Substituting this expression into (16.1.17) gives the two optimal sampling eigenstates as

|η0± = √ |η1± = √

Ã√

1



2 sin θ

1 − sin θ |α 1±

Ä

, Ã √ Ä √ − 1 − sin θ |α 0± + 1 + sin θ |α 1±

2 sin θ 1

1 + sin θ | α0 ± −

(16.2.17a) (16.2.17b)

up to a constant phase factor. Because each detection operator is an outer product of the sampling eigenstate with itself, the constant phase factor does not affect the form of the detection operator. Each sampling eigenstate in (16.2.17), called a cat state, is a superposition of two classically incompatible signal states | α0 ± = |α ± and |α 1± = | − α ±.5 The two cat sampling eigenstates in (16.2.17) are used to construct the two detection ²0 = |η0±²η0| and D²1 = |η1 ±²η1 |. As an example, using (16.2.17a), the operators D optimal detection operator ² D0 for the state |α 0± is

²0 D

1

Ç

(1 + sin θ)|α 0±² α0 | + (1 − sin θ)|α 1±²α 1| È − cos θ(|α 0±² α1 | + |α1 ±²α 0|) , (16.2.18) À 2 Â where sin θ = 1 − κ 01 = 1 − e−4E , and cos θ = e−2E = κ . A similar detection operator can be derived for the coherent state |α1 ±. =

2

2sin

θ

b

b

5 These sampling eigenstates are referred to as “cat” states because they are incompatible in the same sense

as the famous superposition of two states discussed by Schrödinger (1935) of a state |li ve cat ± that represents a live cat and a state | dead cat ± that represents a dead cat. Optical cat states have been generated experimentally. See Ourjoumtsev, Tualle-Brouri, Laurat, and Grangier (2006).

834

16 The Quantum-Lightwave Channel

Large-Signal Regime

When the mean number of photons Eb , which is equal to |α| 2 (cf. (15.3.24)) for either Glauber number α0 or α 1, becomes larger than one, cos θ = e−2Eb approaches zero, sin θ approaches 1, and the two coherent states become nearly orthogonal. In this large-signal regime, the probability of a detection error approaches zero (cf. (16.2.20)), with the optimal detection operator ² D0 for the state | α0 ± given in (16.2.18) approaching the operator |α 0±² α0 |. For large values of the Glauber number α0 , this operator can be regarded as a projection operator because the coherent states are nearly orthogonal in this regime (cf. Section 15.3.7). Similarly, the optimal detection operator for the state |α 1± approaches the projection operator |α 1±²α 1 |. The two projection operators used for state detection then correspond to the two classical matched filters in signal space followed by sampling that are used for classical detection (cf. (10.4.3)). The projection operators are designed to project the signal state at the channel output onto the subspace for each of the two possible orthogonal signal states at the channel input. This is similar to classical orthogonal modulation such as frequency-shift keying (cf. Section 10.1.3), but with the “matched filters” defined in a higher-dimensional signal state space. The symbol state that produces the largest detected outcome is asserted as the most likely symbol at the channel input. This equivalence provides the quantitative correspondence between optimal classical detection of symbols and optimal quantumoptics detection of pure symbol states.

16.2.4

Detection for On–Off-Keyed Coherent-State Modulation

The Glauber numbers for ideal on–off-keyed coherent-state modulation with an √ equiprobable prior are α0 = 0 and α 1 = 2Eb (cf. Figure 10.3(c)). Then | α1 − α 0| 2 = 2Eb . The probability of a detection error using a symmetric sampling basis on an ideal channel with no statistical uncertainity is (cf. (16.2.13)) pe

Ã Â Ä = 21 1 − 1 − e−2E ≈ 14 e−2E . b

(16.2.19)

b

Comparing this expression with the expression given in (9.5.36b) using a photon-optics signal model shows that the probability of a detection error using a symmetric sampling basis is a factor of two smaller than the probability of a detection error using conventional photon counting. The factor of two is explained by recalling that the probability of a detection error for photon counting given in (9.5.36b) is based on the premise that when zero photons are transmitted, zero photons are detected, so that p0| 0 = 1. This means that within quantum optics, the corresponding method of state detection uses a sampling eigenstate |η0± that is aligned with the “space” signal state |ψ0± as shown in Figure 16.2(b). Therefore, p0| 0 = |²ψ0 |η0 ±|2 = 1 and p1|1 = 1 − p0|1 = 1 − e−2Eb . The probability of a detection error for an equiprobable prior for this asymmetric sampling basis is given by pe

= 1 − pc = 1 − 21

( p + p ) = 1 e−2E , 0|0 1| 1 2 b

(16.2.20)

16.2 State Detection for Binary Modulation Formats

835

which agrees with the conventional expression given in (9.5.36b). Therefore, within quantum optics, conventional photon counting uses one sampling eigenstate aligned with the signal state for a “space” as shown in Figure 16.3(b). This produces an asymmetric information channel, with the minimum probability of a detection error typically achieved using a nonequiprobable prior (cf. Section 14.2.4).

16.2.5

Other Methods of State Detection

This section discusses other methods of state detection. These methods can be significantly less complex than the method of detection discussed in the previous subsection, but may not achieve the minimum probability of a detection error permitted within quantum optics, which is given by (16.2.11). These methods are based on wave-optics techniques, photon-optics techniques, or a mingling of these techniques that may involve feedback or feedforward.

Homodyne Demodulation

²homo given in (15.5.4) corresponds to classical The homodyne state-detection operator A shot-noise-limited homodyne demodulation of one signal component to a real-baseband signal followed by binary detection (cf. Section 10.2.5). The summation of the positive eigenvalues corresponding to a discrete set of sampling eigenstates given in (16.2.3) becomes an integral over the continuous set of positive eigenvalues for the sampling eigenstates {|ηi ±} of the homodyne demodulation operator ² A homo . The detection ²1 for homodyne demodulation can be written as operator D ²1 D

=

±∞ 0

|η1 ±²η1|dη1 .

(16.2.21)

²homo that produce posThe integral includes only the sampling eigenstates {|η1±} of A itive eigenvalues. The probability of a detection error using this detection operator corresponds to classical homodyne demodulation followed by a sampled matched filter and hard-decision binary detection with a threshold at zero. Table 16.1 compares this detection operator with several other detection operators for binary modulation formats. Figure 16.3 shows that shot-noise-limited homodyne demodulation of one signal component leads to a probability of a detection error that is larger than for optimal quantum-optics detection. Displacement Receiver

Another method combines homodyne demodulation with photon counting to produce the displacement receiver. The displacement receiver was analyzed using a mingling of wave optics and photon optics in Section 9.5.4. In this section, the same receiver is analyzed using quantum optics with the signal states shown in Figure 16.4. The incident pure signal state is homodyne-demodulated with the magnitude and phase of the local oscillator state |α LO ± matched to one of the two antipodal coherent states, |α 0 ± or |α 1±, where | α0 ± = |−α1 ±. When the other antipodal coherent state | α0 ± = |−α1 ± is incident to the asymmetric 180-degree hybrid coupler, the output state is the vacuum state

836

16 The Quantum-Lightwave Channel

| α〉 |α LO〉 Phase and amplitude adjusted

180-degree coupler

Photon-counting operator Nˆ

>
0}. The set {|ηk ±} of sampling eigenstates and the corresponding set {ηk } of eigenvalues do not depend on the basis used to express ² ρE+ N0 − ²ρN0 . The set of positive eigenvalues {ηk > 0} for (16.2.25) can be determined by expressing the density matrix ²ρE+N0 describing the signal plus noise in a photon-number representation as given by (15.6.20). This is not a diagonal matrix. The density matrix ² ρN0 describing the noise in a photon-number representation is a diagonal matrix given by (15.6.18), with the diagonal elements being the maximum-entropy Gordon distribution. When the two density matrices are expressed using a photon-number-state basis, they are infinite-dimensional because an infinite number of photon-number eigenstates is required to express classical noise. This means that, in general, the |ηk ± sampling eigenstates of (16.2.25), are infinite in number, with the corresponding set of eigenvalues {ηk } giving the possible sample values of the output of a noisy channel. Because the probability of a detection error is at most one-half, the smallest eigenvalue of the infinite-dimensional square matrix (² ρE+N0 − ²ρN0 )/2 goes to zero. Therefore, an approximation for pe can be obtained by truncating the infinite-dimensional square matrix to a finite square matrix, then summing the positive eigenvalues of that finite matrix to determine the probability of a detection error using (16.2.4). Following these steps, the probability of a detection error pe is plotted in Figure 16.5 for several values of the mean number of additive noise photons N 0. The curve for N0 = 0 is for an ideal noiseless channel given by (16.2.13). The other solid curves for nonzero N0 are determined numerically. For comparison, the probability of a detection error for classical homodyne demodulation followed by matched-filter detection given in (10.2.23) is also shown using a dashed line for several values of N 0. The pair of curves for N0 equal to 1 show little difference between optimal quantum state detection and semiclassical shotnoise-limited detection (cf. Section 10.2.5). Therefore, using an optimal state-detection operator is advantageous only when the additive-noise term N0 is significantly smaller than the inherent quantum noise of half of a photon. The sensitivity of optimal state detection to external additive noise essentially precludes any type of phase-insensitive lightwave amplification (cf. Section 7.4) if the advantages of optimal quantum-lightwave state detection are to be realized. This statement is true even when the phase-insensitive lightwave amplifier is operating at the quantum noise limit of N 0 = 1 or Namp = 0 (cf. Section 7.7). For this minimum amount of noise, the two curves for N 0 = 1 shown in Figure 16.5 are nearly coincident. For this

16.3 State Detection for Multilevel Modulation Formats

839

0.1 rorre noitceted a fo ytilibaborP

N0 = 1

10 –3

N0 = 0 N0 = 0.5

10 –5 N0 = 0.01

10 –7

N0 = 0.1

N0 = 0.05

10 –9

1

3

2

4

5

6

8

8

10

Expected number of photons Eb per bit Figure 16.5 The probability of a detection error for quantum antipodal modulation using pure

coherent states with additive noise given by (15.6.18). The expression for the N0 = 0 curve is given by (16.2.13). The dashed curves use classical homodyne demodulation given in (10.2.23).

reason, a quantum information processing system might use only one quadrature component along with a phase-sensitive measurement so as to produce the minimum level of quantum noise.

16.3

State Detection for Multilevel Modulation Formats Methods of detection for multilevel formats are an extension of the methods for binary detection given in (16.2.2). For a multilevel modulation format with L linearly independent pure signal states, the decision rule for the decision subspace region { ± : ± = 0, . . . , L − 1} is the generalized form of (16.1.6). It can be written as

R

R± =

³´ ² k

µ P±k : trace( p± ² ρ± P²±k ) > trace( p j ²ρ j P²±k ) for all j ´= ± ,

(16.3.1a)

where j = 0, . . . , L − 1. The set of sampling eigenstates {|ηk ±} is common for a set of linearly independent pure signal states (cf. Section 16.1.3). Consequently, a different set { P²±k } of elementary projection operators define each decision subspace region ±, with ²± for the ±th subspace given by (cf. (16.1.7)) the detection operator D

R

²± D

R

=

´ ² P±k .

²±k ∈ R± k:P

(16.3.1b)

The decision subspace region ± defined by (16.3.1) is the intersection of the L − 1 pairwise decision subspaces ± j for each pairing of another signal state |ψ j ± with signal state |ψ± ±. Each pairwise decision subspace region ± j specifies a subspace for which

R

R

840

16 The Quantum-Lightwave Channel

the signal state |ψ± ± is more likely than the signal state |ψ j ± for j ´ = ± . The intersection of these L − 1 subspaces defines ± as described by (16.3.1a). Referring to Figure 16.1, the multilevel detection task for a set of pure signal states at the channel output can be viewed as optimally simultaneously aligning an orthogonal set of sampling eigenstates {|ηk ±} to a set of potentially pairwise-nonorthogonal signal states {|ψ± ±}. Because this alignment is defined in a multidimensional space, the optimal alignment is much harder to determine than for the binary case. Approximations analogous to the union bound are often used.

R

16.3.1

Square-Root Detection Basis

For a set of orthogonal signal states, the Gram matrix K, whose elements are the pairwise inner products κi j = ²ψ j |ψi ± of the signal states, is equal to the identity matrix I. Because M = K1/2 as indicated by (16.1.16), this means that M = I. This is exact for orthogonal signal states. It suggests that, for a set of pairwisenonorthogonal signal states whose Gram matrix K is a small perturbation of an identity matrix I, an approximate state-detection basis can be described by the unique matrix M ³ given by

M³ = K1/2,

(16.3.2)

with nonnegative square roots of the nonnegative eigenvalues of the nonnegative-definite hermitian Gram matrix K. This approximation to M is appropriate when the signal-state pairwise inner products κi j are small. When this condition is satisfied, the states are nearly orthogonal and the Gram matrix K is nearly the identity matrix.

−d 2 / 2

For a signal constellation of pure coherent states, κi j = e i j is real (cf. (15.3.39)), where di2j is the squared euclidean distance between the Glauber numbers that specify the coherent states. When every pairwise distance di j in the signal constellation is large, the matrix M³ defined in (16.3.2) can be used to determine an approximate basis, called the square-root detection basis. Detection operators constructed using this basis are optimal for a set of symmetric coherent states. These states have the property that the eigenvalues of the Gram matrix K are equal.7 A signal constellation that satisfies this condition defines a symmetric set of signal states. To determine the form of M³ for a general Gram matrix when the eigenvalues are not equal, write the nonnegative-definite hermitian Gram matrix K as

⎡ ⎢ K = A⎢ ⎢⎣

λ1 0

.. .

0

0

λ2 .. . 0

··· ··· ... ···

⎤ ⎥⎥ † .. ⎥⎦ A , . λ 0 0

L

7 See Ban, Kurokawa, Momose, and Hirota (1997) and Sasaki, Kato, Izutsu, and Hirota (1998).

(16.3.3)

16.3 State Detection for Multilevel Modulation Formats

841

where the λk are the nonnegative eigenvalues of K, and A is a unitary matrix whose columns are the eigenvectors of K. The matrix M ³ = K1/ 2 is

M³ = ABA† ,

(16.3.4)

where the matrix B is a diagonal matrix whose diagonal elements bk are the nonnegative √ square roots λk of the nonnegative eigenvalues λ k of K.

Probability of a Detection Error

When appropriate, an approximation8 to the probability pe of a detection error for a set of pairwise-nonorthogonal signal states can be derived by using a series expansion

K = I −E + ···

to write the Gram matrix K as the difference of two terms, where I is the identity matrix and E a perturbation matrix of the off-diagonal inner products κ i j for i ´ = j that represents the pairwise-nonorthogonal nature of the set of signal states, stated here to be small. Using this expression, expand the matrix M³ in a series up to the second-order terms in E ,

M³ = K1/2 ≈ I − 21 E − 81 E 2, where all diagonal elements of E are zero. The diagonal elements m³ii of M³ up to terms in E2ii and the squared diagonal elements (m ³ii )2 are obtained as

Ã Ä = 1 − 81 E2 ii , Ã Ä (m ³ii )2 = 1 − 41 E2 ii , m ³ii

(16.3.5a) (16.3.5b)

where Eii = 0 because it describes the perturbations from an orthogonal state. Substituting this expression into (16.1.18), the probability of a detection error for a signal constellation with an equiprobable prior is pe

=1− =

1 L

L −1 ´ i =0

(m ³ii )2

1 ´Ã L −1

L

i =0

1 − (m ³ii )2

Ä

= 4L1 trace E2 L −1 ´ L −1 ´ |κi j |2 . = 4L1 i =0 j =0 j ´=i

(16.3.6)

The definition of the trace and expression (16.3.5b) are used to derive the third line from the second line. The fourth line expresses the trace of E2 in terms of |κi j | 2 using K = I − E. This expression is valid when |κi j |2 is small for all i and j . 8 This section is based on Chapter VI, Section 3(c) of Helstrom (1976).

842

16 The Quantum-Lightwave Channel

Quantum Union Bound When the signal states are coherent states κi2j

= e−d (cf. (15.3.40)) and L −1 L −1 1 ´ ´ −d e , pe ≈ 4L 2 ij

2 ij

i =0 j =0 j ´=i

(16.3.7)

which is structurally similar to the classical expression given in (10.2.11) for the probability of a detection error pe based on the union bound. Accordingly, (16.3.7) can be viewed as a quantum union bound that can be used to approximate the probability of a detection error pe for a set of coherent states based on the quantum-optics equivalent of a minimum euclidean distance between the Glauber numbers that specify the coherent states. . 2 = To show this correspondence, define the minimum squared euclidean distance dmin mini ´= j di2j for a set of coherent states in the same way that the classical minimum distance is defined in (9.2.1). For a set of large amplitude signal states, |κ i j |2 is much

−d2

smaller than 1 for all i and j . Therefore, given that κ i2j = e i j , the minimum distance dmin is larger than 1. The minimum squared euclidean distance for a set of coherent states is the minimum pairwise squared euclidean distance over the set of Glauber numbers that specify the coherent-state signal constellation. Then, the probability of a detection error for an equiprobable prior for any pure coherent-state modulation format can be approximated as

∑ L−1 n = 1/ L ±= 0 ±

pe

≈ 41 ne−d , 2 min

(16.3.8)

where n is the average number of symbols at distance dmin (cf. 2 = d012 = 4Eb (cf. (10.2.13)). For antipodal coherent-state modulation, n = 1 and dmin (15.3.40a)). For this case, the probability of a detection error given in (16.3.8) reduces to the binary case given in (16.2.14) for Eb much larger than one. Now compare the exponent in expression (16.3.8) with the exponent for classical shot-noise-limited demodulation (cf. (10.2.28)) repeated here as pe (homodyne) ≈ e−dmin / 2, 2

pe (heterodyne) ≈ e−dmin / 4. 2

The exponent of the exponential function for classical shot-noise-limited homodyne demodulation of one signal component is half the exponent given in (16.3.8) using the square-root detection basis when the minimum pairwise distance between the Glauber numbers is equal to the minimum pairwise distance between the two classical signals. Using classical shot-noise-limited heterodyne demodulation reduces the exponent by an additional factor of two. This means that in a signal regime for which a minimum distance dmin is meaningful, methods of quantum-optics detection with 3 dB less signal energy have the same probability of a detection error as that for optimal classical shot-noise-limited homodyne demodulation. With 6 dB less signal energy, they have the same probability of a detection error as that for optimal classical shot-noise-limited heterodyne demodulation. While this result may seem unexpected, it is a direct result

16.4 Quantum-Lightwave Information Channels

843

of methods of state preparation and detection based on the full set of properties of a quantum lightwave. The quantum union bound given in (16.3.8) can also be applied to other linearly independent pure-state modulation formats that do not use coherent states, provided that each off-diagonal element of the Gram matrix K is small. For this case, the term 2 . e−dmin is replaced by the minimum pairwise inner product κ min = mini , j κ i j . In summary, for a coherent-state signal constellation with a minimum distance dmin larger than one, the signal states become nearly orthogonal. For this case, the probability of a detection error using a square-root detection basis can be approximated using the quantum-lightwave equivalent of the minimum euclidean distance dmin. This quantumlightwave minimum distance is twice the minimum distance of the corresponding classical modulation format.

16.4

Quantum-Lightwave Information Channels The final four sections of the book discuss the classical information-theoretic channel capacity of a quantum-lightwave channel. The capacity is the largest rate, expressed in bits, for which it is possible to transmit classical information on a quantum-lightwave channel. This section discusses the relationship between the quantum uncertainty of a quantum-lightwave channel and the corresponding information channel. The next section, Section 16.5, discusses channel capacity for four information channels of increasing complexity. For one of these channels, the channel capacity is further studied in detail in Section 16.6. The book concludes with a discussion of the classical channel capacity of a gaussian quantum-lightwave information channel in Section 16.7.

16.4.1

Signal-Dependent Information Channels

A signal-dependent information channel was first discussed in Section 14.2.4, where it was shown that the optimal value of the threshold depends on the prior. Quantumlightwave channels are similar. The combined process of state preparation and state detection along with the mean signal level may lead to an information channel that requires a joint optimization over the prior and the method of state preparation and detection. This joint optimization leads to a “gray box” information channel. This channel is distinguished from a “black box” information channel, for which the optimization is only over the prior, with the information channel being fixed. This statement is a direct consequence of basis-dependent and signal-level-dependent quantum uncertainty, which is an inherent part of a quantum-lightwave information channel. This dependence does not occur for a classical memoryless information channel because that channel has no quantum uncertainty. When the information channel is depicted as a gray box, the computation of the channel capacity requires an optimization over the prior, and the combined process of state preparation and state detection. The difference between these two quantum-lightwave channels is illustrated in the next two subsections. The first information channel is based on a process of state

844

16 The Quantum-Lightwave Channel

preparation and state detection that uses component-symbol states. This information channel is fixed and is equivalent to a conventional memoryless classical information channel. The second quantum-lightwave information channel is based on a process of state preparation and state detection that uses block-symbol states. This information channel is depicted as a gray box that depends on the composition of the block-symbol states and the method of block-symbol state detection. 16.4.2

Component-Symbol State Preparation and Detection

The first information channel uses a state-preparation process that maps a letter into a component-symbol state. For this case, the density matrix ² ρin at the channel input is denoted as ² ρsym . The corresponding detection process uses component-symbol-state detection that maps a component-symbol state into a classical letter. The individually detected letters are used to construct a block of logical symbols that is passed to a classical decoder to determine the most likely transmitted codeword.

Classical Component Symbols and Quantum-Lightwave Component-Symbol States A quantum-lightwave information channel that uses component-symbol states is compared in Figure 16.6 to a classical memoryless information channel that uses component symbols. The attributes of the classical information channel are shown in Figure 16.6(a). The classical information channel has only statistical uncertainty. The attributes of a quantum-lightwave channel that uses component-symbol state preparation and component-symbol-state detection are shown in Figure 16.6(b). When a set of nonorthogonal signal states such as a set of coherent states is used for signaling, the quantum-lightwave information channel has additional quantum uncertainty. For a signal constellation {|ψ± ±} of pure component-symbol states (cf. (15.4.10)), which need not be orthogonal, each lightwave channel input is a pure componentsymbol state with a density matrix |ψ± ±²ψ± | and a prior probability psym (±) . The density

(a)

Classical user datastream Classical encoder

Classical user datastream

Classical Memoryless Information Channel Demodulation/ detection

Lightwave channel

Modulator

Classical decoder

(b) Classical user datastream Classical encoder

Single-Symbol Quantum-Lightwave Information Channel Single-symbol state preparation

ρsym

Quantum lightwave channel T

ρout

Single-symbol state detection Dk

Classical user datastream Classical decoder

Figure 16.6 (a) A classical memoryless information channel. (b) A memoryless quantum-lightwave channel that uses a state-preparation process and a state-detection process based on component-symbol states.

16.4 Quantum-Lightwave Information Channels

845

matrix ² ρsym of the average transmitted signal state is a statistical mixture of pure component-symbol states. It is given in (15.4.10) and repeated here as

²ρsym =

´

L −1

±=0

psym (±)|ψ± ±²ψ± |,

(16.4.1)

where the sum is over the L component-symbol states in the signal constellation.

Channel Transition Probabilities

The conditional probability p(k |±) that a classical channel input symbol s± is detected as the channel output symbol sk is discussed in Chapter 9 for real-valued symbols and in Chapter 10 for complex-valued symbols. The conditional probability p(k |±) that a quantum-lightwave channel input component-symbol state |ψ± ± is detected as the channel output component-symbol state |ψk ± is described by a component-symbol-state detection operator D²k with p (k |±) given by (cf. (16.1.10))

(

) = trace ( D² T²²ρ (±)) , k sym

²k ²ρout (±) p(k |±) = trace D

(16.4.2)

where ² ρout (±) = T²²ρsym (±) (cf. (16.1.5)). Because the channel can introduce statistical uncertainty, the density matrix ² ρout at the lightwave channel output need not describe a pure signal state. For either a classical channel or a quantum-lightwave channel, the conditional probability p(k|±) specifies the mutual information as given by (14.1.6c), which is repeated here as É Ê ´´ p(k |±) I (±; k ) = p(±) p(k|±)log ∑ , (16.4.3) j p ( j ) p (k | j )

±

k



where the posterior probability p(k) is written as j p( j ) p(k | j ). Expression (16.4.2) shows that the quantum-lightwave information channel specified by the conditional probability p(k |±) depends on the lightwave channel transformation ² T , which characterizes the statistical uncertainty in the channel, as well as on the combination of the set of signal states {|ψ± ±} used for state preparation and the ²k } used for state detection. Although our corresponding set of detection operators { D interest in this section is the component-symbol-state preparation and detection process, (16.4.2) is general and describes the classical information channel for any method of state preparation and state detection.

Component-Symbol-State Information Channel

When the information channel is defined by component-symbol-state preparation and component-symbol-state detection, the set of component-symbol states does not depend on the prior probability. Similarly, the set of component-symbol-state detection opera²k } does not depend on the prior probability. Therefore, the information channel tors { D described by the conditional probability distribution can be decoupled from the encoder and decoder. This means that the information channel is fixed and can be depicted as a black box as shown by the dashed line in Figure 16.6(b).

846

16 The Quantum-Lightwave Channel

²k } of detection For a set of orthogonal pure signal states {|ψ± ±}, the optimal set {D operators is matched to that set of states (cf. Section 16.1.4), with the state-detection process having no quantum uncertainty. The conditional probability p(k |±) for an ideal lightwave channel with ² T =² I then reduces to ( ) = trace(|ψk ±²ψk |ψ±±²ψ± |) = ²ψk |ψ±±trace(|ψk ±²ψ± |) = |²ψk |ψ±±|2 = δk ±,

p(k|±) = trace ² Dk ² ρ±

(16.4.4)

because the trace of the outer product is equal to the inner product. Here δk ± is the Kronecker impulse. For this elementary case, a correct detection event pc occurs with probability 1 when k is equal to ±. The channel capacity of this classical noiseless channel is described in the formalism of quantum optics in Section 16.5.1. When the channel transformation T² is not equal to the identity operator ² I and the symbol states in the signal constellation are not orthogonal, the quantum-lightwave channel has a combination of statistical uncertainty and quantum uncertainty. The channel capacity of that information channel is discussed in Section 16.5.4.

16.4.3

Block-Symbol-State Preparation and Detection

The second information channel replaces the state-preparation process that uses component-symbol states with a state-preparation process that uses a set of blocksymbol states constructed from a set of component-symbol states. For this case, the density matrix ² ρin at the channel input (cf. (16.1.5)) is denoted as ²ρblk. The block-symbol-state channel also replaces component-symbol-state detection by blocksymbol-state detection. This information channel, shown in Figure 16.7, replaces the information channel shown in Figure 16.6(b).

Classical Block Symbols and Quantum-Lightwave Block-Symbol States Classical block symbols are generated by the encoder, which is not considered to be part of a classical information channel. Block-symbol states in quantum optics are different. Product-State Information Channel Classical user datastream

Product-state preparation (prior and signal states) pblk(l) {ρblk(l)}

ρ blk

Quantum channel T

ρout

Block-symbolstate detection

Classical user datastream

Figure 16.7 A quantum-lightwave information channel defined using a set of block-symbol

product states and block-symbol-state detection. For this channel, the specification of the information channel is not decoupled from the encoding process that defines a statebook.

16.4 Quantum-Lightwave Information Channels

847

These states are considered to be part of the information channel because they are used to decrease the probability of a detection error caused by quantum uncertainty. A block-symbol state such as a product state is defined in an enlarged signal space compared with the signal space of a component-symbol state. Referring to (15.4.7), the density matrix ² ρblk(±) for the ±th block-symbol state used in the state-preparation process is expressed as a K -fold outer product. This pure block-symbol state is written here as the product state,

²ρblk(±) = ²ρsym(1, ±) ⊗ ²ρsym(2, ±) ⊗ · · · ⊗ ρ²sym(K , ±), (16.4.5) where the density matrix ρ²sym (i , ±) for the i th component-symbol state of the ± th blocksymbol state is chosen from a signal constellation {|ψi ±²ψi |} of L pure component-

symbol states such as a set of coherent states. Other block-symbol states can be defined using entangled states. This kind of block-symbol state is considered in Section 16.5.5. The state-preparation process generates a set {² ρblk (±)} of such block-symbol states according to a composition rule. The choice of that rule is discussed later in this section. The set of block-symbol states defined by this rule is called a statebook.9 The concept of a statebook is the concept for state preparation analogous to a codebook used in classical coding. The difference is that a statebook should adhere to an additional constraint to minimize the quantum uncertainty, which does not exist for a classical codebook.

Block-Symbol-State Information Channel Because the basis-dependent quantum uncertainty is an inherent part of the information channel, the composition rule for a statebook that minimizes the quantum uncertainty changes the form of the information channel. This means that the state-preparation process that defines the statebook is different from classical encoding and modulation. This leads to the depiction of the information channel as a “gray box” to denote the coupling of the encoding and form of information channel instead of a “black box” for which the attributes of encoding are decoupled from the information channel. This coupling is shown notionally in Figure 16.7 by the dashed line running through the state-preparation process and the corresponding block-symbol-state detection process. This dashed line indicates that the information channel should not be decoupled from the state-preparation process that defines the statebook {² ρblk (±)} and the corresponding block-symbol-state detection process that determines the most likely transmitted blocksymbol state. The channel capacity is then determined using a joint optimization over the combined state-preparation and state-detection process and the prior. The resulting average density matrix ρ²blk at the output of the state-preparation process is a mixed signal state given by (cf. (16.4.1))

ρ²blk =

´ ±

pblk (±)² ρblk (±),

where pblk (±) is the prior probability on the set statebook. 9 Also called a quantum codebook.

(16.4.6)

{ρ²blk(±)} of block-symbol states in the

848

16 The Quantum-Lightwave Channel

The information-theoretic channel capacity of this channel when the channel transformation ² T is the identity operator ² I is discussed in Section 16.5.3. The informationtheoretic channel capacity of this channel when the channel transformation ² T does not ² equal I is discussed in Section 16.5.4.

Detection of Classical Block Symbols and Quantum-Lightwave Block-Symbol States

A properly constructed statebook and method of block-symbol-state detection leads to a set of detected block-symbol states that have essentially no quantum uncertainty. This section discusses how methods of state preparation and state detection differ from classical methods of block-symbol encoding and detection for a memoryless channel. Consider a classical discrete memoryless information channel as shown in Figure 16.6(a) for which the lightwave channel adds white gaussian noise. When no symbol dependences are imposed by a code and the channel is memoryless, a block of independent symbols at the channel input produces a block of independent symbols at the channel output. When long-range symbol dependences are imposed by a large code, the symbols at the local level seen by the demodulator can be regarded as independent. At the global level seen by the decoder, there are long-range symbol dependences, but these dependences are not seen by the demodulator. For a properly designed code, the local independence of the symbols means that the joint probability density function describing the channel transition on a block of symbols can be regarded as a product distribution on the component symbols. Accordingly, for the classical memoryless information channel there is no difference in performance between separately detecting each symbol and detecting a block of symbols. Now consider the corresponding operation on a quantum block-symbol state ² ρblk(±) such as the product state given in (16.4.5). Suppose that the block-symbol state is constructed using a set of pairwise-nonorthogonal component-symbol states {² ρsym (±)} such as coherent states. The quantum uncertainty seen at the local level of a componentsymbol state arises from the nonorthogonality of the component-symbol states. This leads to quantum uncertainty at the local level of a component-symbol state characterized by the pairwise inner product κsym between component-symbol states. For component-symbol states such as coherent states, the inner product is a function only of the mean number of photons per symbol Eb (cf. (15.3.40b)). The quantum uncertainty seen at the global level of a block-symbol state is different because a block-symbol state is an object in an enlarged signal space compared with the signal space of a component-symbol state. The quantum uncertainty at the global level of a block-symbol state is characterized by the pairwise inner product κ blk between blocksymbol states in the statebook. In contrast to component-symbol states, the pairwise inner product between block-symbol states depends both on the mean signal level and on the combined process of state preparation that defines the statebook and state detection that defines the probability of a detection error. These functions comprise a quantum modem. The quantum modem has all of the functions of a classical modem, should also account for the quantum uncertainty when a block-symbol state is detected.

16.4 Quantum-Lightwave Information Channels

849

For an appropriately composed statebook and the corresponding method of blocksymbol-state detection, the quantum uncertainty in block-symbol-state detection can be made arbitrarily close to zero compared with that for component-symbol-state detection. This difference in the quantum uncertainty leads to a different information channel characterized by a different conditional probability distribution. This difference does not exist for a classical memoryless channel that has no quantum uncertainty. One method of block-symbol-state detection is the detection of the entire blocksymbol state regarded as a single entity. Other methods use a modified form of component-symbol-state detection in which the dependences of a block-symbol state are inferred using feedback or feedforward in the state-detection process. 10

Block-Symbol-State Signaling using Coherent-State Symbols

The state preparation and state detection of block-symbol states using component coherent-state symbols are the topics of this subsection. This method of blocksymbol-state signaling is compared to the method of component-symbol-state signaling discussed in Section 16.4.2. When the set of block-symbol states is constructed from component coherent-state symbols, a rule for constructing the statebook can be cast as the maximization of the minimum pairwise euclidean block distance between the sequences of Glauber numbers that specify the block-symbol states in the statebook. This statement is analogous to maximizing the minimum euclidean distance between blocks of classical symbols so as to minimize the probability of a block-detection error. This correspondence will be used herein to help describe the block detection of a block-symbol state composed of component coherent-state symbols. When the members of a set {αi } of Glauber numbers describing the component coherent-state symbols are large, the component coherent-state symbols are nearly orthogonal. For this case, the information channel defined by processing component coherent-state symbols can be regarded as the same information channel defined by processing block-symbol states because there is hardly any quantum uncertainty at the level of a component-symbol state. This is the classical result. When the members of a set {α i } Glauber numbers describing the component coherent-state symbols are small, the coherent-state symbols become pairwise nonorthogonal. Working in an enlarged signal space with more total energy for a block-symbol state than for a component-symbol state, the goal of constructing a statebook is to suppress the component-level quantum dependences described by the nonorthogonality of a set of component-symbol states. This leads to less quantum uncertainty. For a properly designed statebook with each entry {α± j } = [α 1 j , α 2 j , . . . , αk j ] described by a block of Glauber numbers, the blocksymbol states become more nearly orthogonal and have less evident quantum-level dependences, and so less quantum uncertainty. For a classical channel code, the form of Fano’s inequality given in (14.1.3) states that statistical dependences must be introduced at the block level seen by the decoder to ensure the reliable transmission of information (cf. (14.1.3)). This classical need to 10 For example, see Dolinar (1973) and Chen, Habif, Dutton, Lazarus, and Guha (2012).

850

16 The Quantum-Lightwave Channel

impose controlled dependences is modified for a quantum-lightwave channel by the use of a statebook to suppress the undesired quantum-level dependences.

Statebook Construction To illustrate a block-symbol detection operator, consider a state-preparation process that generates four block-symbol states consisting of three independent antipodal coherentstate symbols. These four blocks are used to transmit two bits of information. 11 For . . this purpose, the abbreviated notation |−± = |−α ± and |+± = |α ± will be used. The inner product of each state with itself is simply ²+|+± = ²−|−± = |α |2 = Eb (cf. (15.3.40)), with Eb being the mean number of photons per bit (cf. (15.3.24)). The inner product of two component antipodal coherent-state symbols is given by κsym =

²+|−± = ²−|+± = e− d

/2

2 e−2Eb , where dsym = 4Eb. Suppose that of the eight possible block-symbol states constructed from the three component coherent-state symbols, a statebook of four symmetric block-symbol states is used to represent two bits of information. The statebook chooses sequences of Glauber numbers that are separated by the maximum euclidean distance. Using this rule, the four block-symbol states in the statebook can be written as {|+++±, |+−−±,|−−+±, |−+−±}. For this set of symmetric signal states, the square-root detection operator is optimal.12 2 The pairwise squared euclidean distance dblk for the set of symmetric block-symbol states with coherent-state components can be expressed in the same way as the classical expression for binary phase-shift keying. This is given by (13.2.20): 2 sym

=

2 dblk

= 4Ecdmin = 4Ebdmin Rc ,

(16.4.7)

where Ec is the mean number of photons per coded component coherent-state symbol, where dmin is the minimum Hamming distance between the coded binary blocks (cf. Section 13.2.2), where R c is the code rate, and where the mean number of photons Ec per coded coherent-state symbol is given by Rc Eb . The inner product κblk between two block-symbol states is determined using the inner product κsym for the antipodal component coherent-state symbols given by κ sym =

e−dsym /2 (cf. (15.3.40b)). To generalize the expression for a symbol to a block, replace 2 between the Glauber numbers that specify the two the squared euclidean distance dsym antipodal component coherent-state symbols by the pairwise squared euclidean distance 2 dblk between the blocks of Glauber numbers {α± } that specify the block-symbol states given in (16.4.7). The pairwise inner product κ blk for the set of symmetric block-symbol states can then be written as 2

j

κblk = e−d

2 blk

/2 = e−2Eb dmin Rc

= (κsym )d

min Rc

,

(16.4.8)

where κ sym = e−2Eb is the pairwise inner product between the blocks of antipodal component coherent-state symbols (cf. (15.3.40b)). Because the inner product κ sym for the component coherent-state symbols is always smaller than one, the pairwise inner 11 This discussion is based on Sasaki, Kato, Izutsu, and Hirota (1997). 12 See Eldar and Forney (2001).

16.4 Quantum-Lightwave Information Channels

851

product κ blk for the set of symmetric block-symbol states is always smaller than the inner product κsym of component coherent-state symbols for a code with dmin Rc greater 4/3 than 1. For our example, R c = 2/3 and dmin = 2, so κ blk = κ sym .

Statebook Detection The probability of a block-symbol-state error using the square-root detection basis is determined by forming the 4 ×4 Gram matrix K defined in (16.1.15), where the elements of the matrix are the inner products of the block-symbol states instead of componentsymbol states. The elements of this matrix are determined using the inner product κblk between the symmetric block-symbol states, with κblk = 1 for k = ±. This gives

⎡ 1 κblk κblk κblk ⎤ ⎢ κblk 1 κblk κblk ⎥⎥ K=⎢ (16.4.9) ⎣ κblk κblk 1 κblk ⎦ . κblk κblk κblk 1 To determine the conditional detection probability p(k|±) = |m ±k |2 of the kth blocksymbol state at the channel output given the ±th block-symbol state at the channel input, the elements m ±k of the signal constellation matrix M that is the square root K1/2 of the Gram matrix are required. The signal constellation matrix M is determined by diagonalizing the symmetric Gram matrix K using the matrix A formed from the column eigenvectors of K. The eigenvalues are 1 − κ blk, 1 − κblk , 1 − κblk , and 3κ blk + 1, with eigenvectors (0, 0,−1, 1)T , (0,−1, 0, 1)T, (−1, 0, 0, 1)T , and (1, 1, 1, 1)T, respectively, so that

⎡ ⎢ A=⎢ ⎣

0 0 −1 1

0 −1 0 1

−1

1 0 1 0 1 1 1

⎤ ⎥⎥ ⎦.

Then AKA−1 is a diagonal matrix with the four eigenvalues on the diagonal. Taking the square root of the diagonal elements leads to

M = K1/2

⎡ √1 − κblk 0 0 √1 − κ 0 ⎢ 0 0 0 blk √ = A−1 ⎢⎣ 1 − κblk 0 0 0 √ 0 0 0 3κblk + 1

⎤ ⎥⎥ ⎦ A.

Multiplying out the matrices, the four diagonal elements m ±± of M have the same value, (√1 + 3κ + 3√1 − κ ). All off-diagonal elements m of M have the which is 41 blk (√1 + 3κ blk− √1 − κ ). Squaring each of±kthese terms and same value, given by 14 blk blk using p(k |±) = |m ±k |2 (cf. (16.1.13)), the conditional block-symbol-state detection probabilities are p (±|±) = |m ±± |2

= (1/16 )

ÃÂ

Â

1 + 3κblk + 3 1 − κblk

Ä2

for all ±,

(16.4.10a)

852

16 The Quantum-Lightwave Channel

1

Block-symbol detection

cp

0.75

noitceted tcerroc fo ytilibaborP

Individual-symbol detection

0.5

0.25

0 0.0

0.2

0.4 0.6 0.8 Expected number of photons per bit Eb

1.0

Figure 16.8 Probability of a correct detection event for block-symbol-state detection and

component-symbol-state detection as a function of the mean number of photons per bit Eb .

p(k|±) = |m k ± |2

= (1/16)

ÃÂ

1 + 3κblk −

Â

1 − κ blk

Ä2

for k

´= ±.

(16.4.10b)

For an equiprobable block-symbol-state prior pblk, the probability of a correct blocksymbol-state detection event pc is equal to p(±|±). Using κ blk = (κsym )dmin Rc (cf. (16.4.8)), dmin Rc = 4/3, and κ sym = e−2Eb (cf. (15.3.40b)) gives κ blk = e−8Eb /3 . Substituting this expression into (16.4.10a), Figure 16.8 plots the probability pc of a correct block-symbol-state detection event as a function of the mean number of photons per bit Eb .

Component-Symbol-State Detection When the state-preparation process uses component-symbol states instead of blocksymbol states (cf. Section 16.4.2), each symbol is detected separately. The corresponding block is constructed from the individually detected component coherent-state symbols. For antipodal coherent-state modulation with an equiprobable prior, the prob2 ability pe of a bit error is given by (16.2.12), with κ sym = e−4Eb (cf. (15.3.40b)). Therefore, the probability pc of a correct detection event for a block of two independent antipodal coherent-state symbols is pc

à  Ä2 = (1 − pe )2 = 41 1 + 1 − e−4E . b

(16.4.11)

This probability of a correct detection event using component-symbol-state detection is also plotted in Figure 16.8. For both methods of detection on a memoryless channel, the probability pc of a correct block-symbol-state detection event approaches 1 as Eb becomes large compared with one photon per bit because the set of component antipodal coherent states is approximately pairwise orthogonal, which results in little quantum uncertainty at the level of a single component-symbol state. This is the classical case for which

16.5 Classical Channel Capacity of a Quantum-Lightwave Channel

853

component-symbol-state detection gives the same probability of a detection error as block detection. When Eb approaches zero, the probability of a correct decision approaches 1/4, meaning that each of the four block-symbol states becomes equally likely to be detected because then there is hardly any signal. Between these two limiting values, the probability pc of a correct detection event using a statebook and block-symbol-state detection is larger than pc for component-symbol-state preparation and detection. This is because the statebook and the corresponding block-symbol-state detection operator are defined in an enlarged signal space that reduces the quantum uncertainty compared with using a component-symbol-state detection operator defined in a smaller signal space. The small improvement shown in Figure 16.8 is a consequence of the small size of the statebook and the simple composition rule used for this example. The difference in the channel capacity between an information channel defined using state-preparation and detection methods based on block-symbol states and methods based on component-symbol states is discussed in Section 16.6.3.

16.5

Classical Channel Capacity of a Quantum-Lightwave Channel The five subsections of this section discuss, in turn, the classical channel capacity of four lightwave channels of increasing complexity. In all cases, by channel capacity we mean the classical information channel capacity expressed in bits or bits per second. The first information channel to be studied is a classically ideal, noiseless, memoryless information channel based on a classical product probability distribution. This classical channel, however, is expressed within quantum optics using a density matrix for a block-symbol state composed of orthogonal component-signal states. Orthogonal signal states detected in the same basis have no quantum uncertainty. Because the channel is ideal, there is no statistical uncertainty. Therefore, the conditional entropy H (s| r) (cf. (14.1.3)) is equal to zero. For this case, the channel capacity is equal to the source entropy H (s ), but expressed in the formalism of quantum optics to facilitate comparison with other quantum-lightwave channels and as a precursor to their description. The second information channel introduces quantum uncertainty by considering a method of state preparation and state detection using block-symbol states that are product states. These product states are composed of pairwise-nonorthogonal componentsymbol states. This noiseless product-state channel is still classically ideal because no statistical uncertainty is added by the channel. However, because there is quantum uncertainty, this leads to a modified form of the classical channel capacity. This modification is expressed in terms of the von Neumann entropy S (² ρ ), which does incorporate quantum uncertainty, instead of the Shannon entropy H (s), which does not. The expression derived for this channel reduces to the expression for the first channel for a set of orthogonal signal states. The third information channel introduces statistical uncertainty within the channel. This leads to a noisy product-state channel with the signal state at the channel output being a mixed signal state instead of a pure signal state. Classically, the capacity of this

854

16 The Quantum-Lightwave Channel

channel would be described by Shannon’s channel coding theorem (cf. Section 14.1.5). For the corresponding quantum-lightwave channel, the capacity is described by the quantum-lightwave equivalent of that theorem. The fourth and most general information channel places no restriction on the process of state preparation and state detection other than the standard finite-energy constraint. This leads to an expression for the channel capacity that is markedly different from the expressions for the other three information channels because of the unique dependences permitted within quantum optics. Informally, these dependences are an irreducible form of memory. This leads to a channel capacity based on an optimization over a blocksymbol-state prior p (s) instead of a single-letter prior p(s) as would be the case were the signal states constrained to be product states.

16.5.1

An Ideal Classical Channel

The first case is an ideal classical discrete memoryless information channel. The information-theoretic capacity of this channel, limited only by the entropy at the channel input, is described in the language of quantum optics to prepare a backdrop for the discussion of the other three channels. For an ideal quantum-lightwave channel, the density matrix ² ρr that describes the channel output state before detection and decoding is equal to the density matrix ²ρs that describes the channel input signal state after state preparation. For statepreparation and state-detection processes that use a signal constellation of orthogonal, pure signal states, which need not be component-symbol states, there is no quantum ²s for the signal state |s± is a projection uncertainty. The optimal detection operator D |s ±²s| onto that state (cf. (16.2.5)). This corresponds to a classical sampled matched filter (cf. Section 9.4.2).

Conditional Probability ²s = |s±²s | for an orthogonal set of signal states, Using r = s for an ideal channel, and D the conditional probability p(r |s) that the channel output is r given that the channel input is s (cf. (15.4.32)) is

(²ρ D² ) s r = ²r| ²ρs |r± ËÌ Í = rÌr±²r|r = 1.

p(r| s) = trace

(16.5.1)

To conclude, trivially, there is no statistical uncertainty or quantum uncertainty for an ideal classical channel. This discussion provides a starting point for the introduction in this section of quantum uncertainty and statistical uncertainty for other information channels for which p (r |s) ´ = 1. The calculation of the classical information capacity requires expressions both for the entropy H (r ) of the channel output and for the conditional entropy H (r |s) describing the uncertainty added by the channel. The relationships will be written in general terms

16.5 Classical Channel Capacity of a Quantum-Lightwave Channel

855

for later use. These general expressions are then specialized to the noiseless classical channel by constraining the signal states to be orthogonal.

Conditional Entropy We start with the conditional entropy. The conditional entropy is given in (14.1.3b): H (r|s ) =

´ s

=−

p (s) H (r|s

´ s

p(s)

´ r

= s)

(16.5.2a)

p(r|s )log p(r| s).

(16.5.2b)

The prior probability distribution p(s) is defined for the channel input signal state |s± described by the density matrix ² ρs . For a classical memoryless channel, the input signal states are orthogonal. For this case, distinguishing between a set of component-symbol states and a set of block-symbol product states is unnecessary because the channel has no quantum uncertainty and so the information channels are the same (cf. Section 16.4). When the signal states are nonorthogonal, the two information channels are different. The conditional entropy H (r |s = s) is the Shannon entropy of the conditional probability distribution p(r |s). For orthogonal signal states and a classical noiseless channel p(r| s) = 1 (cf. (16.5.1)). For the general case, to be considered later, p(r| s) is not equal to one. For this case, because the channel adds no statistical uncertainity, the density matrix ² ρs is a diagonal matrix with elements p(r|s) that are the eigenvalues λ j of the density matrix ρ²s (cf. (15.4.23)). Replacing p(r| s) with λ j in the second term on the right side of (16.5.2b) gives



´ r

p (r |s)log p(r |s ) = −

´

λj log λj

j

= S(²ρs ) ,

(16.5.3)

p (s)S (² ρs ).

(16.5.4)

where S (² ρs) is the von Neumann entropy of the channel input signal state. Replacing H (r| s = s) on the right of (16.5.2a) with (16.5.3), the conditional entropy can be written as H (r|s ) =

´ s

For an ideal classical channel expression (16.5.4) equals zero. For a nonideal channel, the conditional entropy H (r|s ) is not zero. That case is considered later. The Shannon entropy H (r) at the channel output is determined using the posterior probability density function p(r). The general form of the expression for p(r) uses the conditional probability density function p (r |s) = ²r| ² ρs |r± given in (16.5.1). For the general case, p (r | s ) is not equal to 1. Using the general expression and the prior p(s), the posterior probability density function p(r) is p(r ) =

´ s

p (s) p(r|s ) =

´ s

= ²r|

p (s)²r| ² ρs |r±

´ s

p (s)² ρs |r±

= ²r| ²ρ |r±,

(16.5.5)

856

16 The Quantum-Lightwave Channel

where

ρ² =

´ s

p(s)ρ²s

(16.5.6)

is the summary expression for the density matrix describing the channel input (cf. (15.4.13)). This expression will be used for the three other information channels by replacing the density matrix ² ρs for a component-symbol state by the density matrix for a block-symbol state, which may be a product state or an entangled state.

Channel-Output Entropy The second classical term used to derive the channel capacity is the Shannon entropy H (r) at the channel output. For a set of orthogonal signal states that have no quantum uncertainty, the Shannon entropy at the channel output is equal to the von Neumann entropy (cf. (15.4.25)), H (r ) = S (² ρ ),

(16.5.7)

where the posterior probability p(r) and the average density matrix ² ρ describing the channel input are related by (16.5.5). Expression (16.5.7) states that for an ideal classical channel H (r ) = H (s) = S (ρ²). The input and output entropies are equal because the channel adds no statistical uncertainty. The Shannon entropy is equal to the von Neumann entropy because the set of orthogonal signal states adds no quantum uncertainty.

Mutual Information Using H (r ) = H (s) = S (ρ²) and (16.5.4), the mutual information I (r ; s) is (cf. (14.1.6)) I (r ; s) = H (r ) − H (r|s )

= S (²ρ ) − = S (²ρ ) = H (s)

´ s

p(s)S (² ρs )

(ideal classical channel),

(16.5.8a)

(16.5.8b)

where the second term in (16.5.8a) is zero because the von Neumann entropy for a pure signal state is zero (cf. Section 15.4.4). This expression reproduces the classical result that, for a noiseless classical channel, the capacity is equal to the Shannon entropy H (s) at the channel input, which is controlled by the probability distribution p(s) on the prior generated by the encoder. This classical result, expressed in the language of quantum optics, is developed into three other information channels by adding, in turn, quantum uncertainty, statistical uncertainty, and the use of general quantum-lightwave states, which may be entangled. 16.5.2

Holevo Information

The three remaining information channels admit pairwise-nonorthogonal signal states and thus have quantum uncertainty. This uncertainty leads to detection errors even when the channel has no statistical uncertainty.

16.5 Classical Channel Capacity of a Quantum-Lightwave Channel

857

To proceed, define the right side of (16.5.8a) as

´ χ(²I ) =. S (²ρ ) − p(s )S (²ρs) É´ s Ê ´ =S p (s)² ρs − p(s)S (²ρs), s

(16.5.9)

s

where (16.5.6) has been used for the first term on the right. While (16.5.9) was developed for a classically ideal channel, it holds in general. 13 For a channel that has quantum uncertainty, the form of the information channel depends on the method of state preparation and detection as shown in Section 16.4. Therefore, the signal state ² ρs and the associated prior p(s) in (16.5.9) may correspond to a componentsymbol state, or a block-symbol state consisting of a product state or an entangled state. Each of these methods of state preparation and state detection defines a different information channel. This expression for χ(² I ), called the Holevo information, is similar to the expression for the classical mutual information given in (14.1.5c) and repeated here: I (s ; r) = H (r) −

´ s

p(s) H (r |s

= s).

(16.5.10)

The notation χ(² I ) indicates that (16.5.9) is defined for an ideal channel with a channel transformation ² T equal to the identity operator ² I so that the density matrix ² ρout describing the channel output signal state is the same density matrix ² ρs describing the channel input signal state.

Holevo Information and Classical Mutual Information Just as the von Neumann entropy corresponds to the Shannon entropy and includes the effect of quantum uncertainty, so does the Holevo information correspond to the classical mutual information and include the effect of quantum uncertainty. To show the correspondence, consider the term on the right side of the classical expression given in (16.5.10). The first term H (r) on the right side is the uncertainty in the average channel output, whereas the second term is the average uncertainty in the channel output. The mutual information is the difference between the two terms. For an ideal classical channel with no statistical uncertainty, the second term in (16.5.10) is zero, and I (s ; r) = H (r) = H (s). Similarly, the first term, S (² ρ ), on the right side of the quantum-optics expression given in (16.5.9) is the uncertainty in the average channel output expressed by the von ∑ Neumann entropy. The second term, s p(s)S (² ρs ), is the average uncertainty in the channel output. The Holevo information is the difference between the two terms. The Holevo information differs from the classical mutual information in that the Holevo information includes the effect of quantum uncertainty. The classical mutual information does not. 13 See Holevo (1973).

858

16 The Quantum-Lightwave Channel

For an ideal channel with no statistical uncertainty and a state-preparation process that uses a signal constellation of pure signal states, the corresponding channel output signal states are pure signal states, and the second term on the right of (16.5.9) is zero. For this classically ideal channel, the Holevo information is equal to the von Neumann entropy of the average density matrix ρ (cf. (16.5.6)) at the channel input with χ(² I ) = S (² ρ ). 16.5.3

A Noiseless Product-State Channel

The expressions given in the previous subsection are now used to derive the channel capacity of an ideal quantum channel with quantum uncertainty but no statistical uncertainty. The signal states used in the state-preparation process are block-symbol states that are constrained to be pure product states. When the lightwave channel is ideal, the average density matrix ² ρout at the channel output of a quantum-lightwave channel is equal to the average density matrix ² ρblk at the channel input given in (16.4.6). It is repeated here as

ρ²blk =

´ ±

pblk(±) ² ρblk (±),

(16.5.11)

where ² ρblk (±) is the density matrix for the ±th block-symbol in the statebook given in (16.4.5), and where pblk (±) is the prior probability on that block-symbol state. Because the members of the set {² ρblk (±)} of block-symbol states are pure product states and the channel is ideal, the second term in (16.5.10) is zero. For this case, the Holevo information

χ( ²I ) = S (²ρblk ),

(16.5.12)

reduces to the von Neumann entropy of the average density matrix ρ²blk for the blocksymbol state at the channel input given in (16.5.11).

Marginalization to a Component-Symbol State The maximization of the von Neumann entropy S (² ρblk ) at the channel output is determined by reasoning similar to that used in the classical case studied in Section 14.1.5. Each pure block-symbol state ² ρblk(±) in the statebook has no statistical uncertainty. When connected to an encoder, which is internal to the state-preparation process (cf. Figure 16.7), the block-symbol states are now distributed randomly as described by the average density matrix ² ρblk given in (16.5.11). Define ² ρsym (i ) as the average density matrix for the ith component-symbol state of the average block-symbol-state density matrix ρ²blk. The density matrix for the componentsymbol state is determined by taking a partial trace (cf. (15.4.9b)) over the density matrix for each block-symbol state ² ρblk (±) in the statebook. For a block-symbol product state, the partial trace gives ρ²sym (i , ±) , which is the density matrix for the ith componentsymbol state of the ±th block-symbol state in the statebook. The average density matrix ² ρsym (i ) for the i th component-symbol state is determined using the block-symbol-state prior pblk(±) . It is given by

²ρsym(i ) =

´ ±

pblk(±) ² ρsym (i , ±).

(16.5.13)

16.5 Classical Channel Capacity of a Quantum-Lightwave Channel

859

This operation is equivalent to the classical marginalization (cf. (14.1.15)) of a product distribution to a single component. The proper statebook construction produces a set of nearly orthogonal block-symbol states with a pairwise inner product that can be made arbitrarily close to zero as the length of the block-symbol states in the statebook goes to infinity (cf. Section 16.4.3). The “marginalized” density matrix ² ρsym (i ) given in (16.5.13) for component∑ every symbol state then approaches the average density matrix ρ² = p ( s ) ρ ² describing s s a single component-symbol state (cf. (16.5.6)). This means that the “marginalized” density matrix ² ρsym (i ) for every component-symbol state is the same and can be written as

²ρsym(i ) =

´ s

p (s)² ρs ,

(16.5.14)

where ² ρs = |ψs±²ψs | is the density matrix of a pure component-symbol state, and where p(s) is the single-symbol (or single-letter) prior probability distribution on that pure component-symbol state. Expression (16.5.14) is analogous to the classical case for which a properly designed code results in the same marginal probability distribution for every component of the codeword (cf. Section 14.1.5).

Single-Letter Capacity Using the additivity property of the von Neumann entropy for product states given in (15.4.26), the von Neumann entropy S (² ρblk ) of the statistical mixture of block-symbol states of length K is given by

S (ρ²blk) = K S (² ρ ),

(16.5.15)

where ² ρ is the average density matrix of a component-symbol state given in (16.5.6). Maximizing the von Neumann entropy S (ρ²blk) for the block-symbol state to achieve the channel capacity is now reduced to maximizing the von Neumann entropy S (² ρ ) of the density matrix ² ρ that describes a component-symbol state, as would be the case for a classical memoryless channel. Accordingly, the single-letter classical channel capacity of an ideal quantumlightwave channel using product states is determined using a joint maximization of the Holevo information as given by C

= { pmax S (² ρ) (s), ²ρ }

(ideal product-state channel).

(16.5.16)

In this expression χ(² I ) = S (² ρ ) because the channel is ideal and has no statistical uncertainty. The notation { p (s), ² ρ} denotes that the maximization is over both the prior p(s) on the set of component-symbol states and the state-preparation and state-detection process. Because the channel capacity involves a joint optimization, the methods of state preparation and state detection used to achieve the capacity need not be same as those that minimize the probability of a detection error by minimizing the quantum uncertainty. For an information channel defined using nonorthogonal component-symbol states, the optimization over the single-letter prior on the component-symbol states maximizes the statistical uncertainty that can convey classical information. The optimization

860

16 The Quantum-Lightwave Channel

over the state-preparation process and the detection process minimizes the quantum uncertainty for each information channel specified by the prior. For a set of orthogonal symbol states for which S (² ρ ) = H (s) (cf. (15.4.25)), expression (16.5.16) reduces to an ideal classical information channel with the single-letter capacity given by C

= max H (s) p(s)

(ideal classical channel).

(16.5.17)

In this case, the maximization is over only the prior p(s) for the channel input symbol states because there is no quantum uncertainty in the state-detection process. 16.5.4

A Noisy Product-State Channel

The third information channel removes the constraint that the channel has no statistical uncertainty. The resulting noisy product-state channel corresponds to a classical noisy channel. The introduction of statistical uncertainty means that the channel transformation operator ² T is no longer the identity operator ² I . The density matrix ² ρout at the channel output ² ² is now given by T ² ρ, with the Holevo information χ(T ) given by (cf. (16.5.9))

( ) ´ ( ) χ( ²T ) =. S ²T ²ρ − p(s)S ²T ²ρs ,

(16.5.18)

s

where ρ² and ² ρs are related by (16.5.6). For this case, the second term on the right is now nonzero because each channel output (symbol state ² T² ρs is a mixed signal state ) that has a nonzero von Neumann entropy S ² T² ρs . This term corresponds to the classical conditional entropy H (r |s) (cf. (16.5.4)) and quantifies the reduction in the Holevo information for a noisy channel.

Noisy Channel Coding Theorem for Product States The capacity of a noisy product-state channel is described by a form of Shannon’s noisy channel coding theorem (cf. Section 14.1.5) modified for quantum optics. This theorem states that, for an information rate R in bits per second smaller than the channel capacity C , a statebook exists for pure channel input signal states such that the probability of a block-symbol detection error can be made arbitrarily small. The statebook is not specified by the theorem. For an information rate R larger that C , no statebook exists that can achieve an arbitrarily small value of the block-symbol error rate. These statements are not proved here.14 The proof follows the same general line of reasoning as the noiseless product-state channel discussed in Section 16.5.3. However, the details of the proof are more involved because the expression for the probability of a block-symbol-state detection error for mixed signal states (cf. Section 16.2.6) that have a combination of quantum uncertainty and statistical uncertainty is more complicated than the expression for the probability of a block-symbol-state detection error for pure signal states (cf. Section 16.3.1), which have only quantum uncertainty. 14 For a proof see Schumacher and Westmoreland (1997), and Holevo (1998a).

16.5 Classical Channel Capacity of a Quantum-Lightwave Channel

861

Table 16.2 Expressions for the single-letter classical information capacity of several product-state quantum-lightwave information channels

Type of channel

Orthogonal component symbol states (no quantum uncertainty)

Nonorthogonal component symbol states (quantum uncertainty)

Ideal channel (no statistical uncertainty)

C

= max H (s)

C

= max S (²ρ )

Nonideal channel (statistical uncertainty)

C

= max I (s; r)

C

= max χ(²T )

Single-Letter Capacity

The statistical uncertainty introduced by the channel means that the second term on the right side of (16.5.18) is nonzero. Accounting for this nonzero term, the single-letter channel capacity C for a noisy product-state channel can be written as C

= { pmax χ(T²) ( s), ρ²}

(noisy product-state channel) ,

(16.5.19)

where χ( ² T ) ≤ S (² T² ρ ). This is because the channel output states are now mixed so that the second term on the right of (16.5.18) is nonzero. When ² T = ² I and the channel is ideal, (16.5.19) reduces to (16.5.16). Expressions for the single-letter capacity for orthogonal component-symbol states and pairwise-nonorthogonal component-symbol states are summarized in Table 16.2 both for an ideal channel and for a nonideal channel. 16.5.5

The General Quantum-Lightwave Channel

The fourth and final information channel of this section is a general memoryless quantum-lightwave information channel that places no restrictions on the set of signal block-symbol states used to convey classical information. These block-symbol states include entangled states with dependences that cannot be expressed using product states. In the context of describing the resulting information channel, the dependences in an entangled block-symbol state may be viewed as an inherent form of memory that exists even when the channel itself is memoryless. This kind of memory does not exist for a classical information channel or a quantum-lightwave channel constrained to use block-symbol states that are product states. Accounting for this inherent form of memory requires an optimization over a prior p(s) for a set of block-symbol states {|ψs±} for K uses of the channel, where s denotes the entire block-symbol state. This optimization is different than the optimization over a prior p(s) on a single use of the channel, denoted by s, as would be the case when the block-symbol state is constrained to be a product state. This is because the von Neumann entropy of the density matrix ² ρs for a general block-symbol state |ψs ± no longer need be K times the entropy of a component-symbol state as would be the case were the block-symbol state a product state (cf. (16.5.15)). This maximization is analogous to

862

16 The Quantum-Lightwave Channel

a classical information channel whose transition probabilities cannot be expressed as a product distribution.15

Single-Letter Capacity

Because there may be inherent “memory” in the block-symbol states, the single-letter capacity must be determined using a limit. To proceed, let χ(T²⊗ K ) be the Holevo information for K uses of the channel. Generalizing (16.5.19), the capacity C K for K uses of the channel can be written as CK

( ²⊗ ) = { pmax χ T , ( s), ²ρ } K

s

(16.5.20)

where the maximization is over the set {² ρs } of all possible density matrices describing a set of input blocks s of length K and the associated prior p (s) on that block. The blocks are constrained so that the total mean number of signal photons Etotal in K ² ²K =. ∑±= a block is finite. Defining N 1 N± as the photon-number-state operator for the input block, this finite-mean constraint can be written as

( ρ² ) ≤ E . total s

²K trace N

(16.5.21)

The capacity C K for K uses of the channel is determined by calculating the capacity for each block of length K . The single-letter capacity C is defined as the limit of CK / K as K goes to infinity, with C

=.

=

1 CK K →∞ K ( ⊗K ) 1 max χ ² T . lim K →∞ K { p(s), ² ρs } lim

(16.5.22)

When the capacity C of a general quantum-lightwave channel is equal to the capacity C of a product-state channel, the additivity property of the von Neumann entropy holds (cf. (15.4.26)). This means that compared to a product-state channel, the additional dependencies evident in an entangled block-symbol state are not needed to achieve the channel capacity. Quantum-lightwave channels that have this property are discussed in Section 16.7. For these quantum-lightwave channels, there exists a natural correspondence between a memoryless product-state channel and a classical memoryless information channel. For other quantum-lightwave channels, the additivity property may not hold. For these channels, the capacity is not achieved by using product states and there is no simple correspondence between the quantum-lightwave information channel and a classical information channel based on a product probability distribution.

16.6

The Ideal Quantum-Lightwave Channel Exact expressions for the classical information capacity of some quantum-lightwave channels are known. This section discusses one case, which is the capacity of an 15 See Verdu and Han (1994).

16.6 The Ideal Quantum-Lightwave Channel

863

ideal quantum-lightwave channel with Tˆ = Iˆ. This channel adds no statistical uncertainty such as noise, but may still have detection errors caused by quantum uncertainty when the set of signal states is pairwise nonorthogonal. For such an ideal channel, the Holevo information is equal to the von Neumann entropy (cf. (16.5.12)). The capacity is achieved by maximizing the von Neumann entropy S (ρ²) at the channel input by a joint maximization over the state preparation, state detection, and the single-letter prior p(s) (cf. (16.5.16)). The upper bound of the von Neumann entropy is the Shannon entropy H (s) (cf. (15.4.25)) of the diagonal. Because quantum optics is a discrete-energy model, the Shannon entropy H (s) for a set of discrete-energy signal states subject to a finite mean constraint is maximized using a Gordon distribution (cf. (6.1.6)). Accordingly, this suggests that the single-letter classical information capacity of an ideal quantumlightwave channel is achieved when S (² ρ ) = H (s).16 This capacity is equal to the capacity of a discrete-energy photon-optics channel (cf. (14.2.7)), which is repeated here: C

¿ Á = log(1 + E) + E log 1 + E1 = g(E),

(16.6.1)

where g (x ) is the Gordon function defined in (14.2.3). A simple heuristic argument justifying this statement can be developed by considering a set of orthogonal nonclassical photon-number states. When ideal photon counting is used for the detection of a photon-number state, there is no quantum uncertainty. Moreover, for an ideal channel, there is no statistical uncertainty. These conditions result in a classical ideal channel with the capacity given by the Shannon entropy H (s) at the channel input, which for a discrete-energy channel, is maximized using a Gordon distribution.

16.6.1

Capacity for Conventional Binary Modulation Formats

The classical information capacity of several ideal quantum-lightwave channels using various conventional binary modulation formats is discussed in this section. Channel input signal states |r0± and |rθ ±, which need not be coherent states, have probabilities p and 1 − p, respectively. The density matrix of this mixed signal state is given in (15.4.16), with the eigenvalues given in (15.4.17) and repeated here:

λ0,1 ( p, κ) =

1 2

¿

Á ( ) 2 1 µ 1 + 4p κ (1 − p) + p − 1 , À

(16.6.2)

where κ = ²r0 |rθ ± = cos θ is the inner product between the two signal states, with θ being the generalized angle between the two states. For an asymmetric binary information channel, κ may be a function of the prior p, as is the case for a classical asymmetric binary information channel (cf. (14.2.14)). 16 For a formal proof, see Yuen and Ozawa (1993).

864

16 The Quantum-Lightwave Channel

Statistical and Quantum Uncertainty

For a classical signal, the prior p determines the Shannon entropy given in (14.1.1). For a quantum-lightwave signal, the eigenvalues of a density matrix determine the von Neumann entropy given in (15.4.23). These eigenvalues depend both on the prior p, which characterizes the statistical uncertainty, and on the inner product κ , which characterizes the quantum uncertainty. Determining the channel capacity in the presence of both forms of uncertainty requires a joint optimization both over the prior p and over the inner product κ . The optimization over the prior maximizes the statistical uncertainty that can be used to convey classical information in the form of bits. The optimization over the inner product κ , when possible by an alignment of the measurement basis, minimizes the quantum uncertainty. The von Neumann entropy is determined using (15.4.23) and is

S (² ρ ) = S ( p, κ) = −λ0 ( p, κ)log λ0( p , κ) − λ1( p , κ)log λ1( p, κ),

(16.6.3)

where λ0,1 ( p, κ) are the eigenvalues of ² ρ given in (16.6.2).

Orthogonal Signal States When the signal states are orthogonal, the inner product κ is zero and there is no quantum uncertainty. For this case, the eigenvalues given in (16.6.2) are p and 1 − p. The corresponding information channel is classical, with the von Neumann entropy S (² ρ) reducing to the classical binary entropy Hb ( p ) (cf. (14.1.8)),

S (² ρ ) = Hb( p) = − p log p − (1 − p)log(1 − p).

The maximum entropy, achieved when p entropy of a binary symmetric channel.

(16.6.4)

= 1/2, equals one bit. This is the classical

Nonorthogonal Signal States When the signal states are pairwise nonorthogonal, the inner product κ is not zero. Then there is quantum uncertainty. When the inner product κ does not depend on the prior p, the maximum value of the von Neumann entropy again occurs when p = 1/2. However, when quantum uncertainty is present, the maximum entropy does not necessarily equal one bit as would be the case for a classical channel. This is because the von Neumann entropy monotonically decreases as κ increases. For κ = 1, the eigenvalues given in (16.6.2) are λ0 = 0 and λ1 = 1 so that the von Neumann entropy given in (16.6.3) is equal to zero (cf. (15.4.23)). This simply states that no information can be conveyed when the two signal states are coincident because the states cannot be discriminated at the receiver.

Antipodal Coherent-State Modulation

For antipodal coherent-state modulation, the inner product between two signal states is κ = e−d 2/2 = e−2Eb (cf. (15.3.40b)). This expression does not depend on the prior. The resulting information channel is a binary symmetric channel. The channel capacity is achieved for an equiprobable prior with p = 1/2. Using κ = e−2E b and p = 1/2, the

16.6 The Ideal Quantum-Lightwave Channel

1.0

Eb = 1

(a)

Eb = 0.2

0.4

Eb = 0.1

)stib( yportnE

)stib( yportnE

0.6

Eb = 0.5

0.6

Eb = 0.2

0.4

Eb = 0.1

0.2

0.2 0.0

Eb = 1

(b)

0.8

Eb = 0.5

0.8

0.0

0.2

0.4

0.6

0.8

0.0

1.0

865

0.0

0.2

0.4

Prior probability p1

0.6

0.8

1.0

Prior probability p1

Figure 16.9 (a) The von Neumann entropy for antipodal coherent-state modulation for a mark as a

function of the prior probability p1 for a logical “one” for several values of the mean signal Eb per bit. (b) The von Neumann entropy for coherent-state on–off-keyed modulation.

eigenvalues given by (16.6.2) are λ0,1 = 21 (1 µ e−2E b ). Substituting these values into (16.6.3), the channel capacity is equal to the von Neumann entropy as given by

= S (²ρ ) = Hb

C

Ã1 Ã 2

1 + e−2E b

ÄÄ

,

(16.6.5)

where the usual binary entropy function Hb (ρ) is given in (16.6.4). The channel capacity for antipodal coherent-state modulation depends on the mean number of photons Eb per bit, which expresses the pairwise-nonorthogonality of the signal states described by the inner product κ and the corresponding quantum uncertainty. Figure 16.9(a) plots the von Neumann entropy for antipodal coherent-state modulation as a function of the prior probability p for several values of Eb . The curves are scaled versions of the binary entropy function Hb ( p ) with the entropy being a symmetric function of the prior probability p. The maximum value of S (² ρ ) depends on the signal level Eb, and can be less than one bit, as is evident in Figure 16.9(a). The reduction is a consequence of the pairwise nonorthogonality of coherent states for small Eb , which increases the quantum uncertainty. For large Eb , the entropy (16.6.5) reduces to Hb (1/2), which is equal to one bit. This is the classical result obtained for orthogonal signal states that have no quantum uncertainty.

On–Off-Keyed Coherent-State Modulation

For ideal on–off-keyed coherent-state modulation, no signal is sent for a space, and so E0 = 0. The resulting information channel is asymmetric. This leads to the inner product κ depending on the prior probability for a mark p1 because the mean number of photons per bit Eb is p0 E0 + p1 E1 = p1 E1, where E1 is the mean number of photons for a mark (cf. (14.2.16)). Therefore, for on–off keying, the squared euclidean distance between the Glauber numbers that specify the two on–off-keyed coherent states depends on the prior probability p1 for a mark and is given by d2

=

ÃÂ

E1



Ä2 Â Ä2 ÃÂ E0 = Eb / p1 − 0 = Eb / p1.

(16.6.6)

866

1.2 1.0

16 The Quantum-Lightwave Channel

(a)

OOK (p1 = 1/2)

)stib( yticapaC

)stib( yticapaC

0.0 10–2

Unconstrained Antipodal

0.6

0.2

(b)

Unconstrained

0.8

0.4

10

Antipodal

1

10 –1

OOK (Optimal prior)

10–1 1 Expected number of photons Eb per bit

10

10 –2

OOK (Optimal prior)

10–2

OOK (p 1= 1/2)

1 10 –1 Expected number of photons Eb per bit

10

Figure 16.10 (a) The hard-decision capacity of an ideal quantum-lightwave channel as a function

of the mean number of photons Eb per bit for several coherent-state modulation formats. Also shown is the unconstrained capacity C given in (16.6.1) and the capacity for on–off-keyed coherent-state modulation using an equiprobable prior. (b) The same curve on a logarithmic scale.

Using κ = e−d / 2 (cf. (15.3.40b)) with d 2 = Eb / p1 gives κ = e−Eb /2 p1 . Substitute this expression into (16.6.2), and then use the resulting eigenvalues in (16.6.3). Figure 16.9(b) hard-decision plots the resulting von Neumann entropy for on–off-keyed coherent-state modulation as a function of the prior probability p1 for a mark for several values of the mean number of photons per bit Eb . In contrast to antipodal coherent-state modulation, the dependence of the inner product on the prior probability p1 leads to an asymmetric information channel that is similar to a conventional asymmetric photon-counting channel shown in Figure 14.6. The von Neumann entropy for the optimal prior, which need not be equiprobable, is plotted for several binary modulation formats in Figure 16.10. Referring to that figure, the capacity of antipodal coherent-state modulation is always larger than the capacity of on–off coherent-state modulation for any prior and for any signal level. This statement differs from the semiclassical case with no additive noise shown in Figure 14.7. For the semiclassical case in a small-signal regime, the channel capacity for an information channel using photon counting and an optimal prior is larger than the capacity of an information channel that uses semiclassical shot-noise-limited homodyne demodulation. The difference between the semiclassical case and the quantum-optics case is attributable to the performance of the optimal detection operator given in (16.2.18) in a small-signal regime being better than that of semiclassical homodyne demodulation/detection (cf. Table 16.1). 2

Small-Signal Regime In a small signal regime, Eb is much smaller than 1, and S (ρ²) given in (16.6.5) can be expanded in a power series. Using the first two terms of this expansion gives

S (ρ²) ≈ Eb (1 − log Eb )

(Eb ¸ 1),

(16.6.7)

which is asked for as an end-of-chapter exercise. This expression is equal to the smallsignal limit of the entropy of a Poisson probability mass function (cf. (14.2.17)), which,

16.6 The Ideal Quantum-Lightwave Channel

867

in turn, is the small-signal limit of the entropy of a Gordon distribution. Therefore, the capacity of antipodal coherent-state modulation in a small-signal region approaches the capacity of a Poisson channel.

Large-Signal Regime In a large-signal regime, the coherent states are nearly pairwise orthogonal. The maximum von Neumann entropy given in (16.6.5) approaches one bit and is achieved by an equiprobable prior, as is evident in Figure 16.9(a). As an example, using (16.6.5) with a mean signal Eb of one photon per bit, the entropy S (² ρ ) is 0.987 bits. For mean signal levels significantly larger than one photon per bit, the pairwise nonorthogonality of the coherent states is not evident. This leads to a von Neumann entropy that is essentially equal to the classical Shannon entropy of one bit. For this modulation format, a semiclassical analysis of the information channel that is based on wave optics and transition probabilities is appropriate when the mean signal level is somewhat larger than one photon per bit. This statement, derived for a quantum-lightwave information channel based on density matrices, mirrors several previous statements for other aspects of lightwave communications. In summary, a semiclassical analysis is appropriate when the mean number of signal photons per bit is somewhat larger than one. The two limiting cases of the channel capacity provide illuminating examples of the remarkable dual properties of coherent states. In the limit of large mean signal levels, the capacity of an ideal channel based on coherent states is appropriately modeled using classical information theory based on continuous wave optics and transition probability densities. In the limit of small mean signal levels, the capacity of an ideal channel based on coherent states is appropriately modeled using classical information theory based on discrete photon optics with transition probability mass functions. These statements are derived from the common mathematical formalism of quantum optics. 16.6.2

Capacity using Component-Symbol-State Detection

When the state-detection process uses component-symbol states, the classical information capacity can be determined using semiclassical methods by replacing the probability of a bit error p using semiclassical detection with the probability of a bit error p using quantum-optics detection. The capacity of an ideal channel for a binary modulation format using hard decisions depends on whether the information channel is symmetric or asymmetric. While the probability of a detection error p depends on the signal model, the form of the resulting classical information channel does not. For a discrete symmetric information channel, the hard-decision capacity is given by (14.3.10), C

= 1 − Hb ( p) = 1 + p log p + (1 − p)log(1 − p).

(16.6.8)

Substituting the two expressions for the probability of a detection error p, in one case for semiclassical shot-noise-limited homodyne demodulation, and in the other for antipodal

868

16 The Quantum-Lightwave Channel

1.4 (a)

1.0

0.6 0.4

Von Neumann entropy (16.6.5) Coherent-state antipodal (hard-decision)

0.2 0.0 10 –2

1 Photon counting (optimal prior)

–1

10 –3 10

Von Neumann entropy (16.6.5)

Photon counting (optimal prior)

C

Coherent-state antipodal (hard-decision) Homodyne (shot-noise-limited)

10 –2

Homodyne (shot-noise-limited)

10–1 1 Expected number of photons Eb per bit

10

)stib( yticapaC

)stib( yticapaC

0.8

(b)

C

1.2

10–3

10–2 10 –1 1 Expected number of photons Eb per bit

10

Figure 16.11 (a) Capacity for binary semiclassical and coherent-state modulation formats. The

plot includes the capacity C given in (16.6.1) as well as the von Neumann entropy bound for the soft-decision detection of antipodal coherent-state modulation (cf. (16.6.5)). (b) The same curves on a log–log plot.

coherent-state modulation (cf. Table 16.1) into (16.6.8), leads to the capacities of these two symmetric modulation formats using hard decisions, as are plotted in Figure 16.11. For an asymmetric information channel, the calculation of the capacity requires an optimization over the prior. Figure 16.11 also plots the hard-decision capacity for an asymmetric information channel that uses photon counting with a nonequiprobable prior that maximizes the mutual information (cf. Section 14.2.4 and Figure 14.4). The figure suggests three regimes of interest. In the large-signal regime for which Eb is much larger than one photon per bit, the most significant term in (16.6.1) is the wave-optics term Cw = log(1 + Eb ), which is equal to the wave-optics capacity per symbol with a noise source equivalent to one photon (N 0 = 1). It is also the quantumnoise limit for the joint detection of both signal components (cf. Section 15.5.3). Most current lightwave communications systems operate in this regime. In this large-signal regime, the difference between Cw and the capacity for the other binary modulation formats shown in Figure 16.11 is the potential improvement that can be obtained by using a nonbinary modulation format with a more complex method of detection such as block-symbol detection. For the intermediate-signal regime defined by 0 .1 ± Eb ± 1, antipodal coherentstate modulation with an optimal hard-decision detection operator (cf. (16.2.18)) and an equiprobable prior achieves the largest capacity of any of the binary modulation formats that have been considered. The difference shown in Figure 16.11 between this hard-decision capacity and the capacity based on the von Neumann entropy given in (16.6.5) is the potential improvement that can be obtained using a more complex detection operator such as a joint-detection operator (cf. Section 16.4.3). In this intermediate-signal regime, the discrete-energy property of a lightwave signal becomes evident. The capacity using photon counting is achieved using a nonequiprobable prior and exceeds the capacity for semiclassical shot-noise-limited homodyne demodulation. The trend for which the discrete-energy property of a lightwave conveys a larger portion of the information becomes more pronounced in a small-signal regime defined

16.6 The Ideal Quantum-Lightwave Channel

869

by Eb ± 0. 1. In this regime, the hard-decision detection capacity for photon counting with an optimal nonequiprobable prior exceeds the hard-decision detection capacity for the other channels considered. This includes antipodal coherent-state modulation with an equiprobable prior (cf. Figure 16.11(b)). These statements suggest that in a small-signal regime, hard-decision detection favors asymmetric “particle-like” formats such as photon counting over symmetric “wave-like” formats such as antipodal coherent-state modulation because hard-decision detection is more “particle-like.” The same effect occurs in the small-signal regime of a classical information channel using hard-decision detection (cf. Figure 14.7). These statements are based on hard-decision detection. The soft-decision capacity based on the von Neumann entropy given in (16.6.5) and shown in Figure 16.11 shows that for any signal level, antipodal coherent-state modulation with an optimal method of state detection yields a higher capacity than any of the hard-decision detection channels considered in this section.

16.6.3

Capacity Using Block-Symbol-State Detection

When the state-detection process uses block-symbol states, the classical information capacity is not the same as that of an information channel that uses component-symbol states. Specifically, Section 16.4.3 shows that the information channel using blocksymbol states is different than the information channel using component-symbol states. This section determines how this difference affects the hard-decision capacity of the information channel. The example discussed in Section 16.4.3 has an information channel defined by a state-preparation process and a state-detection process that uses block-symbol states. Two bits are encoded using four block-symbol states, with each block-symbol state composed of three binary component-symbol states that are antipodal coherent states. The hard-decision block-symbol detection capacity of this symmetric information channel is determined by setting L = 4 in (14.3.9b). For a block-symbol state composed of three binary component-symbol states, the capacity C3 is C3

= log2 4 +

4 ´ i =1

p(i |1)log p(i |1)

= 2 + p(1|1)log p(1|1) + 3 p(2|1)log p(2|1), (16.6.9) where p(2|1) = p(3|1) = p(4| 1) for a symmetric information channel. The subscript

3 on C denotes that the capacity is defined for block-symbol states composed of three binary component-symbol states using the block-symbol-state detection operator for the set of four such block-symbol states described in Section 16.4.3.

Single-Letter Capacity using Block-Symbol States

The single-letter capacity of an information channel that uses three-letter blocksymbol-state detection is given by C = C3/3. The conditional probabilities p(i |1)

870

16 The Quantum-Lightwave Channel

1.0

(a)

0.8

Component detection Entropy bound

2/3

Block detection

Entropy bound

0.2

10 –1 1 Expected number of photons Eb per bit

10

0

Block detection

Component detection

0.1

0.2 10–2

(b)

0.3

0.4

0

0.4

)stib( yticapaC

)stib( yticapaC

0.6

1

Block detection is optimal

0

Component

detection is optimal

0.05 0.10 0.15 Expected number of photons Eb per bit

0.20

Figure 16.12 (a) The single-letter capacity C for two information channels. The first channel uses

component-symbol-state detection. The second channel uses block-symbol-state detection. The von Neumann entropy bound given in (cf. (16.6.5)) is also shown. (b) An expanded view of the shaded region in (a) near E b = 0 on a linear scale.

for i = 1, . . . , 4 for the block-symbol-state detection operator required to determine the capacity can be expressed in terms of the pairwise block-symbol-state transition probabilities given in (16.4.10). Suppose that a statebook is constructed such that the block-symbol states all have the same pairwise inner product κ blk equal to the square of the component-symbol state 2 inner product κ sym . This means that κblk = κsym = e−4Eb , where κsym = e−2Eb is the pairwise inner product between two component-symbol coherent states (cf. (15.3.40b)). Substituting κblk = e−4Eb into (16.4.10) gives the transition probabilities p(i | 1) for i = 1, . . . , 4. Substituting the transition probabilities p(i | 1) into (16.6.9) and scaling the result by 1/3 gives the single-letter capacity using block-symbol-state detection. This single-letter capacity is plotted in Figure 16.12 as a function of the mean number of photons Eb per binary component-symbol coherent state.

Comparison with the Channel Capacity Using Component-Symbol-State Detection

The channel capacity of the information channel defined using component-symbol states and component-symbol-state detection is determined using the probability of a detection error given in (16.2.13). The corresponding capacity of the resulting binary symmetric information channel is given in (16.6.8) and plotted in Figure 16.12. The information channel using component-symbol states does not construct a statebook as part of the state-preparation process. The resulting information channel can be depicted as a black box with the encoder external to the information channel (cf. Section 16.4.2). The information channel using block-symbol states does construct a statebook as part of the state-preparation process. The encoding of this statebook is part of the information channel (cf. Section 16.4.3). This difference does not occur for a classical memoryless channel described by a product distribution for which the information-theoretic channel for either method of signaling is the same and can be depicted as a black box with the encoder external to the information channel.

16.7 The Gaussian Quantum-Lightwave Information Channel

871

The difference in the capacity for the two methods of signaling is evident for small Eb and a large probability pe of a detection error. In this regime using block-symbol

states leads to a larger single-letter capacity than using component-symbol states. This increase is shown in Figure 16.12(b) for Eb ± 0.15. The advantage of using block-symbol states for a quantum-lightwave channel decreases in a large-signal regime because the component coherent-state symbols used in our example become nearly pairwise orthogonal, leading to the probability pe of a symbol-state-detection error approaching zero (cf. (16.2.11)) because the channel adds no statistical uncertainty. In this regime, the capacity of an information channel using block-symbol states is reduced by one bit because only half of the eight possible block-symbol states are used in the statebook. The corresponding single-letter capacity is reduced by one-third as is shown in Figure 16.12(a). Neither method of signaling approaches the von Neumann entropy bound for antipodal coherent-state modulation (cf. (16.6.5)), which is also shown in Figure 16.12. This difference is the theoretical improvement obtainable by using a different statebook.

16.7

The Gaussian Quantum-Lightwave Information Channel A gaussian quantum-lightwave information channel is a basic model of a lightwave communication system. This section discusses such a channel and outlines the derivation of the single-letter capacity for three versions of a gaussian information channel. The channel capacity for each channel requires a separate proof. The common features of the channel capacity for these three gaussian channels is the topic of this section.

16.7.1

Gaussian Channels

A gaussian quantum-lightwave channel preserves the characteristics of a gaussian signal state. Referring to Section 15.6.5, a gaussian signal state is completely described by a real covariance matrix S (cf. (15.6.25)). For a gaussian channel, the covariance matrix of the signal at the channel input and the covariance matrix of the signal at the channel output are related by a channel matrix Hs (cf. Section 8.1.3). Similarly, for an external noise process such as thermal noise or lightwave amplifier noise that is modeled as a gaussian state N, the input and output covariance matrices are related by a channel matrix Hn of the noise. The total covariance matrix (cf. (14.4.6)) describing the gaussian signal state at the channel output can be written as

R = Hs S HTs + Hn N HTn. (16.7.1) The channel matrices Hn and Hs need not be equal because the channel may affect the

signal and noise in different ways. Moreover, thermal noise within quantum optics is treated differently than classical additive noise. 16.7.2

Phase-Insensitive Gaussian Channels

A phase-insensitive gaussian quantum-lightwave channel corresponds to a classical passband lightwave communication channel corrupted by circularly symmetric gaussian

872

16 The Quantum-Lightwave Channel

noise. Depending on the noise model, the interaction between the signal and the noise may be treated as a coupling process involving a set of external modes or as a simple additive process for which the noise is superimposed on the signal. Because the channel is phase-insensitive, the effect of the channel can be described by diagonal channel matrices Hs and Hn of the form Hs = h s I and Hn = h n I, where h s and h n are scalars. When there is no signal at the channel input, the channel input state is a vacuum state (cf. (15.3.31)) described by a density matrix ² ρvac = |0±² 0|. Using (15.6.27), the variance 2 2 of the vacuum state is 2σs = 1/2 or σs = 1/4. (cf. (15.3.17b)), where σs2 corresponds to the variance of a circularly symmetric gaussian state measured using homodyne state detection of one signal component (cf. Figure 15.13). For a channel input described by vacuum state ² ρvac, the transformation describing a phase-insensitive gaussian channel simply changes the variance of that input state. Accordingly, the variance 2σr2 of the circularly symmetric gaussian signal state at the channel output, given a vacuum state at the channel input, can be written as 2σr2

= 2σs2|hs |2 + 2σn2|hn|2 = 21 |hs |2 + 2σn2 |hn |2,

(16.7.2)

where, because 2σs2 = 1/2, the variance of the channel input vacuum state is equivalent to half of a photon. The term |h s|2 is the channel signal gain. The term |hn |2 is the channel noise gain, which depends on the noise model. Because the displacement of the vacuum state does not affect the von Neumann entropy of that state as expressed by the covariance, this expression is also valid for a coherent state. This statement in quantum optics is equivalent to the classical statement that adding a bias to a circularly symmetric gaussian distribution does not affect the Shannon entropy. Expression (16.7.2) is now used to describe three phase-insensitive gaussian channels: a thermal noise channel, a phase-insensitive lightwave amplifier channel, and a classical additive noise channel. The thermal noise channel and the phase-insensitive lightwave amplifier channel are modeled as a coupling process between a signal state and a set of external noise states. The classical additive noise channel is different. It is modeled by randomly displacing the signal state in phase space by a circularly symmetric gaussian random variable. This noise model corresponds to classical additive noise.

Thermal-Noise Channel

A thermal noise channel model couples the channel input state with a set of external states in thermal equilibrium. This produces a mixed signal state at the channel output. The density matrix describing this thermal noise state in a photon-number-state representation is given by (15.6.18) with the mean number N0 of thermally generated photons given by (6.1.8). The variance of a thermal-noise state at the channel output is then a circularly symmetric gaussian signal state with 2σn2 = N 0 + 1/2. This variance differs from the classical variance by the fundamental quantum uncertainty of half of a photon. When there is a vacuum state with no signal at the channel input, the thermal noise channel combines the external thermal noise state with the channel input vacuum state. The coupling process is modeled with a channel signal gain | hs| 2 equal to η and a channel noise gain | hn | 2 equal to 1 − η , where the parameter η determines the amount of

16.7 The Gaussian Quantum-Lightwave Information Channel

873

thermal noise that is coupled to the signal. Substituting these expressions into (16.7.2), the variance of the circularly symmetric gaussian state at the channel output, with a vacuum state at the channel input, is 2σr2

Ã Ä = 21 η + (1 − η) N0 + 21 = (1 − η)N0 + 21 .

(16.7.3)

When the circularly symmetric gaussian signal state at the channel output is expressed in a photon-number-state representation, the diagonal density matrix ² ρN0 describing that state is given by (15.6.18). The diagonal elements of that density matrix comprise a Gordon probability distribution with mean (1 −η)N0 . Therefore, for a thermal noise channel ( ρ ) of described by the channel transformation T²th , the von Neumann entropy S T²th ² vac the circularly symmetric gaussian signal state at the channel output given a vacuum state ²ρvac at the channel input is equal to the von Neumann entropy of a Gordon probability distribution (cf. (6.1.6)) at the channel output. This expression is given by

(

S T²th ² ρvac

)= g

((1 − η)N0 ) ,

(16.7.4)

where g (x ) is the Gordon function (cf. (14.2.3)).

Phase-Insensitive Lightwave Amplifier Channel

A phase-insensitive lightwave channel that includes a phase-insensitive lightwave amplifier consists of a coupling process that combines, then amplifies, both a signal state and a lightwave amplifier noise state. With lightwave amplifier gain G (cf. Section 7.7), the channel signal gain |hs |2 equals G, and the channel noise gain | hn | 2 is G − 1 (cf. (7.7.7)). Excluding quantum noise, the phase-insensitive lightwave amplifier noise term is Namp (cf. (7.7.5)). The variance of the corresponding circularly symmetric gaussian noise state at the channel input is 2σ n2 = Namp + 1/2, which has the same form as the thermalnoise state with N0 replaced by N amp. Using (16.7.2), the variance 2 σr2 of the circularly symmetric gaussian signal state at the channel output given a vacuum state with no signal at the channel input is

|h s|2 + 2σn2 |h n|2 Ã Ä = 12 G + Namp + 12 (G − 1) = (Namp + 1)(G − 1) + 21 .

2σ r2 =

1 2

(16.7.5)

The mean number of photons at the amplifier channel output is given by (N amp + 1)( G − 1) = Nsp in accordance with (7.7.7). The phase-insensitive lightwave amplifier channel is described by the transformation ² Tamp . Following the same reasoning used for the thermal noise channel, the von ( ρ ) of the circularly symmetric gaussian signal state at Neumann entropy S T²amp ² vac the amplifier channel output given a vacuum state ρˆvac with no signal at the amplifier channel input is

874

16 The Quantum-Lightwave Channel

(

Tamp² ρvac S ²

) = g ((N + 1)(G − 1)) ( amp ) = g Nsp ,

(16.7.6)

where g (·) is the Gordon function (cf. (14.2.3)). Quantum-limited performance is achieved for N amp = 0. This statement is the quantum-lightwave equivalent of (7.7.10), which states that Nsp ≈ G for a high-gain (T ²ρ ) ≈ g(G ). This means phase-insensitive lightwave amplifier. For this case, S ² amp vac that the entropy at the channel output of the lightwave amplifier channel given no signal at the channel input is the entropy of a Gordon distribution with a mean equal to the amplifier gain G.

Classical Additive-Noise Channel

The quantum-lightwave equivalent of a classical additive noise channel is a signal channel with gain |hs |2 equal to 1 and a noise channel gain | hn | 2 also equal to 1. When both the channel signal gain | hs |2 and the channel noise gain |hn |2 are set equal to 1, the resulting channel model can no longer be described as a coupling process between a signal state and external noise states. Instead, when |hs |2 = |hn | 2 = 1, the resulting channel model describes classical noise that is superimposed on the signal. Because the classical noise is added directly to the signal instead of being coupled from an external noise state, the variance of the noise for the circularly symmetric gaussian signal state at . the channel output is defined as 2σn2 = N 0. This expression excludes a factor of half of a photon. This contrasts with coupling the noise from a separate mode with its own quantum noise, as was the case for the two other phase-insensitive channels. This channel corresponds to a classical additive-noise channel. Within quantum optics, a classical additive-noise channel can be viewed as randomly displacing the signal state at the channel output in phase space (cf. Figure 15.3) by a circularly symmetric gaussian random variable with variance 2 σ 2 = N0 without adding additional quantum uncertainty. The variance 2σr2 of the circularly symmetric gaussian signal state at the channel output given a vacuum state with no signal at the channel input is 2σr2

= N0 + 1/2.

(16.7.7)

The mean number of photons N 0 at the channel output is equal to a classical additivenoise source. Therefore, for a phase-insensitive additive-noise channel described by the ( T ²ρ ) of the circularly symmetric transformation ² Tadd, the von Neumann entropy S ² add vac gaussian signal state at the channel output given a vacuum state ρ vac with no signal at the channel input, is

(

S T²add ² ρvac

) = g(N ).

where g (·) is the Gordon function (cf. (14.2.3)).

0

(16.7.8)

16.7 The Gaussian Quantum-Lightwave Information Channel

16.7.3

875

Capacity for a Phase-Insensitive Gaussian Channel

The expression for the classical capacity of a phase-insensitive gaussian channel is described in this section, but a full proof is not provided. 17 For a block-symbol state consisting of K channel uses, the general expression for the capacity is given by (16.5.20) and repeated here: CK

where

( T²⊗ ) , = { pmax χ ( s), ²ρ } K

s

( ) ( ) ´ ( ) χ T²⊗ = S ²T ⊗ ²ρ − p(s)S ²T ⊗ ρ²s K

K

K

(16.7.9)

s

is the generalization of (16.5.19). The proof of the capacity of the phase-insensitive gaussian channel is outlined below.

Additivity Property of the Entropy

The first step of the full proof shows that, for a phase-insensitive gaussian quantumlightwave channel, the entropy is additive. For the classical case, the capacity of a phase-insensitive gaussian channel is achieved using a circularly symmetric gaussian probability distribution (cf. Section 14.3). This is a product distribution for which the entropy is additive (cf. Section 6.6.1). By analogy, the capacity of a phase-insensitive gaussian quantum-lightwave channel can be achieved using a statebook composed of block-symbol states that are circularly symmetric gaussian product states. Applying the mean-energy constraint given in (16.5.21) to a block-symbol state leads to the mean number of photons Etotal in the block-symbol state being K times the mean number E of photons in a component-symbol state. Because the capacity is additive in the entropy when the block-symbol( state is ) a product state, the maximum von Neumann entropy S max of the first term S ² T ⊗K ² ρ in (16.7.9) is K times the entropy of each component-symbol state, so that

(

) = K S (T²²ρ ) ,

Smax T²⊗ K ² ρ

E

(16.7.10)

where ρ²E is the density matrix describing a component-symbol state of the block-symbol product state at the channel input. For a phase-insensitive gaussian channel described by the channel transformation ²T , the maximization is achieved when each component-symbol state used to form a block-symbol state in the statebook at the channel input is a circularly symmetric gaussian symbol state with a mean number of photons E and variance 2σE2 = E + 1/ 2 (cf. (15.6.26)). When the channel is a phase-insensitive gaussian channel, the entropy at the chan( ρ ) is determined from the variance 2σ 2 of the circularly symmetric nel output S T²² E r gaussian signal state at the channel output. Using (16.7.2) for a nonzero channel input, 17 For details of the proof see Giovannetti, García-Patrón Sánchez, Cerf, and Holevo (2014).

876

16 The Quantum-Lightwave Channel

(² ) (² ) ²ρ − ²ρ

Table 16.3 Expressions for the entropy at the channel output for three phase-insensitive gaussian channels. The S T vac . single-letter capacity C is S T E

Channel model

Expected signal level

Expected noise level

Total output ( ) T ρ²E entropy S ²

Thermal noise (² Tth )

ηE

(1 − η)N0

Amplifying (T²amp )

GE

( G − 1)Namp

( ) g η E + (1 − η)N0 ( ) g GE + (Namp + 1)(G − 1)

Additive noise (² Tadd )

E

N0

g (E + N0 )

Output ( entropy )

ρvac for S T²²

vacuum state ρ²vac

(

)

g (1 − η)N0 (16.7.4)

g(N0 ) (16.7.8)

(

)

g (Namp +1)( G − 1) (16.7.6)

replace the variance of 1/2 for a vacuum state by the variance 2 σE2 = E + 1/2 of the channel input state. This replacement gives the variance 2σr2 of the channel output as 2σ r2

= (E + 1/2)|h s|2 + 2σn2 |h n|2 . (16.7.11) The channel output variance σr2 depends on the channel gain of the signal | hs |2 , the channel gain of the noise | hn |2 , and the variance 2 σn2 of the noise. The total von Neu(T ²ρ ) for a component-symbol state at the channel output given a mann entropy S ² E

circularly symmetric gaussian component-symbol state at the channel input for each of the three phase-insensitive gaussian channels is listed in Table 16.3. That table also lists the von Neumann entropy at the channel output given a vacuum state at the channel input described by ρ²vac.

Minimum Entropy at the Channel Output

The second step of the proof addresses the second term on the right side of (16.7.9), which appears with a minus sign. To maximize the capacity requires minimizing this term. For a phase-insensitive gaussian channel, the minimum von Neumann entropy at the channel output is K times the entropy when the channel input is a vacuum state. 18 This statement appears self-evident because a vacuum state at the channel input that has zero photons would seem to produce the minimum entropy at the channel output. However, the unique dependences admitted within quantum optics might lead to a signal state at the channel output with less entropy than K times the entropy of a vacuum state. The proof shows that no such state exists. Indeed, the minimum entropy at the channel output is equal to K times the entropy when the channel input is a vacuum state. This implies that the minimum of the second term of (16.7.9) is given by min

É´ s

Ê ( ) p (s)S T²⊗ ρ ²s = K S (²T ²ρvac ) , K

(16.7.12)

where the minimization is constrained over states that have a finite mean energy (cf. (16.5.21)). 18 The proof of this reasonable conjecture was the key step in the 2014 proof.

16.7 The Gaussian Quantum-Lightwave Information Channel

877

Single-Letter Capacity

Combining (16.7.12) with (16.7.10), the capacity for K uses of the channel given in (16.7.9) can be bounded as

( ( ) ( )) ≤ K S ²T ²ρE − S T²²ρvac , with the single-letter capacity C = C / K given as (T ²ρ ) − S (T²²ρ ) . C≤S ² E vac CK

(16.7.13a)

K

(16.7.13b)

The equality can be achieved using a circularly symmetric gaussian prior α on the set of Glauber numbers that describe a set of input coherent-symbol states19 subject to the mean energy constraint given in (16.5.21). The optimal prior is a circularly symmetric gaussian distribution with the probability distribution for the squared magnitude | α| 2 = E given as an exponential probability density function. Expressing the optimal prior in a photon-number-state representation gives a Gordon distribution (cf. (6.3.11)). Expressions for each term on the right side of (16.7.13b) for the three phase-insensitive gaussian channels considered in this section are provided in Table 16.3.

Phase-Insensitive Quantum-Lightwave Channels and Semiclassical Channels

The channel capacity derived using quantum optics is compared in this section with the channel capacity derived using photon optics and wave optics. The channel capacity derived using photon optics in Chapter 14 treats the Gordon distribution as a composite distribution that incorporates both quantum uncertainty and statistical uncertainty. The resulting expression for the capacity given in (14.2.5) has the same form as the capacity of the classical additive-noise phase-insensitive gaussian channel given in the third row of Table 16.3. This means that the bandlimited capacity (cf. Section 14.4.4) and the spectral rate efficiency (cf. Section 14.5.2) derived from photon optics can be directly applied to the classical additive-noise phase-insensitive quantum-lightwave gaussian channel considered in this section. Both for the phase-insensitive gaussian quantum channel studied in this section and for the semiclassical phase-insensitive channel studied in Chapter 14, the classical information capacity depends only on the signal energy and the noise energy. As a consequence of the phase-insensitive nature of the channel, the classical phase is uniformly distributed. The classical wave-optics channel capacity is the limiting form of the expression listed on the last line in Table 16.3 when the discrete nature of a lightwave is not evident. This occurs as the spacing between the discrete energy levels goes to zero (cf. Section 6.2.1). For this case, the capacity given by C = g(E + N0) − g (N 0) (cf. (14.2.5)), where g(x ) is the Gordon function, approaches the capacity C = He ( E + N0)− He ( N0 ) . for wave optics (cf. (14.2.20b)), where He (x ) = 1 + log x is the entropy function of an exponential distribution. Accordingly, the wave-optics channel capacity for a 19 See Holevo and Werner (2001).

878

16 The Quantum-Lightwave Channel

phase-insensitive gaussian channel can be determined by replacing the Gordon function g (x ) by He (x ). In the small-signal limit, the function g(x ) approaches H p (x ), where . H p (x ) = x (1 − log x ) is the small-signal limit of the entropy of a Poisson distribution. The resulting photon-optics channel capacity for a phase-insensitive gaussian channel can be determined by replacing g( x ) by H p (x ). Modifying the signal level and the noise level leads to the capacity for the other two channels considered in this section. Therefore, one might hope that all three channel capacities can be derived from a common underlying framework. However, that underlying framework is currently unknown. This means that the single-letter capacity, the bandlimited capacity, and the spectral rate efficiency for each instance of a phase-insensitive gaussian quantum channel must be proved separately.

General Quantum-Lightwave Channels

The correspondence between a phase-insensitive gaussian quantum-lightwave information channel, a classical wave-optics channel based on a gaussian probability density function in a large-signal regime, and a photon-optics channel based on a Poisson probability distribution in a small-signal regime is insightful, but incomplete. The correspondence provides insight because many concepts in classical information theory based on transition probabilities can be directly applied to a phase-insensitive quantumlightwave gaussian channel. Moreover, this understanding leads to a classical framework based on the Poisson transform that connects the channel capacity based on waves with the channel capacity based on particles. The understanding is incomplete because quantum optics admits a much larger class of channels, and methods of state preparation and state detection, than do the classical methods of encoding/modulation and demodulation/detection used in photon optics and wave optics. Understanding the unique properties of this larger class of signals, the relationship of these properties to classical random processes, and the practical situations for which these unique properties can be used to efficiently convey classical information suggests future research directions for quantum information theory.

16.8

References Quantum detection theory is developed in the works by Yuen, Kennedy, and Lax (1975), Helstrom (1976), and Holevo (2011). Quantum receivers for coherent-state modulation are presented in Osaki, Ban, and Hirota (1996), with quantum quadrature phase-shift keying discussed in Bondurant (1993) and in Becerra, Fan, Baumgartner, Polyakov, Goldhar, Kosloski, and Migdall (2011). Suboptimal binary receivers are detailed in Kennedy (1973a) and in Takeoka and Sasaki (2008). Implementations have been reported in Wittmann, Andersen, Takeoka, Sych, and Leuchs (2010) and in Becerra, Fan, Baumgartner, Goldhar, Kosloski, and Migdall (2013). The optimality of distance-based detection operators was derived in Ban, Kurokawa, Momose, and Hirota (1997) and in Sasaki, Kato, Izutsu, and Hirota (1998), and generalized in Eldar and Forney (2001). Numerical methods for optimal state-detection operators are detailed in

16.10 Problems

879

Eldar, Megretski, and Verghese (2003). Background material on quantum information theory can be found in Nielsen and Chuang (2000) and in Vlatko (2006). Properties of the von Neumann entropy are presented in Wehrl (1978). The effect of the detection operator on the capacity is discussed in Sasaki, Kato, Izutsu, and Hirota (1997, 1998). The classical information capacity of an ideal quantum-lightwave information channel is discussed in Lebedev and Levitin (1966), in Yuen and Ozawa (1993), in Caves and Drummond (1994), and in Weedbrook, Pirandola, García-Patrón, Cerf, Ralph, Shapiro, and Lloyd (2012). The capacity of a lossy gaussian quantum-lightwave information channel is discussed in Giovannetti, Guha, Lloyd, Maccone, Shapiro, and Yuen (2004) and in Shapiro (2012). Quantum limits on the entropy of bandlimited radiation are discussed in Franceschetti (2017).

16.9

Historical Notes Quantum detection theory was developed by Helstrom (1976) and by Holevo (1982, 2011). The Holevo information was proposed independently in Forney (1963) and in Gordon (1964) as the bound on the classical information that can be conveyed in an quantum channel. This bound was proved by Holevo (1973); it was subsequently reproved and extended to derive the channel capacity in the papers by Hausladen, Jozsa, Schumacher, Westmoreland, and Wootters (1996) on the noiseless product-state channel, and by Schumacher and Westmoreland (1997) and Holevo (1998a) on the noisy product-state channel. The capacity of the quantum-lightwave equivalent of a classical additive white gaussian noise channel was first conjectured in Gordon (1964). This work was based on superimposing quantum gaussian noise on a classical signal. The capacity for this specific channel was given in Holevo (1998b). At that time, the mathematical formalism of a general phase-insensitive gaussian channel was not fully developed. For this channel, the minimum-output entropy conjecture is required to derive the capacity. The nontrivial proof of this conjecture was published over 30 years later in Giovannetti, García-Patrón Sánchez, Cerf, and Holevo (2014). This proof, in turn, led to the formal proof of Gordon’s initial conjecture, which was extended to several other phase-insensitive gaussian channels.

16.10

Problems 1 Probability of error for quantum and classical systems Show that, for an ideal channel with an orthogonal signal constellation of quantum signal states, the probability of a detection error using a set of optimal sampling eigenstates is zero. 2 Probability of error for orthogonal states This problem compares classical orthogonal signals to quantum orthogonal signal states. An example of orthogonal quantum-lightwave signals are polarization states or nonoverlapping temporal states. Derive the large-signal limit for the probability

880

16 The Quantum-Lightwave Channel

of a detection error for L-level orthogonal state modulation when all pairwise distances between the symbols are equal and d 2 = 2E, where E is the mean number of photons per symbol. 3 Optimal orientation of a binary sampling basis (a) Referring to Figure 16.2(a), define 2 . Derive an expression for the probability of a correct decision pc for binary pure-signal-state modulation in terms of and the generalized angle 1 shown in Figure 16.2(a). (b) Determine the maximum probability of a correct decision pc by differentiating this expression with respect to 1 and setting the resulting expression equal to zero. (c) Show that the resulting probability of error pe is given by the same expression as (16.2.12).

ζ = π/ − θ

ζ

φ

φ

4 Detection operators for quantum on–off keying (a) Following the same steps as in the previous problem, and using e−Eb , determine the optimal sampling eigenstates and the corresponding detection operators for quantum on–off-keyed modulation. 1 (b) Sketch the signal states and the sampling eigenstates for Eb 2.

κ =

=

5 Homodyne detection operators Starting with (16.2.21), write the form of the component-symbol-state detection operator using homodyne demodulation for the following modulation formats. (a) A symmetric L -ary pulse-amplitude modulation (PAM) format with the mean number E of photons per symbol. (b) A symmetric L -ary quadrature-amplitude modulation (QAM) format with a mean number E of photons per symbol in which each signal component is separately demodulated to baseband using homodyne demodulation. 6 Methods of detection Referring to Table 16.1, compare the error performance of classical homodyne detection with that of a displacement receiver. Determine the range of the mean number of photons per bit Eb for which one detection technique outperforms the other method of detection. 7 Von Neumann entropy Suppose the density matrix of an ensemble of two pure signal states is given as

Å √ p(1 − p)κ Æ p . ²ρ = √ , p(1 − p)κ 1− p . where p is the prior, and where κ = ²ψ0 |ψ1± is the inner product between the two pure signal states with κ real. (a) Determine an expression for the von Neumann entropy of this density matrix. (b) Compare this result with the von Neumann entropy of the density matrix given in (15.4.16). How are they related? Why?

881

16.10 Problems

8 Von Neumann entropy Using the relationship between a density matrix and a probability distribution for a set of orthogonal signal states, show that when the signal states are pairwise orthogonal, the Holevo information given in (16.5.9) and repeated below,

χ É´ Ê ´ . χ =S p(s)² ρs − p(s)S (²ρs ), s

s

is equal to the Shannon entropy H (s ). 9 Von Neumann entropy A signal state is given by

|ψ± = 12 |0± + 12 (cos θ|0± + sin θ|1±). Determine the corresponding density matrix ρˆ . Determine the eigenvalues of ρˆ .

(a) (b) (c) Derive an expression for the von Neumann entropy as a function of θ . (d) Determine the value of θ that maximizes the entropy. (e) Plot the von Neumann entropy for 0 < θ < 2π and demonstrate that the result determined in the previous step is correct. 10 Small-signal expansion of the von Neumann entropy The von Neumann entropy of an antipodal coherent state is given by (16.6.5) and is repeated here

S (² ρ ) = Hb

Ã1( 2

1 + e−2Eb



= −(1 − e−2E )log(1 − e−2E ) − (1 + e−2E )log(1 + e−2E ). Using the power-series expansion log(1 + x ) = x − x 2/2 + O (x 3 ) and e x = 1 + x + x 2 /2 + O (x 3 ), show that the small-signal limit of the von Neumann entropy b

is given by

b

b

b

S (ρ²) ≈ Eb (1 − logEb ),

which is (16.6.7). This expression is the small-signal limit of the entropy of a Poisson distribution discussed in Problem 14.5. 11 Coherent states versus wave-optics waveforms Classical wave-optics waveforms based on signal vectors can be interpreted as the large-signal limit of a coherent state. Chapter 14 showed that no classical wave-optics modulation format can achieve the single-letter photon-optics capacity in a small-signal limit as shown in Figure 14.4. Reconcile this statement with (16.6.7), which shows that coherent states can achieve the capacity in a small-signal regime. 12 Codeword detection This problem works through the steps used to derive the probability of a blocksymbol state detection error given in Section 16.4.3.

882

16 The Quantum-Lightwave Channel

(a) Starting from the Gram matrix K given in (16.4.9), derive the matrix A and its inverse A−1 . (b) Using these expressions, show that the matrix M can be written as

⎡a 1⎢ b M= ⎢ ⎣ 4

À

À

b b

b a b b

b b a b

b b b a

À

⎤ ⎥⎥ ⎦

2 + 3κ 2 + 1 and b = 3 κ 2 + 1 − where a = 3 1 − κ blk blk blk (c) Using the result from part (b), derive (16.4.10).

À

2 . 1 − κ blk

Bibliography

Abramowitz, M. and I. A. Stegun. Handbook of Mathematical Functions. Dover, New York, NY, 1965. Adler, R., D. Coppersmith, and M. Hassner. Algorithms for sliding block codes – an application of symbolic dynamics to information theory. IEEE Transactions on Information Theory, 29(1):5– 22, 1983. Agazzi, O. E., M. R. Hueda, H. S. Carrer, and D. E. Crivelli. Maximum-likelihood sequence estimation in dispersive optical channels. Journal of Lightwave Technology, 23(2):749–63, 2005. Agrawal, G. P. Semiconductor Lasers. Van Nostrand Reinhold, New York, NY, 1993. Agrawal, G. P. Nonlinear Fiber Optics. Academic Press, San Diego, CA, 2008. Agrawal, G. S. Quantum Optics. Cambridge University Press, Cambridge, 2013. Agrell, E. and M. Karlsson. Power-efficient modulation formats in coherent transmission systems. Journal of Lightwave Technology, 27(22):5115–26, 2009. Agrell, E., A. Alvarado, and F. R. Kschischang. Implications of information theory in optical fibre communications. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 374(2062):20140438, 2016. Ahmed, S., T. Ratnarajah, M. Sellathurai, and N. Cowan. Reduced-complexity iterative equalization for severe time-dispersive MIMO channels. IEEE Transactions on Vehicular Technology, 57(1):594–600, 2008. Al-Dhahir, N. and A. H. Sayed. The finite-length multi-input multi-output MMSE-DFE. IEEE Transactions on Signal Processing, 48(10):2921–36, 2000. Alferov, Zh. I., V. M. Andreev, E. L. Portnoi, and M. K. Turkan. AlAs–GaAs heterojunction injection lasers with a low room-temperature threshold. Soviet Physics Semiconductors, 3:460–3, 1970. Ali´c, N., G. C. Papen, R. E. Saperstein, L. B. Milstein, and Y. Fainman. Signal statistics and maximum likelihood sequence estimation in intensity modulated fiber optic links containing a single optical preamplifier. Optics Express, 13(12):4568–79, 2005. Arikan, E. A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory, 55(7):3051–3073, 2009. Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information Theory, 18(1):14–20, 1972. Austin, M. E. Decision-feedback equalization for digital communications over dispersive channels. Technical Report 437, MIT Lincoln Laboratories, 1967. Azana, J. and M. A. Muriel. Real-time optical spectrum analysis based on the time-space duality in chirped fiber gratings. IEEE Journal of Quantum Electronics, 36(5):517–26, 2000. Bahl, L., J. Cocke, F. Jelinek, and J. Raviv. Optimal decoding of linear codes for minimizing symbol error rate. IEEE Transactions on Information Theory, 20(2):284–7, 1974.

884

Bibliography

Baker, K. R. On the WMC density as an inverse Gaussian probability density. IEEE Transactions on Communications, 44(1):15–17, 1996. Ballato, J. and P. Dragic. Materials development for next generation optical fiber. Materials, 7(6):4411–30, 2014. Ban, M., K. Kurokawa, R. Momose, and O. Hirota. Optimum measurements for discrimination among symmetric quantum states and parameter estimation. International Journal of Theoretical Physics, 36(6):1269–88, 1997. Barry, J. R. and E. A. Lee. Performance of coherent optical receivers. Proceedings of the IEEE, 78(8):1369–94, 1990. Becerra, F. E., J. Fan, G. Baumgartner, S. V. Polyakov, J. Goldhar, J. T. Kosloski, and A. Migdall. M -ary-state phase-shift-keying discrimination below the homodyne limit. Physical Review A, 84:062324, 2011. Becerra, F. E., J. Fan, G. Baumgartner, J. Goldhar, J. T. Kosloski, and A. Migdall. Experimental demonstration of a receiver beating the standard quantum limit for multiple nonorthogonal state discrimination. Nature Photonics, 7(2):147–52, 2013. Becker, P. C., N. A. Olsson, and J. R. Simpson. Erbium-Doped Fiber Amplifiers. Academic Press, San Diego, CA, 1999. Beling, A. and J. C. Campbell. InP-based high-speed photodetectors. Journal of Lightwave Technology, 27(1–4):343–55, 2009. Bell, A. G. On the production and reproduction of speech by light. American Journal of Science, 20(118):305–24, 1880. Bellanger, M. A simple comparison of constant modulus and Wiener criteria for equalization with complex signals. Digital Signal Processing, 14(5):429–37, 2004. Benedetto, S. and E. Biglieri. Principles of Digital Transmission: With Wireless Applications. Kluwer Academic/Plenum Press, New York, NY, 1999. Berdague, S. and P. Facq. Mode-division multiplexing in optical fibers. Applied Optics, 21:1950–5, 1982. Berger, T. and D. W. Tufts. Optimal pulse amplitude modulation part I: Transmitter–receiver design and bounds from information theory. IEEE Transactions on Information Theory, 13(4):196–208, 1967. Berrou, C. Error-correction coding method with at least two systematic convolutional codings in parallel, corresponding iterative decoding method, decoding module and decoder, August 29, 1995. URL www.google.com/patents/US5446747. US Patent 5,446,747. Berrou, C., A. Glavieux, and P. Thitimajshima. Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1. In Conference Record, IEEE International Conference on Communications, 1993. ICC 93. Geneva. Technical Program, volume 2, pages 1064–70, 1993. Betti, S., G. De Marchis, and E. Iannone. Coherent Optical Communications Systems. WileyInterscience, New York, NY, 1995. Beygi, L., E. Agrell, J. M. Kahn, and M. Karlsson. Rate-adaptive coded modulation for fiber-optic communications. Journal of Lightwave Technology, 32(2):333–43, 2013. Blachman, N. M. A comparison of the informational capacities of amplitude- and phasemodulation communication systems. Proceedings of the Institute of Radio Engineers, 41(6):748–59, 1953. Blahut, R. E. Computation of channel capacity and rate-distortion functions. IEEE Transactions on Information Theory, 18(4):460–73, 1972. Blahut, R. E. Principles and Practice of Information Theory. Addison-Wesley, Reading, MA, 1988.

Bibliography

885

Blahut, R. E. Digital Transmission of Information. Addison-Wesley, Reading, MA, 1990. Blahut, R. E. Algebraic Codes for Data Transmission. Cambridge University Press, Cambridge, 2003. Blahut, R. E. Modem Theory: An Introduction to Telecommunications. Cambridge University Press, Cambridge, 2010. Blahut, R. E. Principles of Information Theory. Cambridge University Press, Cambridge, 2019. Blow, K. J. and D. Wood. Theoretical description of transient stimulated raman scattering in optical fibers. IEEE Journal of Quantum Electronics, 25(12):2665–73, 1989. Bode, H. W. Network Analysis and Feedback Amplifier Design. The Bell Telephone Laboratories series. Van Nostrand Reinhold, New York, NY, 1950. Bondurant, R. S. Near-quantum optimum receivers for the phase-quadrature coherent-state channel. Optics Letters, 18(22):1896–8, 1993. Born, M. and E. Wolf. Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light. Pergamon Press, Oxford, 1980. Bosco, G., P. Poggiolini, and M. Visintin. Performance analysis of MLSE receivers based on the square-root metric. Journal of Lightwave Technology, 26(14):2098–109, 2008. Bowen, J. On the capacity of a noiseless photon channel. IEEE Transactions on Information Theory, 13(2):230–6, 1967. Boyd, R. W. Nonlinear Optics. Elsevier Science, Amsterdam, 2013. Brandwood, D. H. A complex gradient operator and its application in adaptive array theory. Microwaves, Optics and Antennas, IEE Proceedings H, 130(1):11–16, 1983. Braunstein, S. L. Homodyne statistics. Physical Review A, 42(1):474–81, 1990. Breiling, M. A logarithmic upper bound on the minimum distance of turbo codes. IEEE Transactions on Information Theory, 50(8):1692–710, 2004. Bromage, J. Raman amplification for fiber communications systems. Journal of Lightwave Technology, 22(1):79–93, 2004. Butler, B. K. and P. H. Siegel. Bounds on the minimum distance of punctured quasi-cyclic LDPC codes. IEEE Transactions on Information Theory, 59(7):4584–97, 2013. Butler, B. K. and P. H. Siegel. Error floor approximation for LDPC codes in the awgn channel. IEEE Transactions on Information Theory, 60(12):7416–41, 2014. Bužek, V. and P. L. Knight. Quantum interference, superposition states of light, and nonclassical effects. In E. Wolf, editor, Progress in Optics, volume 34, pages 1–158. Elsevier, Amsterdam, 1995. Campbell, J. C. Advances in photodetectors. In I. P. Kaminow, T. Li, and A. E Willner, editors, Optical Fiber Telecommunications V A: Components and Subsystems, pages 221–68. Elsevier, Amsterdam, 2008. Campbell, J. C. Recent advances in avalanche photodiodes. Journal of Lightwave Technology, 34(2):278–85, 2016. Campbell, J. C., S. Demiguel, F. Ma, A. Beck, X. Guo, S. Wang, X. Zheng, X. Li, J. D. Beck, M. A. Kinch, A. Huntington, L. A. Coldren, J. Decobert, and N. Tscherptner. Recent advances in avalanche photodiodes. IEEE Journal of Selected Topics in Quantum Electronics, 10(4):777– 87, 2004. Campbell, N. The study of discontinuous phenomena. Proceedings of the Cambridge Philosophical Society, 15:117–36, 1909a. Campbell, N. Discontinuities in light emission. Proceedings of the Cambridge Philosophical Society, 15:310–28, 1909b.

886

Bibliography

Carena, A., V. Curri, G. Bosco, P. Poggiolini, and F. Forghieri. Modeling of the impact of nonlinear propagation effects in uncompensated optical coherent transmission links. Journal of Lightwave Technology, 30:1524–39, 2012. Carlson, A. P., P. B. Crilly, and J. C. Rutledge. Communication Systems: An Introduction to Signals and Noise in Electrical Communication. McGraw-Hill, Dubuque, IA, 2001. Carson, J. R., S. P. Mead, and S. A. Schelkunoff. Hyper-frequency wave guides – mathematical theory. Bell System Technical Journal, 15:310–33, 1936. Caves, C. M. Quantum limits on noise in linear amplifiers. Physical Review D, 26(8):1817–39, 1982. Caves, C. M. and P. D. Drummond. Quantum limits on bosonic communication rates. Reviews of Modern Physics, 66:481–537, 1994. Chen, J., J. L. Habif, Z. Dutton, R. Lazarus, and S. Guha. Optical codeword demodulation with error rates below the standard quantum limit using a conditional nulling receiver. Nature Photonics, 6:374–9, 2012. Chew, W. C. Waves and Fields in Inhomogeneous Media. Van Nostrand Reinhold, New York, NY, 1990. Chiang, T.-K., N. Kagi, T. K. Fong, M. E. Marhic, and L. G. Kazovsky. Cross-phase modulation in dispersive fibers: Theoretical and experimental investigation of the impact of modulation frequency. IEEE Photonics Technology Letters, 6:733–6, 1994. Chraplyvy, A. R. Limitations on lightwave communications imposed by optical-fiber nonlinearities. Journal of Lightwave Technology, 8:1548–57, 1990. Chraplyvy, A. R., R. W. Tkach, and K. L. Walker. Optical fiber for wavelength division multiplexing, August 29, 1995. URL www.google.com/patents/US5327516. US Patent 5,327,516. Chraplyvy, A. R., D. Marcuse, and P. S. Henry. Carrier-induced phase noise in angle-modulated optical-fiber systems. Journal of Lightwave Technology, 2:6–10, 1984. Chuang, S. L. Physics of Photonic Devices. Wiley, New York, NY, 2012. Coleman, J. J. The development of the semiconductor laser diode after the first demonstration in 1962. Semiconductor Science and Technology, 27(9):090207, 2012. Conradi, J. The distribution of gains in uniformly multiplying avalanche photodiodes: Experimental. Transactions on Electron Devices, 19(6):713–18, 1972. Costello, D. J. An introduction to low-density parity check codes. Presented at the IEEE Information Theory Society Summer School, Northwestern University, August 2009. Cover, T. M. and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, NY, 1991, 2006. Dar, R., M. Shtaif, and M. Feder. New bounds on the capacity of the nonlinear fiber-optic channel. Optics Letters, 39(2):398–401, 2014. Darrigol, O. A History of Optics from Greek Antiquity to the Nineteenth Century. Oxford University Press, Oxford, 2012. Davies, E. B. and J. T. Lewis. An operational approach to quantum probability. Communications in Mathematical Physics, 17(3):239–60, 1970. Desset, C., B. Macq, and L. Vandendorpe. Computing the word-, symbol-, and bit-error rates for block error-correcting codes. IEEE Transactions on Communications, 52(6):910–21, 2004. Desurvire, E., J. R. Simpson, and P. C. Becker. High-gain erbium-doped traveling-wave fiber amplifier. Optics Letters, 12(11):888–90, 1987. Diament, P. Wave Transmission and Fiber Optics. Macmillan, New York, NY, 1990. Dirac, P. A. M. The Principles of Quantum Mechanics. Clarendon Press, Oxford, 1981.

Bibliography

887

Djordjevic, I., W. Ryan, and B. Vasic. Coding for Optical Channels. Springer, New York, NY, 2010. Dolinar, S. J. An optimum receiver for the binary coherent state quantum channel. MIT Research Laboratory of Electronics Quarterly Progress Report, 111:115–20, 1973. Einarsson, G. Principles of Lightwave Communications. John Wiley and Sons, New York, NY, 1996. Eiselt, M. Limits on WDM systems due to four-wave mixing: A statistical approach. Journal of Lightwave Technology, 17(11):2261–7, 1999. Eldar, Y. C. von Neumann measurement is optimal for detecting linearly independent mixed quantum states. Physical Review A, 68:52303, 2003. Eldar, Y. C. and G. D. Forney. On quantum detection and the square-root measurement. IEEE Transactions on Information Theory, 47(3):858–72, 2001. Eldar, Y. C., A. Megretski, and G. C. Verghese. Designing optimal quantum detectors via semidefinite programming. IEEE Transactions on Information Theory, 49(4):1007–12, 2003. Epworth, R. E. The phenomenon of modal noise in analog and digital optical fiber systems. In Proceedings of the Fourth European Conference on Optical Communication, 1978, page 492, 1978. Falconer, D. and J. Salz. Optimal reception of digital data over the Gaussian channel with unknown delay and phase jitter. IEEE Transactions on Information Theory, 23(1):117–26, 1977. Fehenberger, T., A. Alvarado, G. Böcherer, and N. Hanik. On probabilistic shaping of quadrature amplitude modulation for the nonlinear fiber channel. Journal of Lightwave Technology, 34(21):5063–73, 2016. Fock, V. Konfigurationsraum und zweite Quantelung. Zeitschrift für Physik, 75(9):622–47, 1932. Forestieri, E. and G. Prati. Exact analytical evaluation of second-order PMD impact on the outage probability for a compensated system. Journal of Lightwave Technology, 22(4):988–96, 2004. Forghieri, R. W. and A. R. Chraplyvy. Fiber nonlinearities and their impact on transmission systems. In I. P. Kaminow and T. L. Koch, editors, Optical Fiber Telecommmunications IIIA. Academic Press, San Diego, CA, 1997. Forney, G., R. Gallager, G. Lang, F. Longstaff, and S. Qureshi. Efficient modulation for bandlimited channels. IEEE Journal on Selected Areas in Communications, 2(5):632–47, 1984. Forney, G. D. The concepts of state and entropy in quantum mechanics. Master’s thesis, Massachusetts Institute of Technology, 1963. Forney, G. D. Concatenated Codes. MIT University Press, Cambridge, MA, 1966. Forney, G. D. Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference. IEEE Transactions on Information Theory, 18(3):363–78, 1972. Forney, G. D. Viterbi algorithm. Proceedings of the IEEE, 61(3):268–78, 1973. Forney, G. D. and D. J. Costello. Channel coding: The road to channel capacity. Proceedings of the IEEE, 95:1150–77, 2007. Forney, G. D. and G. Ungerboeck. Modulation and coding for linear Gaussian channels. IEEE Transactions on Information Theory, 44(6):2384–415, 1998. Foschini, G. J. and C. D. Poole. Statistical-theory of polarization dispersion in single-mode fibers. IEEE Journal of Lightwave Technology, 9(11):1439–56, 1991. Franceschetti, M. Quantum limits on the entropy of bandlimited radiation. Journal of Statistical Physics, 169(2):374–94, 2017. Friedman, B. Principles and Techniques of Applied Mathematics. Wiley, New York, NY, 1956.

888

Bibliography

Gabor, D. Communication theory and physics. Transactions of the IRE Professional Group on Information Theory, 1(1):48–59, 1953. Gagliardi, R. M. and S. Karp. Optical Communications. Wiley, New York, NY, 1976. Gallager, R. G. Low-density parity-check codes. IRE Transactions on Information Theory, 8(1):21–8, 1962. Gallager, R. G. Stochastic Processes: Theory for Applications. Cambridge University Press, Cambridge, 2013. Garello, R., P. Pierleoni, and S. Benedetto. Computing the free distance of turbo codes and serially concatenated codes with interleavers: Algorithms and applications. IEEE Journal on Selected Areas in Communications, 19(5):800–12, 2001. Geist, J. M. Capacity and cutoff rate for dense M -ary PSK constellations. In Military Communications Conference, 1990, volume 2, pages 768–70, 1990. George, D. Matched filters for interfering signals. IEEE Transactions on Information Theory, 11(1):153–4, 1965. Gibbs, J. W. Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundation of Thermodynamics. Yale University Press, New Haven, CT, 1914. Giles, C. R., E. Desurvire, and J. R. Simpson. Transient gain and cross talk in erbium-doped fiber amplifiers. Optics Letters, 14(16):880–2, 1989. Giovannetti, V., S. Guha, S. Lloyd, L. Maccone, J. H. Shapiro, and H. P. Yuen. Classical capacity of the lossy bosonic channel: The exact solution. Physical Review Letters, 92(2):027902, 2004. Giovannetti, V., R. García-Patrón Sánchez, N. J. Cerf, and A. S. Holevo. Ultimate classical communication rates of quantum optical channels. Nature Photonics, 8(10):796–800, 2014. Glauber, R. J. Coherent and incoherent states of the radiation field. Physical Review A, 131(6):2766–88, 1963. Gloge, D. Weakly guiding fibers. Applied Optics, 10(10):2252–8, 1971a. Gloge, D. Dispersion in weakly guiding fibers. Applied Optics, 10(11):2442–5, 1971b. Gloge, D. and E. A. J. Marcatili. Multimode theory of graded-core fibers. Bell System Technical Journal, 52(9):1563–78, 1973. Gnauck, A. H. and P. J. Winzer. Optical phase-shift-keyed transmission. Journal of Lightwave Technology, 23(1): 115–30, 2005. Godard, D. Self-recovering equalization and carrier tracking in two-dimensional data communication systems. IEEE Transactions on Communications, 28(11):1867–75, 1980. Goldfarb, G., G. Li, and M. G. Taylor. Orthogonal wavelength-division multiplexing using coherent detection. IEEE Photonics Technology Letters, 19(24):2015–17, 2007. Goodman, J. W. Speckle Phenomena in Optics: Theory and Applications. Roberts and Company, Greenwood Village, CO, 2007. Goodman, J. W. Statistical Optics. Wiley, New York, NY, second edition, 2015. Gordon, B. E. and L. P. Bolgiano. Information capacity of a photoelectric detector. Proceedings of the IEEE, 53(11):1745–46, 1965. Gordon, J. P. Quantum effects in communication systems. Proceedings of the Institute of Radio Engineers, 50(9):1898–908, 1962. Gordon, J. P. Noise at optical frequencies; information theory. In P. A. Miles, editor, Proceedings of the International School of Physics “Enrico Fermi”, Course XXXI, pages 156–81. Academic Press, New York, NY, 1964. Gordon, J. P. and H. Kogelnik. PMD fundamentals: Polarization mode dispersion in optical fibers. Proceedings of the National Academy of Sciences of the United States of America, 97:4541–50, 2000.

Bibliography

889

Gordon, J. P. and W. H. Louisell. Simultaneous measurement of noncommuting observables. In P. L. Kelley, B. Lax, and P. E. Tannewald, editors, Physics of Quantum Electronics, page 43. McGraw Hill, New York, NY, 1966. Gordon, J. P. and L. F. Mollenauer. Phase noise in photonic communications systems using linear amplifiers. Optics Letters, 15(23):1351–53, 1990. Gowan, B. Ciena 20: The founding of Ciena, 2012. URL www.ciena.com/insights/ articles/Ciena-20-The-Founding-of-Ciena_prx.html . Gowar, J. Optical Communication Systems. Prentice Hall International, Englewood Cliffs, NJ, 1984. Griffiths, R. B. Consistent Quantum Theory. Cambridge University Press, Cambridge, 2002. Gursoy, M. C. On the low-SNR capacity of phase-shift keying with hard-decision detection. In IEEE International Symposium on Information Theory, 2007. ISIT 2007. pages 166–70. IEEE, 2007. Hagfors, T. Information capacity and quantum effects in propagation circuits. Technical Report 344, MIT Lincoln Laboratories, 1964. Hajek, B. Random Processes for Engineers. Cambridge University Press, Cambridge, 2015. Hall, M. J. W. Gaussian noise and quantum-optical communication. Physical Review A, 50(4):3295–303, 1994. Hall, R. N., G. E. Fenner, J. D. Kingsley, T. J. Soltys, and R. O. Carlson. Coherent light emission from Ga–As junctions. Physics Review Letters, 9(9):366–8, 1962. Hamkins, J. Performance of low-density parity-check coded modulation. Technical Report IPN Progress Report 42-184, Jet Propulsion Laboratory, 2011. Han, Y. and G. Li. Coherent optical communication using polarization multiple-input–multipleoutput. Optics Express, 13:7527–34, 2005. Harrington, R. F. Time-Harmonic Electromagnetic Fields. McGraw-Hill, New York, NY, 1961. Hasegawa, A. and T. Nyu. Eigenvalue communication. Journal of Lightwave Technology, 11(3):395–9, 1993. Haus, H. A. Noise figure definition valid from RF to optical frequencies. IEEE Journal of Selected Topics in Quantum Electronics, 6(2):240–7, 2000a. Haus, H. A. Electromagnetic Noise and Quantum Optical Measurements. Springer-Verlag, Berlin, 2000b. Haus, H. A., C. H. Townes, and B. M. Oliver. Comments on “Noise in photoelectric mixing”. Proceedings of the Institute of Radio Engineers, 50(6):1544–6, 1962. Hausladen, P., R. Jozsa, B. Schumacher, M. Westmoreland, and W. K. Wootters. Classical information capacity of a quantum channel. Physical Review A, 54(3):1869–76, 1996. Hayashi, T., T. Taru, O. Shimakawa, T. Sasaki, and E. Sasaoka. Design and fabrication of ultra-low crosstalk and low-loss multi-core fiber. Optics Express, 19(17):16576–92, 2011. Haykin, S. Adaptive Filter Theory. Prentice Hall, Englewood Cliffs, NJ, 1996. Haykin, S. Communication Systems. Wiley, New York, NY, 2001. Hecht, J. City of Light: The Story of Fiber Optics. Oxford University Press, Oxford, 2004. Helstrom, C. W. Statistical Theory of Signal Detection. Pergamon Press, Oxford, 1968. Helstrom, C. W. Quantum Detection and Estimation Theory. Academic Press, New York, NY, 1976. Helstrom, C. W. Probability and Stochastic Processes for Engineers. Maxwell Macmillan International, New York, NY, 1991. Henry, C. H. Phase noise in semiconductor lasers. Journal of Lightwave Technology, 4(3):298– 311, 1986.

890

Bibliography

Henry, C. H. and R. F. Kazarinov. Quantum noise in photonics. Reviews of Modern Physics, 68(3):801–53, 1996. Hepp, K. The classical limit for quantum mechanical correlation functions. Communications in Mathematical Physics, 35(4):265–77, 1974. Hill, K. O., D. C. Johnson, B. S. Kawasaki, and R. I. MacDonald. CW three-wave mixing in single-mode optical fibers. Journal of Applied Physics, 49(10):5098–106, 1978. Ho, K. P. Phase-Modulated Optical Communication Systems. Springer, New York, NY, 2005. Ho, K. P. and J. M. Kahn. Channel capacity of WDM systems using constant-intensity modulation formats. In Optical Fiber Communications Conference. (OFC). Postconference Technical Digest, pages 731–3. Optical Society of America, 2002. Ho, K. P. and J. M. Kahn. Statistics of group delays in multimode fiber with strong mode coupling. Journal of Lightwave Technology, 29:3119–28, 2011. Holevo, A. S. Statistical problems in quantum physics. In Proceedings Soviet–Japanese Symposium on Probability and Statistics, volume 1, pages 22–40, 1968. Holevo, A. S. Some estimates of the information transmitted by quantum communication channel. Problems in Information Transmission, 9(3):177–83, 1973. Holevo, A. S. Probabilistic and Statistical Aspects of Quantum Theory. North-Holland, Amsterdam, 1982. Holevo, A. S. The capacity of the quantum channel with general signal states. IEEE Transactions on Information Theory, 44(1):269–73, 1998a. Holevo, A. S. Quantum coding theorems. Russian Mathematical Surveys, 53(6):1295–331, 1998b. Holevo, A. S. Probabilistic and Statistical Aspects of Quantum Theory. Edizioni della Normale Basel, second edition, 2011. Holevo, A. S. and R. F. Werner. Evaluating capacities of bosonic Gaussian channels. Physical Review A, 63:032312, 2001. Holevo, A. S., M. Sohma, and O. Hirota. Capacity of quantum gaussian channels. Physical Review A, 59:1820–8, 1999. Holonyak, N. and S. F. Bevacqua. Coherent (visible) light emission from Ga(As1−x Px ) junctions. Applied Physics Letters, 1(4):82–3, 1962. Horn, R. A. and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, Cambridge, 1994. Horodecki, R., P. Horodecki, M. Horodecki, and K. Horodecki. Quantum entanglement. Reviews of Modern Physics, 81:865–942, 2009. Hranilovic, S. and F. R. Kschischang. Capacity bounds for power- and band-limited optical intensity channels corrupted by Gaussian noise. IEEE Transactions on Information Theory, 50(5):784–95, 2004. Hueda, M. R., D. E. Crivelli, H. S. Carrer, and O. E. Agazzi. Parametric estimation of IM/DD optical channels using new closed-form approximations of the signal PDF. Journal of Lightwave Technology, 25(3):957–75, 2007. Hui, R., K. R. Demarest, and C. T. Allen. Cross-phase modulation in multispan WDM optical fiber systems. Journal of Lightwave Technology, 17(6):1018, 1999. Humblet, P. A. Design of optical matched filters. In Global Telecommunications Conference, 1991, volume 2, pages 1246–50, 1991. Humblet, P. A. and M. Azizoglu. On the bit error rate of lightwave systems with optical amplifiers. Journal of Lightwave Technology, 9(11):1576–82, 1991. Husimi, K. Some formal properties of the density matrix. Proceedings of the PhysicoMathematical Society of Japan, 22:264–314, 1940.

Bibliography

891

Huttner, B., C. Geiser, and N. Gisin. Polarization-induced distortions in optical fiber networks with polarization-mode dispersion and polarization-dependent losses. IEEE Journal of Selected Topics in Quantum Electronics, 6(2):317–29, 2000. Immink, K. A. S. Runlength-limited sequences. Proceedings of the IEEE, 78(11):1745–59, 1990. Inoue, K. Suppression technique for fiber four-wave mixing using optical multi-/demultiplexers and a delay line. Journal of Lightwave Technology, 11(3):455–61, 1993. Inoue, K. and N. Shibata. Theoretical evaluation of intermodulation distortion due to 4wave mixing in optical fibers for coherent phase-shift-keying frequency-division-multiplexing transmission. Optics Letters, 14(11):584–6, 1989. Ip, E. and J. M. Kahn. Digital equalization of chromatic dispersion and polarization mode dispersion. Journal of Lightwave Technology, 25(8):2033–43, 2007. Ip, E. and J. M. Kahn. Compensation of dispersion and nonlinear impairments using digital backpropagation. Journal of Lightwave Technology, 26(20):3416–25, 2008. Ippen, E. P. and R. H. Stolen. Stimulated Brillouin-scattering in optical fibers. Applied Physics Letters, 21(11):539–41, 1972. Jacobs, I. and E. Berlekamp. A lower bound to the distribution of computation for sequential decoding. IEEE Transactions on Information Theory, 13(2):167–74, 1967. Jacobsen, G. Noise in Digital Optical Transmission Systems. Artech House, Boston, MA, 1994. Jaynes, E. T. Information theory and statistical mechanics. Physical Review, 106(4):620–30, 1957a. Jaynes, E. T. Information theory and statistical mechanics 2. Physical Review, 108(2):171–90, 1957b. Jelonek, Z. A comparison of transmission systems. In W. Jackson, editor, Communication Theory, page 43. Academic Press Inc., New York, NY, 1953. Jinguji, K., N. Takato, A. Sugita, and M. Kawachi. Mach–Zehnder interferometer type optical waveguide coupler with wavelength-flattened coupling ratio. Electronics Letters, 26:1326–7, 1990. Johnson, C. R., P. Schniter, T. J. Endres, J. D. Behm, D. R. Brown, and R. A. Casas. Blind equalization using the constant modulus criterion: A review. Proceedings of The IEEE, 86(10):1927–50, 1998. Johnson, N. L., A. W. Kemp, and S. Kotz. Univariate Discrete Distributions. Wiley, New York, NY, 2005. Kabal, P. and S. Pasupathy. Partial-response signaling. IEEE Transactions on Communications, 23(9):921–34, 1975. Kahn, J. M. and K. P. Ho. Spectral efficiency limits and modulation/detection techniques for DWDM systems. IEEE Journal of Selected Topics in Quantum Electronics, 10(2):259–72, 2004. Kailath, T. Linear Least-Squares Estimation. Dowden, Hutchinson & Ross, Stroudsburg, PN, 1977. Kanwal, R. P. Generalized Functions: Theory and Applications. Birkhäuser, Boston, MA, 2004. Kao, K. C. and G. A. Hockham. Dielectric-fibre surface waveguides for optical frequencies. Proceedings of the IEE, 113(7):1151–8, 1966. Kapron, F. P., D. B. Keck, and R. D. Mauer. Radiation losses in glass optical waveguides. Applied Physics Letters, 17:423–5, 1970. Karlsson, M. Probability density functions of the differential group delay in optical fiber communication systems. Journal of Lightwave Technology, 19(3):324–31, 2001.

892

Bibliography

Karlsson, M. and H. Sunnerud. Effects of nonlinearities on PMD-induced system impairments. Journal of Lightwave Technology, 24(11):4127–37, 2006. Karp, S., R. M. Gagliardi, S. E. Moran, and L. B. Stotts. Optical Channels: Fibers, Clouds, Water, and the Atmosphere. Springer, New York, NY, 2013. Katz, M. and S. Shamai. On the capacity-achieving distribution of the discrete-time noncoherent and partially coherent AWGN channels. IEEE Transactions on Information Theory, 50(10):2257–70, 2004. Kawakami, S. and J. Nishizawa. Propagation loss in a distributed beam waveguide. Proceedings of the IEEE, 53:2148–9, 1965. Kay, S. M. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, Upper Saddle River, NJ, 1993. Kazovsky, L. G., S. Benedetto, and A. E. Willner. Optical Fiber Communication Systems. Artech House, Boston, MA, 1996. Kennedy, R. S. A near-optimum receiver for the binary coherent state quantum channel. Technical report, MIT Research Laboratory of Electronics Quarterly Progress Report, 1973a. Kennedy, R. S. On the optimum quantum receiver for the M-ary linearly independent pure state problem. Technical report, MIT Research Laboratory of Electronics Quarterly Progress Report, 1973b. Kikuchi, K. Polarization-demultiplexing algorithm in the digital coherent receiver. In Digest of the LEOS Summer Topical Meetings, 2008, page MC2.2, 2008. Killey, R. Dispersion and nonlinearity compensation using electronic predistortion techniques. In The IEE Seminar on Optical Fibre Communications and Electronic Signal Processing, 2005, pages 2/1–2/6, 2005. Klauder, J. R. and E.C.G. Sudarshan. Fundamentals of Quantum Optics. Benjamin, New York, NY, 1968. Klotzkin, D. J. Introduction to Semiconductor Lasers for Optical Communications: An Applied Approach. Springer, New York, NY, 2013. Koetter, R., A. C. Singer, and M. Tüchler. Turbo equalization. IEEE Signal Processing Magazine, 21(1):67–80, 2004. Kogelnik, H. Theory of dielectric waveguides. In Integrated Optics, pages 13–81. Springer, Berlin, 1975. Kolmogorov, A. N. Sur l’interpolation et extrapolation des suites stationaries. Comptes rendus de l’Académie des Sciences, 208:2043–5, 1939. English translation in Kailath (1977). Kong, J. A. Electromagnetic Wave Theory. Wiley, New York, NY, 1990. Kramer, G., A. Ashikhmin, A. J. van Wijngaarden, and W. Xing. Spectral efficiency of coded phase-shift keying for fiber-optic communication. Journal of Lightwave Technology, 21(10):2438–45, 2003. Kramer, G., M. I. Yousefi, and F. R. Kschischang. Upper bound on the capacity of a cascade of nonlinear and noisy channels. In 2015 IEEE Information Theory Workshop (ITW), 2015. Kretzmer, E. Generalization of a technique for binary data communication. IEEE Transactions on Communication Technology, 14(1):67–8, 1966. Kreutz-Delgado, K. The complex gradient operator and the CR-calculus, 2009. URL https:// arxiv.org/pdf/0906.4835.pdf. Kschischang, F. R. and S. Pasupathy. Optimal nonuniform signaling for Gaussian channels. IEEE Transactions on Information Theory, 39(3):913–29, 1993. Kudeki, E. and D. C. Munson. Analog Signals and Systems. Pearson Prentice-Hall, Upper Saddle River, NJ, 2009.

Bibliography

893

Kwiat, P. G. and L. Hardy. The mystery of the quantum cakes. American Journal of Physics, 68(1):33–6, 2000. Lachs, G. Theoretical aspects of mixtures of thermal and coherent radiation. Physical Review, 138(4B):B1012, 1965. Lai, Y. and H. A. Haus. Characteristic functions and quantum measurements of optical observables. Quantum Optics, 1(2):99–115, 1989. Landau, L. D., E. M. Lifshitz, and L. P. Pitaevski˘ı. Electrodynamics of Continuous Media. Pergamon, Oxford, 1984. Lapidoth, A. On phase noise channels at high SNR. In Proceedings of 2002 IEEE Information Theory Workshop, pages 1–4. IEEE, 2002. Lax, M. Classical noise. V. Noise in self-sustained oscillators. Physical Review, 160(2):290–307, 1967. Lebedev, D. S. and L. B. Levitin. Information transmission by electromagnetic field. Information and Control, 9(1):1–22, 1966. Lee, H. W. Theory and application of the quantum phase-space distribution functions. Physics Reports, 259(3):147–211, 1995. Lender, A. The duobinary technique for high-speed data transmission. Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics, 82(2):214–18, 1963. Lender, A. Correlative digital communication techniques. IEEE Transactions on Communication Technology, 12(4):128–35, 1964. Lenz, G., B. J. Eggleton, C. R. Giles, C. K. Madsen, and R. E. Slusher. Dispersive properties of optical filters for WDM systems. IEEE Journal of Quantum Electronics, 34(8):1390–402, 1998. Leonhardt, U. A. Essential Quantum Optics. Cambridge University Press, Cambridge, 2010. Leonhardt, U. A. and H. Paul. Measuring the quantum state of light. Progress in Quantum Electronics, 19(2):89–130, 1995. Lin, S. and D. J. Costello. Error Control Coding: Fundamentals and Applications. Pearson Prentice-Hall, Upper Saddle River, NJ, 2004. Lindsey, W. C. and M. K. Simon. Telecommunication Systems Engineering. Prentice-Hall, Englewood Cliffs, NJ, 1973. Liu, J. M. Photonic Devices. Cambridge University Press, Cambridge, 2005. Liu, X., X. Wei, R. E. Slusher, and C. J. McKinstrie. Improving transmission performance in differential phase-shift-keyed systems by use of lumped nonlinear phase-shift compensation. Optics Letters, 27(18):1616–18, 2002. Loudon, R. The Quantum Theory of Light. Oxford University Press, Oxford, 2000. Lucky, R. W. Automatic equalization for digital communication. The Bell System Technical Journal, 44(4):547–88, 1965. MacKay, D. J. C. and R. M. Neal. Near Shannon limit performance of low density parity check codes. Electronics Letters, 32:1645–6, 1996. Mandel, L. Fluctuation of photon beams and their correlations. Proceedings of the Physical Society, 71:1037–48, 1958. Mandel, L. Fluctuations of photon beams: The distribution of the photo-electrons. Proceedings of the Physical Society, 74(3):233–43, 1959. Mandel, L. and E. Wolf. Photon statistics and classical fields. Physical Review, 149:1033–7, 1966. Mandel, L. and E. Wolf. Optical Coherence and Quantum Optics. Cambridge University Press, Cambridge, 1995.

894

Bibliography

Marcuse, D. Light Transmission Optics. Van Nostrand Reinhold, New York, NY, 1972. Marcuse, D. Theory of Dielectric Optical Waveguides. Academic Press, New York, NY, 1974. Marcuse, D. Pulse distortion in single-mode fibers. Applied Optics, 19(10):1653–60, 1980. Marcuse, D. RMS width of pulses in nonlinear dispersive fibers. Journal of Lightwave Technology, 10(1):17–21, 1992. Marcuse, D., A. R. Chraplyvy, and R. W. Tkach. Dependence of cross-phase modulation on channel number in fiber WDM systems. Journal of Lightwave Technology, 12(5):885–90, 1994. Martinez, A. Spectral efficiency of optical direct detection. Journal of the Optical Society of America B, 24(4):739–49, 2007. McCusker, K. T. and P. G. Kwiat. Efficient optical quantum state engineering. Physical Review Letters, 103:163602, 2009. McIntyre, R. J. Multiplication noise in uniform avalanche diodes. IEEE Transactions on Electron Devices, 13(1):164–8, 1966. McIntyre, R. J. The distribution of gains in uniformly multiplying avalanche photodiodes: Theory. IEEE Transactions on Electron Devices, 19(6):702–13, 1972. Mears, R. J., L. Reekie, I. M. Jauncey, and D. N. Payne. Low-noise erbium-doped fibre amplifier operating at 1.54 mm. Electronics Letters, 23(19):1026–8, 1987. Mecozzi, A. Probability density functions of the nonlinear phase noise. Optics Letters, 29(7):673– 5, 1994a. Mecozzi, A. Limits to long-haul coherent transmission set by the Kerr nonlinearity and noise of the in-line amplifiers. Journal of Lightwave Technology, 12(11):1993–2000, 1994b. Mecozzi, A. and M. Shtaif. On the capacity of intensity modulated systems using optical amplifiers. IEEE Photonics Technology Letters, 13(9):1029–31, 2001. Mecozzi, A., C. Antonelli, and M. Shtaif. Nonlinear propagation in multi-mode fibers in the strong coupling regime. Optics Express, 20(11):11673–8, 2012. Menyuk, C. R. Application of multiple-length-scale methods to the study of optical fiber transmission. Journal of Engineering Mathematics, 36(1–2):113–36, 1999. Menyuk, C. R. and B. S. Marks. Interaction of polarization mode dispersion and nonlinearity in optical fiber transmission systems. Journal of Lightwave Technology, 24(7):2806–26, 2006. Messerschmitt, D. G. Minimum MSE equalization of digital fiber optic systems. IEEE Transactions on Communications, 26(7):1110–18, 1978. Milonni, P. W. Lasers. Wiley, New York, NY, 1988. Milonni, P. W. The Quantum Vacuum: An Introduction to Quantum Electrodynamics. Academic Press, New York, NY, 1994. Mitra, P. P. and J. B. Stark. Nonlinear limits to the information capacity of optical fibre communications. Nature, 411(6841):1027–30, 2001. Monsen, P. Feedback equalization for fading dispersive channels. IEEE Transactions on Information Theory, 17(1):56–64, 1971. Nathan, M., W. Dumke, G. Burns, F. Dill, and G. Lasher. Stimulated emission of radiation from GaAs P–N junctions. Applied Physics Letters, 1(3):62–4, 1962. Newton, I. and W. Innys. Opticks: Or, A Treatise of the Reflections, Refractions, Inflections and Colours of Light. William Innys at the West-End of St. Paul’s, London, 1730. Nielsen, M. A. and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, Cambridge, 2000. North, D. O. An analysis of the factors which determine signal/noise discrimination in pulsed carrier systems. Technical Report PTR-6C, Radio Corporation of America, 1943.

Bibliography

895

Nuemark, M. A. On the representation of additive operator set functions. Comptes Rendus (Doklady) Acad. Sci. URSS, 41(1):359–61, 1943. Nyquist, H. Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47(2):617–44, 1928. Okamoto, K. Fundamentals of Optical Waveguides. Academic Press, Burlington, MA, 2006. Okoshi, T. Optical Fibers. Academic Press, New York, NY, 1982. Oliver, B. M. Signal-to-noise ratios in photoelectric mixing. Proceedings of the Institute of Radio Engineers, 49(12):1960–1, 1961. Oliver, B. M. Thermal and quantum noise. Proceedings of the IEEE, 53(5):436–54, 1965. Oppenheim, A. V. and A. S. Willsky. Signals and Systems. Pearson Education, London, 2013. Osaki, M., M. Ban, and O. Hirota. Derivation and physical interpretation of the optimum detection operators for coherent-state signals. Physical Review A, 54(2):1691–701, 1996. Ourjoumtsev, A., R. Tualle-Brouri, J. Laurat, and P. Grangier. Generating optical Schrödinger kittens for quantum information processing. Science, 312(5770):83–6, 2006. Papen, G. C. and R. E. Blahut. The channel capacity from particles to waves. IEEE Transactions on Information Theory, to be published, 2019. Payne, D. N. and W. A. Gambling. Zero material dispersion in optical fibres. Electronics Letters, 11(8):176–8, 1975. Pearl, J. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the Second National Conference on Artificial Intelligence, 1983, pages 133–63, 1982. Personick, S. D. An image band interpretation of optical heterodyne noise. Bell System Technical Journal, 50:213–16, 1971. Personick, S. D. Receiver design for digital fibre optic communication systems. I. Bell System Technical Journal, 52(6):843–74, 1973a. Personick, S. D. Baseband linearity and equalization in fiber optic digital communication systems. Bell System Technical Journal, 52(7):1175–94, 1973b. Personick, S. D., P. Balaban, J. H. Bobsin, and P. R. Kumar. A detailed comparison of four approaches to the calculation of the sensitivity of optical fiber system receivers. IEEE Transactions on Communications, 25(5):541–8, 1977. Peˇrina, J. Quantum Statistics of Linear and Nonlinear Optical Phenomena. Springer Netherlands, Amsterdam, 2012. Petermann, K. Laser Diode Modulation and Noise. Kluwer, Dordrecht, 1988. Poggiolini, P. The GN model of non-linear propagation in uncompensated coherent optical systems. Journal of Lightwave Technology, 30(24):3857–79, 2012. Poggiolini, P., G. Bosco, Y. Benlachtar, S. J. Savory, P. Bayvel, R. I. Killey, and J. Prat. Long-haul 10 Gbit/s linear and non-linear IMDD transmission over uncompensated standard fiber using a sqrt-metric MLSE receiver. Optics Express, 16(17):12919–36, 2008. Polyanskiy, Y., H. V. Poor, and S. Verdu. Channel coding rate in the finite blocklength regime. IEEE Transactions on Information Theory, 56(5):2307–59, 2010. Poole, C. D. and R. E. Wagner. Phenomenological approach to polarisation dispersion in long single-mode fibres. Electronics Letters, 22(19):1029–30, 1986. Poole, S. B., D. N. Payne, and M. E. Fermann. Fabrication of low-loss optical fibers containing rare-earth ions. Electronics Letters, 21(17):737–8, 1985. Potasek, M. J. and G. P. Agrawal. Self-amplitude-modulation of optical pulses in nonlinear dispersive fibers. Physical Review A, 36:3862–7, 1987.

896

Bibliography

Prabhu, V. K. Error rate considerations for digital phase-modulation systems. IEEE Transactions on Communication Technology, 17(1):33–42, 1969. Prasad, S., M. O. Scully, and W. Martienssen. A quantum description of the beam splitter. Optics Communications, 62:139–45, 1987. Price, E. Claude E. Shannon, an oral history. IEEE History Center at Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ, 1982. Prilepsky, J. E., S. A. Derevyanko, and S. K. Turitsyn. Nonlinear spectral management: Linearization of the lossless fiber channel. Optics Express, 21(20):24344–67, 2013. Proakis, J. G. Digital Communications. McGraw-Hill, Boston, MA, 2001. Proakis, J. G. and M. Salehi. Digital Communications. McGraw-Hill, Boston, MA, 2007. Quist, T. M., R. H. Rediker, R. J. Keyes, W. E. Krag, B. Lax, A. L. McWhorter, and H. J. Zeigler. Semiconductor maser of GaAs. Applied Physics Letters, 1(4):91–2, 1962. Randel, S., R. Ryf, A. Sierra, P. J. Winzer, A. H. Gnauck, C. A. Bolle, R.-J. Essiambre, D. W. Peckham, A. McCurdy, and R. Lingle. 6 × 56-Gb/s mode-division multiplexed transmission over 33-km few-mode fiber enabled by 6 × 6 MIMO equalization. Optics Express, 19(17):16697–707, 2011. Rashleigh, S. C. and R. Ulrich. Polarization mode dispersion in single-mode fibers. Optics Letters, 3(2):60–2, 1978. Reed, I. On a moment theorem for complex Gaussian processes. IRE Transactions on Information Theory, 8(3):194–5, 1962. Reed, I. S. and G. Solomon. Polynomial codes over certain finite fields. Journal of The Society for Industrial and Applied Mathematics, 8(2):300–4, 1960. Rice, S. O. Mathematical analysis of random noise – parts I and II. Bell System Technical Journal, 23:282–32, 1944. Rice, S. O. Mathematical analysis of random noise – part III. Bell System Technical Journal, 24:46–108, 1945. Richardson, D. J., J. M. Fini, and L. E. Nelson. Space-division multiplexing in optical fibres. Nature Photonics, 7(5):354–62, 2013. Richardson, T. Error-floors of LDPC codes. In Proceedings of the 41st Annual Allerton Conference, Monticello, IL, 2003, pages 1426–35, 2003. Richardson, T. and R. Urbanke. Modern Coding Theory. Cambridge University Press, Cambridge, 2008. Risken, H. Statistical properties of laser light. In E. Wolf, editor, Progress in Optics Volume VIII, pages 241–94. North-Holland, Amsterdam, 1970. Ross, M. Laser Receivers: Devices, Techniques, Systems. Wiley, New York, NY, 1966. Roudas, I., A. Vgenis, C. S. Petrou, D. Toumpakaris, J. Hurley, M. Sauer, J. Downie, Y. Mauro, and S. Raghavan. Optimal polarization demultiplexing for coherent optical communications systems. Journal of Lightwave Technology, 28(7):1121–34, 2010. Ryan, W. E. and S. Lin. Channel Codes: Classical and Modern. Cambridge University Press, Cambridge, 2009. Ryf, R., S. Randel, A. H. Gnauck, C. Bolle, A. Sierra, S. Mumtaz, M. Esmaeelpour, E. C. Burrows, R. J. Essiambre, P. J. Winzer, D. W. Peckham, A. H. McCurdy, and R. Lingle. Mode-division multiplexing over 96 km of few-mode fiber using coherent 6 × 6 MIMO processing. Journal of Lightwave Technology, 30(4):521–31, 2012. Sakurai, J. J. and J. Napolitano. Modern Quantum Mechanics. Addison-Wesley, New York, NY, 2011.

Bibliography

897

Saleh, B. E. A. Photoelectron Statistics, with Applications to Spectroscopy and Optical Communication, volume 6. Springer-Verlag, Berlin, 1978. Saleh, B. E. A. and M. I. Irshid. Coherence and intersymbol interference in digital fiber optic communication-systems. IEEE Journal of Quantum Electronics, 18(6):944–51, 1982a. Saleh, B. E. A. and M. I. Irshid. Collett–Wolf equivalence theorem and propagation of a pulse in a single-mode optical fiber. Optics Letters, 7(7):342–3, 1982b. Saleh, B. E. A. and M. C. Teich. Fundamentals of Photonics. Wiley, New York, NY, 1991. Salz, J. Coherent lightwave communications. AT&T Technical Journal, 64(10):2153–209, 1985. Sasaki, M., K. Kato, M. Izutsu, and O. Hirota. A demonstration of superadditivity in the classical capacity of a quantum channel. Physics Letters A, 236(1–2):1–4, 1997. Sasaki, M., K. Kato, M. Izutsu, and O. Hirota. Quantum channels showing superadditivity in classical capacity. Physical Review A, 58(1):146–58, 1998. Sato, Y. A method of self-recovering equalization for multilevel amplitude-modulation systems. IEEE Transactions on Communications, 23(6):679–82, 1975. Sayed, A. H. Adaptive Filters. Wiley Interscience, Hoboken, NJ, 2008. Schrödinger, E. Die gegenwärtige Situation in der Quantenmechanik. Naturwissenschaften, 23(48):807–12, 1935. Schumacher, B. and M. D. Westmoreland. Sending classical information via noisy quantum channels. Physical Review A, 56(1):131–8, 1997. Schumacher, B. and M. D. Westmoreland. Quantum Processes, Systems, and Information. Cambridge University Press, Cambridge, 2010. Schwartz, L. Théorie des Distributions. Hermann et Cie, Paris, 1950. Seimetz, M. High-Order Modulation for Optical Fiber Transmission. Springer-Verlag, Berlin, 2009. Shannon, C. E. The mathematical theory of communication. The Bell System Technical Journal, 27:379–423 and 623–56, 1948. Shannon, C. E. and W. Weaver. The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, 1949. Shannon, C. E. and W. Weaver. The Mathematical Theory of Communication. Reprinted with corrections by R. Blahut and B. Hayak. University of Illinois Press, Urbana, IL, 1998. Shapiro, J. and S. Wagner. Phase and amplitude uncertainties in heterodyne detection. IEEE Journal of Quantum Electronics, 20(7):803–13, 1984. Shapiro, J. H. Classical capacities of bosonic channels. In L. Cohen, H. V. Poor, and M. O. Scully, editors, Classical, Semi-classical and Quantum Noise, pages 157–68. Springer, 2012. Shen, Y. R. The Principles of Nonlinear Optics. Wiley-Interscience, New York, NY, 2002. Shibata, N., R. Braun, and R. Waarts. Phase-mismatch dependence of efficiency of wave generation through four-wave mixing in a single-mode optical fiber. IEEE Journal of Quantum Electronics, 23(7):1205–10, 1987. Shimada, S. Coherent Lightwave Communications Technology, volume 1. Chapman and Hall, London, 1995. Shtaif, M. Analytical description of cross-phase modulation in dispersive optical fibers. Optics Letters, 23:1191–3, 1998. Shurcliff, W. A. Polarized Light: Production and Use. Harvard University Press, Cambridge, MA, 1962. Simon, M. K. Probability Distributions Involving Gaussian Random Variables: A Handbook for Engineers and Scientists. Springer, New York, NY, 2007. Slepian, D. On bandwidth. Proceedings of the IEEE, 64(3):292–300, 1976.

898

Bibliography

Slepian, D. and H. O. Pollak. Prolate spheroidal wave functions, Fourier analysis and uncertainty. I. Bell System Technical Journal, 40:43–64, 1961. Slusher, R. E., L. W. Hollberg, B. Yurke, J. C. Mertz, and J. F. Valley. Observation of squeezed states generated by four-wave mixing in an optical cavity. Physical Review Letters, 55:2409–12, 1985. Snyder, A. W. Coupled-mode theory for optical fibers. Journal of the Optical Society of America, 62:1267–77, 1972. Snyder, A. W. and J. D. Love. Optical Waveguide Theory. Chapman and Hall, London, 1983. Snyder, D. L. Random Point Processes. Wiley Interscience, New York, NY, 1975. Snyder, D. L. and M. I. Miller. Random Point Processes in Time and Space. Springer, New York, NY, 2012. Splett, A., Ch. Kurtzke, and K. Petermann. Ultimate transmission capacity of amplified optical fiber communication systems taking into account fiber nonlinearities. In 19th European Conference on Optical Communication ECOC ’93, Montreux, Switzerland, volume 2, pages 41–4, 1993. Stark, H. and J. W. Woods. Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Englewood Cliffs, NJ, 1994. Stark, J. B., P. Mitra, and A. Sengupta. Information capacity of nonlinear wavelength division multiplexing fiber optic transmission line. Optical Fiber Technology: Materials, Devices and Systems, 7(4):275–88, 2001. Stern, T. E. Some quantum effects in information channels. IRE Transactions on Information Theory, 6(9):435–40, 1960. Stolen, R. H. Nonlinear properties of optical fibers. In S. E. Miller and A. G. Chynoweth, editors, Optical Fiber Communications, page 130. Academic Press Inc., New York, NY, 1979. Stolen, R. H. and C. Lin. Self-phase-modulation in silica optical fibers. Physical Review A, 17:1448–53, 1978. Strichartz, R. S. A Guide to Distribution Theory and Fourier Transforms. World Scientific, Singapore, 2003. Stuart, H. R. Dispersive multiplexing in multimode optical fiber. Science, 289:281–3, 2000. Sudarshan, E. C. G. Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams. Physical Review Letters, 10:277–9, 1963. Sun, H., K.-T. Wu, and K. Roberts. Real-time measurements of a 40 Gb/s coherent system. Optics Express, 16(2):873–9, 2008. Swindell, W. Polarized Light. Dowden, Hutchinson & Ross, Stroudsburg, PN, 1975. Taghavi, M. H., G. C. Papen, and P. H. Siegel. On the multiuser capacity of WDM in a nonlinear optical fiber: Coherent communication. IEEE Transactions on Information Theory, 52(11):5008–22, 2006. Takahasi, H. Information Theory of Quantum-Mechanical Channels, pages 227–310. Academic Press, New York, NY, 1965. Takeoka, M. and M. Sasaki. Discrimination of the binary coherent signal: Gaussian-operation limit and simple nongaussian near-optimal receivers. Physical Review A, 78(2):022320, 2008. Tang, J. T. K. and K. B. Letaief. The use of WMC distribution for performance evaluation of APD optical communication systems. IEEE Transactions on Communications, 46(2):279–85, 1998. Tanner, R. M. A recursive approach to low complexity codes. IEEE Transactions on Information Theory, 27(5):533–47, 1981. Teich, M. C. and B. E. A. Saleh. Squeezed state of light. Quantum Optics, 1:153–91, 1989.

Bibliography

899

Temprana, E., E. Myslivets, B. P. P. Kuo, L. Liu, V. Ataie, N. Ali´c, and S. Radi´c. Overcoming Kerr-induced capacity limit in optical fiber transmission. Science, 348(6242):1445–8, 2015. Tikhonov, V. I. Phase-lock automatic frequency control application in the presence of noise. Automatika i Telemekhanika, 21(3):209–14, 1960. Tonguz, O. K. and L. G. Kazovsky. Theory of direct-detection lightwave receivers using optical amplifiers. Journal of Lightwave Technology, 9(2):174–81, 1991. Treichler, J. and M. Larimore. New processing techniques based on the constant modulus adaptive algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2):420–31, 1985. Treichler, J. R and B. G. Agee. A new approach to multipath correction of constant modulus signals. IEEE Transactions on Acoustics, Speech and Signal Processing, 31:459–72, 1983. Tse, D. and P. Viswanath. Fundamentals of Wireless Communication. Cambridge University Press, Cambridge, 2005. Tufts, D. W. Nyquist’s problem; the joint optimization of transmitter and receiver in pulse amplitude modulation. Proceedings of the IEEE, 53(3):248–59, 1965. Turitsyn, K. S., S. A. Derevyanko, I. V. Yurkevich, and S. K. Turitsyn. Information capacity of optical fiber channels with zero average dispersion. Physical Review Letters, 91(20):203901, 2003. Turitsyna, E. G. and S. K. Turitsyn. Digital signal processing based on inverse scattering transform. Optics Letters, 38(20):4186–8, 2013. Ungerboeck, G. Nonlinear equalization of binary signals in Gaussian noise. IEEE Transactions on Communication Technology, 19(6):1128–1137, 1971. Ungerboeck, G. Channel coding with multilevel/phase signals. IEEE Transactions on Information Theory, 28(1):55–67, 1982. Urkowitz, H. Energy detection of unknown deterministic signals. Proceedings of the IEEE, 55(4):523–31, 1967. van der Pol, B. LXXXVIII. On “relaxation-oscillations”. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):978–92, 1926. Venghaus, H. and N. Grote. Fibre Optic Communication: Key Devices. Springer International Publishing, New York, NY, 2017. Verdeyen, J. T. Laser Electronics. Prentice Hall, Englewood Cliffs, NJ, 1995. Verdu, S. Spectral efficiency in the wideband regime. IEEE Transactions on Information Theory, 48(6):1319–43, 2002. Verdu, S. and T.-S. Han. A general formula for channel capacity. IEEE Transactions on Information Theory, 40(4):1147–57, 1994. Viterbi, A. J. Principles of Coherent Communication. McGraw-Hill, New York, NY, 1966. Viterbi, A. J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–9, 1967. Vlatko, V. Introduction to Quantum Information Science. Oxford University Press, Oxford, 2006. von Neumann, J. Mathematische Grundlagen der Quantenmechanik [Mathematical Foundations of Quantum Mechanics]. Springer, Berlin, 1932. An English translation by Robert T. Beyer was published in 1955 by Princeton University Press and reprinted in 1996. Waarts, R. G. and R. P. Braun. System limitations due to 4-wave-mixing in single-mode optical fibers. Electronics Letters, 22(16):873–5, 1986. Waks, E., E. Diamanti, and Y. Yamamoto. Generation of photon number states. New Journal of Physics, 8(1):4, 2006.

900

Bibliography

Webb, P. P., R. J. McIntyre, and J. Conradi. Properties of avalanche photodiodes. RCA Review, 35(2):234–78, 1974. Weedbrook, C., S. Pirandola, R. García-Patrón, N. J. Cerf, T. C. Ralph, J. H. Shapiro, and S. Lloyd. Gaussian quantum information. Review of Modern Physics, 84:621–69, May 2012. Wehrl, A. General properties of entropy. Reviews of Modern Physics, 50(2):221–60, 1978. Werts, A. Propagation de la lumière cohérente dans les fibres optiques. L’Onde Electrique, 46:967–80, 1966. Widmer, A. X. and P. A. Franaszek. A DC-balanced, partitioned-block, 8B/10B transmission code. IBM Journal of Research and Development, 27(5):440–51, 1983. Widrow, B. and M. E. Hoff. Adaptive switching circuits. In 1960 IRE WESCON Convention Record, pages 96–104, 1960. Wiener, N. Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications. MIT Press, Cambridge, MA, 1949. Originally issued as a classified National Defense Research Report in February 1942. Wiener, N. and E. Hopf. Über eine Klasse singulärer Integralgleichungen [On a class of singular integral equations]. Berichte der Preußischen Akadamie der Wissenschafen, 31:696–706, 1931. Wigner, E. On the quantum correction for thermodynamic equilibrium. Physical Review, 40:749– 59, 1932. Wilson, S. G. Digital Modulation and Coding. Prentice-Hall, Upper Saddle River, NJ, 1996. Winter, M., C.-A. Bunge, D. Setti, and K. Petermann. A statistical treatment of cross-polarization modulation in DWDM systems. Journal of Lightwave Technology, 27(17):3739–51, 2009. Winter, M., D. Setti, and K. Petermann. Cross-polarization modulation in polarization-division multiplex transmission. IEEE Photonics Technology Letters, 22(8):538–40, 2010. Winters, J. H. and S. Kasturia. Constrained maximum-likelihood detection for high-speed fiberoptic systems. In IEEE Global Telecommunications Conference. GLOBECOM ’91, pages 1574–9, 1991. Winzer, P. J. and R. J. Essiambre. Advanced modulation formats for high-capacity optical transport networks. Journal of Lightwave Technology, 24(12):4711–28, 2006. Winzer, P. J., A. H. Gnauck, C. R. Doerr, M. Magarini, and L. L. Buhl. Spectrally efficient long-haul optical networking using 112-Gb/s polarization-multiplexed 16-QAM. Journal of Lightwave Technology, 28(4):547–56, 2010. Wittmann, C., U. L. Andersen, M. Takeoka, D. Sych, and G. Leuchs. Demonstration of coherent-state discrimination using a displacement-controlled photon-number-resolving detector. Physical Review Letters, 104(10):100505, 2010. Wolf, E. and C. L. Mehta. Determination of the statistical properties of light from photoelectric measurements. Physical Review Letters, 13:705–7, 1964. Woodward, P. Theory of radar information. Transactions of the IRE Professional Group on Information Theory, 1(1):108–13, 1953. Wozencraft, J. M. and I. M. Jacobs. Principles of Communication Engineering. Waveland, Prospect Heights, IL, 1965. Wyner, A. D. Bounds on communication with polyphase coding. Bell System Technical Journal, 45(4):423–560, 1966. Yamamoto, Y. and H. A. Haus. Preparation, measurement and information capacity of optical quantum states. Reviews of Modern Physics, 58(4):1001–20, 1986. Yariv, A. Quantum Electronics. Wiley, New York, NY, third edition, 1989. Yonenaga, K. and S. Kuwano. Dispersion-tolerant optical transmission system using duobinary transmitter and binary receiver. Journal of Lightwave Technology, 15(8):1530–7, 1997.

Bibliography

901

Yoshitani, R. On the detectabillity limit of coherent optical signals in thermal radiation. Journal of Statistical Physics, 2(4):347–78, 1970. Yousefi, M. I. and F. R. Kschischang. Information transmission using the nonlinear Fourier transform, part I: Mathematical tools. IEEE Transactions on Information Theory, 60(7):4312–28, 2014a. Yousefi, M. I. and F. R. Kschischang. Information transmission using the nonlinear Fourier transform, part II: Numerical methods. IEEE Transactions on Information Theory, 60(7):4329–45, 2014b. Yousefi, M. I. and F. R. Kschischang. Information transmission using the nonlinear Fourier transform, part III: Spectrum modulation. IEEE Transactions on Information Theory, 60(7):4346– 69, 2014c. Yu, T., W. M. Reimer, V. S. Grigoryan, and C. R. Menyuk. A mean field approach for simulating wavelength-division multiplexed systems. IEEE Photonics Technology Letters, 12:443–5, 2000. Yuen, H. P. and M. Ozawa. Ultimate information carrying limit of quantum systems. Physical Review Letters, 70:363–6, 1993. Yuen, H. P. and J. H. Shapiro. Generation and detection of two-photon coherent states in degenerate four-wave mixing. Optics Letters, 4(10):334–6, 1979. Yuen, H. P., R. S. Kennedy, and M. Lax. Optimum testing of multiple hypotheses in quantum detection theory. IEEE Transactions on Information Theory, 21(2):125–34, 1975. Yuen, H. P. and V. W. S. Chan. Noise in homodyne and heterodyne detection. Optics Letters, 8(3):177–9, 1983. Zador, R. S. Bell Telephone Laboratories technical memorandum, 1965. Zehavi, E. and J. Wolf. On the performance evaluation of trellis codes. IEEE Transactions on Information Theory, 33(2):196–202, 1987. Zeng, H. H., L. Tong, and C. R. Johnson. Relationships between the constant modulus and Wiener receivers. IEEE Transactions on Information Theory, 44(4):1523–38, 1998. Zheng, L. and D. N. C. Tse. Diversity and multiplexing: A fundamental tradeoff in multipleantenna channels. IEEE Transactions on Information Theory, 49:1073–96, 2003. Ziemer, R. E. and W. H. Tranter. Principles of Communications: Systems, Modulation, and Noise. John Wiley, New York, NY, 2015. Zurek, W., editor. Complexity, Entropy, and the Physics of Information. Addison-Welsey, Redwood City, CA, 1980. Zweck, J. and C. R. Menyuk. Validity of the additive white Gaussian noise model for quasilinear long-haul return-to-zero optical fiber communications systems. Journal of Lightwave Technology, 27(16):3324–35, 2009. Zyskind, J. L., J. A. Nagel, and H. D Kidorf. Erbium-doped fiber amplifiers for optical communications. In T. L. Koch, editor, Optical Fiber Telecommunications IIIB, pages 13–68. Academic Press, San Diego, CA, 1997.

Index

Entries in bold are primary references, usually a definition. Absorption, 112, 298 coefficient, 113 cross section, 112, 316 infrared, 113 internal, 316 ultraviolet, 113 Adaptive bit loading, 712 Adaptive estimation, 599 Add–compare–select, 540 Additive-noise-limited channel, 19, 688 Additivity property of entropy, 279, 674, 862 product state, 786 Algebraic block code, 611, 612 binary, 617, 640 critical rate, 611, 635 generator matrix, 614 irregular, 647 matrix description, 613 performance, 623 properties, 613 Reed–Solomon, see Reed–Solomon code regular, 647 weight profile, 628, 635 Algorithm back-propagation, 564, 724 Bahl, 644 split-step Fourier method, 229 Viterbi, 540, 565, 626 Aliasing, 46, 399 Alphabet, 1, 609 q-ary, 609 binary, 609 Amplified spontaneous emission, 334, 337 Amplifier, see Lightwave amplifier Amplify-and-forward, 5 Amplitude peak, xxv root-mean-squared, xxv, 39, 324 Amplitude modulation, 12 Amplitude-shift keying, 415, 467 Analytic signal, 32, 101 Ancilla embedding, 103, 810 Ancilla state, 789, 810

image mode, 795 unoccupied, 792 Angular frequency, 8, 30, 235, 518, 570 Anharmonic oscillator, 202 Anisotropic material, 81, 96 Annihilation operator, 770 Anomalous dispersion, 161 Antenna, 18, 298 Anti-Stokes scattering, 205 Antipodal modulation, 414, 466, 477 error rate, 423, 476 quantum optics, 775, 831 Approximation gaussian, 453, 506 paraxial, 92, 120 Arrival process photodetection, 309 photoelectron, 258, 285, 294 photon, 237, 257, 311 Arrival rate constant, 259, 286, 428, 441 photoelectron, xxvi, 258, 270, 312, 692 constant, 331, 519 photon, 9, 122, 257, 265, 309, 311 time-varying, 257 Associative property, 29 Asymptotic coding gain, 625, 659 energy efficiency, 474, 492, 689 Attenuation, 155 coefficient, 114 material absorption, 112 optical fiber, 112 Autocorrelation function, 68, 72, 106, 294 additive white gaussian noise, 344 circularly symmetric gaussian noise, 267 complex-baseband noise, 76 electrical, 98, 109 fourth-order, 267 frequency, 199 laser, 345 lightwave noise power, 267 lightwave power, 98

Index

noise, 74, 279 optical, 109 width, 70 Autocovariance function, see Covariance function Avalanche photodiode, 290, 306, 515 absorption region, 306 characteristic function, 333 current loop, 307 gain, 306 multiplication region, 306 probability mass function, 332 Axial field component, 88, 124, 139 Axis fast optic, 97, 183 optic, 97 polarization, 96 principal, 81 slow optic, 97, 183 Back propagation, 564, 724 Bahl algorithm, 644 Balanced modulator, 330 Balanced photodetection, 9, 19, 308, 378, 494, 574 Bandgap, 303, 319 Bandlimited channel, 18, 21, 238, 271, 398, 522, 659 gaussian noise, 271, 274 gaussian random process, see Random process, gaussian, bandlimited information channel, 676, 706, 710 lightwave signal, 277, 294 lightwave source, 325 noise, 75 random process, see Random process, bandlimited signal, 12, 24, 34 system, 26, 74 waveform, 45, 109, 723 wavelength-multiplex channel, 726 Bandwidth, 12, 34 3-dB, 34, 355 effective, 70 half-power, 34, 74 ideal, 34 maximum, 45, 281 noise circularly symmetric gaussian, 271 noise-equivalent baseband, 74 complex-baseband, 77, 280 passband, 74, 77 root-mean-squared, 34, 100 Baseband noise-equivalent bandwidth, 74 signal, 3, 12, 24 power, 11

waveform, 3, 45, 398 Basis, 41 change of, 48, 102, 745 countable, 39, 751 finite field, 613 Fourier, 45 gaussian distribution, 59 mode, 88 Nyquist, 45 orthonormal, 41, 50, 88, 100, 277 polarization mode, 96 principal polarization state, 370 spatial modes, 147 square-root, 840 uncountable, 751 Bayes’ rule, 55, 413, 545, 640, 651 Beamsplitter, 301, 789 Beer–Lambert law, 113 Belief propagation, 665 Bending loss, 151 Berrou code, 642 consensus, 644 constituent decoder, 644 prior, 644 error rate floor, 646 minimum Hamming distance, 646, 655 parallel concatenation, 646 Bessel differential equation, 134, 151 Bessel function expansion large-argument, 168 small-argument, 153 first kind, 132, 134, 406 identity, 141 modified, 62, 104, 518, 638 recursion relations, 136 Bias, 15, 513 current, 324, 327 forward voltage, 304 reverse voltage, 304, 305, 352 Binary asymmetric channel, 436, 439, 441 quantum, 831, 868 Binary block code, 617, 640 Binary entropy function, 636, 675, 701, 865 Binary phase-shift keying, 466, 475, 625 error rate, 476 error rate floor, 485 with phase noise, 484 Binary symmetric channel, 434, 439, 456, 701 capacity, 705 crossover probability, 434 quantum, 831 Binomial coefficient, 590, 624 probability mass function, 260 Biorthogonal modulation, 499

903

904

Index

Birefringence, 81, 370 circular, 97, 183 fiber, 370 linear, 97 optical fiber, 183 intensity-dependent, 218 linear, 185 Bit, 1, 240 energy, 22, 423, 466 error, 6, 431 error rate, 6, 15, 25, 477 Bivariate gaussian random variable, 106 probability density function, 54 gaussian, 58, 104, 251, 253 maximum-entropy, 252 Block code, see Algebraic block code detection, 401, 495, 573, 849 quantum-lightwave, 869 signal, 255, 373, 491, 554, 724, 732 synchronization, 573 transfer function, 373 Block-symbol state, 742, 778, 815, 819, 846, 848 construction, 850 detection, 849, 851, 852 detection operator, 816 entangled state, 781, 819, 861 partial trace, 858 product state, 781, 819 Blocklength, 614 Blue-shift, 213, 404 Boltzmann constant, 240 entropy, 673 factor, 206, 245 probability density function, 250, 293 Bose–Einstein probability mass function, 244, 293 Bound Shannon bandlimited channel, 21, 711 single-letter capacity, 697, 699 spectral rate efficiency, 22, 719 union, 479, 541, 840, 842 von Neumann entropy, 868 Boundary cladding/casing, 137 core/cladding, 127, 133, 138 planar, 115 Boundary conditions, 80, 88, 115, 121, 127, 151 step-index fiber, 138 Bounded-distance decoder, 622 Bra, 48, 751 Bra–ket notation, 750 Bragg diffraction grating, 207 Branch (mode)

characteristic equation, 130, 142 Branch (trellis), 537, 538, 626 metric, 539, 632 state-dependent, 560 label, 538, 656 trellis-coded modulation, 658 Brillouin scattering, 206 stimulated, 313 Burst error, 608 Campbell’s theorem, 286, 289, 296, 428, 458, 569 derivation, 290 modified form, 290 Campbell–Baker–Hausdorff identity, 772 Canonical commutation relationship, 764, 766 Canonical quantization, 751, 761 Capacity, see Channel capacity Carrier, 12 coherent, 6, 13, 214, 560 frequency, 12, 31 noncoherent, 6, 13, 25, 172, 387, 392, 393, 467, 556, 566 wavelength autocorrelation function, 179 partially coherent, 471, 507, 510, 569 Causal impulse response, 73 Causality, 29 Central chi probability density function, 65 one degree of freedom, 65 Central chi random variable, 65 Central chi-square probability density function, 63, 372, 376 N degrees of freedom, 64, 274 Central chi-square random variable, 63 Central limit theorem, 66, 216, 256, 279, 286, 371, 553, 726 Centroid, 34 Centrosymmetric material, 203 Chain rule entropy, 674, 731 Channel, 1, 17 additive white gaussian noise, 271, 281, 425, 530 discrete-time, 400 additive-noise-limited, 19, 688 bandlimited, 18, 21, 238, 271, 398, 522, 659, 671, 676, 710 binary asymmetric, 436, 439, 441 quantum, 831, 868 binary symmetric, 434, 439, 456, 701 “black box”, 672, 691, 843 capacity, see Channel capacity classical ideal, 854, 856 code, see Code, channel coherence timewidth, 590 continuous-time memoryless, 374

Index

discrete-time, 17, 18, 398, 411, 494, 671, 698 additive bandlimited gaussian noise channel, 634 additive white gaussian noise, 412 electrical, 397, 411 matrix, 602 electrical, see Electrical channel gain, 709 noise, 872 signal, 872 “gray box”, 691, 843 impulse-response matrix, 373 information, see Information channel interference-limited, 552 lightwave, see Lightwave channel memoryless, 672 Kerr, 729 continuous-time, 374 discrete-time mimo, 707 photon-noise-limited, 446 space-time separable, 495 multi-input multi-output, 15, 372, 602 continuous-time, 374 mode-multiplex, 372 multiplex, see Multiplex channel noiseless product-state, 858 noisy product-state, 860 nonlinear, 23, 559, 722 interference-limited, 726 Kerr, 722 phase-insensitive, 253, 799, 871 photon-noise-limited, 265, 402 photon-optics, see Photon optics physical, 4, 17, 361 Poisson, 265, 311, 401, 692 product, 706 quantum-lightwave gaussian, 871 reverse, 529 signal-dependent-noise, 20 space-multiplex, 15, 369 space-time separable, 376, 494 state, 536 state estimation, 590 state vector, 536 transfer-function matrix, 370, 373 transition matrix, 636, 672 transition probability matrix, 636, 672, 679, 817 wave-optics, see Wave optics, channel waveform, 17, 397 bandlimited, 18 electrical, 17, 398 lightwave, 17 Z , 693 Channel capacity, 20, 237, 363, 669 additive-noise-limited channel, 688, 697

905

classical limit, 695 large-signal limit, 695 bandlimited, 21 discrete-time memoryless channel, 679 equal-energy signaling, 711 photon-optics, 21, 713 wave-optics, 21, 676, 710 binary-symmetric channel, 636, 705 classical, 859 continuous information channel, 710 dispersionless nonlinear lightwave channel, 724 ideal classical channel, 854 ideal photon-optics channel, 685 homodyne demodulation, 694 large-signal limit, 686 small-signal limit, 687 small-signal limit per photon, 687 information-theoretic, 21, 670 intensity modulation, 701 multi-user detection, 728 noiseless product-state channel, 858 noisy product-state channel, 860 nonlinear interference-limited channel, 726 phase modulation, 638, 703 Poisson channel, 692, 731, 867 product channel, 706 quantum-lightwave channel additive-noise, 871, 876 block-symbol state, 869 classical, 843 component-symbol state, 867 phase-insensitive, 876 thermal-noise, 876 single-carrier nonlinear lightwave channel, 723 single-letter, 20, 676 photon-optics, 244, 680 wave-optics, 684, 696 Channel matrix, 373, 401, 707, 871 frequency domain, 373 nonhermitian, 189 normal, 495, 554 polarization, 370 time domain, 373 Characteristic equation, 128, 158 hybrid mode, 141 linearly polarized mode, 153 TE mode, 128 TM mode, 129 Characteristic function, 56, 275 conditional, 288 exponential probability density function, 282 noncentral chi-square probability density function, 64, 275 Poisson probability distribution, 260, 264 Poisson transform, 263 probability mass function, 56

906

Index

quantum, 799 Check equations, 614, 649 Gallager code, 649 Tanner graph, 647 Check matrix, 614 regular, 617, 646 systematic, 616 Checkword, 642 Chi probability density function central, 65 noncentral, 65 Chi-square probability density function central, 63, 372, 376, 731 N degrees of freedom, 64, 274 two degrees of freedom, 64, 273 noncentral, 105, 347, 360, 805 N degrees of freedom, 64, 274, 284 two degrees of freedom, 63, 255, 273, 340, 561 Chirp, 213, 404 Chromatic dispersion, 161 Circular polarization, 94 Circularly symmetric complex gaussian random variable, 60 complex gaussian random vector, 60 gaussian noise, 256, 271, 334, 390, 481, 573, 575, 724, 748, 872 gaussian prior, 697, 699, 877 gaussian probability density function, 58, 78, 254, 872 multivariate, 60, 707 gaussian random process, 255, 343, 574 bandlimited, 277, 283 gaussian random variable, 58, 78, 251, 254, 281, 483, 518, 553, 602, 696 with a bias, 255 gaussian random vector, 106 gaussian signal state, 748, 806, 873 gaussian smoothing function, 802 Classical information, 679, 736, 815 optics, 6 signal state, 741 Clock phase, 473 estimation, 587 synchronization, 573, 586 Closure property, 774 Coaxial cable, 124 Code algebraic, see Algebraic block code Berrou, see Berrou code binary block, 617, 640 channel, 2, 15, 534, 535, 608 check matrix, 614 composite, see Composite code constrained, 660 convolutional, see Convolutional code

critical rate, 611, 635 cutoff rate, 611, 635 cyclic, 613, 620 data modulation, 609 data transmission, 608 DC-balanced, 663 DC-free, 663 dicode, 663 dual, 614 duobinary, 663 modified, 663 error control, 608 Gallager, see Gallager code graphical description, 625 Gray, 451, 477, 515 Hamming, 615 heft, 616, 647 inner, 609 LDPC, see Gallager code linear, 613 low-density parity-check, see Gallager code maximum-distance, 613, 620 modulation, 659 Morse, 15 outer, 609, 612, 655 partial-response, 534, 663 precode, 534 punctured, 643 rate, 2, 534, 609 Reed–Solomon, see Reed–Solomon code repetition, 615, 666 runlength-limited, 660 source, 15 spectral-notched, 663 stream, 628, 662 turbo, see Berrou code Ungerboeck, 656 Codebit, 609, 617 convolutional code, 631 energy, 623, 646 Gallager code, 652 Codebook, 609, 780, 847 quantum, 847 Coded-modulation, xx, 3, 15, 22, 238, 413, 464, 534, 685, 741 Codeframe, 630 Codeword, 2, 5, 15, 413, 609 block error rate hard-decision, 624 soft-decision, 625 error rate, 475, 610 sensed, 610 Coding gain, 475, 610, 659 asymptotic, 625, 659 Coherence, 6 function, 97, 105

Index

temporal, 68 interval, 70 partial, 13, 388, 470, 556, 569 quantum, 7, 782, 784, 805 region, 98, 374, 376 spatial, 98, 374 temporal, 97 time, 78 timewidth, 70, 98, 293, 294 Coherent carrier, 6, 12, 13, 214, 560 Coherent state, 737, 739, 761, 762, 772 P distribution, 797 binary detection, 831 binary error rate, 831 density matrix for a pure signal state, 802 displacement operator, 772, 836 in-phase component operator, 761 minimum-uncertainty, 749, 767, 802 nonorthogonality, 774, 775, 848, 867 operator, 762, 789 photon-number-state representation, 741, 773 quadrature component operator, 761 state detection, 789 symmetric, 840 Colored noise, 74 Commutation relationship, 810 Commutative property, 29 Commutator, 51, 102, 763 Commuting operator, 792 Complementary error function, 57, 437, 456 Complex-conjugate transpose, 42, 60, 107, 351, 751 coherent-state operator, 762 dispersion relationship, 165 gradient, 596, 606 random variable, 53 Complex-baseband impulse response, 40, 364 noise autocorrelation function, 76 bandwidth, 271 equivalent bandwidth, 77 power density spectrum, 76 probability density function, 250 random process, 75 signal, 12, 39, 53, 77, 101, 307, 362, 383 energy, 44 transfer function, 40 waveform, 399 Complex envelope field, 88, 144, 387 power density spectrum, 179, 198 signal, 39, 144, 162, 361, 389, 724 spectrum, 364 vector form, 218 Component in-phase, 39, 313

907

quadrature, 39, 313 Component maximum posterior, 544 Component-symbol state, 742, 759, 776, 778, 780, 815, 819, 844, 846, 869 marginalization to, 858 Componentwise maximum posterior, 544 Componentwise maximum-posterior decoding, 639 Composite code, 611, 639, 641 Berrou, 642 Gallager, 641 low-density parity-check, see Gallager code turbo, see Berrou code Composite signal space, 742, 759 state, 742, 813 Concave function, 719 Conditional entropy, 673, 854 probability, 54, 291, 545 density function, 54, 431 of a detection error, 437, 507 Conjugate symmetric signal, 32, 269 Conjugate variables, 767 Consensus, 644 Constant Boltzmann, 240 Euler, 153, 731 free-space permeability, 80 free-space permittivity, 80 Planck’s, 8 reduced Planck’s, 8 Constant-modulus objective function, 598 Constellation, see Signal constellation Constitutive relationships, 81, 83, 150 Constrained code, 660 Constraint length, 536, 543, 551 of a constituent code, 642 of a convolutional code, 629 of intersymbol interference, 536 Continuous information channel, 18, 634 gaussian, 698 memoryless, 634 Continuous-time channel memoryless, 374 multi-input multi-output channel, 374 system, 28 waveform, 17, 398, 414 Controlled local oscillator, 576 Convolution, 28 associative property, 29 commutative property, 29 distributive property, 29 Fourier transform of, 31 Convolutional code, 611, 628 binary, 629

908

Index

codebit, 631 codeframe, 630 constraint length, 629 dataframe, 630 decoding sequential, 633 Viterbi, 629 encoder, 629, 630 generator polynomial, 630 performance, 634 trellis, 632 weight profile, 635 Correlation, 55 coefficient random variable, 58 signal, 466 pseudocovariance function, 268 matrix, 107 Correspondence principle, 762, 771, 792 Costas loop, 582 Counting soldiers, 667 Coupled-mode theory, 146, 299, 371 Coupler 180-degree hybrid, 300 3-dB, 301, 351 90-degree hybrid, 301 directional, 300 symmetric, 300 lightwave, 298 multimode interference, 301 Coupling amplitude–phase, 405 coefficient, 147 efficiency, 117, 150 fiber, 150 matrix, 299 mode, see Mode coupling Covariance, 56, 58 function, 68, 346 matrix, 59, 749 gaussian state, 806 noise, 255, 495 real, 57, 252, 593 signal, 707, 733 spatiotemporal, 375 Cramér-Rao bound, 593 Creation operator, 770 Critical angle, 116, 130 rate, 611, 635, 636 Cross -coherence function, 98 -phase modulation, 224, 227, 553 interference, 226 -polarization

modulation, 218 -power density, 89 Cross section, 315 absorption, 112, 316 scattering, 114 stimulated emission, 355 Crossover probability, 434 Crosstalk, 16, 368, 371, 373 Cumulative probability distribution function, 53 Curl, 79, 83 Cutoff condition, 130, 153, 158 hybrid mode, 143 linearly polarized mode, 137 Cutoff rate, 611, 635 figure, 636 phase-shift keying, 638 Cyclic code, 613, 620 Dark current, 306, 312, 358 noise, 331 Dataframe, 572, 630 Dataword, 2, 15, 609, 617 dBm, 25 Decision region, 430, 477, 478, 817 complex gaussian noise, 480 real gaussian noise, 436 Decision subspace region classical, 497, 818 quantum-optics, 820, 839 Decision-feedback demodulation, 419, 533 combined with a linear equalizer, 533 error propagation, 533 Decoder, 5, 18, 361, 410 bounded-distance, 622 data modulation, 609 data transmission, 609 hard-decision, 545, 610 incomplete, 622 iterative, 611 message passing, 648 sequential, 611 soft-decision, 545, 610, 625 spherical, 611 Viterbi, 633 Decoding, 5, 523, 610, 691 error, 622, 622, 624 failure, 622 hard-decision, 698 maximum-likelihood, 629, 639 sequential, 629, 633, 637 sphere, 628 spherical, 621 trellis-based, 632 turbo, 643 Decoherence, 783 Delay

Index

group, see Group delay ray, 156 Delay spread, 176 control, 180 distribution, 177 maximum, 178 minimum, 177 ray optics, 156, 191 root-mean-squared value, 182 step-index fiber maximum, 156, 179 wavelength-dependent, 181, 214 Delay-line interferometer, 301 Delay-locked loop, 587, 606 Demodulation, 4 decision-feedback, 533 combined with a linear equalizer, 533 error propagation, 533 direct photodetection, 20 energy, 500, 509 envelope, 14, 24, 388 hard-decision, 5, 18 heterodyne, 13, 379, 381 quantum-lightwave, 792 homodyne, 13, 380, 383, 694 quantum-lightwave, 790 to a complex-baseband waveform, 380, 574 to a real-baseband waveform, 380 to quantum complex-baseband, 795 noncoherent, 500, 512 noncoherent binary, 518 partial-response decision-feedback, 419 partial-response maximum-likelihood, 664 phase-asynchronous, 14, 362, 388, 463, 500 phase-sensitive, 790 phase-synchronous, 13, 524 quantum-lightwave, 747 soft-decision, 5, 18 Demodulator, 307, 361 complex-baseband, 379 direct-photodetection, see Direct photodetection dual-polarization, 493 heterodyne, 307, 381 homodyne, 309 phase-asynchronous, 14, 500, 502 phase-synchronous, 379, 476 quantum heterodyne, 792 Density function, see Probability density function Density matrix, 679, 680, 746, 777 average, 780, 847 block-symbol state, 780, 858 component-symbol state, 780, 858 basis, 781 coherent ensemble, 783 marginalized, 779, 859 mixed signal state, see Mixed signal state

noncoherent state, 783 photon-number-state representation additive noise, 803 bias plus noise, 804 thermal noise, 872 product state, 779, 786, 857 pure signal state, see Pure signal state signal constellation, 780, 780 signal plus noise, 838 statistical ensemble, 783 vacuum state, 872 Density operator, 746 Depletion region, 305 Detection, 5, 8 hard-decision, 5, 18, 418, 610, 698, 700 photon-optics signal model, 692 maximum-likelihood, 433 maximum-posterior, 433 maximum-posterior sequence, 544 multi-user, 373, 728 optimal, 477 quantum, 747, 816, 846 coherent states, 831 sequence, 542 signal state, see Signal state, detection single-user, 373, 726 soft-decision, 5, 18, 418, 610, 698 symbol-by-symbol, 5, 528, 545 Detection filter, 5, 390, 423 additive noise, 426 integrating, 425, 458 intersymbol interference, 556 linear, 424 matched, see Matched filter minimum error, 529 Detection operator, 816, 820 antipodal coherent state modulation, 831 block-symbol state, 816, 849 component-symbol state, 816, 849, 867 displacement receiver, 835, 880 homodyne demodulation, 791, 835, 880 optimal, 832 relationship to matched filter, 834 orthogonal pure signal state, 829 quantum on–off keying, 834, 880 suboptimal, 835 table of, 832 Detection statistic, 395, 412, 419, 424, 525 complex sample, 477 photon-counting, 692 sufficient, 399, 412, 419, 420, 622 Detection threshold, 5, 433 binary symmetric channel, 439 dependence on prior probability, 439 direct-photodetection, 433 optimal, 436

909

910

Index

photon counting, 402 ideal, 441 nonideal, 442 Determinant, 48 Dielectric material, 80, 84, 110, 151, 166, 202 waveguide, 87, 124 cylindrical, 132 Differential delay, 191 group delay, 214 per unit length, 185 polarization-dependent, 185, 196 phase shift, 184 transit time, 156 Differential entropy, 240, 673, 696 conditional, 724 Differential equation Bessel, 151 Helmholtz, 84 lightwave amplification, 316 nonlinear pulse propagation, 217 nonlinear Schrödinger, 202 phase-locked loop, 578 Differential-phase modulation, 469 binary, 469 bit error rate, 503 multilevel, 504 Differential-phase-shift keying (DPSK), see Differential-phase modulation Diffraction, 298 grating, 207 Digitize, 1 Dipole, 80 Dirac impulse, 28, 89, 345 signal-state representation, 752 Direct photodetection, xxv, 9, 14, 310, 519 demodulation, 20 with intensity modulation, see Intensity modulation Directional coupler, 300 symmetric, 300 Discrete information channel, 18, 610, 672, 679, 691 memoryless, 634, 679 Discrete-time additive white gaussian noise channel, 400, 412 channel, 17, 18, 398, 411, 494, 671, 698 channel matrix, 602 direct-photodetection channel, 401, 412 electrical channel, 397, 411 Poisson channel, 401, 412 sequence, 398 Discriminator, 311 Dispersion, 4, 126 anomalous, 161

chromatic, 161 -controlled optical fiber, 229 group-velocity, 161, 167 intermodal, 155, 163, 169, 176, 194 intramodal, 155, 169, 173, 194 length, 213, 216, 226, 235, 404, 563 linear, 4, 126, 155, 175 material, 82, 155, 163 coefficient, 164, 175, 180 zero wavelength, 164, 180 multiple sources, 181 narrowband, 159 nonlinear, 155 normal, 161 polarization-mode, 94, 97, 155, 182 ray optics, 156 -shifted optical fiber, 181 slab waveguide, 157 slope, 194 transit time, 156 waveguide, 194 coefficient, 174, 180 Dispersion relationship, 6, 86, 159, 167, 191, 364 complex, 165 graded-index optical fiber, 169 normalized, 158, 159 plane-wave, 86 power series approximation, 160 waveguide mode, 157 Dispersive material, 82 Displacement operator, 772, 836 Displacement receiver, 445, 488 binary error rate, 446 detection operator, 835, 880 Distance euclidean, 44, 478, 539 maximum, 850 minimum, 414 squared, 44, 478, 774, 840 Hamming, 613 minimum, 613, 616 metric, 478 Distortion, 4, 155 intensity-dependent, 16 linear, 4 mode-dependent group delay, 175 multiple sources, 181 polarization-dependent delay, 187 polarization-dependent loss, 189 wavelength-dependent group delay, 179 nonlinear, 4, 216 single pulse, 219 Distributive property, 29 Divergence, 79 Diversity, 608 polarization, 491

Index

Doping, 303 Doppler shift, 207 Down-conversion, 304, 379 Dual code, 614 space, 614 Dual-polarization keying, 491, 493 Dyadic product, 43 Effective area, 211, 232 length, 213, 229, 234, 235, 563 power, 223, 233 Effective timewidth, 34 Eigenbasis, 49, 51, 741, 753, 758, 831 countable, 753 uncountable, 753 Eigenfunction, 33, 86, 88, 278 Eigenmode, 33, 86, 88, 121 Eigenstate, 742, 752, 754, 755 common set for observables, 748 homodyne demodulation, 835 of coherent-state operator, 768 photon-number operator, 754 polarization measurement, 756 sampling, 821, 836, 879 sampling operator, 823, 828, 834 Eigenvalue, 33, 47, 50, 86, 278 Eigenvector, 47 of a covariance matrix, 59 orthogonal, 185 Eikonal equation, 92, 105, 119 Elastic scattering, 114 Electric field, 79 complex representation, 84 cylindrical coordinates, 138 Electric flux density, 79 Electrical bit energy, 423, 466 channel, 17, 361 discrete-time, 397, 411 intermodal-dispersion-limited, 392 intramodal-dispersion-limited, 393 nonlinear, 556 relationship to lightwave channel, 362 optical conversion, 9, 362 pulse, xxvii, 100, 391 energy, 365, 382, 423, 426, 466 signal-to-noise ratio (SNR), 75, 108, 268, 343 relationship to OSNR, 268 waveform, xxvii, 108 classification, 377 direct photodetection, 296 noiseless, 416 noisy, 430 phase-synchronous demodulation, 307

Electro-optic effect, 329 Electromagnetic field, 79 complex-baseband, 87 wave, 83 Electron–hole pair, 303 Electrostrictive force, 207 Encoder, 2, 18, 241, 242, 361, 410 Berrou code, 642 classical, 844 convolutional code, 629, 630 inner, 612 optimal, 670 outer, 612 state, 658 systematic, 617 trellis description, 632 Energy codebit, 623, 646 complex-baseband signal, 44 direct-photodetected lightwave, 336 electrical bit, 22, 423, 427, 466 electrical pulse, 365, 382, 423, 426, 466 level, 244 passband signal, 12, 25, 44 photon, 8 quanta, 203 redistribution, 4, 146 between subchannels, 373 linear, 155, 202, 395 multimode fiber, 155 nonlinear, 216, 221 thermal, 112, 206, 243 vacuum-state, 246, 770, 771 zero-point, 246, 771 Energy demodulation, 500, 509 Energy efficiency, 474 intensity modulation, 505 PSK, 474 QAM, 474 Ensemble, 68 Entangled state, see Signal state, entangled Entropy, 239, 673 additivity property, 279, 674, 862 Boltzmann, 673 chain rule, 674, 731 classical, 673 conditional, 673, 854 differential, 240, 673, 696 conditional, 724 Gordon distribution, 685 joint, 673 noise, 682, 698, 731 Poisson distribution, 687, 692, 732 product distribution, 674 Shannon, 673, 785

911

912

Index

table of common distributions, 240 units, 240, 696 von Neumann, see von Neumann entropy, 880 Envelope, 12, 14 complex field, 88, 144, 387 complex signal, 12, 39, 144, 162, 361 vector, 218 demodulation, 14, 24, 388 passband signal, 162 root-mean-squared amplitude, 364 Envelope method, 700, 729 Equalization, 522, 527 linear, 527 with additive noise, 530 matched filter, 528 nonlinear, 559 optical fiber dispersion, 181 zero-forcing, 528 Equalizer constant modulus, 598 minimum mean-squared error, 529 nonlinear minimum-error, 541 zero-forcing, 528, 533 multi-input multi-output channel, 554 Equivalence class, 496 Erbium, 313 -doped lightwave amplifier, see Lightwave amplifier Ergodic process, 69 Error bit, 6, 431 block, 6 burst, 608 codeword, 6 decoding, 622 frame, 6 function, 57, 63, 347 complementary, 57, 437, 456 message, 6, 676 propagation, 533 random, 608 symbol, 5, 21, 451, 456 Error rate antipodal modulation, 476 based on the union bound, 481 binary coherent state, 831 binary orthogonal modulation, 476 binary phase-shift keying, 476 with phase noise, 485 binary pure signal state, 830 bit, 6, 15, 25, 477 bit error vs. symbol error, 451 block, 6 codeword, 6, 15, 610 differential-phase-shift keying (DPSK), 503

equal-energy phase-asynchronous modulation, 503 floor, 485 frame, 6, 589 gaussian approximation, 506 heterodyne binary phase-shift keying, 487 homodyne binary phase-shift keying, 487 shot-noise limit, 487, 832 with matched local oscillator, 832 message, 6, 15, 676 multilevel modulation, 447 on–off keying, 438 quantum, 834 shot-noise limit, 506 with additive white gaussian lightwave noise, 453, 504 phase-shift keying, 483, 504 photon counting, 441, 832 binary keying, 441 multilevel keying, 451 with a matched local oscillator, 832 with noise photons, 441 quadrature amplitude modulation (QAM), 482 quantum antipodal modulation, 832 quantum on–off keying, 832 sequence detection, 540 shot-noise-limited, 487 symbol, 451 Estimation adaptive, 599 carrier phase, 13, 574 decision-directed, 581 error, 577 maximum-likelihood, 574 channel matrix, 602 channel state, 590 clock phase, 587 detection filter, 595 impulse response, 591 joint carrier and clock phase, 588 nonlinear phase, 463 polarization-state, 553, 600, 606 spatial mode, 601 stochastic gradient decent, 600, 603 Étendue, 117 Euclidean distance, 44, 478, 539 maximum, 850 minimum, 414 squared, 44, 478, 774, 840 Euler’s constant, 153, 731 Evanescent field, 131 mode, 131, 148 Evidence extrinsic, 547, 641, 649 intrinsic, 545, 547, 641, 649, 655

Index

Evolution equation for pulse timewidth, 222, 232 Excess noise factor, 332, 515 Expectation, 53, 270 quantum, 754, 777 Expected number of noise photons, 241, 490, 684, 695, 803, 838 of photons, 241 of signal photons, 241, 739, 805 Expected value, 53 Exponential probability density function, 65, 103, 250, 273, 376, 702 random variable, 65 Extinction ratio, 327, 516 Extrinsic evidence, 545, 547, 641, 649 Berrou code, 645 Gallager code, 651 Eye diagram, 525 Fabry–Pérot resonator, 357 Fano inequality, 675 Fast optic axis, 97, 183 Fermat’s principle, 92 Few-mode fiber, 369, 372, 601 Fiber, see Optical fiber Field electric, 79 electromagnetic, 79 finite, see Finite field magnetic, 79 monochromatic, 80, 83, 125, 151 complex, 125 narrowband, 87 plane-wave, 85 Filter all-pass, 364 causal, 29 detection, 390, see Detection filter estimation, 595 image-rejecting, 578 linear shift-invariant, 29 loop, 577, 580, 604 lorentzian, 295 noise-suppressing, 271, 293, 354, 381, 382, 396, 559 transversal, 531, 559 Wiener, 591, 598 Finite field, 612, 618 basis, 613 characteristic-two, 613 orthogonal vectors, 613 First moment, 53 Fluctuation–dissipation theorem, 335 Flux density, 79 Fock state, see Photon-number state Forward error correction, 619

913

Forward–backward algorithm, see Bahl algorithm Four-dimensional signal constellation, 491 Four-wave mixing, 205, 225, 228, 234 interference, 228 interchannel, 228, 553 intersymbol, 229 intrachannel, 229 Fourier series, 45, 355, 406, 516 cosine, 104 Fourier transform, 30, 38 convolution property, 31 differentiation property, 31 inverse, 30, 217, 261, 284, 570 modulation property, 31 properties, 30, 99 scaling property, 30 sign convention, xxv spatial, 30 temporal, 30, 125, 162 Frame (data), 5 error rate, 6 synchronization, 573, 588, 590 Frame (trellis), 537, 634 Fraunhofer diffraction, 402 Free-space impedance, 108, 140 permeability, 80 permittivity, 80 phase velocity, 86 wavelength, 84 wavenumber, 80, 84, 121, 165, 209 Frequency, 8 angular, 8, 30, 235, 518, 570 carrier, 12, 31 instantaneous, 212 mixer, 307, 578 modulation, 463 noise, 579 normalized, 128, 151, 157 spatial, 30, 120 translation, 3, 12 Frequency-shift keying, 468 coherent, 469 noncoherent, 469 orthogonality condition, 469 phase noise floor, 520 Full-rank matrix, 48, 189, 614 Function Bessel first kind, 132, 134, 406 identity, 141 modified, 134 coherence, 97 complementary error, 57, 437, 456 concave, 719 error, 57, 63, 347

914

Index

gamma, 64, 66, 233, 274 generalized, 28 Gordon, see Gordon function Hermite polynomial, 145, 154 Laguerre polynomial, 66 log-likelihood, 477, 542, 583 lorentzian, 345, 356 modified Bessel, 62, 104, 518, 638 objective, 241, 530 pseudocovariance, 268 radial prolate spheroidal, 280 rect, 36 signum, 29 sinc, 37 smoothing, 801 triangular, 72 unit-step, 29 Gain homogeneous, 320 inhomogeneous, 320 internal photodetector, see Photodetector, internal gain lightwave amplifier, see Lightwave amplifier, gain modulation, 320 noise, 307 per unit length, 315 saturation, 316 small-signal per unit length, 316 Gallager code, 641, 646 check equations, 649 codebit, 652 irregular, 647 posterior probability ratio, 649 regular, 647 Gallager induction, 668 Galois field, 612, 618 Gamma function, 64, 66, 233, 274 probability density function, 65, 405 random variable, 65 Gaussian noise model, 727 probability density function, see Probability density function, gaussian pulse, 37 generalized, 233 quantum-lightwave channel, 871 random process, see Random process, gaussian random variable, see Random variable, gaussian signal state, see Signal state, gaussian Generalized angle, 826 function, 28 likelihood function, 583 likelihood principle, 583

measurement, 787, 792 rotation, 48, 753, 755, 823 Generator matrix, 614, 630 systematic, 617 polynomial, 630 Geometric probability mass function, 242 Geometrical optics, 6, 91, 119 wavefront, 92 Germanium, 111 Girth, 627, 647, 655 Glauber number, 739, 762, 768, 791, 802 state, 739, 761 Gordon distribution, 242, 266, 292, 681, 697, 803, 804, 838, 873 entropy function, 242 variance, 266 Gordon formula, 22, 714 Gordon function, 242, 683, 696, 716, 863, 873, 877 Graded-index optical fiber, 110, 169 cladding, 121 maximum launch angle, 120, 150 mode, 144 mode-dependent group delay, 169 numerical aperture, 120 ray propagation, 119 ray trajectories, 120 Gradient complex, 596, 606 spatial, 305 Gradient descent method stochastic, 600, 603 Gram matrix, 49, 825 Gram–Schmidt procedure, 100 Graph edge, 627 Tanner, 625, 647 trellis, 625 Gray code, 451, 477, 515 Grazing angle, 116 Group delay figure, 174 mode-dependent, 155, 157, 161, 167, 171, 172, 349, 364, 366, 367 mode-group, 168, 170, 177 optimal index profile, 178 polarization-dependent, 184, 185, 368 relationship to group-velocity dispersion, 173 spread, 167 subcarrier, 224 wavelength-dependent, 155, 170, 173, 179, 199, 366, 367 weighted average, 176 Group index, 163, 170, 192, 195

Index

Group velocity, 161, 162 dispersion coefficient, 161, 173, 235, 364, 534, 569 total, 174, 180 dispersion compensation, 181 polarization-dependent, 183 Guard band, 409 interval, 537, 591 Guided mode, see Mode, guided Hadamard’s inequality, 48 Half-power bandwidth, 74 Hamiltonian, 762, 771 Hamming code, 615 distance, 613 minimum, 613, 616 sphere, 621 weight, 613, 622 Hard-decision decoder, 610 decoding, 698 detection, 18, 608, 610, 698, 700 Hardlimiter, 582 Harmonic oscillator, 202 classical, 82 quantum, 761 Hartley–Shannon theorem, 21 Heft, 616, 647 Heisenberg representation, 752, 792 uncertainty relationship, 36, 767 Helmholtz equation, 80, 84 cylindrical coordinates, 139 scalar, 84, 105, 126, 133 boundary conditions, 127 vector, 80, 84, 121, 126, 138 boundary conditions, 121 Hermite polynomial, 145, 154, 809 Hermite-gaussian mode, 145 Hermitian matrix, 47, 49, 185, 752 Heterodyne demodulation, 13, 307, 379, 381 operator, 793 including an image mode, 794 quantum-lightwave, 792 shot-noise-limited, 385, 795, 796 Hilbert space, 43, 752 transform, 32, 101 Hole, 303 Holevo information, 857, 862, 881 Homodyne demodulation, 13, 307, 380, 383 detection operator, 791, 835, 880 quantum optics to complex-baseband, 795

quantum-lightwave, 790 shot-noise-limited, 487, 796, 831, 842, 866 to a complex-baseband waveform, 380, 574 to a real-baseband waveform, 380 Homogeneity, 27 Homogeneous material, 81 system, 28 Husimi distribution, 801 Hybrid mode, 124, 138, 142 characteristic equation, 141 cutoff conditions, 143 Hypersphere, 491 Hypothesis testing, 418, 430 quantum-optics, 820 sequential, 633 Identity matrix, 48 operator, 774 Image frequency, 386 mode, 308, 386, 487 quantum-optics, 794 vacuum-state operator, 794 Image rejecting filter, 578 Impact ionization, 306, 332 Impairment, 4, 155 absorption, 112 attenuation, 112 cross-phase modulation, 205, 226 interchannel interference, 553 four-wave mixing, 205, 228 interchannel interference, 228, 553 intersymbol interference, 229 group-velocity dispersion, 161, 235, 569 interchannel interference, see Interference, interchannel interference, see Interference nonlinear, 202 polarization-dependent loss, 189, 371, 553 polarization-mode dispersion, 182 scattering, see Scattering self-phase modulation, 204 Impedance, 85 free-space, 108, 140 material, 210 Impulse Dirac, 28, 89, 345 Kronecker, 28, 41, 89, 278, 498, 777 sifting property, 28 Impulse response, 27, 28, 101 causal, 73, 74 channel matrix, 373 complex-baseband, 40, 364 estimation, 591

915

916

Index

noncoherent lightwave channel, 393 passband, 40 ray optics, 191 right-sided, 29 shift-invariant, 28 symmetric, 561 time-varying, 28 In-phase component operator, 761, 797 noise component, 75 signal component, 39, 313 signal representation, 761 Incomplete decoder, 622 Independent increment, 257 random matrix, 375 random variable, 55, 388 random vector, 67 Index of refraction, 80, 83, 86, 170, 195 angularly dependent, 81 complex, 165 fiber core, 151 graded-index fiber, 110 group, 163 inhomogeneous medium, 91 power-law profile, see Power-law index profile profile, parabolic, 119, 145, 153 silica glass, 192 step-index fiber, 110 time-varying, 328 Inelastic scattering, 114, 205 Inequality Hadamard’s, 48 Schwarz, 43, 426 timewidth–bandwidth, 36, 767 Information, 1 classical, 679 Holevo, 857, 862, 881 mutual, 674, 690 rate, see Information rate relationship to entropy, 673 side, 726 Information channel, 18, 361, 363, 410, 411 additive-noise-limited, 688 bandlimited, 706, 710 photon-optics, 688 binary asymmetric, 436, 439 quantum, 831, 868 binary symmetric, 434, 439, 456, 701 quantum, 831 classical, 679, 707 continuous, 18, 634, 672, 676 capacity, 710 gaussian, 698 memoryless, 634 discrete, 18, 610, 672, 679, 691

memoryless, 634, 671, 679, 848 gray box, 691 hard-decision, 672, 698 ideal photon, 685 bandlimited, 21, 713 Laguerre, 690 memoryless, 411, 610 multi-input multi-output, 707 probability model, 411 quantum-lightwave, 679 additive-noise-limited, 871, 876 block detection, 869 ideal, 862 phase-insensitive, 876 thermal-noise, 876 soft-decision, 672, 698 spectral rate efficiency, 718 waveform, 18, 672, 676, 679, 710 bandlimited, 676, 710 nonlinear, 671 Information rate, 2, 11, 474, 534, 573, 611, 710 bandlimited channel, 711 function of signal power, 734 maximum, 21, 433, 669 relationship to arrival rate, 714 relationship to bandwidth, 22, 715 relationship to channel capacity, 669 relationship to codeword error rate, 678 saturation with bandwidth, 716 using photon counting, 693 Information theory, 18, 240 quantum, 878 Inhomogeneous material, 81 Injection current, 323 Inner code, 609 Inner product, 42, 49, 100, 102, 147, 466, 595, 708 coherent state, 850 pure signal state, 781, 830, 863 signal state, 753, 830 Intensity, 9, 87 autocoherence function, 98 modulation, 15, 467 approximate forms using , 453 binary modulation error rate, 438 channel capacity, 701 error rate, 453 error rate with additive lightwave noise, 504 gaussian approximation, 506 intersymbol interference, 523 multilevel, 447, 451, 515 multiple fiber segments, 321 on–off keying, 467 shot-noise-limited error rate, 506 threshold, 433 noise, 346 probability density function, 347, 359

Q

Index

Interchannel interference, see Interference, interchannel Interference, 4, 155 interchannel, 4, 16, 368, 373, 522, 556, 568 cross-phase modulation, 553 estimation, 555 four-wave mixing, 228, 553 gain modulation, 320 nonlinear, 16, 205 intermodal, 556 intersymbol, 395, 523, 569 equalization, 522 four-wave mixing, 229 nonlinear, 229 postdetection, 531, 533 power penalty, 515 predetection, 531 modal, 392 multipath, 302 nonlinear, 181, 202, 223, 559, 726 equalization, 559 Interferometer delay-line, 301 Mach–Zehnder, 329 Intermediate frequency, 13, 308, 379, 577 Intermodal dispersion, 155, 163, 169, 176 electrical channel, 392 lightwave channel, 366 Internal gain noise, 331 Interpolation, 46, 399 Interstitial region, 622, 628 Intersymbol interference, see Interference, intersymbol Intramodal dispersion, 155, 169, 173, 194 electrical channel, 393 lightwave channel, 367 Intrinsic evidence, 545, 547, 641, 649, 655 Berrou code, 645 Gallager code, 651 linewidth, 179, 326, 345 region, 305 Inverse Fourier transform, 30, 217, 261, 284, 570 Poisson transform, 262, 698 Ionization coefficient, 306 impact, 306 ratio, 332, 359 Irregular algebraic block code, 647 Isotropic material, 81, 91, 203 Isserlis theorem, 54, 267, 297 Jacobian matrix, 103 Jitter, 349 Joint

917

detection classical, 495 quantum-lightwave, 846 estimation, 588 probability, 413 probability density function, 54 signal space, 742 Jointly orthogonal modulation, 474, 492 Jones representation of polarization, 94, 183 vector, 94, 108, 186 Kerr nonlinear fiber coefficient, 210, 235 nonlinear lightwave channel, 722 mimo, 728 multiplex, 726 single-carrier, 723 nonlinearity, 204, 229, 553 glass, 208 optical fiber, 208 Ket, 48, 751 Keying antipodal, 466 quantum coherent-state, 775 ASK, 467 BPSK, 466, 625, 832 DPSK, 469 dual-polarization, 491 FSK, 468 MSK, 473 on–off, 467 OQPSK, 473 polarization-switched, 492 PSK, 472 pulse-position, 689 QAM, 472 QPSK, 472, 483 offset, 473 quantum-lightwave antipodal, 831 on–off, 832 phase shift, 831 Kramers–Kronig transform, 32, 83, 102, 165 Kronecker impulse, 28, 41, 89, 278, 498, 777, 809 product, 51, 779, 813 Lagrange multiplier, 241 Laguerre information channel, 690 polynomial, 66, 805 generalized, 66 probability mass function, 66, 277, 444, 690, 805 random variable, 66, 276 Laplace recursion formula, 48

918

Index

Laser diode, 326, 356, 357 extinction ratio, 327 Fabry–Pérot resonator, 357 free spectral range, 357 intensity noise, 346, 358, 397 linewidth intrinsic, 326, 345 modulated, 569 modulation response, 327 phase noise power density spectrum, 347 probability density function, 511 relaxation oscillation frequency, 328 spectral width, see Intrinsic linewidth threshold current, 326, 356 Length dispersion, 213, 216, 226, 235, 404, 563 effective, 213, 229, 234, 235, 563 nonlinear, 213, 230, 235, 563 walk-off, 213, 214, 216, 234 Letter, 1, 411 Lifetime charge carrier, 325 upper-state, 246, 314, 318 Light-emitting diode (LED), 323 modulation response, 325 noise statistics, 355 spectrum, 323 Lightwave coupler, 298 envelope, 143 ergodic, 98 monochromatic, 94, 123, 143, 739 noise, see Noise partially polarized, 94 power, see Power, lightwave unpolarized, 94 Lightwave amplifier, 312 Brillouin, 313 distributed, 322 doped-fiber, 313, 317, 325, 396 gain, 312, 315 internal, 316 saturation, 316 small-signal, 316 gain site, 312 noise, 337 noise figure, 338 parametric, 313 phase-insensitive, 313, 388, 796, 813 quantum-noise limit, 336, 339, 796 phase-sensitive, 313 pump, 313 Raman, 313 semiconductor, 318 Lightwave channel, xix, 4, 17, 361, 361

impulse response, 393 intermodal dispersion, 366 intramodal dispersion, 367 linear, 110, 363 mode-multiplex, 372 nonlinear, 201 polarization-multiplex, 370 quantum, 862 relationship to electrical channel, 362 single-input single-output, 363 space-multiplex, 369 table of models, 393 transfer function, 364 Lightwave field, 143 complex, 98 complex envelope, 88, 144 random, 68, 97 spatially random, 98, 374 temporally random, 97 vacuum state, 249 Lightwave signal, 8, 364 bandlimited, 277, 294 direct-photodetected energy, xxv expected number of photons, 241, 348 mean number of signal photons, 259 narrowband, 159 power, 4, 25, 87, 105 spectrum, 159 Lightwave source, 10, 323, 355 bandlimited, 325 coherence properties, 13 coupling efficiency, 118 étendue, 117 for a multimode fiber, 176 laser diode, see Laser diode light-emitting diode, see Light-emitting diode numerical aperture, 118, 151 spontaneous emission, 256 Likelihood function, 432 generalized, 583 log, 477, 542, 583 ratio, 433, 461 log, 433, 652 Linear causal media, 82 polarization, see Polarization, linear polarizer, 301 shift-invariant filter, 29 time-invariant system, 33, 84, 363 Linearity, 28 Linearly independent set, 41, 613, 647 Linearly polarized mode, 133 cutoff condition, 137 dispersion relationship, 159 Linewidth, 179, 569

Index

intrinsic, 326, 345 Local oscillator, 13, 19, 308, 379, 384, 408, 446, 577 controlled, 576 Log-likelihood function, 477, 542, 583 ratio, 433, 652 Loop filter, 577, 580, 604 Lorentzian filter, 295 function, 345, 356 pulse, 38, 101 Low-density parity-check code, see Gallager code Lowering operator, 771 Mach–Zehnder interferometer, 329 Magnetic field, 79 complex representation, 84 flux density, 79 Manakov equation, 218 Manakov–PMD equation, 218 Marginal probability density function, 54 Marginalization, 54, 104, 544, 639, 677, 779 block, 546 quantum-lightwave product state, 858 Mark, 1 Marker detection, 589 Matched filter, 390, 408, 423, 424, 426, 854 bank, 390, 498, 542 coherent state, 834 optical, 421 whitened, 427, 458 Material anisotropic, 81, 96 centrosymmetric, 203, 208 circular birefringent, 97, 183 crystalline, 114 dielectric, 80, 84, 110, 151, 166, 202 dispersion, 155, 163 zero-wavelength, 164, 180 dispersion coefficient, 164, 175, 180 silica glass, 192 dispersive, 82 homogeneous, 81 impedance, 107, 210 inhomogeneous, 81 isotropic, 81, 91 linear-birefringent, 97 noncrystalline, 112, 114 nondispersive, 82 polarization, 80, 148 Matrix channel, 401, 707, 871 commuting, 47, 51 conjugate, 46 conjugate transpose, 46

919

covariance, see Covariance matrix, 252 density, 746, 777 exponentiation, 49, 102, 772 full-rank, 48, 189, 614 Gram, 49, 825 hermitian, 47, 49, 60, 185, 752, 825 identity, 48 logarithm, 102 nonhermitian, 49 nonnegative-definite, 47, 189, 371, 776, 819, 825 normal, 51, 190 outer product, 51 polarization-dependent loss, 189, 371, 553 positive-definite, 47, 819 product, 47 projection, 50, 753, 783 pseudocovariance, 60, 107 rank, 48 rank-one, 747, 782 singular-value decomposition, 49, 190, 554 square, 46 symmetric, 47, 59 trace, see Trace transformation, 42, 50 transpose, 46 unitary, 48, 185, 196, 370 Maximum-distance code, 613, 620 Maximum-entropy distribution, 241, 803 continuous energy, 250 discrete energy, 242 photon-optics, 242, 684 thermal equilibrium, 244 wave-optics, 697 with a mean constraint, 242 without a mean constraint, 292 Maximum-entropy principle, 240 Maximum-likelihood decoding, 629, 639 detection, 433 sequence detection, 535, 536, 542 Maximum-posterior decision rule, 431 detection, 433 sequence, 544 sequence detection, 522, 535, 544 Maxwell’s equations, 6, 79 complex representation, 84 Maxwellian probability density function, 188 random variable, 65 Mean, 53 Mean-squared value frequency, 34 timewidth, 34, 222 Measurement operator, 742, 778 Memoryless

920

Index

additive white gaussian noise, 583 channel, 748 continuous-time channel, 374 discrete-time mimo channel, 707 Kerr channel, 729 photon-noise-limited channel, 446 space-time separable channel, 495 system, 30 Meridional ray, 116 Message error rate, 6, 15, 676 passing decoder, 648 Mimo channel, see Multi-input multi-output channel Minimal trellis, 631 Minimum distance, 414, 435 between sequences, 540 classical signal vs. quantum-lightwave signal, 842 codeword segment, 634 for a code, 611 for trellis-coded modulation, 658 quantum-lightwave coherent state, 842 sequence, 541 signal constellation, 464 four-dimensional, 492 Minimum Hamming distance, 613, 616 relationship to euclidean distance, 625 weight, 613, 634 Minimum mean-squared error, 529 orthogonality condition, 597 Minimum phase, 165 minimum separation, 524, 567 Minimum-distance sequence detection, 538 Minimum-shift keying, 473 Mixed signal state, 746, 778, 808, 821, 837, 872 representation, 782 Mixer, see Frequency mixer, 576 Mixing, 313 efficiency, 408 four-wave, see Four-wave mixing gain, 379, 382, 385, 578 mathematical identity, 307 noise–noise, 396 signal–carrier, 12 signal–local oscillator, 576 signal–noise, 337, 362, 396, 403, 435, 453, 460, 507 signal–signal, 201 square-law photodetection, 19 subchannel–subchannel, 726 Mobility, 352 Modal noise, see Noise, modal Mode, 3, 4, 6, 88, 117, 121 azimuthally symmetric, 141 basis, 88 confinement factor, 153

coupling, 146, 176, 298 random, 375 cutoff condition, 130, 137, 153, 158 decay rate in the cladding, 128, 129, 158 delay for a mode-group, 168, 170, 177 -dependent group delay, 155, 157, 167, 171, 172, 349, 366, 367 estimation of, 601 evanescent, 148 even-transverse electric, 128 graded-index optical fiber, 144 group delay, 161 guided, 88, 111, 121, 123 relationship to ray theory, 130 hermite-gaussian, 145 hybrid, 124, 138, 142 image, 308, 386, 487 linearly polarized, 133 multiplex channel, 16, 369, 372, 555 odd-transverse electric, 128 of a linear system, 33 orthogonal, 88, 132, 137, 146, 152 orthogonality condition, 89, 90, 131, 148 partition noise, 348 power, 180 propagation constant, 127 radiation, 89, 131, 146 spatial frequency within the core, 128, 158 spatiotemporal, 239, 244, 250, 739, 761, 769 step-index fiber, 132 suppression ratio, 348 transverse electric (TE), 124, 151, 189 transverse electromagnetic (TEM), 85, 124 transverse magnetic (TM), 124, 151, 189 unguided, 88, 121, 143, 146, 162 well-guided, 130, 161, 211 Mode-group, 168, 173, 369 delay, 168, 170, 177 density, 192 graded-index fiber, 169 index, 168, 192 maximum value, 169 uniform distribution, 177 Modem, 5, 15, 361, 815 quantum, 742, 815, 848 Modified Bessel function, 62, 104, 134, 518, 638 Modified duobinary code, 663 Modulation, 3, 12 amplitude, 12 antipodal, 466, 477 binary, 463 biorthogonal, 499 code, 609, 659 coded, xx, 3, 22, 238, 413, 464, 534, 685, 741 direct current, 324 frequency, 463, 468

Index

index, 198, 406 intensity, see Intensity modulation jointly orthogonal, 474, 492 layer, 1 multisymbol, 463 on–off, 414 orthogonal, 466 phase-synchronous, 12 quadrature amplitude, 472 quantum antipodal, 775, 831 symmetric, 703 trellis-coded, 656 unequal energy, 467 Modulator, 8, 361 amplitude/phase, 330 balanced, 330 complex-baseband, 330, 378 dual-polarization, 493 electro-absorption, 329 electro-optic, 329 lightwave-power response, 331 phase-asynchronous, 327 phase-synchronous, 378 Modulus, 32 Moment nth, 53 first, 53 Momentum operator, 765, 766 representation, 765 Monochromatic field, 80, 83, 125, 151 complex, 125 lightwave, 94, 123, 143, 739 plane wave, 93 Morse code, 15 Multi-input multi-output channel, 15, 368, 372, 602 capacity, 707 continuous-time, 374 equalization polarization modes, 553, 600 spatial modes, 601 zero-forcing, 554 frequency response, 373 impulse response, 373 mode-multiplex, 372 transfer function, 373 Multi-user detection, 373, 728 Multilevel modulation error rate, 447 phase, 472 quadrature amplitude, 472 Multimode interference coupler, 301 Multipath interference, 302 Multiplex channel, 368 mode-division, 372, 555

921

polarization-division, 370 space-division, 371, 555 Multiplexing, 15 frequency-division, 16 mode-division, 16, 369 polarization-division, 16 space-division, 15, 369 time-division, 16, 368 wavelength-division, 16, 368, 568 Multivariant gaussian probability density function, 57, 707 Mutual coherence function, 98 Mutual information, 674, 690 binary symmetric channel, 731 classical channel, 845, 856 function of the prior, 693 quantum channel, 857 Z channel, 693 Narrowband dispersion, 159 field, 87 signal, 12, 44 Nat, 240 Nearest neighbor, 414, 520, 541, 658 Negative binomial probability mass function, 276, 295, 325, 444 Node, 537, 552, 627 Noise, 155 additive, 19, 457 additive white gaussian, 21, 26, 73, 279, 287, 385 autocorrelation function, 74 bandlimited, 75 circularly symmetric gaussian, 254, 334, 481, 573, 748, 872 autocorrelation function, 267 bandwidth, 271 colored, 74 complex-baseband autocorrelation function, 76 equivalent bandwidth, 77 correlated, 105 electrical power density spectrum phase-synchronous demodulation, 381 energy per mode, 247 entropy, 682, 698, 731 equivalent bandwidth, 74, 461 figure, 78, 461 lightwave amplifier, 338 segment of an optical fiber, 354 gaussian bandlimited, 271, 274 in-phase component, 75 lightwave power, 267, 346 power density spectrum, 267 probability density function, 347, 359

922

Index

modal, 376, 402 mode-partition, 348 nongaussian, 478, 530, 723 nonlinear phase, see Nonlinear phase noise optical thermal, 256 passband, see Passband, noise phase, see Phase noise phase-insensitive, 253 photon, 7, 122, 257 power density spectrum, 269 probability mass function, 260 power, 73 passband, 77 power density spectrum, 22 pseudothermal, 256, 334 quadrature component, 76 quantum, 248, 334, 490, 794, 803, 838, 839 relative intensity, 346 shot, 9, 239, 305, 457 counting, 386, 486, 758 signal-dependent, 4, 20, 261, 429, 454, 458, 505 signal-independent, 4, 19, 448, 457, 458 signal-spontaneous emission, 337, 461 spontaneous emission, 256, 337 -suppressing filter, 271, 293, 354, 381, 396, 559 thermal, 78, 248, 360, 396, 403, 461 white, 73, 249, 271, 424, 528 Noisy channel coding theorem, 677 quantum optics, 860 Noncentral chi probability density function, 65 random variable, 65 Noncentral chi-square probability density function N degrees of freedom, 64, 274, 284 characteristic function, 64 two degrees of freedom, 340, 347, 561 random variable, 63 Nonclassical signal state, 741, 749, 782 Noncoherent carrier, 6, 12, 25, 172, 387, 392, 393, 467, 556, 566 direct photodetection, 387 wavelength autocorrelation function, 179 signal state, 783 Noncoherent demodulation, 500, 512 Noncrystalline material, 112 Nondispersive material, 82 Nongaussian noise, 478, 530, 723 random process, 279 Nonlinear channel, see Channel, nonlinear distortion, 4, 202, 216, 226 fiber coefficient, 340 index coefficient, 210

interference, 16, 181, 202, 223, 559 interchannel, 205 interchannel cross-phase modulation, 553 interchannel four-wave mixing, 228, 553 intersymbol, 229 length, 213, 230, 235, 563 lightwave propagation, 210 governing equation, 217 nondispersive channel, 219 numerical methods, 229 vector channel, 218 weakly dispersive channel, 222 phase estimation, 463 phase modulation cross, 205, 553 self, 204 phase noise, 222, 339, 341, 723 probability density function, 340 variance, 341 phase shift, 212, 220, 234 Nonlinear Schrödinger equation, 217, 553, 723 multichannel, 223 numerical solution, 229, 727 traveling timeframe, 218 vector channel, 218 Nonlinearity Brillouin scattering, 206 stimulated, 207, 313 Kerr, 204, 229, 553 glass, 208 optical fiber, 208 Raman scattering, 205 stimulated, 206, 313 Nonnegative-definite matrix, see Matrix, nonnegative-definite Nonorthogonal signal, 499, 818 signal state, 774, 778, 786, 824, 840 waveform, 391 Normal dispersion regime, 161 Normal matrix, 51, 190 Normal ordering, 763 Normalized dispersion relationship, 158, 159 frequency, 128, 151, 152, 157 index difference, 117, 121, 129, 132 mode-group index, 170, 176, 181 propagation constant, 157 Nuemark extension, 789 Numerical aperture, 116, 129, 150, 376 graded-index optical fiber, 117, 120 step-index fiber, 117 Nyquist pulse, 229, 388, 399, 416, 455, 465, 526, 566, 585 joint, 466, 469 joint property, 474

Index

rate, 46 Nyquist–Shannon sampling theorem, 45, 399, 672 Objective function, 241, 530 augmented, 241 constant-modulus, 598 mean-squared error, 596 Observable operator, 742, 752, 755, 762, 782 matrix representation, 751 property, 754 Offset QPSK (OQPSK), 473 Operator, 751 annihilation, 770 coherent-state, 762, 789 commuting, 758, 792 convolution, 56 creation, 770 density, 746 detection, 816, see Detection operator determinant, 48 differential, self-adjoint, 88 displacement, 772, 836 generalized measurement, 787 Hamiltonian, 762, 771 heterodyne demodulation, 793 including an image mode, 794 identity, 774 image mode vacuum-state, 794 in-phase component, 761, 797 lowering, 771 measurement, 742, 778 momentum, 765, 766 noncommuting, 792 observable, 742, 752, 755, 762, 782 matrix representation, 751 photon-number state, 754, 762, 764, 790 position, 765, 766 projection, 753, 777, 778, 789 quadrature component, 761, 797 raising, 770, 772 self-adjoint, 752, 768 trace, 47, 777, 829 Optic axis, 97 fast, 97, 183 slow, 97, 183, 187 Optical cavity, 323 coupler, 313 electrical conversion, 8, 310, 362 filter, 302 isolator, 302 pump, 256, 313 signal-to-interference ratio (OSIR), 552, 727 signal-to-noise ratio (OSNR), 75, 108, 268, 341, 343

923

relationship to SNR, 268 signal-to-noise-plus-interference ratio (OSNIR), 552, 553, 571 thermal noise, see Noise, optical thermal Optical fiber, 110 absorption, 112 attenuation, 112, 114 birefringence, 183, 185 intensity-dependent, 218 characteristic equation hybrid mode, 141 cladding, 110 core, 110 cutoff condition hybrid mode, 143 linearly polarized mode, 137 dispersion-controlled, 180, 229 dispersion relationship, 170 effective area, 211 effective length, 213 equalization, 181 few-mode, 369, 372, 601 graded-index, see Graded-index optical fiber intermodal dispersion, 366 intramodal dispersion, 366 mode, power, 180 mode-confinement factor, 153 mode-group, 168, 173 density, 192 mode-group index, 168, 192 multimode, 111, 119, 155, 169, 194, 365 impulse response per mode, 365 mode-dependent group delay, 175 nonlinear propagation governing equation, 217 nondispersive channel, 219 numerical methods, 229 vector channel, 218 weakly dispersive channel, 222 parabolic index of refraction, 119 segment, 4, 321 single-mode, 111, 137, 151 attenuation, 11 complex-baseband transfer function, 364 dispersion, 173, 194 impulse response, 364 nonlinear coefficient, 210 span, 4 step-index, see Step-index optical fiber transfer function complex-baseband, 406 weakly guiding, 133 Optical-matched filter, 421 Orthogonal quantum signal states, 745, 747, 777, 823, 860, 879

924

Index

signals, 44, 390, 391, 425, 469, 498, 879 classical vs. quantum-optics, 774 equal-energy, 501 error rate, 476 noncoherent, 501 waveforms, 469 Orthogonality condition frequency-shift keying, 469 minimum mean-squared error, 597 mode, 89, 90, 131, 148 Orthonormal basis, 41, 50, 88, 100, 277 basis states, 759 OSNR, see Optical signal-to-noise ratio Outer code, 609, 612 Outer product, 43, 47, 759, 777 basis functions, 759 matrix, 51 orthonormal basis vector, 50 P distribution, 797, 802 classical signal in noise, 798 Parabolic index profile, 119, 153 Parallel concatenation, 642 Parametric amplification, 313 Paraxial approximation, 92, 120 Pareto index, 54 probability density function, 106, 634 random variable, see Random variable, Pareto Parseval’s relationship, 31, 44, 70, 73 Partial-fraction expansion, 296 Partial-response code, 534, 663 decision-feedback demodulation, 419 dicode, 419, 663 duobinary, 418, 663 maximum-likelihood demodulation, 664 maximum-likelihood detection, 419 pulse, 418 Partial trace, 779, 812, 858 Partially coherent carrier, 13, 388, 470, 556, 569 Passband impulse response, 40 noise, 75 bandwidth, 271 power density spectrum, 76 probability density function, 250 noise-equivalent bandwidth, 74, 77 noise process, 75 signal, 3, 12, 39 energy, 44 envelope, 162 waveform, 3, 710 Path metric, 539, 542 summed, 540

Permeability, 80 Permittivity, 80, 82 Phase ambiguity, 583 asynchronous demodulation, 14, 500 comparator, 576 estimation, see Estimation, phase -insensitive channel, 253, 799, 871 lightwave amplifier, 313, 388 noise, 253 -matching, 149, 203, 233 -mismatch, 228 -modulation, see Modulation, phase probability density function, 104 reference, 13 -synchronous demodulation, 379 heterodyne, 381 velocity, 86, 92, 161, 162 polarization-dependent, 184 Phase-locked loop, 576 acquisition step, 577 Costas, 582 damping coefficient, 580 first-order, 578, 580 loop filter, 577, 580, 604 natural frequency, 580 nonlinear analysis, 605 second-order, 580, 604 tracking step, 577 transfer function, 579 first-order, 604 second-order, 604 transfer function for phase error, 604 Phase noise, 14, 484, 574 laser diode probability density function, 511 nonlinear, 339, 502 nonlinear probability density function, 340 probability density function, 63 Phase-shift keying, 472 binary, 466 error rate heterodyne BPSK, 487 homodyne binary phase-shift keying, 487 multilevel, 472 QPSK, 483 offset, 473 Phase space, 739, 774 Photocharge, xxvi, 259, 382 Photocurrent, xxvi, 9, 259, 352 Photodetected lightwave energy, xxv lightwave power, xxvi Photodetection, 8, 238 balanced, see Balanced photodetection direct, see Direct photodetection

Index

event, 257, 305 square-law, see Square-law photodetection Photodetector, 8 avalanche photodiode, 306 dark current, 312, 331 internal gain, 290, 306, 331 expected value, 359 PIN photodiode, 305 polarization-insensitive, 396 responsivity, 9 square-law, 100, 108, 249 Photodiode, 304 avalanche, 304 PIN, 305, 352 Photoelectron, xxvi, 257 arrival process, 258, 285, 294 arrival rate, xxvi, 258, 270, 692 constant, 331, 519 counting process, 258, 312 mean number, 259 primary, 257, 288, 311, 331 secondary, 257, 288, 291 Photon, 7, 205 arrival process, 237, 257, 311 arrival rate, 9, 122, 257, 309, 311, 401, 421, 440, 685 time-varying, 257 counting process, 312 mean number, 241, 348 in a coherence interval, 352 per bit, 503, 514, 690, 775, 831 per pulse, 440, 451 -noise-limited channel, 265, 402 number, 239, 311, 737 probability mass function, 338 energy, xxv, 8 lifetime, 326 momentum, 8 noise, see Noise, photon Photon counting, 257, 402, 456, 689 binary, 440 classical, 832 conventional, 831, 836 multilevel, 451 random process, 257 threshold, 402 with a matched local oscillator, 832 with noise photons, 441 Photon optics, 6 bandlimited capacity, 21, 713 narrowband, 714 description of shot noise, 309 detection, 422, 428 discrete-energy property, 422 discrete-time channel, 401 memoryless, 682, 692

925

photon noise, 237, 257 relationship to quantum optics, 736, 773 relationship to wave optics, 250, 259 signal model, 7, 237, 246, 737 signal propagation, 122 single-letter capacity, 244, 680 spectral-rate efficiency, 719 treatment of photodetection, 238 Photon-number state, 741 basis, 764 operator, 754, 762, 764, 790 representation, 741, 838 of a classical state, 802 of a coherent state, 741, 773 of additive noise, 803 with zero photons, 771 Physical channel, 4, 17, 361 layer, 1, 15 Pilot signal, 574 PIN photodiode, 305, 352 Planck’s constant, 8 reduced, 8 Plane wave, 85, 107 in slab waveguide, 130, 196 monochromatic, 93 Poincaré sphere, 95, 186, 188, 370, 601 latitude, 95 longitude, 95 transition along, 96, 601 Point process, 122, 331, 439, 740 maximum entropy, 439 Poisson channel, 265, 311, 401, 692 counting process, 9, 257, 260, 312 probability mass function, 260, 773 mixed, 262 summation formula, 37 transform, 239, 262, 684, 737, 799 Dirac impulse, 265 exponential probability density function, 265 gamma probability density function, 275 inverse, 262, 698 noncentral chi-square probability density function, 690 relationship to Gordon distribution, 266 Polarization (field), 3, 80, 93, 368 alignment, 495, 553, 600 axes, 190 beamsplitter, 301, 493, 744 circular, 94 combiner, 301 cross modulation, 218 decorrelation length, 187, 218 -dependent group delay, 184, 368 -dependent group velocity, 183

926

Index

-dependent loss, 189, 199, 371, 553 ellipse, 108 elliptical, 93 Jones representation, 94, 183 linear, 94, 183 mode dispersion, 94, 97, 155, 182 first-order, 199 governing equation, 184 Jones representation, 183 linear-birefringent material, 185 second-order, 183 Stokes representation, 186 vector, 186, 196 modulation, 493 multiplex channel, 16, 370, 491 optic axis fast, 97, 183 slow, 97, 183, 187 orthogonal, 96 principal axes, 97, 183, 185, 190 principal states, 183, 188 basis, 370 interference, 190 sign convention, xxv slab waveguide, 130 state estimation, 553, 600, 606 Stokes representation, 94, 186 Polarization (material), 80 nonlinear, 203, 208 Polarizer, linear, 301 Position operator, 765, 766 representation, 765 vector, 85 Positive-operator-valued measure, 788 Posterior probability, 55, 412, 544, 547 component maximum, 544 componentwise maximum, 544 density function, 432 marginal, 545 ratio, 432, 461, 547, 643 Gallager code, 649 senseword, 649 sequence maximum, 544 Potential, 304 Power baseband signal, 11 effective bandwidth, 70, 293 lightwave, 4, 25, 87, 105 autocorrelation function, 98 photodetected, xxvi, 4 penalty, 607 discrete lightwave amplifier, 322 intersymbol interference, 515 reactive, 131 splitter, 301, 386

Power density spectrum, 69, 105, 267, 269 arrival rate, 269 complex-baseband noise, 76 constant signal plus noise, 268 direct-photodetected electrical signal, 271, 395 electrical per unit resistance, xxvi, 73 phase-synchronous demodulation, 381 relative intensity noise, 346 shot noise, 385 filtered electrical signal, 287 filtered shot noise, 287 lightwave noise power, 267 lightwave signal, 70 noise, 22, 247 one-sided, 69 passband noise, 76 quantum noise, 248 random binary waveform, 72 spontaneous emission, 257, 336 direct-photodetected, xxvi, 561 lightwave power, 267 thermal noise, 250 two-sided, 69, 279, 293 noise, 73 wavelength, 70 Power-law index profile, 145, 169, 171, 192 parameter, 145, 169, 176, 182 optimal, 177, 182, 196 Poynting vector, 86 complex, 87 Pre-equalization, 525, 534 Prefilter, 534 Primary photoelectron, 257, 288, 311, 331 Primitive element, 620 Principal axis, 81 polarization axes, 97, 183, 185, 190 polarization states, 183, 188 basis, 370 interference, 190 Principle of stationary phase, 367 Prior probability, 18, 237, 412, 436, 672, 681 Berrou decoder, 644 capacity-achieving, 673 equiprobable, 433, 699 ratio, 548 unequal, 461 Probability density function, 53 bivariate, 54, 295 gaussian, 58, 104, 251, 253 maximum-entropy, 252 Boltzmann, 250, 293 central chi, 65 central chi-square, 63, 372, 376, 731 two degrees of freedom, 273

Index

complex gaussian, 60 circularly symmetric, 60 complex-baseband noise, 250 conditional, 54, 413, 431 exponential, 64, 65, 103, 250, 273, 376, 702 first-order, 68 gamma, 65, 376, 405 gaussian, 57 bivariate, 253 circularly symmetric, 58, 78, 254, 872 multivariate, 57, 60, 707 spherically symmetric, 188 inverse gaussian, 360 joint, 54 marginal, 54 maxwellian, 65, 188, 200 noncentral chi, 65 noncentral chi-square, 63, 105, 340, 347, 360, 561, 805 two degrees of freedom, 63, 255, 273 Pareto, 54, 106, 634, 636 passband noise, 250 phase, 63, 104 posterior, 432 ratio, 461 prior, see Prior probability Rayleigh, 61, 62, 65, 103, 254, 273 ricean, 61, 62, 103, 273, 502, 507, 509, 724 second-order, 68 table of, for noise processes, 283 Probability distribution function, 53 posterior, 412 prior, 412 table for wave-optics and photon optics, 284 Probability mass function, 53 binomial, 260 Bose–Einstein, 244, 293 geometric, 242 Gordon, see Gordon distribution inverse gaussian, 332 Laguerre, 66, 277, 444, 690, 805 maximum entropy, 241 negative binomial, 295 Poisson, 260, 325, 773 Product channel, 706 capacity, 706 mimo, 707 distribution, 55, 253, 254, 554, 682, 742, 848 bandlimited gaussian random process, 278, 706 gaussian, 59, 253 mimo channel, 707 output posterior, 643 subblock, 678 state, 742, 760 density matrix, 779, 786, 857

927

expectation, 791 generation with a coupler, 790 Projection, 43, 100 matrix, 50, 753, 783 operator, 753, 777, 789 Propagation constant, 85, 124, 157 along optic axes, 184 frequency-dependent, 157 mode, 127 mode-dependent, 157 normalized, 157 vector, 85 Pseudocovariance function, 268 matrix, 60, 107 Pseudorandom sequence, 516, 594 Pseudothermal noise, 256 Pulse, 3 -amplitude modulation, 463, 517 multilevel, 464, 471 multilevel error rate, 477 chirp, 38 electrical, xxvii gaussian, 37, 100 generalized, 233 lorentzian, 38, 101 non-return-to-zero, 467 inverse, 468 Nyquist, 229, 388, 399, 416, 455, 465, 526, 566, 585 joint, 469 joint property, 474 partial-response, 418 -position modulation, 689 quadratic phase, 38 raised-cosine, 468 rectangular, 36 sinc, 37 symmetric, 532 target, xxvii, 416, 465, 525 transmit, 415 triangular, 101 Puncturing, 643 Pure signal state, 746, 760, 776, 778 binary modulation error rate, 830 coherent density matrix, 802 density matrix, 811 representation, 782 von Neumann entropy, 785

Q, 438

for an avalanche photodiode, 515 for intensity modulation, 453 for multiple fiber amplifiers, 462

928

Index

Quadratic-phase function, 367 Quadrature -component operator, 761, 797 -noise component, 76 -signal component, 39, 313 -signal representation, 761, 762 Quadrature phase-shift keying, 472, 483 offset, 473 Quadrature-amplitude modulation, 464, 472 Quanta, 203 Quantization, 1 Quantum coherence, 7, 782, 784, 805 correlation, 7 decoherence, 783 efficiency, xxvi, 258, 305, 338 entropy, see von Neumann entropy expectation, 754, 777 information theory, 878 noise, see Noise, quantum, 249 noise regime, 249 phase, 782, 805 state, 750 uncertainty, 7, 262, 683, 685, 743, 776, 820, 824 Quantum optics, 6 channel capacity, 843 heterodyne demodulation, 792 homodyne demodulation, 790 information channel, 843 modulation antipodal, 775, 831, 832 on–off keying, 832 using coherent states, 831 signal model, 6, 7, 246, 446, 736 statistical, 743, 776 Quantum wave function, 738, 745, 753, 765 continuous, 755 in-phase component, 761 in-phase representation, 765 momentum representation, 765 position representation, 765 quadrature component representation, 762 quadrature representation, 765 Quarter-square multiplier, 308 Quasi-linear approximation, 723 Quasi-probability distribution, 773, 797 P distribution, 797, 802 Husimi, 801 Wigner, 799, 802, 805 gaussian state, 806 Radial prolate spheroidal function, 280 Radiation mode, 89, 131, 146 Radiative-recombination lifetime, 319, 325 Raised-cosine pulse, 468 Raising operator, 770, 772

Raman scattering, 205 stimulated, 313 Random process, 52, 67 bandlimited, 251 circularly symmetric gaussian, 78, 254, 343, 574 bandlimited, 277, 283 complex-baseband noise, 75 ergodic, 69 gaussian, 68 bandlimited, 239, 273, 280, 284 circularly symmetric, 255 decorrelated, 278 product distribution, 278, 706 independent increment property, 257 nongaussian, 279 notation, xxv passband, 252 photon counting, 9, 257, 260, 312 spatiotemporal, 284 stationary strict sense, 68 wide sense, 68 Wiener, 343 Random telegraph signal, 70 Random variable, 52 nth moment, 53 additivity property, 253, 261 bivariate gaussian, 58, 255 central chi, 65 central chi-square, 63 complex, 53 complex gaussian, 60 correlation, 55 expectation, 53 exponential, 65 first moment, 53 gamma, 65 gaussian, 57 bivariate, 58, 251, 255 circularly symmetric, 58, 78, 281 multivariate, 57, 284 spherically symmetric, 188 uncorrelated, 59 independent, 55 Laguerre, 66, 276 maxwellian, 65 mean, 53 noncentral chi, 65 noncentral chi-square, 63 notation, xxv Pareto, 666 phase, 511 Poisson, 260 Rayleigh, 62, 391, 507 realization, 53, 239 ricean, 62, 391, 507

Index

root-mean-squared value, 54 uncorrelated, 56 uniform, 70 variance, 54 Random vector complex gaussian, 60 Random walk, 343 Rate code, see Code, rate information, see Information rate Rate equations, 317, 328 Ray, 92 meridional, 116, 120 skew, 116 tracing, 91 trajectory, 92, 120, 150 transit time, 156 impulse response, 191 unguided, 117 Ray optics, 6, 110 dispersion, 156 governing equation, 92, 105 Rayleigh probability density function, 61, 62, 103, 254, 273 random variable, 62, 391, 507 scattering, 114, 166, 193, 205 Reach, 4, 21 Reactive power, 131 Real-baseband signal, 11, 77, 307, 362, 383, 386, 489, 694 waveform, 310, 380, 411, 581 Realization channel, 709 random variable, 53, 239 sample function, 52 Receiver displacement, 835, 880 frequency response, 287 impulse response, 292 lightwave, 304 phase-asynchronous, 14, 463, 500 phase-sensitive, 790 phase-synchronous, 13, 524 photon-counting, 311, 440, 456 multilevel, 451 polarization-sensitive, 368 quantum-lightwave, 837 sensitivity, 11, 25 suboptimal, 561 Recode-and-forward, 5 Reconfigurable add–drop multiplexer, 552 Rectangular pulse, 36 Rectifier, 14 Red-shift, 213, 404 Reduced Planck’s constant, 8 Reed–Solomon code, 619

929

primitive, 620 primitive element, 620 Reflection, 115 law of, 115 Refraction, 115 law of, 115 Regeneration, 5 Regular algebraic block code, 647 check matrix, 617, 646 Relative intensity noise (RIN), 346, 358, 397 shot-noise limit, 346, 359 Relative permittivity, 82 Relaxation oscillation frequency, 328, 359, 397 Remodulate-and-forward, 5 Representation Dirac, 752 Heisenberg, 752, 792 in-phase component, 765 in-phase signal component representation, 761 interaction, 752 momentum, 765 position, 765 quadrature component, 765 quadrature signal component, 761, 762 Schrödinger, 752 Resonator, 323 Fabry–Pérot, 357 free spectral range, 357 laser diode, 326 mode structure, 327 photon lifetime, 326 Responsivity, xxvi, 9, 259, 305 Reverse channel, 529 Ricean probability density function, 61, 62, 103, 273, 502, 507, 509, 724 Ricean random variable, 62, 391, 507 RIN, see Relative intensity noise ROADM, see Reconfigurable add–drop multiplexer Robust, 427, 519, 609 Root-mean-squared bandwidth, 34, 100 position, 767 timewidth, 34, 100 Root-mean-squared amplitude, xxv, 39, 54, 324 Rotation generalized, 48, 753, 755 joint probability distribution, 107 polarization axes, 187, 369 signal constellation, 576 Runlength-limited code, 660 Sample, 1, 45, 399 complex, 390 function, 52, 68 instantaneous, 420

930

Index

Sampling, 1 basis, symmetric, 830 eigenstate, 821, 836, 879 rate, 46, 235, 281, 692 theorem, 18, 44, 45, 281, 399, 672 Saturation power, 317 Scalar, 28 Scalar Helmholtz equation, 84, 126 boundary conditions, 127 Scattering, 113 anti-Stokes, 205 Brillouin, 206 stimulated, 207, 663 cross section, 114 direction-dependence, 114 elastic, 114 inelastic, 114, 205 linear, 113 Raman, 205 stimulated, 206 Rayleigh, 114, 166, 205 Stokes, 205 Schrödinger representation, 752 Schur–Horn theorem, 785 Schwarz inequality, 43, 426 Second law of thermodynamics, 243 Self-adjoint differential operator, 88 operator, 752, 768 transformation, 49, 742, 752 Self-phase modulation, 204, 219, 223, 224 Sellmeier formula, 192 Semiclassical signal model, 7, 680, 748, 815 Semiconductor, 304 bandgap, 303, 319 conduction band, 303, 315 diode, 323 electron–hole recombination, 313 intrinsic, 305 -laser diode, see Laser diode lightwave amplifier, see Lightwave amplifier radiative recombination time, 319 valence band, 303, 315 Sensebit, 649 Senseword, 5, 610, 626 corrupted, 621 hard-valued, 5, 610 memoryless, 418 pictorial representation, 621 soft-valued, 5, 610 symbol, 412 uncorrectable, 622 Separation of variables, 126, 133, 139, 145, 151 Sequence detection, 522, 535 constraint length, 537 error rate, 540

maximum-likelihood, 535, 542 maximum-posterior, 522, 544 minimum-distance, 538 trellis, 537 Sequential decoding, 629, 633 complexity, 639 hypothesis testing, 633 Shannon bound bandlimited channel, 21, 711 single-letter capacity, 697, 699 spectral rate efficiency, 22, 719 entropy, 673, 785 first coding theorem, 676 second coding theorem, 677 Shaping, gain, 472, 699 Shift-invariant system, 28 Shot noise, see Noise, shot Shot-noise-limited error rate, 487 heterodyne demodulation, 385, 795, 796 homodyne demodulation, 487, 796, 831, 842, 866 relative intensity noise, 346, 359 system, 286, 346, 359, 429, 455, 457, 559, 767 Side information, 726 Sideband, 205 Sifting property of an impulse, 28, 345 Signal, 27, 41 analytic, 32, 101 bandlimited, 12, 24, 34 baseband, 3, 12, 24 complex envelope, 39, 144, 162, 361 spectrum, 364 vector, 218 complex-baseband, 12, 39, 53, 77, 101, 307, 362, 383 conjugate symmetric, 32, 269 dual-polarization, 493 envelope, 12, 14 passband, 162 impairment, see Impairment lightwave, see Lightwave signal narrowband, 12, 44 noise, see Noise nonorthogonal, see Nonorthogonal signal orthogonal, see Orthogonal signal passband, 3, 12, 39 periodic, 12 polarization-multiplex, 491 quantum-lightwave, 7 real-baseband, 11, 77, 307, 362, 383, 386, 489, 694 rectified, 14 regeneration, 5 right-sided, 29

Index

spectrum, 30 square-integrable, 30 timewidth-limited, 34 vector, 42, 788, 881 basis, 100 euclidean distance, 44 waveform, 3 Signal constellation, 414, 480 antipodal, 414, 466 complex, 464 four-dimensional, 491 matrix, 497, 818, 851 quantum, 825, 832 on–off intensity modulation, 414, 467 phase-shift modulation binary, 466 eight-level, 472 multilevel, 472 pulse-amplitude modulation, 471 quadrature-amplitude modulation, 464, 497 nonsquare, 472 quantum-lightwave, 780 average state, 780, 788, 847 coherent-state components, 819 nonorthogonal components, 815 pure-state components, 826, 844 symmetric, 840 real, 414 signal space, 496 Signal model, 6 geometrical-optics, 6, 119 photon-optics, 7, 11, 237, 246, 737 relationship to quantum optics, 736 quantum-optics, xxv, 6, 7, 246, 446, 736 ray-optics, 6, 110 semiclassical, 7, 680, 748, 815 wave-optics, xxv, 6, 20, 83, 115, 122, 239, 361, 737 relationship to quantum optics, 741 Signal space, 41 basis, 41, 759 composite, 742, 759 definition of distance, 44 for a classical signal constellation, 496 for a quantum-lightwave signal constellation, 742 generalized measurement, 789 higher-dimensional, 536 Hilbert, 752 infinite-dimensional, 736 Signal state, 736, 738, 750 block-symbol, 742, 778, 815, 819, 846, 848 entangled state, 781 product state, 781, 819 cat, 833 classical, 741, 800 coherent, see Coherent state, 762

931

component-symbol, 742, 759, 776, 778, 780, 815, 819, 844, 846, 869 marginalization to, 858 composite, 742, 813 detection, 738, 739, 747, 815, 816, 818, 820, 843 block-symbol, 846 coherent-state, 789 component-symbol state, 845 entangled, 760, 811, 819, 861 Fock, see Photon-number state gaussian, 748, 800, 806 circularly symmetric, 748, 749, 806, 807, 873 Glauber, 739, 761 image quantum-optics, 794 measurement, 742, 776 mixed, see Mixed signal state, 778, 837 nonclassical, 741, 749, 782 noncoherent, 783 nongaussian, 749, 800 nonorthogonal, 774, 778, 786, 824, 840 orthogonal, 745, 747, 777, 823, 860, 879 photon-number, see Photon-number state preparation, 738, 741, 815, 843 product, see Product state pure, see Pure signal state, 760, 776, 778 sampling, 823, 834 symmetric, 840 vacuum, see Vacuum state Signal state, detection, 738 Signal-spontaneous emission noise, 337, 461 Signal-to-noise ratio, 21, 22, 26, 75, 108 electrical (SNR), 75, 108, 343 optical (OSNR), 75, 108, 268, 341, 343 relationship between SNR and OSNR, 339 sample, 420, 424, 426, 435 effective, 424, 438 matched filter, 426, 435 Signum function, 29 Silica glass, 83, 111, 112, 166, 203, 210 material dispersion curve, 164 Sinc pulse, 37 Single -input single-output channel, 363 -letter channel capacity, see Channel capacity, single-letter -mode optical fiber, see Optical fiber -user detection, 373, 726 Singular value, 49, 554, 555 Singular-value decomposition, 49, 190, 554 Skew ray, 116 Slab waveguide, 125 dispersion, 157 power density, 130 TE characteristic equation, 128 TE mode pattern, 131 TM characteristic equation, 129

932

Index

Slow optic axis, 97, 183 Smoothing function, 801 Snell’s law, 115 SNR, see Signal-to-noise ratio Soft-decision decoder, 610 decoding, 625 detection, 18, 418, 608, 610, 698 Solid angle, 117 Space -invariant system, 28 -multiplex channel, 15, 369, 371, 555 Space–time separable channel, 376, 494 Span, 41 Spatial Fourier transform, 30 coherence function, 98 region, 374 frequency, 30, 120, 127 Spatiotemporal covariance matrix, 375 Spatiotemporal mode, 239, 244, 250, 739, 761, 769 Speckle, 374 Spectral –notched codes, 663 shaping, 664 susceptibility, 83, 165 width, see Linewidth Spectral rate efficiency, 156, 474, 475, 718 arbitrary modulation, 718 for several modulation formats, 720 maximum, 22 phase modulation, 720 photon-optics, 719 wave-optics, 718 wideband regime, 719 Spectrum, 30 Speed of light material, 83 vacuum, 83 Spherical decoding, 621 Split-step Fourier method, 229, 564 Spontaneous emission, 246, 256 amplified, 334, 337 noise factor, 335 nonlinear mixing with signal, 339 power density spectrum, 257, 336 phase-synchronous demodulation, 381 Spontaneous–spontaneous emission noise, 337 Square-integrable signal, 30 Square-law photodetection, 19 Square matrix, 46 Square-root detection basis, 840, 851 Standard deviation, 54 State, see Signal state State preparation, see Signal state, preparation

Statebook, 780, 847, 875 Stationarity strict sense, 68 wide sense, 68 Stationary point, 530 random process, 68 Statistic, 52 sufficient detection, 399, 419, 420, 622 Statistical quantum optics, 743, 776 uncertainty, 7, 262, 683, 743, 776, 820 Step-index optical fiber, 110, 132, 133, 179 multiple-layer, 180 Stimulated Brillouin scattering, 207, 313, 663 Raman scattering, 206, 313 emission, 246, 312 cross section, 315, 355 laser diode, 326 Stochastic differential equation, 347 gradient descent method, 600, 603 process, 52 Stokes parameters, 94 representation of polarization, 94, 186 scattering, 205 vector, 108 Subcarrier, 16, 201, 225, 368 interchannel interference, 551 nonlinear interference, 224 walk-off length, 215 Subchannel, 15, 551 channel capacity, 706 degrees of freedom, 368 energy redistribution, 373 independent, 285, 706 multimode fiber, 369 polarization, 368 time, 368 uncorrelated, 709 wavelength, 368, 553 Subcode, 640 Submatrix, 48, 615 Sufficient detection statistic, 399, 412, 419, 420, 622 Sum–product algorithm, see Bahl algorithm Superposition, 27 integral, 28 Surviving path, 540 Susceptibility, 81 nonlinear coefficient, 203 spectral, 83, 165 temporal, 82, 165 tensor, 81 Symbol, 1

Index

clock, 586 energy, 365, 382, 423, 426, 466, 471, 514 error, 5, 21, 451, 456 mapping, 451 pulse, 3 state, 451 Symbol state block-symbol, see Signal state, block-symbol component-symbol, see Signal state, component-symbol Symbol-by-symbol detection, see Detection, symbol-by-symbol Symmetric matrix, 47 Symptom, 648 Synchronization carrier phase, 574 block, 573 clock phase, 573, 586 frame (data), 573, 588, 590 marker, 588 detection, 589 preamble, 588 Syncword, 588 Syndrome, 622, 648 System, 1, 27 additive, 28 bandlimited, 26, 74 causal, 29 continuous time, 28 homogeneous, 28 linear, 27, 28, 560 linear time-invariant, 33, 84, 363 memoryless, 30 shift-invariant, 28 shot-noise-limited, 286, 429, 455, 457, 559, 767 space-invariant, 28 spatially local, 30 time-invariant, 28 Tangential field component, 139 Tanner graph, 625, 647 girth, 627 Target pulse, xxvii, 416, 465, 525 Temporal coherence, 97 coherence function, 98 defocusing, 367 Fourier transform, 30, 125, 162 susceptibility, 165 Tensor product, 43, 759 Theorem Campbell’s, see Campbell’s theorem central limit, see Central limit theorem fluctuation–dissipation, 335 Hartley–Shannon, 21 Isserlis, 54, 267, 297

933

noisy channel coding, 677 quantum optics, 860 sampling, see Sampling theorem Wiener–Khintchine, 69 Thermal energy, xx, 112, 206, 243 equilibrium, 206, 243, 244 maximum-entropy distribution, 250 noise, 78, 237, 248, 250, 358, 360, 396, 403, 461 channel, 872 floor, 249 optical, 256 power density spectrum, 249, 250, 396 noise regime, 248 radiation, 113 Thermodynamics, 243 second law of, 243 Three-dB coupler, 301 Threshold detection, see Detection threshold lasing, see Laser diode, threshold current Time-division multiplexing, 16, 368 Time-invariant system, 28 Timeframe, traveling, 167, 218, 220, 224 Timewidth, 34, 82 effective, 34 -limited signal, 34 root-mean-squared, 34, 100 Timewidth–bandwidth inequality, 36, 767 product, 35, 688, 707, 711 Total harmonic distortion, 355 Total internal reflection, 116 Trace, 47, 776, 777, 829 outer product, 102 partal, 779, 812, 858 Training sequence, 372, 574, 590 Transfer function, 33, 73 baseband noise-equivalent bandwidth, 75 block, 373 channel matrix, 373 complex-baseband, 40, 364, 371, 406 lightwave noise-suppressing filter, 293 electrical, 395 matrix, 370 multi-input multi-output channel, 373 passband, 40 noise-equivalent bandwidth, 75 phase-locked loop, 579 first-order, 604 second-order, 604 polarization-dependent, 370 receiver, 287 Transform Fourier, see Fourier transform Hilbert, 32, 101

934

Index

Kramers–Kronig, 32, 83, 102, 165 Poisson, 262, 799 Transformation matrix, 42, 50 self-adjoint, 49, 742, 752 unitary, 48, 370 Transit-time dispersion, 156 Transition probability, 636, 672, 691, 700, 729, 848 Transmittance, 114, 321, 364 Transparency region, 112 Transversal filter, 531, 559 Transverse electric mode (TE), 124, 151 characteristic equation, 128, 158 even, 128 mode pattern, 131 odd, 128 step-index fiber, 143 electromagnetic mode (TEM), 85, 124 field component, 88 magnetic mode (TM), 124, 151 characteristic equation, 129 step-index fiber, 143 Trellis, 537 branch, 537, 538, 626 code, 656 -coded modulation, 656 branch, 658 for a convolution code, 632 graph, 625 minimal, 631 node, 537 surviving path, 540 Triangular pulse, 101 Turbo code, see Berrou code Turning point, 809 Uncertainty quantum, 7, 262, 685, 743, 750, 820, 824 statistical, 7, 262, 685, 743, 820 Ungerboeck code, 656 Unguided mode, see Mode, unguided, 162 Union bound, 479, 541, 840, 842 quantum, 842 superfluous term, 479, 541 Unit-step function, 29 Unitary matrix, 48, 185, 196, 370 transformation, 48, 370, 753 Up-conversion, 378 Upper-state lifetime, 246, 314, 318 V parameter, 128, 151, 157 V π , 329 Vacuum state, 246, 248, 771, 772, 794, 800 energy, 246, 764, 770, 771

fluctuations, 249, 487 operator, image mode, 794 Wigner distribution, 800 Variance, 54 Variational calculus, 241, 252, 429, 530 Vector field, 84, 89, 126 axial component, 88, 124, 139 normalized, 88 tangential component, 139 transverse component, 88 Helmholtz equation, 126 notation, xxv Vector space, 41 Velocity group, see Group velocity phase, see Phase velocity Viterbi algorithm, 540, 565, 626 decoding, 629 von Neumann entropy, 784, 881 antipodal coherent-state modulation, 865 nonorthogonal signal states, 785, 864 orthogonal signal states, 864 pure signal state, 785 Voronoi regions, 478, 497 Walk-off length, 213, 214, 216, 234 time, 215 Water filling, 706, 735 spatial subchannel, 706 Wave equation, 83, 108, 114 optical fiber, 132 Wave optics, 6, 115 bandlimited capacity, 710 relationship to photon optics, 250, 259 relationship to quantum optics, 741 signal model, 6, 20, 83, 122, 239, 361, 737 signal rate, 735 single-letter capacity, 696, 698 spectral-rate efficiency, 718 Waveform, 3 bandlimited, 45, 109, 723 baseband, 3, 45, 398 channel, 17, 397 bandlimited, 18 electrical, 17, 398 lightwave, 17 complex-baseband, 399 continuous-time, 398, 414 data-modulated, 214, 573 information channel, 18, 679 bandlimited, 676, 710 nonlinear, 671 modulated, 179

Index

noncoherent, 500 nonorthogonal, 391 passband, 3, 39, 710 power, 467 pulse, 570 random binary, 71 real-baseband, 310, 380, 411, 581 Wavefront, geometrical, 92 Waveguide, 6, 88, 121 dielectric, 87, 124 cylindrical, 132 dispersion, see Dispersion dispersion coefficient, 180 optical fiber, see Optical fiber slab, 110, 125 Wavelength, 7, 85 -dependent group delay, 155, 170, 173, 179, 199, see group delay, wavelength-dependent -division multiplexing, 16, 368, 568 bandlimited channel, 726 free-space, 84 Wavenumber, 8, 84, 86 free-space, 80, 84, 91, 121, 165, 209 Wavevector, 85 local, 92 Weakly guiding fiber, 133 Weight, Hamming, 613, 622

minimum, 613, 634 Weight profile, 628, 635 Well-guided mode, 130, 161, 211 White noise, 73, 249, 271, 424, 528 Whitened matched filter, 427, 458 Wideband regime photon-optics capacity, 715 wave-optics capacity, 719 Wiener filter, 591, 598 process, 343 Wiener–Hopf equations, 598 Wiener–Khintchine theorem, 69 Wigner distribution, 799, 802, 805 gaussian state, 806 relationship to other distributions, 802 vacuum state, 800 Z channel, 693 mutual information, 693 Zero-forcing equalizer, 528, 533 multi-input multi-output channel, 554 polarization alignment, 554 Zero-material dispersion wavelength, 164 Zero-point energy, 246, 771

935