High-Speed CMOS Circuits for Optical Receivers [illustrated] 079237388X, 9780792373889

With the exponential growth of the number of Internet nodes, the volume of the data transported on the backbone has incr

759 125 9MB

English Pages 124 [132] Year 2001

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

High-Speed CMOS Circuits for Optical Receivers [illustrated]
 079237388X,  9780792373889

Table of contents :
List of Tables ..............cdlxxxvi
CLOCK AND DATA RECOVERY ..............
A CMOS INTERFACE FOR DETECTION OF 1 2GBS ..............
Vea L ..............
responding waveforms 71 ..............

Citation preview

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Jafar Savoj Transpectrum Technologies, Inc.

Behzad Razavi University of California, Los Angeles

KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW

eBook ISBN: Print ISBN:

0-306-47576-6 0-7923-7388-X

©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©2001 Kluwer Academic Publishers Dordrecht All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: and Kluwer's eBookstore at:

http://kluweronline.com http://ebooks.kluweronline.com

Contents

List of Figures List of Tables Preface

vii xi xiii

1. INTRODUCTION 1.1 Overview of the Fiber Optic Network 1.2 Overview of Fiber Optic Transceivers 1.3 Overview of Topics

1 3 5 12

2. TIAS AND LIMITERS 2.1 TIAs 2.2 Limiters

13 13 16

3. CLOCK AND DATA RECOVERY ARCHITECTURES 3.1 Open-Loop CDR Architectures 3.2 Phase-Locking CDR Architectures 3.2.1 Full-Rate and Half-Rate Architectures 3.2.2 Oscillators 3.2.2.1 General Theory 3.2.2.2 Ring Oscillators 3.2.2.3 LC Oscillators 3.2.2.4 PLL Jitter Calculation 3.2.3 Phase Detectors 3.2.3.1 Linear Phase Detectors 3.2.3.2 Binary Phase Detectors 3.2.4 Frequency Detectors 3.2.4.1 Referenced Frequency Detectors 3.2.4.2 Referenceless Frequency Detectors 3.2.5 Decision Circuits

21 22 23 27 29 29 30 32 41 42 45 48 52 53 55 58

vi

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

4. A CMOS INTERFACE FOR DETECTION OF 1.2-GB/S RZ DATA 4.1 Introduction 4.2 Matched Filtering 4.3 Architecture 4.4 Building Blocks 4.4.1 Low-NoiseWidebandAmplifier 4.4.2 Integrate-and-Dump Circuit 4.4.3 Demultiplexer 4.4.4 Clock Buffer 4.5 Experimental Results 4.6 Conclusion

61 61 62 65 67 67 69 71 73 74 74

5. A 10-GB/S LINEAR HALF-RATE CMOS CDR CIRCUIT 5.1 Architecture 5.2 Building Blocks 5.2.1 VCO 5.2.2 Phase Detector 5.2.3 Charge Pump and Loop Filter 5.3 Experimental Results 5.4 Conclusion

77 77 80 80 83 87 89 92

6. A 10-GB/S CMOS CDR CIRCUIT WITH WIDE CAPTURE RANGE 6.1 Introduction 6.2 Architecture 6.3 Building Blocks 6.3.1 VCO 6.3.2 Phase and Frequency Detector 6.3.3 Charge Pump 6.3.4 Output Buffers 6.4 Loop Characterization 6.5 Experimental Results 6.6 Conclusion

95 95 97 98 98 102 106 107 108 109 113

7. CONCLUSION

115

REFERENCES

119

Index

123

List of Figures

1.1 1.2

1.3 1.4

1.5 1.6 1.7 1.8 1.9 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12

Volume of data transported over the Internet. SONET architectures. Light propagation in single-mode and multi-mode fibers. SONET OC-192 interfaces. Role of the framer and the mapper in processing the data. Fiber optic transceiver. (a) Four-to-one and (b) two-to-one multiplexer. Current-steering (a) multiplexer and (b) latch. The clock multiplying unit. Common-gate TIA. Feedback TIA and its realizations. Simple limiter. (a) Cherry-Hooper amplifier, (b) Gilbert gain cell. (a) Inductive peaking, (b) simple inductor model, (c) more complete inductor model. Instability resulting from feedback through supply line. Detector with peak value sampling. Edge detection of the random data. Edge detection using an XOR gate. Spectral line clock and data recovery. Generic phase-locking CDR circuit. Jitter transfer mask. Jitter tolerance mask. (a) Full-rate and (b) half-rate data recovery. Effect of non-ideal duty cycle. Negative feedback system. Three-stage ring oscillator. Delay interpolation.

1 3 3 4 6 8 9 10 11 13 15 16 17 18 19 21 22 23 23 24 25 27 28 29 30 30 32

viii

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.30 3.31 3.32 3.33 3.34 3.35 3.36 3.37 3.38 3.39 3.40 3.41 4.1

Substrate loss vs. sheet resistance. (a) Decaying impulse response. (b) Oscillatory impulse response of the tank. Compensation of the tank loss in an LC oscillator. (a) pn junction, (b) MOS varactor. Block diagram of a quadrature oscillator. The quadrature LC oscillator. Modified tuning mechanism for a quadrature oscillator. Tuning a quadrature oscillator by changing the coupling coefficient. Doubling the oscillation frequency by means of a quadrature oscillator. Multi-phase coupled oscillator. Ring oscillator incorporating common-source stages with inductive loads. XOR gate operating with periodic data. Beat frequency at XOR output for inputs with different frequencies. (a) Phase/frequency detector. Circuit response with (b) (c) A leading B. Hogge phase detector. Problem of triwave. Modified Hogge phase detector. (a) Alexander phase detector. Operation of the circuit with (b) late clock and (c) early clock. (a) CDR circuit using a D flipflop phase detector, (b) PD characteristic, (c) addition of skews in and (a) Pottbacker phase/frequency detector. Samples generated by for (b) early clock and (c) late clock. Linearized early-late detector. Frequency acquisition using a similar VCO. Dual loop frequency acquisition. Quadricorrelator. Quadricorrelator operating on random data. Phasor diagram of the clock and data signals. Characteristic of the frequency detector in Fig. 3.32(a). CDR architecture with wide capture range. Detection with integration over one bit prior to sampling. Role of the interface circuit and pseudo-differential input signals.

33 33 34 35 37 37 38 39 40 41 42 43 44 44 46 47 48 49 50 51 52 54 55 56 56 57 58 59 60 62

List of Figures

4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9

4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13

5.14 5.15 6.1

Output SNR for a system (a) without and (b) with a matched filter. Matched filter for rectangular pulse. Single matched filter. Interleaved matched filters. Interface architecture. (a) Seven-stage amplifier, (b) the first commongate stage, (c) the following common-source stages. Amplifier’s transfer function. (a) Overall input-referred noise and (b) output eye of the amplifier. Stacked inductor. (a) High-speed integrate-and-dump circuit, (b) corresponding waveforms. (a) Addition of hold phase, (b) corresponding waveforms. Demultiplexer. Clock buffers. Die photograph. Eye diagram of the output. Addition of matched filtering to optical receivers. Generic CDR architecture. Half-rate CDR architecture. Effect of non-ideal duty cycle. (a) Three-stage ring oscillator, (b) implementation of each stage, (c) transistor-level schematic. Small-signal (a) gain and (b) phase response of each delay stage. VCO gain partitioning: (a) fine control and (b) coarse control. (a) Phase detector, (b) operation of the circuit. Symmetric XOR gate. Determination of PD gain. Charge pump and loop filter. Lock acquisition. Chip photograph. (a) Spectrum of the recovered clock, (b) recovered clock in the time domain. Measured jitter transfer characteristic. (a) Recovered demultiplexed data, (b) recovered full-rate data. CDR architecture.

ix

63 64 65 65 66 67 68 69 70 71 72 72 73 74 75 75 78 79 80 81 82 83 84 86 87 88 89 90

91 92 93 98

x

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12

6.13 6.14 6.15 6.16 6.17 6.18

(a) Four-stage LC-tuned ring oscillator, (b) implementation of each stage, (c) simple model of the 99 load. Distributed inductor model. 100 Signal arrangements (a) minimizing differential ca101 pacitance, (b) equalizing the length of the traces. In-phase and quadrature samples of positive and 103 negative data edges with early and late clock signals. 104 Phase Detector. Timing diagram in the PFD for slow and fast clock 105 signals. 105 Phase and Frequency Detector. 106 Modified multiplexer. 107 Charge Pump. Output Buffer. 107 (a) Linearized small-signal model of the loop, (b) 108 simple loop filter, (c) VCO noise shaping. 109 Chip photograph. (a) VCO tuning range, (b) phase noise over tuning range. 110 (a) Spectrum of the recovered clock, (b) recovered 111 clock in the time domain. 112 Measured jitter transfer characteristic. 112 Measured jitter tolerance characteristic at 5 Gb/s. 113 Recovered clock and data.

List of Tables

4.1

Simulated gain, power, and noise distribution.

68

Preface

With the exponential growth of the number of Internet nodes, the volume of the data transported on the backbone has increased with the same trend. The load of the global Internet backbone will soon increase to tens of terabits per second. This indicates that the backbone bandwidth requirements will increase by a factor of 50 to 100 every seven years. Transportation of such high volumes of data requires suitable media with low loss and high bandwidth. Among the available transmission media, optical fibers achieve the best performance in terms of loss and bandwidth. High-speed data can be transported over hundreds of kilometers of single-mode fiber without significant loss in signal integrity. These fibers progressively benefit from reduction of cost and improvement of performance. Meanwhile, the electronic interfaces used in an optical network are not capable of exploiting the ultimate bandwidth of the fiber, limiting the throughput of the network. Different solutions at both the system and the circuit levels have been proposed to increase the data rate of the backbone. System-level solutions are based on the utilization of wave-division multiplexing (WDM), using different colors of light to transmit several sequences simultaneously. In parallel with that, a great deal of effort has been put into increasing the operating rate of the electronic transceivers using highly-developed fabrication processes and novel circuit techniques. The design of the clock and data recovery (CDR) circuit is the most challenging part of building a high-speed optical transceiver because of the complexity of this block. In this book, the design and experimental results of two CDR circuits are described. Both the circuits achieve a

xiv

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

high operating speed by employing the concept of “half rate”, meaning that the clock frequency is half the data rate. Furthermore, broadband circuit techniques including wideband amplification and high-speed matched filtering are described in this book. The two CDR circuits benefit from two major techniques for phase detection, namely linear and binary. The design of the linear phase detector is based on a new technique that allows a fast speed and low power consumption because of its simplicity. The new binary phase/frequency detector provides a wide capture range and a phase error signal that is only revalidated at data transitions. Furthermore, the design of the CDR circuits involves utilization of two major types of voltage-controlled oscillators, which are ring and LC-tuned. The ring oscillator described in this work achieves a wide tuning range and low power consumption. The LC oscillator benefits from a new topology that provides multiple phases with low jitter.

Chapter 1 INTRODUCTION

The volume of the data transported over the Internet backbone has increased with the exponential growth of the number of Internet users. As shown in Fig. 1.1, the load on the global Internet backbone will be

as high as 11 Tb/s by the year 2005. This means that the bandwidth requirements will increase by a factor of 50 to 100 every seven years. Among the available transmission media, optical fibers achieve the highest bandwidth and the lowest loss. These characteristics make them an attractive medium for transmission of data over long distances. Despite the unique transmission capabilities of optical fibers, the data needs to be regenerated after a few tens of miles. Data gets distorted as it travels through the fiber, mostly because of the fiber dispersion. This

2

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

distortion leads to the closure of the data eye. The signal amplitude is also reduced due to the loss throughout the fiber. Restoration of the original data at the receiver side with acceptable bit-error rate (BER) can only be performed if the signal sustains the required signal-to-noise ratio (SNR). The data must be regenerated midway to prohibit degradation of its SNR. There has been extensive research performed on finding techniques for regeneration of the data in optical domain. However, most of these techniques are still under study and the majority of the commercial systems employ electronic interfaces for regeneration of the data. As a result, the optical pulses are first converted into electric current, regenerated and processed in the electric domain, and then converted back into the optical pulses. The complexity of the procedure that takes place in the regenerator introduces latency. Furthermore, the maximum data rate is determined by the speed of the electronic interface. The operating speed of the backbone can be increased by either designing faster electronic interfaces to handle a higher data rate, or by using a number of parallel regenerators and wave-division multiplexing (WDM) to combine a number of high-speed optical data streams on one fiber channel. Throughout this book, various approaches for increasing the operating rate of the regenerators are addressed. These approaches introduce innovations at both the system and circuit levels. Special attention has been paid to reducing the complexity of the circuits, so that a number of transceivers can be placed on one chip if parallelism is used. In this work, we have targeted a data rate of 10 Gb/s. With the operating rates increasing to 10 Gb/s, and costs staying the same, new applications can be introduced that will become more attractive as the cost of transport per bit decreases. The majority of the backbone optical communication systems are based on the SONET standard. Short for Synchronous Optical Network, it was proposed by Bellcore in mid 80s and is now an ANSI standard. SONET defines a hierarchy that allows data streams of different rates to be multiplexed. SONET recommends optical carrier (OC) levels that are integer multiples of 51.85 Mb/s. This standard has allowed different communication carriers to interconnect their existing fiber optic systems [1]. The SONET OC-192 standard has been specified for 10 Gb/s optical communication. SONET recommends two types of architectures for use in metropolitan and long-haul areas: ring and point to point (Fig. 1.2) [2]. An OC-192 ring replaces multiple pre-existing OC-48 rings operating at lower speeds. Furthermore, it allows a larger number of nodes to be placed on the ring and provides the capability to process more added

Introduction

3

and dropped traffic at each node. For this reason, the complexity of the network is drastically reduced. Point-to-point architecture allows flexible routing between different nodes and point-to-point services require connections on a per-customer basis. Other services provided by the OC-192 standard include video conferencing and ATM-based services like LAN interconnections.

1.

Overview of the Fiber Optic Network

The fiber used in the construction of a network is either single-mode or multi-mode. Single-mode fiber is mostly used with a coherent light source that produces a pure spectrum. Multi-mode fiber on the other hand is used with optical sources that are not coherent or not spectrally pure [3]. As shown in Fig. 1.3, single-mode fibers are designed to have a very small core that limits the modes of propagation, and an index of re-

4

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

fraction profile that allows light to remain in the core. In a multi-mode fiber, light that enters the fiber core at one end continuously bounces off the interface of the core and cladding until it exits the fiber at the other end. This effect can result in the dispersion of the pulses entering the fiber. The light that directly travels through the core of the fiber reaches the other end faster than the light that continuously bounces off the interface. Because of this effect, the pulse gets wider as it travels through the fiber. This pulse spreading limits the maximum length of a fiber link formed using multi-mode fibers. Dispersion is usually not a limiting factor with single-mode fibers. For these fibers, the length is limited by the attenuation of the signal. Therefore, long hauls are formed using single-mode fibers. Figure 1.4 depicts the interfaces for the OC-192 optical systems [2]. The main optical path consists of the fiber and the optical line ampli-

fiers. Both the transmitter and the receiver equipment employ optical amplifiers. On the transmitting side, the transmitter is followed by a booster amplifier. In the receiving end, a preamplifier and an optical fil-

Introduction

5

ter process the optical pulse before going to the receiver. The parameters should be chosen to provide an overall BER of better than The data transmitted over the fiber is encoded in nonreturn-to-zero (NRZ) format. Therefore, the data stream does not carry any information about the clock signal, and its spectrum contains no spectral components at the frequency of the data rate. The only measure for the clock signal that can be derived from the data sequence is the minimum spacing between consecutive zero crossings of the data. This measure can be extracted through nonlinear circuit techniques such as edge detection, detecting the timing information contained in the transitions between nonidentical adjacent bits. As a result, the edge-detected signal contains a tone at the data rate. An NRZ stream can contain long sequences of ones and zeros with no transitions in between. If the number of transitions is too low, synchronization at the receiver end will become very difficult. For example, if the receiver contains a phase-locked loop (PLL), the frequency of the oscillator can drift during these long sequences of identical bits such that the recovery of the data would no longer be possible. To overcome this difficulty, high-speed communication systems encode the data such that the maximum length of a continuous sequence of ones or zeros is limited. A widely-accepted technique is the 8B/10B encoding [4] that has been used for some of the systems operating at 2.5 Gb/s. It generates an encoded stream at 3.125 Gb/s. Using this coding technique, an eight-bit data byte is converted into ten bits. As a result, the minimum and the maximum number of consecutive zeros or ones is one and five, respectively. In addition to providing a higher transition density, this type of encoding limits the low-frequency content of the data stream such that the sequence has no dc component on average. As a result the optical modules can be ac coupled. Finally, this encoding scheme detects many signaling errors. A new encoding scheme is the 64B/66B encoding [5], in which two additional bits are added to every 64 bits. If this encoding scheme is applied to a 10-Gb/s stream, it will result in a high-speed sequence of approximately 10.3 Gb/s.

2.

Overview of Fiber Optic Transceivers

In the fiber optic system, the receive and transmit modules contain electronic blocks, each of which consists of several analog and digital integrated circuits (ICs). The analog circuits detect, retime, serialize, and deserialize the data. When the data rate is lowered by the analog circuits, the digital circuits process the data according to the standard’s

6

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

recommendations. The digital ICs are generally referred to as Framers and Mappers and are controlled by a microprocessor (Fig. 1.5).

The available commercial systems use digital ICs fabricated in a mainstream CMOS process and high-speed analog front-end ICs fabricated in a silicon bipolar or III-IV process. The processes used for the implementation of the analog circuits should have a transit frequency that is several times higher than the data rate. Recent developments of the CMOS technology has introduced processes with a minimum feature size smaller than Since the of the technology is inversely proportional to the square of the minimum feature size, the scaling has yielded in processes with an of a few tens of gigahertz. This has provided the capability of integrating the analog and the digital sections on the same chip. Some of the advantages of this integration are the followings: Cost: A significant portion of the cost of electronic systems comes from packaging, circuit board design, and chip fabrication. Integration of multiple chips into a single one reduces the number of packages and decreases the circuit board area. Furthermore, CMOS processes have lower fabrication cost compared to other processes because they include a fewer number of masks. Power Dissipation: If multiple chips are integrated into a single circuit, the power-hungry output buffers driving the terminations can be eliminated. Also, CMOS devices display the desired performance at a smaller current density compared to other processes. Time to market: The turn-around time of CMOS processes is shorter than that of other processes. There are numerous foundries providing a digital CMOS process. As a result, institutions employ-

Introduction

7

ing this technology benefit from a sound backup foundry, and a shorter pre-process waiting period. Due to the huge momentum of the digital market, the CMOS process develops faster than other processes. The migration of to CMOS process has taken place in only two years. Newly developed SiGe BiCMOS processes are another good alternative for development of high-speed integrated circuits. These processes offer very fast bipolar devices suitable for building analog front-ends and dense CMOS devices for the digital portion. A modified BiCMOS process that does not have the trench isolation [6] has a fabrication cost and turn-around time that is not much different from a pure CMOS process. The number of fabrication masks for this BiCMOS process is slightly higher than that of a CMOS process. The drawbacks of the BiCMOS process are the small number of supplying foundries and the fact that the scaling of their CMOS devices is usually not as aggressive as that of the fastest available CMOS processes. The digital circuits fabricated using these BiCMOS processes cannot operate as fast as those circuits fabricated in a pure CMOS process. Benefiting from such capabilities, the CMOS technology is a perfect solution for implementation of systems that employ parallelism. A number of transceivers are placed on one chip to handle the incoming highspeed sequences. These signals are carried over either a bundle of fibers or a single fiber that uses wave-division multiplexing (WDM). The complexity and the power dissipation of the transceivers are critical as they determine the number of transceivers that can be placed on one chip. Figure 1.6 depicts a fiber optic transceiver consisting of a transmitter and a receiver. In the transmitter, parallel sequences of data at lower rates are combined in a multiplexer to generate a single high-speed serial signal. Multiplexing of data is performed in multiple steps, with a gradual increase in the rate of the merging sequences. The multiplexer therefore operates with a number of clock signals whose frequency doubles as the multiplexing advances to the next level. Figure 1.7 depicts a conceptual topology of 4-to-l and 2-to-l multiplexers. Shown in Fig. 1.7(b), multiplexing at any level is performed by a combination of five latches and a multiplexer. The four latches tend to retime the data. The fifth latch, skews the data in one of the signal paths by one half of a clock period. As a result, the multiplexer samples both of the sequences starting from the middle of the data period. Figure 1.8 depicts the structure of current-steering latch and multiplexer used for high-speed applications. This structure allows for a reduced voltage swing that is well defined. Reduction of the output

8

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

swing increases the operating rate of the circuit. Furthermore, simulations indicate that switching of the current can be performed at a speed higher than the switching of the voltage. As a result, these architectures provide the maximum operation speed for a given technology. The structure of the latch and the multiplexer is quite similar except for the second input of the multiplexer that is replaced by a cross-coupled pair in the latch. The incoming data sequences experience phase shifts with respect to the transmitter clock signal. To cancel these time delays, these signals are written into a FIFO and passed to its output. The depth of the FIFO is typically chosen to be between 4 and 6. The FIFO is driven by the transmitter clock signal. The clock multiplying unit (CMU) generates the clock signals, used in the multiplexer. This circuit operates based on phase locking of the internal voltage-controlled oscillator (VCO) to an external reference. Shown in Fig. 1.9, the oscillator should produce a clock signal at a frequency equal to the data rate. If the transmitter VCO oscillates at 10 GHz and the crystal reference generator produces a signal at 156.25 MHz,

Introduction

9

the transmitter PLL should incorporate a 64:1 divider. This frequency division is performed in 6 steps, each step reducing the frequency by a factor of two. As a result, a group of clock signals with their frequency varying from 10 GHz to 156.25 MHz will be provided. Frequency division is performed by placing two latches in a negative feedback loop. At high-frequencies, the frequency divider can oscillate at the natural frequency defined by the ring oscillator consisting of the two latches. This undesirable oscillation is alleviated if the divider is driven by a strong external tone. In reality, the amplitude of the signal, required for the switching of the latches in the frequency divider is reduced as the frequency of the input clock approaches twice the self-resonance frequency of the divider. In that region, the divider operates in an injection-locked mode rather than a static mode. The CMU takes advantage of the fact that it operates with a periodic clock signal rather than a random data sequence. This simplifies the design of the phase/frequency detector and the charge pump used inside the loop. The loop filter should be designed to guarantee the stability of the loop and suppress the jitter introduced by the phase noise of the oscillator. It may seem that the multiplexer that produces a maximum data rate of 10 Gb/s requires clock signals of frequencies not exceeding 5 GHz,

10

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

considering that the data rate at the output of the multiplexer is twice the frequency of the clock signal that drives the circuit. However, it is critical to retime the output of the last multiplexer with a flipflop operating at a frequency equal to the data rate for two reasons. First, in the presence of mismatches the multiplexer will exhibit different delays from each of its inputs to the output. This inherent mismatch increases the static ISI on the output eye diagram of the multiplexer. Second, the two input data sequences experience timing mismatch on their path to the multiplexer, yielding to static bimodal jitter.

Introduction

11

The high-speed signal is ultimately modulated into the intensity of the light emitted by the laser diode. The laser driver isolates the multiplexer from the diode and boosts the signal level to the operational range of the diode. In the receiver, the received light is transformed to an electric current by the photo detector. The detector is followed by a low-noise highbandwidth transimpedance amplifier that converts the current into a voltage with a swing large enough for the proceeding blocks. The sensitivity of the photo detector can be increased by widening its light reception window. However, this increases the parasitic capacitance of the photo detector, complicating the design of the high-bandwidth receiver. This means that the receiver sensitivity trades off with its bandwidth. The limiting amplifier increases the voltage swings, while isolating its proceeding synchronous stages from the transimpedance amplifier. The clock feedthrough of the synchronous circuits to the sensitive transimpedance amplifier can heavily corrupt the data signal. The core of the receiver is the clock and data recovery (CDR) circuit. In this block, a clock signal is generated such that its rising/falling edges fall in the middle of the data eye. This means that if the clock signal is used to retime the data, the sampling occurs at the optimum point, improving the SNR of the receiver. CDR circuits for NRZ data can be categorized into two main groups: open-loop CDRs with high-Q filtering, and CDRs employing phase-locked loops (PLL). We provide a short overview of the first technique in chapter 3. However, the emphasis of the work, presented in this book, is on systems that utilize phase locking. The retimed high-speed data is subsequently split among parallel sequences of lower speed by the demultiplexer. Similar to multiplexing, demultiplexing is performed in multiple steps using clock signals of different frequencies. The frequency divider generates these signals.

12

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

The design of the CDR circuit is the most complicated part of implementing an optical transceiver. It entails many challenges that will be addressed in the following chapters.

3.

Overview of Topics

Chapter 2 describes the high-speed front-end circuits of the optical receivers, covering the design of transimpedance amplifiers and voltage limiters. Chapter 3 provides an overview of existing CDR architectures, and describes the implementation of their building blocks. Chapter 4 describes the techniques for optimizing detection in a high-speed receiver, focusing on wideband amplification and matched filtering. An interface built in CMOS process utilizing these techniques is introduced. The remainder of this book concentrates on designing high-speed CDR circuits operating at 10 Gb/s. Chapter 5 describes a CDR circuit incorporating a half-rate linear phase detector and a ring oscillator. Chapter 6 covers the design of a CDR circuit that uses a half-rate binary phase/frequency detector and a multi-phase LC oscillator. Chapter 7 concludes this book.

Chapter 2 TIAS AND LIMITERS

1.

TIAs

Transimpedance amplifiers play a critical role in optical receivers. Trade-offs between noise, speed, gain, and supply voltage present many challenges in TIA design. As TIAs experience a tighter performance envelope with technology scaling at the device level and speed scaling at the system level, it becomes necessary to design the cascade of the TIA, the limiter, and the decision circuit concurrently. The TIA bandwidth is typically chosen to be equal to 0.7 times the bit rate - a reasonable compromise between the total integrated noise and the intersymbol interference (ISI) resulting from limited bandwidth. Shown in Fig. 2.1, the common-gate (or common-base) topology is a candidate for TIAs as it provides a relatively low input impedance,

14

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

a broad band, and a well-behaved time response. However, its inputreferred noise current, is relatively high. This is because per unit bandwidth at low frequencies is given by

where denotes the excess noise coefficient of ( in technology). Interestingly, the noise currents of and are directly referred to the input with a unity factor. Furthermore, for a given supply voltage, and trade with each other because the minimum drain-source voltage of plus the voltage drop across cannot exceed In other words, and are inevitably large. It can also be shown that the noise contributed by and rises as the frequency increases and the photodiode capacitance, shunts the input. A TIA configuration that achieves more relaxed noise-headroom tradeoffs is the shunt-shunt feedback topology. Shown in Fig. 2.2(a) as feedback around a voltage amplifier the circuit exhibits a –3-dB bandwidth of (if the poles of are neglected) and an input-referred noise current per unit bandwidth equal to

where denotes the input-referred noise voltage of A1.1 The key point here is that does not carry significant dc current and can therefore be maximized so as to reduce both terms in (2.2). This is in contrast to the behavior represented by (2.1). Actual implementations of the feedback TIA suffer from voltage headroom, stability, and overshoot problems. Considering the example shown in Fig. 2.2(b), we recognize that significantly constrains the dc drop across thereby limiting the open-loop gain and raising the noise contributed by and Furthermore, the three poles at the input node, the drain of and the output node degrade the phase margin and hence the step response. Figure 2.2(c) suggests a modification that isolates the feedback path from the input capacitance of the subsequent stage. Finally, Fig. 2.2(d) eliminates the source follower from the feedback loop to allow a greater drop across [7]. It is possible to choose the pole at node X in Figs. 2.2(c) or (d) so as to increase the bandwidth of the TIA. In fact, if the magnitude of this pole is equal to the TIA exhibits a slightly underdamped step response but a bandwidth of i.e., 40% greater than that for an ideal core amplifier.

TIAs and Limiters

15

16

2.

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Limiters

The voltage swing produced by TIAs at the minimum light level is usually inadequate to drive the CDR circuit, necessitating further amplification. Used to boost the binary swings, limiters typically consist of a cascade of differential pairs with enough bandwidth and a relatively linear phase response so as to amplify the signal with negligible ISI. The high small-signal gain requires low-frequency negative feedback to prohibit the offset voltages of the differential pairs from saturating the latter stages. Interestingly, limiter design must cope with difficulties at both the low corner and the high corner of the passband. Consider the limiter topology shown in Fig. 2.3, where the feedback network suppresses the offset of the last three stages. Since some optical standards require

that the low end of the passband fall around a few tens of kilohertz, the values of and must be very large. More specifically, with a small-signal gain of A per stage, the low corner frequency is given by demanding an product on the order of 1 ms if A is around 5. For this reason, the capacitors are usually placed off chip, raising the number of package pins and also the possibility of crosstalk from other bond wires. New circuit topologies may resolve this issue. At the upper end of the passband, high-speed amplification techniques must provide a well-behaved magnitude and phase response for both small and large signals. Shown in Fig. 2.4, configurations such as the Cherry-Hooper amplifier [8, 9] and the Gilbert gain cell [10] have been used but their utility becomes more limited as the supply voltage falls. In particular, the voltage drops across and in Fig. 2.4(a) and

TIAs and Limiters

17

the cascode in Fig. 2.4(b) both constrain the voltage headroom and mandate level-shift circuits between the stages. An attractive solution for low-voltage broadband amplifiers is inductive peaking. Owing to the extensive work on monolithic inductors in RF design, this method can now be realized with accurate modeling and prediction of the performance in optical communication circuits as well. Interestingly, inductor quality factors (Q’s) as low as 3 to 4 prove ade-

18

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

quate for increasing the bandwidth, allowing the use of simple, compact spiral structures. Figure 2.5(a) shows a limiting stage incorporating inductive peaking. It can be shown that an ideal inductor increases the bandwidth by ap-

proximately 82% if a 7.5% overshoot in the step response is acceptable. With the finite Q and parasitic capacitance of the inductors included, the enhancement is around 50%, still quite a significant factor. An interesting difficulty in modeling the inductors in the above circuit arises from the narrowband nature of the definition of the Q, an issue rarely encountered in RF design. Figure 2.5(b) depicts a rough model where yields the correct Q at about 3/4 of the –3-dB bandwidth. The approximation is reasonable because the inductor manifests itself only near the high end of the band. Alternatively, a more complete model such as that in Fig. 2.5(c) can be used. Here, denotes the effective series resistance, and represent the resistance seen by the electric coupling to the substrate, models the resistance seen by

19

TIAs and Limiters

the magnetic coupling to the substrate, and the capacitors approximate the parasitic capacitances. While the values of some of the components in this model do vary with frequency, the overall model can be fitted to measured data over a broader range than the parallel tank of Fig. 2.5(b) can. The high gain provided by several stages in a limiter may lead to oscillation or at least considerable peaking and ISI. Illustrated in Fig. 2.6, this phenomenon occurs if the mismatches in the differential stages

create both substantial current switching from the supply and a finite supply rejection, allowing a component to travel from the output stage through the supply and back to the input stage. With a finite bond wire inductance, the gain around the loop may exceed unity, leading to high-frequency oscillation. The issue of course becomes much more severe if a single-ended TIA shares the same supply lines with the limiter. For this reason, separate supply lines, careful bypassing, symmetric layout, and accurate package modeling are essential.

Notes 1 The input-referred noise current of

is neglected for simplicity.

Chapter 3 CLOCK AND DATA RECOVERY ARCHITECTURES

If a single pulse is to be detected in the presence of additive noise and intersymbol interference (ISI), the signal-to-noise ratio (SNR) is dependent on the choice of the sampling instance. If sampling is synchronized such that the peak value of the pulse is sensed, the output SNR is high (Fig. 3.1).

Synchronized sampling requires two conditions to be simultaneously satisfied. First, a clock signal should be generated such that its frequency is equal to the data rate. Second, the clock signal should sample the

22

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

data at its peak point. Satisfaction of these two conditions is commonly referred to as the task of clock and data recovery. The clock and data recovery (CDR) architectures are categorized in two major groups, open-loop CDRs and phase-locking CDRs. We briefly describe the former in this chapter. However, the focus of this book is on the latter.

1.

Open-Loop CDR Architectures

The spectrum of an NRZ sequence does not carry a frequency tone at the data rate. However, the information about the frequency of the data can be extracted from the spacing between its transitions. These transitions appear as the rising and falling edges of the data signal. If a high-speed data sequence is passed through a differentiator, the resulting signal will carry positive and negative pulses for rising and falling edges of the clock signal, respectively. This differentiated signal does not provide a strong spectral line at the frequency of the data because the polarity of these pulses is random. As shown in Fig. 3.2, the randomness of these pulses can be cancelled by passing this signal through a rectifier. The resulting signal can be decomposed into a periodic waveform with a fundamen-

tal frequency equal to the data rate, and a random, transition-dependent signal with a zero dc average value [11].

Clock and Data Recovery Architectures

23

Differentiation and rectification of a random sequence with finite rise and fall times is equivalent to edge detection of the signal. Edge detection is performed by a logical XOR operation. Figure 3.3 depicts the structure of an edge detector that consists of an XOR gate operating on the data and its delayed replica. Theoretical derivations indicate that the highest degree of harmonic suppression can be achieved if these two waveforms are spaced within half a bit period from each other.

The clock signal can be recovered from the edge-detected waveform by passing the signal through a band-pass filter tuned to the clock frequency. Shown in Fig. 3.4, the recovered clock signal is fed to a phase aligner to ensure that the output clock signal samples the data at its optimum point in the decision circuit.

In order to reduce jitter on the recovered clock signal, the filter should have a very high selectivity to suppress the unwanted data-dependent signal that results in amplitude and phase modulation. Integration of highly selective band-pass filters operating at very high frequencies is not practical using available fabrication processes. This limitation calls for the use of external components such as SAW filters. These filters, however, suffer from high loss and a relatively low speed of operation that limits their applicability to 10-Gb/s operation.

2.

Phase-Locking CDR Architectures

A second approach to clock and data recovery is by synchronizing the random data to a clock signal generated by a voltage-controlled oscillator

24

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

(VCO) in a phase-locked loop (PLL). The idea is that during each data transition, the location of the data transition with respect to the clock edge is detected. If the data leads the clock, the clock is sped up. If the data lags the clock, the clock is slowed down. If the zero crossings of the data and the clock coincide, the clock frequency is kept constant to ensure phase lock. Figure 3.5 shows a generic CDR circuit. The VCO generates a clock signal. The phase and the frequency of this signal is compared to that of

the incoming data in the phase detector, generating an error signal that is passed through the charge pump and the low-pass filter to set the voltage required by the VCO to oscillate at the frequency of interest. Phase locking of the clock to the data means that their phases are different by a small constant offset. This means that the derivative of their phases their frequencies - are identical. The generated clock signal is also used to retime the data in the decision circuit. As the incoming data is regenerated in this block, its additive noise and ISI is suppressed while the amplitude is significantly magnified. Some of the design issues of the CDR circuits are mentioned in the following: Speed: The throughput of a high-speed receiver is determined by the maximum operating rate of the CDR circuit. The CDR circuit consists of blocks such as phase detectors and digital latches utilizing positive feedback for regeneration at high speed. As the data rate increases, the regeneration time of the latches becomes comparable to the data period, thus limiting the maximum operating rate of a latch [12]. Another critical issue is sustaining the integrity of the clock signal generated by the VCO, which operates at very high frequencies. Therefore, the available low-cost fabrication processes such as CMOS technology can marginally handle data rates as high as 10 Gb/s. This limitation can be overcome by innovations at both the architecture and the circuit levels. Later in this book, a number of approaches for increasing the speed of the system are described.

Clock and Data Recovery Architectures

25

Jitter: Jitter can be interpreted as the random perturbations of the zero crossings of a signal with respect to a reference point. Jitter can be measured as the rms value of the difference of the interval between the two consecutive zero crossings of the signal and a constant time period (cycle jitter), or the rms value of the difference of two consecutive samples of such intervals (cycle-to-cycle jitter). These two values are shown to have a close dependence [13]. To define a measure for the purity of the signals in a transceiver, the SONET standard specifies three measures for the maximum allowable jitter in the system [2]: Jitter generation is a specification for the maximum allowable jitter generated by the system, mainly because of the electronic noise of the VCO, and the ripple on its control line. This specification concerns the closed-loop behavior of the system, since the PLL provides some degree of VCO noise cancellation. The SONET OC-192 standard specifies 10 ps as the maximum peak-to-peak jitter on the clock and data signals in the phase-locked condition. Other standards define a limit for the rms jitter as well. However, the rms jitter requirements for the OC-192 are smaller than the precision of most measuring equipment. Therefore, no rms requirement is defined for the OC-192 standard. Jitter transfer deals with the closed-loop transfer function of the phase-locked system (Fig. 3.6). It is a measure for the suppression of the input jitter through the CDR circuit. The system requirements define the 3-dB bandwidth and the peaking in the transfer function. The loop

bandwidth is chosen as a tradeoff between external jitter suppression on one hand, and internal jitter suppression, capture range, and acquisition time on the other. The loop acts as a low-pass filter for the external jitter entering the system. Input jitter with a frequency higher than the loop bandwidth is significantly suppressed in this system. Therefore, reduction of the loop bandwidth highly suppresses the jitter that enters the system on the input. At the same time, the loop acts as a high-pass filter for the jitter generated by the VCO. The suppression of the open-

26

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

loop VCO jitter is significant within a frequency offset close to the loop bandwidth with respect to the center frequency. Therefore, a wider loop bandwidth removes a larger portion of the VCO jitter. The capture range of an unaided CDR circuit is close to the loop bandwidth. However, this limitation can be overcome by means of a frequency acquisition scheme. The acquisition time becomes shorter for a larger loop bandwidth. Since the nominal loop bandwidth is defined by the standard, an adaptive bandwidth mechanism can be employed to speed up the acquisition. Within the startup of the circuit, the loop bandwidth is increased to provide faster acquisition. As the circuit acquires lock, the bandwidth is set back to the nominal value, suppressing the jitter entering the system. A fiber link consists of many cascaded regenerators. Jitter can be accumulated on the link as the smaller jitter peaking of regenerators is added up to a large sum. To alleviate the difficulty, the jitter peaking of each regenerator should be kept below 0.1 dB. Jitter tolerance is a measure of the ability of the CDR circuit to track a jittered input data signal. Jitter on the input signal can be considered as phase modulation. The CDR must provide a clock signal that tracks this phase modulation in order to accurately retime jittered data. The jitter tolerance is defined as a mask that relates the maximum amount of phase modulation that can be corrected by the loop to the frequency offset with respect to the data rate (Fig. 3.7). Power Dissipation: Until recently, low power dissipation was not considered a critical requirement for optical transceivers. One reason was that in contrast to handheld wireless transceivers, optical transceivers do not run from a battery. Another reason was that high-quality transceivers could only be integrated in power-hungry III-IV processes. Development of bipolar technologies along with introduction of deep sub-micron CMOS processes has allowed circuit designers to build systems with significantly reduced power consumption. This aspect becomes more attractive when a number of transceivers are placed on a single chip in order to increase the operating rate of the system. The power dissipation and the integrability of the circuit in VLSI technologies determine the number of transceivers that can be placed on one chip. Lower power dissipation also eases packaging and eliminates heat sinking issues. Supply Scaling: Supply scaling has been a distinguished feature of the trend towards deep sub-micron scaling in CMOS processes. While resulting in reduced power consumption, the supply scaling limits the choice of circuit topologies for high-speed applications. In a

Clock and Data Recovery Architectures

27

CMOS technology running from a 1.8-V supply, circuit structures that require stacking of more than three devices must be discarded. Supply scaling also leads to a larger VCO gain for high-frequency oscillators. This is because a smaller voltage range must sweep the VCO frequency across the range of interest. If the VCO is used in a closedloop application such as a CDR circuit, the higher VCO gain translates the noise on the control line of the VCO into a larger output jitter. In chapter 5 we will describe a technique to reduce the VCO gain in a CMOS process. Fabrication Technology: As CMOS technology continues to benefit from both scaling and enormous momentum of the digital market, many high-speed integrated circuits that were once considered the exclusive domain of III-IV or silicon bipolar technologies are likely to appear as CMOS implementations. However, issues such as technology development costs, computer-aided design (CAD) infrastructure, and fabrication turnaround time make it desirable to use a single mainstream digital CMOS process for all IC products.

2.1.

Full-Rate and Half-Rate Architectures

Phase-locking CDR architectures can be categorized into two major groups, full-rate and half-rate. In a full-rate circuit the location of the data transition is compared to the falling (or rising) edge of the clock.

28

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

This comparison can be performed if the clock frequency equals the data rate (Fig. 3.8(a)). As a result, retiming of the data can be performed using flipflops that operate either on rising edge or falling edge of the clock signal.

In a half-rate circuit, the location of the data transitions is compared to that of both the rising and falling edges of clock (Fig. 3.8(b)). This results in a clock frequency equal to one half of the data rate. In this scenario, retiming of the data should be performed using flipflops that operate on both the rising and falling edges of the clock. The biggest advantage of a half-rate approach is the reduction of the clocking frequency by a factor of two. Simulations indicate that circuits operating at a lower speed consume less power. In fact, as the speed of operation reaches the maximum operating frequency of a particular technology, the required power consumption grows exponentially. Another advantage of half-rate architectures is the reduction in complexity if the CDR circuit is followed by a demultiplexer, i.e., if the circuit is not required to generate a full-rate output. Since the half-rate clock samples every other bit on its rising or falling edges, the first level of demultiplexing is automatically performed. This technique also saves on hardware and power consumption since a frequency divider, which reduces the clock frequency by a factor of two, can be eliminated. A major concern in employing a half-rate clock, however, is the duty cycle mismatch if the system is designed to generate a full-rate output in the receiver or the transmitter. Since the spacing between the rising and

Clock and Data Recovery Architectures

29

falling edges of the clock signal is different from half the clock period, the width of the data eye sampled by the rising edge is different from that sampled by the falling edge, resulting in bimodal jitter (Fig. 3.9).

The focus of the work presented in this book is the design of systems employing half-rate architectures. Although the CMOS technology used here performs marginally in a full-rate system, the resulting reduction of power consumption makes the half-rate approach a strong candidate. Furthermore, since the scaling of CMOS processes cannot keep up with the growing demand for systems operating at higher data rates, half-rate approaches are becoming more attractive for high-speed design in near future. In chapters 5 and 6 we describe two half-rate CDR circuits. However, in the remainder of this chapter after a general review of the CDR circuit’s building blocks, some of the existing full-rate architectures are addressed by describing their phase and frequency detectors.

2.2.

Oscillators

As an integral part of phase-locked loops, oscillators are used for clock generation in these systems. The design of the VCO directly impacts the jitter performance and the reproducibility of the CDR circuit. While LC topologies achieve a potentially lower jitter and higher center frequency, their limited tuning range makes it difficult to obtain a target frequency without design and fabrication iterations. 2.2.1 General Theory An oscillator can be modeled as an amplifier in a unity-gain negative feedback system (Fig. 3.10) where:

30

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

If the amplifier itself experiences so much phase shift at high frequencies that the overall feedback becomes positive, an oscillation may occur. If this happens at a frequency of the circuit amplifies its own noise components at indefinitely [14]. The above statement can be formulized as follows: Oscillation at the frequency can occur if these two conditions are simultaneously satisfied.

2.2.2 Ring Oscillators A ring oscillator is formed by placing a number of gain stages in a loop. The poles of these stages should introduce a 180° phase shift required for oscillation [15]. Oscillation cannot occur if a single stage is placed in a unity-gain loop. This is because a maximum frequency-dependent phase shift of 90° can be provided by the single pole of the open-loop circuit. If two stages are used in a loop, the loop contains two poles. Since each stage introduces 90° of frequency-dependent phase shift, the overall phase shift can reach 180°, but at a frequency of infinity. However, the loop gain vanishes at very high frequencies. Therefore, the requirements of oscillation are not simultaneously satisfied. To achieve a greater phase shift around the loop, a third inverting stage can be added to the loop. The phase shift can therefore reach 180° where the loop gain is still greater than or equal to unity.

Clock and Data Recovery Architectures

31

As described in [15], if the transfer function of each stage is denoted as – then the loop gain is given as:

In order to achieve the frequency of oscillation and the minimum required gain per stage, we let the frequency-dependent phase shift and the magnitude of the loop gain at the oscillation frequency equal 180° and unity respectively. It follows that and where is the 3-dB bandwidth, and is the dc gain of each stage. These values are used for choosing the proper scaling for the active and passive elements forming the stages. The dc gain of each stage should be reduced to the minimum value required for oscillation in order to reduce the output jitter. In practice, as the oscillation amplitude grows, the stages in the signal path experience nonlinearity and eventually saturation, causing the signal amplitude to be clipped at two boundaries. If the small-signal loop gain is greater than unity, the time allocation between the linear and nonlinear modes of operation in the circuit is such that the average loop gain is still equal to unity. If the time spacing between the zero crossings of the input and the output signals of each stage is the consecutive nodal voltages track each other within a time period of yielding a period of The oscillation frequency can be derived using both the small-signal and the large-signal circuit analysis. Using the equation (3.4), the smallsignal frequency can be given as while the large-signal value is These two values are not equal. The small-signal frequency depends on the output time constant of each stage, primarily given by the resistance and the capacitance of each stage. results from the large-signal slew rate of each stage that is related to its nonlinear current drive and capacitances. As a result, the oscillation begins with the smallsignal frequency, but as the amplitude grows and the circuit becomes nonlinear, the frequency shifts to the large-signal frequency which is a larger value. The 3-dB bandwidth of each stage is mostly determined by the load resistance and total parasitic capacitance at the output node. To increase the oscillation frequency of the ring oscillator, the load resistance should be reduced. However, in order to keep the output voltage swing constant, the bias current and hence the power consumption of the circuit should be increased.

32

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Tuning. In a ring oscillator that consists of n stages, the frequency of oscillation is where is the delay through each stage. In order to change the frequency of oscillation, either the effective number of stages or the delay of each stage must be altered. Figure 3.12 depicts a conceptual illustration of this technique. This approach, mostly

referred to as “delay interpolation”, consists of a fast path and a slow path in parallel [16]. The total delay is adjusted by increasing the gain of one path and decreasing that of the other, differentially. The total delay is hence a weighted sum of the delays of the two paths. Timing Jitter. In the CDR circuit, the main source of timing jitter is the inherent thermal and shot noise of the active and passive devices that make up each delay stage of the VCO. 1/f noise is usually not of practical importance since it is rejected by the loop filter. Therefore, minimizing the impacts of thermal and shot noise in the basic delay stage becomes the key to attaining low timing jitter. It can be shown that the thermal jitter improves with the square root of power consumption [17]. To design for low jitter, the overdrive voltage of the devices used in the delay stage should be maximized. For a fixed delay and fixed current, the small-signal gain of each stage should also be minimized. However, this gain must be large enough for oscillation to occur. 2.2.3 LC Oscillators Monolithic LC oscillators are formed by a resonant tank that consists of a spiral inductor (L) and a variable capacitor (C) that resonate at a frequency of The inductors and capacitors suffer from having a resistive component. This component is mostly dominated by the resistance of the metal wire used in the inductor and Eddy current and displacement loss

Clock and Data Recovery Architectures

33

through the substrate. Figure 3.13 depicts the substrate loss versus its sheet resistance. The Eddy loss reduces as the sheet resistance of the

substrate increases. As a result, in bulk processes with high sheet resistance the loss is dominated by displacement. The displacement loss is maximized when the substrate’s resistive impedance becomes comparable with the oxide’s capacitive impedance [18]. The tank can therefore be modeled as a parallel combination of an inductor, a capacitor, and a resistor (Fig. 3.14(a)). Because of this resistive element, the tank cannot sustain oscillation indefinitely,

if it is stimulated by a current impulse. However, if a negative resistance is placed in parallel with the tank, the combination can oscillate (Fig. 3.14(b)). This is the main idea behind the operation of the crosscoupled LC oscillators.

34

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

It can be shown that the resistance seen between the two drains of a cross-coupled differential pair equals Therefore, for the oscillation to occur, it is necessary that (Fig. 3.15).

The resulting LC oscillators can achieve a very high center frequency. This frequency can be as high as 25.9 GHz in CMOS process [19]. This is in contrast to ring oscillators, where the frequency is limited to the minimum delay per stage and the minimum required number of stages. Tuning. Only the inductor and capacitor values can be varied to tune the frequency of an LC oscillator. Other parameters such as bias currents and transistor transconductances have a negligible effect on the oscillation frequency. Since it is difficult to vary the value of the monolithic inductor, the tank’s capacitance can be changed for tuning. The amount of tuning achieved is reduced as the supply voltage gets lower. Also, to maximize the tuning range, constant capacitances in the tank must be minimized. The variable capacitor can be formed using either pn junctions or MOS varactors. The former is formed by diffusing doping in an Nwell (Fig. 3.16(a)), whereas the latter is formed by placing an NMOS device in an N-well (Fig. 3.16(b)). Phase noise. The phase noise is defined as a small random excess phase, representing variations in the period of a sinusoidal signal. The phase noise of LC oscillators usually depends on the quality of the tank (Q). The higher the Q, the sharper the resonance and the lower the phase noise skirts.

Clock and Data Recovery Architectures

35

Oscillation phase noise is generated primarily through two mechanisms, distinguished by the path into which the noise is injected. Noise in the signal path is shaped by the oscillator and generates the phase noise skirts, whereas the noise in the control path is translated to the region around the carrier. When a VCO is placed inside a CDR circuit, its phase noise characteristic is shaped by the loop. Phase noise within the loop bandwidth is suppressed, while the out-of-band noise remains unattenuated. The phase noise of the oscillator due to thermal noise was formulated in [20]. Recently, research performed in UCLA led to a clearer representation of the formula, based on the physical parameters of the oscillator [21]. Thermal noise can be injected into the signal path from either the tank, the tail current source, or the cross-coupled differential pair. The current noise injected by the resistor, representing the loss of the tank, directly contributes to the output phase noise. The noise of the current source affects the oscillator through more complicated mechanisms. As a result of the switching of the cross-coupled differential pair in the oscillator, noise is up or down converted in frequency. Low-frequency noise of the current source is up converted to the vicinity of the oscillator carrier frequency. Also, noise of the current source at twice the carrier frequency is down converted to the oscillator output frequency. The up and down conversions affect the oscillator phase noise through different mechanisms. Down conversion directly contributes to phase noise that is proportional to the noise factor of the devices. Up conversion on the other hand manifests itself as perturbations on the amplitude of the output signal. If the oscillator uses varactors as a means of tuning, amplitude variations can modulate the varactor capacitance and result

36

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

in phase noise. Known as AM-to-FM modulation, this effect becomes more significant when the gain of VCO is large, meaning that a change in the voltage across the varactor is translated into a larger amount of capacitance modulation. Noise of the differential pair is sampled during a time window when the switching occurs. The phase noise introduced by the pair is a constant, given by the noise bandwidth product of the devices. This indicates that the phase noise introduced by the pair is independent of the device sizes. Multi-Phase LC Oscillators. The architecture of a clock and data recovery circuit determines the number of clock phases required for the implementation of the system. This number is primarily determined by the ratio of the clock signal to the data rate. Furthermore, systems incorporating referenceless frequency detection require a higher number of clock phases. As an example, a minimum of two data samples must be obtained to derive the phase relationship between the clock and the data signals in a binary phase detector - one from the data transition instant and one from the previous bit. A single phase of a full-rate clock is sufficient to obtain these two samples in two flipflops operating on the opposite edges of the clock. However, if the circuit is designed to operate with a half-rate clock signal, two quadrature phases of the clock are required to obtain the same samples. The requirement for using several phases of the clock signal necessitates the use of a VCO, capable of producing multiple equally-spaced phases. In a ring oscillator, consisting of a number of stages in a loop, multiple phases can be taken from the output of the different stages. The number of the stages should be chosen to be equal to the number of required phases or an integer multiple of it. As the frequency of oscillation increases, the number of stages in the loop should be reduced. The ring oscillators, therefore, fail to operate reliably in modern high-speed systems. The design of LC oscillators capable of generating quadrature phases has gained a huge momentum in the recent years. The signal generated by these oscillators is relatively pure and their phase noise is only a few dB worse than the phase noise of a stand-alone LC oscillator at a similar frequency offset. A quadrature oscillator consists of two LC oscillators. The output of each oscillator is coupled to the input of the other one with a given coupling coefficient (k). Shown in Fig. 3.17 each VCO can be modeled as a unity-gain feedback system with an open-loop gain of H If both of the coupled oscillators resonate at the output phasors of the two

Clock and Data Recovery Architectures

37

oscillators (X and Y) must satisfy the following equations [22]:

The combination of these two equations indicates that The two signals are therefore 90° apart from each other. Figure 3.18 depicts a quadrature oscillator, implemented based on the above analysis [23]. This structure consists of two LC oscillators and two

differential pairs coupling the output of each oscillator to the input of the

38

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

other one. In this oscillator, tuning is performed by means of a control voltage The current flowing through and therefore the bias current of the oscillators change as varies. Variation of the bias current changes the junction capacitance and the oscillation frequency of the VCO. This circuit can be modified to alleviate significant variations of the bias current, required for tuning the oscillator. As shown in Fig. 3.19 a tail current source can be added to the circuit to maintain a constant

bias current. The common-mode voltage and hence the oscillation frequency can be varied by changing the on-resistance of [24]. As increases, enters saturation mode and the voltage at node P experiences a sudden change, resulting in nonlinearity in the VCO characteristic. A second transistor, driven by a source-followed version of can be added to the circuit to provide an effective resistance between P and the supply that smoothly varies with the control voltage. A different tuning mechanism for quadrature oscillators is by changing the coupling coefficient between the two oscillators [22]. Figure 3.20 depicts the structure of this oscillator. The output phasor at node A is determined by vector adding the phasor of the stand-alone oscillator and the phasor of the coupling differential pair. As the coupling coefficient increases, the magnitude of the coupling phasor that is 90° away from the stand-alone phasor increases. This indicates that the angle

Clock and Data Recovery Architectures

39

between the sum phasor and the stand-alone phasor increases, meaning that the quadrature oscillator resonates at a larger frequency offset with respect to the stand-alone oscillation frequency. As a result, the amount of coupling can be changed to cover a very wide tuning range. This value cannot be indefinitely reduced because the two oscillators will lose synchronization if the coupling is too small. Phase noise sets a limit on the maximum amount of coupling. As the frequency of oscillation deviates from the resonance frequency of the stand-alone oscillator, the Q of the tank at the frequency of oscillation reduces and the phase noise is degraded. The quadrature oscillator can also be used to generate a differential signal at twice the frequency [25]. As shown in Fig 3.21, the fully differential topology allows for the possibility of sensing the common-source nodes as the output at twice the frequency. The common-mode node must be followed by proper buffering stages to ensure reasonable swings. Another implementation of coupled oscillators is the circuit of [26]. Shown in Fig. 3.22, the circuit consists of a number of cross-coupled oscillators that are placed in a ring. The idea is to improve phase noise by providing a higher amount of noise filtering through several high Q tanks. If n oscillators are cascaded in a loop, the output noise filtering goes up by a factor of However, since the number of noise sources increases proportionally with n, it can be assumed that the output noise power density reduces by a factor of n. Meanwhile, the signal at the oscillation frequency is amplified by a factor of n and its power scales up with a factor of As a result, it can be assumed that the phase

40

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

noise of this oscillator is improved by compared to a standalone oscillator. The penalty for this improved phase noise performance is the higher power consumption and the larger chip area. In fact, if the power of a single oscillator is increased by n, a similar performance can be achieved from the oscillator. However, this circuit produces multiple phases, allowing the oscillator to be used as the clock generator for highspeed systems, replacing a ring oscillator with its superior performance. Furthermore, as the number of LC oscillators in the loop increases, the effects of mismatches on the performance of the oscillator becomes less pronounced because the oscillation frequency is determined by the average characteristics of the oscillators in the loop. Multi-stage LC oscillators can be modified to produce only differential phases. In the circuit of Fig. 3.23 [27], four single-ended common-source amplifiers with inductive loads are placed in a loop. The inductor resonates with the parasitic capacitances at the output node and the stage sustains a phase shift of 180° between its input and output. Since the number of stages is even, the overall phase shift around the loop is zero and therefore the circuit oscillates at a frequency where the tank has the maximum Q. The two pairs of differential signals taken from this oscillator are summed together to reduce the phase noise by 3 dB. As the operation frequency of the oscillators approaches of a CMOS process, the design of the oscillators satisfying the requirements for the

Clock and Data Recovery Architectures

41

tuning range and phase noise becomes very difficult. The quality of the inductors degrades as they operate at speeds close to their self-resonance frequency. Variable capacitors added at the output of the oscillator to provide tuning deteriorate the integrity of the output signal of oscillators. An alternative solution for implementation of oscillators at very high speeds is the distributed oscillator. The oscillator is formed by connecting the output of a distributed amplifier to its input. Design of these oscillators has significantly advanced in the recent years [28, 29] 2.2.4 PLL Jitter Calculation In design or measurement, it is often necessary to predict the output jitter of a PLL if the electronic noise in the VCO is the dominant source. We describe a simple approach that estimates the closed-loop jitter with reasonable accuracy. Using simulations or measurements, we first compute the relative phase noise of the free-running VCO due to the sources of white noise. The cycle-to-cycle jitter is then calculated from the phase noise with the

42

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

aid of the following equation:

where denotes the oscillation frequency and represents the relative phase noise power at an offset frequency of [13]. In the next step, we relate the jitter of the PLL to that of the freerunning VCO. It has been shown that the closed-loop jitter can be viewed as if the VCO jitter rises with the square-root of time and saturates at a time equal to the inverse of the loop bandwidth [30]. If the loop bandwidth is hertz, then the VCO produces a total of cycles in seconds. Thus, the total accumulated jitter due to the VCO is equal to 2.3. Phase Detectors In a CDR circuit the phase detector is the key element for providing the phase lock between the VCO clock signal and the input data sequence.

Clock and Data Recovery Architectures

43

The task of the phase detector is to provide information about the spacing between the zero crossings of the data and the clock. This information is used to set the control voltage of the VCO at a value required by the VCO to oscillate at the frequency of interest. When phase lock is achieved, this voltage stays constant and the phase detector output does not corrupt that. A commonly-used type of phase detector operating with periodic data is an XOR gate. As shown in Fig. 3.24, if the two sequences with a phase difference of are applied to the input of the XOR gate, the output will carry pulses as wide as

The dc value of the resulting signal is linearly proportional to the difference between the phases of the two input signals.

where is the gain of the phase detector, and is the input phase difference. Although this simple approach proves to be useful for applications where the two inputs have identical frequencies and different phases, it falls short in providing frequency error information as the two input frequencies start to grow apart from each other. The reason is that if the two frequencies are not equal, the detector generates a beat frequency with an average value of zero (Fig. 3.25). The beat signal can still provide efficient information about the phase and frequency difference if the two frequencies are slightly different. To improve the capture range of the phase detector, modern phase-locked systems incorporate additional means of frequency acquisition. A circuit that can detect both phase and frequency difference proves extremely useful because it significantly increases the acquisition range and lock speed of PLLs. The sequential phase/frequency detector (PFD)

44

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

proves to provide a large capture range for periodic waveforms [31]. Figure 3.26(a) shows the implementation of this circuit and the correspond-

ing waveforms when the two inputs have different frequencies and phases. If the frequency of input A is greater than that of input B, then the PFD

Clock and Data Recovery Architectures

45

produces positive pulses at while remains zero (Fig. 3.26(b)). Conversely, if positive pulses appear at while If then the circuit generates pulses at either or with a width equal to the phase difference between the two inputs (Fig. 3.26(c)). Thus the average value of is an indication of the frequency or phase difference between A and B. The sequential phase/frequency detector is a major block used for phase detection in frequency synthesizers and clock generators. Its compact and power-efficient structure makes it attractive for low-power applications. However, this circuit cannot be used to provide phase error information for random data, because in contrast to periodic data, a zero crossing at the end of each bit is not guaranteed. Consecutive ones and zeros are very likely to appear in a random sequence and automatically reduce the transition density of the signal. Binary data is usually transmitted in the “nonreturn-to-zero” (NRZ) format. Each bit has a duration of T and is equally likely to be one or zero. NRZ data has two properties that make the task of clock and data recovery difficult. First, the data may exhibit long sequences of consecutive ones and zeros. This means that in the absence of data transitions, the CDR circuit should not only continue to produce the clock, but also incur negligible drift in the clock frequency. Second, the spectrum of the NRZ data has nulls at frequencies that are integer multiples of the bit rate. Due to the lack of a spectral component at the bit rate in the NRZ format, a clock recovery circuit may lock to spurious signals or simply not lock at all. As we mentioned previously, the NRZ data usually undergoes a nonlinear operation at the front end of the circuit so as to create a frequency component at the bit rate. The phase detectors that operate with random data are categorized in two groups, linear and binary. In a linear phase detector, similar to the XOR gate for the periodic signal, the phase error signal has a linear relationship with the phase difference, falling to zero in the phase-lock condition. In a binary phase detector, a binary (early or late) signal is generated in response to arbitrarily small phase differences between the clock and data. 2.3.1 Linear Phase Detectors In a linear PD, like the one addressed in [32], phase error information is produced by taking the difference of two pulses, both of which are generated at any data transition. The width of one of the pulses is linearly proportional to the phase difference between the clock and data, whereas the other one has a constant pulsewidth. By using a differential

46

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

error signal, pattern dependency of phase error is cancelled since both pulses are present only when a data transition occurs. Figure 3.27 depicts the implementation of the Hogge phase detector. The input NRZ data ripples through two D flipflops. One of the flipflops

samples its input on the rising edge of the clock and the other one samples it at the falling edge. As shown in the waveforms, if the three signals, A, and are applied to the two XOR gates, the resulting signals will have the property of linear phase detectors. One will carry a pulse for every transition of the data with a width proportional to the phase difference between the clock and the data. The other one will have pulses as wide as half the clock period. An important feature of the Hogge phase detector is the automatic retiming of the incoming sequence. In the locked condition, the zero crossings of the clock signal appear in the middle of a bit. Meaning that the clock samples the bit at its optimum point.

Clock and Data Recovery Architectures

47

There is an important issue in the design of the Hogge linear phase detector. Among the three signals applied to the XORs, two of them ( and A) contain a clock-to-Q delay with respect to the clock edge. However, does not contain this delay. This systematic difference in the timing of these signals can cause a phase offset that could result in degradation of the quality of signal detection if its value is large compared to the data period. In practice, an additional delay element is placed on the path of the data signal to the input of the XOR such that a delay equal to the clock-to-Q delay of the latches is introduced on the signal path. Another problem of the Hogge phase detector is that the retiming delay through leads to a half-period skew between the pulses at Error and those at Reference. Consequently, even in lock, a charge pump and loop filter driven by Error and Reference produce a positive ramp while Error is high and a negative ramp while Reference is high. The control line of the VCO therefore experiences a triwave with a positive net area, disturbing the VCO on every data transition (Fig 3.28).

The Hogge phase detector can be modified to alleviate the residual error caused by the triwaves. Shown in Fig. 3.29, one extra latch and two additional XOR gates are added to the original Hogge PD [41]. The latches are driven by alternating phases of the clock signal and two cascaded latches form a flipflop. The outputs of the XOR gates control the charge pump. Any pulse generated at the first output, starts with a data edge and ends with a clock edge. Therefore it carries information about the phase difference between the data and clock. The other three outputs provide pulses as wide as one half of a clock period for every data transition as the signal ripples through the chain of latches. The resulting outputs control the charge pump in the following way: and control the first and second up-ramp, respectively. and

48

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

control the first and second down-ramp. This indicates that each phase measurement persists for two clock periods and charge pump activities provided by the four outputs cancel each other such that the triwave transient has a net area of zero. This effect significantly reduces the pattern-dependent jitter at the output of the CDR circuit. 2.3.2 Binary Phase Detectors In a binary phase detector, a binary error signal is generated in response to small phase differences between the clock and the data. This binary error signal determines whether the clock phase is “early” or “late” with respect to the data phase. An inherent characteristic of a binary phase detector is the continuous generation of early and late pulses at its output, while the clock edge

Clock and Data Recovery Architectures

49

repeatedly moves back and forth around the zero crossings of the data. This is in contrast to linear phase detectors because the output of the latter goes to zero in phase lock. This characteristic of binary phase detectors can inherently lead to a higher charge pump activity, possibly increasing the clock jitter. One of the most commonly-used binary phase detectors is the circuit presented by Alexander [33], in which the zero crossings of the data are measured as early or late events when compared with the transitions of the clock signal. Similar to the Hogge phase detector, the structure of the Alexander phase detector allows for automatic retiming of the data. During any particular clock interval, this phase detector provides three binary samples of the data signal: the previous bit (A), a sample of the current bit at the zero crossing (B), and the current bit (C) (Fig. 3.30(a)). Figures 3.30(b),(c) depict the value of these samples for the early and late clocks, respectively. The retimed data can be taken

from A or C. The output is usually taken from A so as to get an additional retiming of the data pulse and further improve the data eye. The location of the clock edge with respect to the data edge can be determined based on the following rules. If

clock is early.

50

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

If

clock is late.

If

no data transition has occurred.

Using the above observations, the three samples can be used to produce a phase error in a GDR circuit. The Early signal can be formed as and the Late signal is generated as The desired phase error can be obtained by subtracting the Early signal from the Late signal. To improve the performance of the phase detector, the three samples A, B, and C should all be available at the same time. For this reason, sample B is regenerated by the clock edge that produces the two samples A and C. A simple master-slave D flipflop can serve as an NRZ phase detector if its clock input is driven by the data stream and its D input senses the VCO output [Fig. 3.31(a)]. Called a “bang-bang” phase detector, this

topology exhibits a very nonlinear characteristic [Fig. 3.31(b)], applying large swings to the loop filter and possibly introducing substantial ripple on the oscillator control line.

Clock and Data Recovery Architectures

51

A critical drawback of this CDR architecture at high speeds results from the skews in and Since typical flipflops suffer from unequal data-to-output and clock-to-output delays, the loop locks such that the recovered clock and the input data sustain a finite, systematic phase offset, compensating for the delay difference. Illustrated in Fig. 3.31(c), the skews of and add, resulting in a significant deviation of the clock edge from the middle of the data bits. Another implementation of a full-rate binary system is the circuit presented by Pottbacker [34]. This circuit is a digital implementation of the quadricorrelator, providing a capture range of 15 %. The operation of the circuit is based on the bang-bang concept described above, with the difference that the PD operates on both the rising and falling edges of the clock. Similar to the previous circuit, its drawback is the lack of inherent data regeneration in its structure. As shown in Fig. 3.32(a) it

consists of two phase detectors and a frequency detector. Each phase detector is formed using a double-edge-triggered flipflop (DETFF) with

52

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

clock signal applied to its input and the data signal applied to its trigger. Figures 3.32(b) and (c) depict the waveforms for the two cases when the clock is early or late. Utilizing both the rising and falling edges increases the correction rate of the CDR circuit by a factor of two. This can eventually result in a smaller output jitter, because the VCO phase will be corrected at a higher rate. Since the bang-bang nature of this phase detector creates significant ripple on the control line in the locked condition and hence produces large jitter at the VCO output, the latches forming the flipflops can be replaced by sample-and-hold circuits to modify the binary characteristic of the phase detector into a more linear behavior. In [35], the phase detector is formed as a master-slave sample-and-hold circuit (Fig. 3.33). The rising data transitions sample the instantaneous value of the VCO output. The circuit thus generates an output that is linearly proportional to the phase difference in the vicinity of the lock point.

2.4. Frequency Detectors The fiber optic standards require operation at an exact data rate. Therefore, the oscillators should be guaranteed to oscillate at an exact frequency which equals the data rate or an integer fraction of it. The oscillators are designed with a large tuning range to account for the process and temperature variations. On the other hand, the CDR circuits provide a very narrow capture range. This range is primarily determined by two factors: loop bandwidth and phase detector topology. The loop bandwidth of the CDR circuit is defined by the standard and does not exceed a few megahertz. The linear phase detectors usually have a capture range of a fraction of one percent of the incoming data rate. This value can be as high as a few percents for a bang-bang phase detector. Therefore, the capture range of the CDR circuits is much smaller than the tuning range of the oscillator. For this reason, the CDR circuit is unlikely to acquire lock to

Clock and Data Recovery Architectures

53

the data when the circuit turns on and the VCO starts to oscillate at a frequency that is very different from the data rate. This limitation calls for an aided acquisition mechanism. Various frequency detection schemes have been introduced that operate with or without a reference signal. The idea is that as the circuit turns on, the frequency detector pushes the VCO frequency to a value close to the data rate. When the difference between the oscillation frequency and the data rate is small enough to fall in the capture range of the phase detector, the frequency detector is disabled and the phase detector takes over. Eventually in the phase-lock condition, the phases of clock and data signals are within a constant offset from each other, ensuring that the clock frequency equals the data rate. We describe a number of mechanisms for referenced and referenceless frequency acquisition. Similar to the phase detectors, the frequency detectors can operate with a full-rate or half-rate clock. We briefly review a number of full-rate frequency detection schemes in this section. In chapter 6, a new approach for half-rate frequency detection is described. 2.4.1 Referenced Frequency Detectors

A high-speed transceiver uses a reference clock in the transmitter to multiplex the low-speed sequences into a single high-speed signal. This reference clock can be used for frequency acquisition in the receiver that is built on the same chip. This signal is used in a frequency locked loop (FLL) that brings the VCO frequency close to the data rate. Two approaches utilizing this concept are described in this section, the first one captures lock while the frequency locking circuit is still running, whereas in the second one, the FLL is deactivated before the CDR takes over. In the circuit described in [36], an additional reference PLL is used for aided acquisition. As shown in Fig. 3.34, the reference PLL locks to a frequency that is N times higher than the frequency of the reference clock. Since both the reference and the signals are periodic, the phase/frequency detector of the reference PLL can be implemented as a simple block with a wide capture range. The two oscillators used in the reference PLL and the CDR circuit should be identical such that they produce the same output frequency for identical control voltages applied to their inputs. The control voltage of the VCO in the CDR circuit is decomposed into coarse and fine voltages. The control voltage generated by the reference PLL is routed to the CDR circuit as the coarse control voltage of the VCO. The fine control voltage is generated by the CDR circuit based on the comparison of the phase of the data to that of the clock signal.

54

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Because of this decomposition, the VCO gain from the fine control to the output can be very small. A smaller VCO gain translates the ripple on the control line into a smaller amount of output jitter. Meanwhile, the coarse control guarantees phase lock over a very wide frequency range. The two oscillators used in this circuit should be spaced far apart from each other. Otherwise, injection locking of the two VCOs can result in false lock. Theoretically, identical oscillators provide equal oscillation frequencies for similar control voltages. However, inherent mismatches between the two VCOs can be significant because they should be placed relatively apart from each other. For this reason, the CDR circuit should use a means of narrowband frequency detection to achieve phase lock for small frequency mismatches. On the other hand, the circuit described in [37] consists of a single VCO, a phase detector, and a frequency detector. The phase detector and the frequency detector are connected to the loop filter through a multiplexer (Fig. 3.35). When the circuit turns on, the multiplexer activates Loop I and the circuit locks to the reference clock. Then the multiplexer switches to the other mode and Loop II is activated. As the loop locks to the random data, the frequency detector is turned off, reducing the power consumption. The operating mode of the multiplexer is determined by a lock detector that measures the frequency difference between the reference clock and VCO frequency. It can be formed as

Clock and Data Recovery Architectures

55

a counter that counts the number of pulses on one of the signals when clocked by the other one.

2.4.2

Referenceless Frequency Detectors

Referenceless frequency detectors become attractive for various reasons. Elimination of an external reference signal helps further integration of the system. An external clock signal can degrade the performance of the circuit mostly through substrate coupling. Furthermore, implementation of referenced acquisition schemes requires more circuitry and a larger chip area. In this section, a number of referenceless schemes are discussed. These techniques are all based on the concept of the quadricorrelator that was originally described in [38]. Figure 3.36 depicts a simple quadricorrelator. The incoming passband signal is multiplied by the in-phase and quadrature signals produced by the oscillator to generate the corresponding in-phase and quadrature baseband components ( and ). The mixers produce the sum and the difference frequency products between the input signal and the local oscillator. Low-pass filters following the mixers suppress the sum and pass the difference frequency. The in-phase baseband component is differentiated and multiplied by the quadrature baseband component. If the input is a tone, the output of the quadricorrelator consists of a dc component proportional to the frequency difference between the input and the oscillator signals, and a ripple component at double the frequency difference. The dc component can be used as the error signal in the frequency tracking loop.

56

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

The quadricorrelator, however, has a limited capture range [39]. The frequency difference between an incoming signal and the oscillator must fall within the passband of the filter. A signal that is well outside of the passband will be significantly suppressed by the filter. Although this approach primarily deals with periodic incoming signals, it can be modified to work with random data as well. In [40] a quadricorrelator operating with random data is introduced. This circuit achieves a capture range of 12%. Since the operation of the quadri-

correlator depends on the existence of a tone at the data rate and the random data does not contain a spectral component at this frequency, edge detection should be performed. In this circuit, edge detection is performed by differentiating and rectifying the data. The circuit uses stacking to integrate the tasks of differentiation, rectification, mixing, and low-pass filtering in one block. This will result in substantial power reduction since the circuits reuse

Clock and Data Recovery Architectures

57

the same bias current. Stacking also yields a smaller chip area since routing between various stages can be eliminated. The quadricorrelator can also be implemented using digital elements to produce a binary error signal. The two examples are the rotational frequency detector [41], and the circuit introduced by Pottbacker [34]. The operation of the rotational frequency detector can be described as follows: In the presence of a frequency difference between the data and the clock, the phase relationship will change with time at a rate proportional to the frequency difference, producing a beat frequency. A circular phasor diagram of the oscillator signal can be used to express the concept (Fig. 3.38). The diagram is split in four quadrants, A, B, C, and D. For simplicity the phasor for the clock is assumed to be constant, serving as a reference, and the phasor for the data moves around the circle. The direction of this rotation determines whether the data rate is faster or slower than the clock frequency.

When the data frequency is lower than the data rate, the data phasor rotates counterclockwise. The direction of rotation can be distinguished by marking the two consecutive quadrants where the phasor is detected. For example as the phasor moves from B to C, the clock is found to be fast. A transition from the C to B quadrant denotes a slow clock. In the Pottbacker frequency detector, shown in Fig. 3.32, two beat frequencies equal to the difference of clock frequency and data rate are generated at the outputs and one of them leading the other one. The direction of the frequency difference can be determined from the relative spacing of these two signals. If leads clock is slow and if lags clock is fast. The relative spacing of the two signals is extracted using a DETFF in which is sampling CDR loops employing frequency detectors that operate with random data exhibit only a moderate capture range, not exceeding of the center frequency. This limitation can be explained with the aid of the characteristic plotted in Fig. 3.39 for the frequency detector of Fig. 3.32(a). We note that for a large difference between the data rate and the VCO frequency, the average output is close to zero, carrying

58

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

little information. Figure 3.40(a) shows a part of a dual-loop architecture that substantially increases the capture range [42]. The frequency detector used here is similar to that in Fig. 3.32(a). Here, a counter controlling the capacitor array sets the VCO frequency to the lowest value. Under this condition, is very negative and is close to zero. Thus, the two comparators generate logical zeros, the output of the OR gate remains low, and the counter continues to (slowly) count up until drops below This is an indication that has reached a reliable level. Now the two flipflops begin to save each state before the next count is carried out. The counter still continues to count until crosses zero and jumps from negative to positive. The two flipflops then record this change, disabling the counter and enabling the CDR loop (not shown in Fig. 3.40).

2.5.

Decision Circuits

The synchronized sampling of the peak value of a pulse results in a high SNR. However, we can take advantage of the random nature of the noise by performing averaging in one bit period. Shown in Fig. 3.41, is an example where the input pulse in integrated from 0 to T and the sampling occurs at The noise components that vary significantly in a period of T tend to average out [14]. This idea leads to the concept of matched filters. For a pulse that is corrupted by additive white noise, there exists an optimum filter that maximizes the SNR at the sampling instant. Matched filters are extensively used in low-speed communication systems. For practical issues, they have not been previously implemented in high-speed systems such as optical receivers. Before getting into the design of high-speed CDR circuits, we describe an approach for high-speed matched filtering in chapter 4. Then in

Clock and Data Recovery Architectures

59

chapters 5 and 6 we describe two implementations of 10-Gb/s CMOS CDR circuits, the former using a ring oscillator and a linear half-rate phase detector and the latter benefiting from a multi-phase LC oscillator and a binary half-rate phase/frequency detector.

60

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Chapter 4 A CMOS INTERFACE FOR DETECTION OF 1.2-GB / S RZ DATA

This chapter describes the design of an interface for cryogenic radar systems. This interface circuit incorporates an amplifier with 2-GHz bandwidth, interleaved matched filters, and a 1:2 demultiplexer. Fabricated in a CMOS technology, the interface achieves a sensitivity of while consuming 142 mW from a 3.3-V supply, and occupying an area of The low sensitivity, wide bandwidth, and small input-referred noise requirements of this circuit are similar to those of a fiber optic receiver. The solutions provided in this chapter become useful for the implementation of high-speed front-end circuits in an optical communication system.

1.

Introduction

This interface is used in a cryogenic radar system (Fig. 4.1). The received radar signal is converted to a digital bit stream by means of a Josephson junction analog-to-digital converter. The resulting output is a pseudo-differential return-to-zero (RZ) signal with a bit rate of 1.2 Gb/s and an amplitude of The interface must convert this serial data into 8 parallel streams each having a peak-to-peak amplitude of 1 V. The principal challenge in this design is the combination of high speed and low signal levels in a moderate technology such as CMOS process. To appreciate this challenge, some of the important issues in the design of the interface, which consists of a pre-amplifier and a decision circuit, are described. For bandwidth calculations, an RZ signal can be roughly considered as a nonreturn-to-zero (NRZ) signal with twice the bit rate. In order to suppress intersymbol interference (ISI) in a broadband system, the bandwidth should be at least equal to 70% of the bit rate, about

62

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

1.7 GHz in this system. Increasing the bandwidth beyond this point reduces the intersymbol interference but at the cost of higher power dissipation and, more importantly, higher total integrated thermal noise. In other words, there exists a trade-off between signal integrity and sensitivity. The bandwidth is chosen to be approximately equal to 2 GHz as a compromise between these two factors. To achieve acceptable ISI, the amplifier should have a transfer function with small ripples in magnitude and a linear phase across this bandwidth. To overcome the offset of the decision circuit and provide enough overdrive voltage, the amplifier must exhibit sufficient gain, on the order of 40 dB. Finally, to obtain a bit error rate (BER) of the input-referred noise density must be lower than This is an important concern because the bandwidth requirement does not allow a large gain in the first stage of this amplifier, making the noise of the following stages significant. Simulations indicate that it is difficult to simultaneously satisfy all of these requirements in a CMOS technology, even if power dissipation is not critical. As a result, a means of relaxing the trade-offs between these parameters must be introduced.

2.

Matched Filtering

First, we assume that a stream of rectangular pulses with amplitude A and period experiences additive noise and subsequently goes through a low-pass system with bandwidth (Fig. 4.2(a)).

A CMOS Interface for Detection of 1.2-Gb/s RZ Data

63

The signal-to-noise ratio (SNR) at the output of this system is equal to The value of the SNR ultimately determines the BER, based on the type of modulation. Also, is the noise bandwidth rather than the 3-dB bandwidth of the system. Next, the low-pass system is replaced by a matched filter, a circuit whose impulse response is similar to the input pulse shape but reversed in time and shifted by (Fig. 4.2(b)). In this case, the output SNR is given by [14]. To realize the advantage of matched filtering, the bandwidth of the first system is assumed to be roughly equal to the bit rate. This indicates that the SNR of the second system is twice that of the first system, which is an improvement of 3 dB. In reality, the noise bandwidth is typically higher than the bit rate and the improvement can be as high as a factor of that is, 5 dB. Matched filters are used extensively in low-speed communication systems, but this improvement makes them attractive for the gigahertz range as well. In the first step, the filter can be reduced to a more familiar form. For a square pulse, the matched filter can be implemented as an integrate-and-dump operation. As shown in Fig. 4.3, additive white noise in the system corrupts the input data. If the decision circuit samples the output of the integrator at the end of the bit period, this sample carries information about the signal not only at the sampling instant but also for the entire period. From another point of view, integration filters out high frequency components of noise, and the final level is somewhat cleaner than any single point on the original waveform.

64

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

At the end of the integration mode, the integrator must be quickly reset and integration of the next bit must begin. But this is difficult for two reasons. First, the final value must be held constant until the decision circuit has reliably sampled it. Second, the dump operation cannot be arbitrarily fast because larger devices required for quick discharge also add substantial capacitance to the circuit and lower the gain of the integrator. Fortunately, an idle zero exists between every two bits in an RZ sequence. This suggests that both hold and dump can be performed during this idle time (Fig. 4.4). However, the partitioning of this small period between these two operations is quite difficult, requiring additional clock edges that are sensitive to process and temperature. Furthermore, a total time of 416 ps is not sufficient for both. With these issues in mind, we can consider interleaving two matched filters to relax the timing constraints. Now each integrate-and-dump circuit operates in three phases: integrate, hold, and dump. As shown in Fig. 4.5, when one integrator enters the reset mode, the other begins to integrate. This allows one bit period for each of these operations. Furthermore, only quadrature phases of a clock with a frequency equal to half the bit rate are needed.

A CMOS Interface for Detection of 1.2-Gb/s RZ Data

3.

65

Architecture Figure 4.6 depicts the interface architecture [43]. The input signal is

66

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

applied to a low-noise amplifier consisting of seven stages, generating an output swing of approximately 200 mV. The signal is then processed by two matched filters and subsequently sampled by two decision circuits. The decision circuit is implemented as a master-slave D flipflop and it extracts the zeros and ones while converting the RZ data into an NRZ stream. This signal is passed to the buffers which are formed as opendrain differential pairs connected to the output pads. External quadrature clock signals at a frequency equal to half the bit rate of the input signal are used to maintain synchronization and provide the control commands in the circuit. Since the required clock frequency is half the bit rate, the quadrature phases can be easily generated by a divide-by-two circuit, but for this circuit they are provided externally.

A CMOS Interface for Detection of 1.2-Gb/s RZ Data

67

When one matched filter resets, the other integrates. Each decision circuit begins to sample at the end of the integration mode, producing a logical level at the output.

4. 4.1.

Building Blocks Low-Noise Wideband Amplifier

The wideband amplifier in this interface must boost the signal level with minimal ISI. This amplifier consists of 7 stages: The first is a common gate topology and the following six are common-source stages (Fig. 4.7(a)-(c)).

In order to achieve an overall bandwidth of 2 GHz, each stage must achieve a similar bandwidth. This is accomplished by incorporating inductive peaking in all stages. In addition, the common-source differential pairs also use capacitive/resistive degeneration to increase the bandwidth and reduce the ripple in the passband. The common-gate input stage provides two useful properties. First, the input impedance can be set to over a relatively wide frequency

68

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

band by proper choice of device dimensions and bias currents. Second, the noise figure of the stage is relatively independent of frequency and does not require tuning techniques. Figure 4.8 shows the simulation results for the amplifier frequency response. The 3-dB bandwidth is about 2 GHz. The fabricated pro-

totype of the amplifier exhibits a gain of 43 dB across a bandwidth of 2.1 GHz. The simulated results for the noise and the gain of the stages at 2 GHz are presented in Table 4.1. The voltage gain of the stages adds up to 45.7 dB. The noise of these stages adds up to an overall input-referred noise voltage of

The simulated input-referred noise voltage across the band is shown Fig. 4.9(a). The noise varies from 3.2 to This is an optimistic estimate of the noise because SPICE assumes an excess noise coefficient of 2/3 whereas for short-channel devices, it is quite larger. The output of the amplifier with RZ data is presented in Fig. 4.9(b). The major contributor to the output ISI is the kickback noise caused by

A CMOS Interface for Detection of 1.2-Gb/s RZ Data

69

the switches in the matched filter. The eye opening is about 70% and the jitter is about 40 ps. The signal slews with a slope of 1 V/ns. In order to integrate the inductors in a reasonable area, a stacked structure consisting of metal 2 and metal 3 has been used (Fig. 4.10). Since in this circuit the self-resonance frequency is more critical than the quality factor, the line width is only The values range from 11 nH to 17 nH.

4.2.

Integrate-and-Dump Circuit

The speed and input-referred offset issues require a simple and compact topology for both the integrator and its reset mechanism. In the

70

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

circuit of Fig. 4.11 (a), and convert the input voltage to current and the result flows through the total capacitance at nodes X and Y. Simulations indicate that the parasitic capacitance already available at nodes X and Y is sufficient to allow fast integration with considerable voltage swings. The dimensions of the input transistors are chosen as a compromise between input-referred offset (around 10 mV) and speed. Each matched filter is biased at a total current of 1 mA and provides a voltage gain of approximately two at the end of the integrate phase. As shown in Fig. 4.11(b), the RZ bit stream amplified by the preamplifier is applied to the input and is integrated for one bit period. Switch which is an NMOS device, resets the integrator at the end of the integration phase. The dimensions of are chosen as a tradeoff between faster reset and less parasitic capacitance at the output. However, common-mode feedback and a hold phase should be added to this circuit. Shown in Fig. 4.12(a), the common-mode feedback consists of two relatively large resistors, and which are implemented by small PMOS devices. The hold mode is controlled by switch which disables the differential pair at the end of the integration mode and freezes the output voltage. In reality, the output does experience a small degradation (Fig. 4.12(b)). The dip seen in the hold mode results from the relatively large capacitance from the source of and to ground. Matched filtering along with interleaving provides a hold period, making the sampling less susceptible to jitter. Proper choice of device dimensions leads to a small input offset for the matched filter. However,

A CMOS Interface for Detection of 1.2-Gb/s RZ Data

71

the devices cannot be very large because of the limitations on speed and loading on the previous stage. The output voltage of the matched filter drops during the hold mode. To alleviate this problem, the decision circuit starts to sample the output of the matched filter at the start of the hold mode rather than at the end of it. The non-square waveform of the input slightly degrades the improvement in SNR because the corresponding matched filter is not exactly an integrator.

4.3.

Demultiplexer

The matched filter is followed by a master-slave D flipflop (Fig. 4.13. To achieve short set-up and hold times, the flipflops use current steering with 2 V of differential swing. Each latch consists of a pre-amplifier that senses the amplitude during half the period and a regenerative circuit

72

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

A CMOS Interface for Detection of 1.2-Gb/s RZ Data

73

that boosts the level of the signal. Each latch uses a bias current of 1 mA and load resistors of

4.4.

Clock Buffer

The external clock is applied to the system in the form of sinusoids with small amplitudes. The clock buffer (Fig. 4.14) converts this signal to the sharp, rail-to-rail edges required for the switches in the matched

filters. The inverters are sized to drive the load capacitance with short rise and fall times. Since the I and Q clock signals experience different loading, these inverters are large enough to maintain reasonable matching in the interface environment. In this design, rail-to-rail swings are preferred because of their capability to switch nodes with different dc levels. Resistor serves as a termination. This resistor also equalizes the mismatch between the common-mode levels and also the phase of the signals applied to the input of the two gates.

74

5.

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Experimental Results

The interface has been fabricated in a CMOS technology. Figure 4.15 depicts the die photograph. The circuit occupies an area of The circuit was tested with a 3.3-V supply in a chip-on-board assembly.

The eye diagram of the output of one of the channels is depicted in Fig. 4.16, with 1.2-Gb/s RZ data, both channels have a relatively open eye diagram. The measurement of the BER at these frequencies has been quite difficult because the input is RZ and the output is NRZ. The amplifier achieves a gain of 43 dB over a bandwidth of 2.1 GHz. A total power of 142 mW is dissipated in this interface, of which 80 mW is consumed by the pre-amplifier and 4 mW is consumed by each matched filter. It can be realized that the power penalty for the matched filters is very small.

6.

Conclusion

The concepts introduced in this work can be applied to other applications as well. For example, in a fiber optic receiver (Fig. 4.17), the current generated by the photo detector is amplified by a transimpedance amplifier. We can then interpose interleaved matched filters between the amplifier and the decision circuits to improve the SNR. Since the power consumption and complexity of the matched filters can be quite low, the boost in performance is obtained at minimal cost. A CMOS interface for detection of 1.2-Gb/s RZ data incorporates wideband amplification, matched filtering, and demultiplexing. Lownoise amplifiers with bandwidths exceeding 2 GHz can be implemented

A CMOS Interface for Detection of 1.2-Gb/s RZ Data

75

in CMOS process using inductors. These inductors can be integrated without any process modifications. Matched filters improve the overall SNR by approximately 3 dB. Matched filtering can be performed on high-speed data using a simple and compact topology.

Chapter 5 A 10-GB / S LINEAR HALF-RATE CMOS CDR CIRCUIT

This chapter describes the design and experimental results of a 10Gb/s CMOS phase-locked clock and data recovery circuit. The circuit incorporates an interpolating voltage-controlled oscillator and a half-rate phase detector. The phase detector provides a linear characteristic while retiming and demultiplexing the data with no systematic phase offset. Fabricated in a CMOS technology in an area of the circuit exhibits an rms jitter of 1 ps, and a peak-to-peak jitter of 14.5 ps in the recovered clock and a bit error rate of with random data input of length The power dissipation is 72 mW from a 2.5-V supply. The next section describes the CDR architecture and its design issues. The following sections present the design of the building blocks and the description of the experimental results.

1.

Architecture

The choice of the CDR architecture is primarily determined by the speed and supply voltage limitations of the technology as well as the power dissipation and jitter requirements of the system. In a generic CDR circuit, shown in Fig. 5.1, the phase detector compares the phase of the incoming data to the phase of the clock generated by the voltage-controlled oscillator (VCO), producing an error that is proportional to the phase difference between its two inputs. The error is then applied to a charge pump and a low-pass filter so as to generate the oscillator control voltage. The clock signal also drives a decision circuit, thereby retiming the data and reducing its jitter. If attempted in a CMOS technology, the architecture of Fig. 5.1 poses severe difficulties for 10-Gb/s operation. Although ex-

78

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

ploiting aggressive device scaling, the CMOS process used in this work provides marginal performance for such speeds. For example, even simple digital latches or three-stage ring oscillators fail to operate reliably at these rates. These issues make it desirable to employ a “half-rate” CDR architecture, where the VCO runs at a frequency equal to half of the input data rate. The concept of half-rate clock has been used in [44]-[47]. However, [44] and [45] incorporate a bang-bang phase detector (PD), possibly creating large ripple on the control line of the oscillator and hence high jitter. The circuit reported in [46] inherently has a smaller output jitter as a result of using a linear phase detector, but it fails to operate at speeds above 6 Gb/s in CMOS technology. The circuit of [47] benefits from a new linear phase detection scheme, but it may not operate properly with certain data patterns. Another critical issue in the architecture of Fig. 5.1 relates to the inherently unequal propagation delays for the two inputs of the phase detector: Most phase detectors that operate properly with random data (e.g., a D flipflop) are asymmetric with respect to the data and clock inputs, thereby introducing a systematic skew between the two in phaselock condition. Since it is difficult to replicate this skew in the decision circuit, the generic CDR architecture suffers from a limited phase margin - unless the raw speed of the technology is much higher than the data rate. The problem of the skew demands that phase detection and data regeneration occur in the same circuit such that the clock still samples the data at the midpoint of each bit even in the presence of a finite skew. For example the Hogge PD [32] automatically sets the clock phase to the optimum point in the data eye (but it fails to operate properly with a half-rate clock). The above considerations lead to the CDR architecture shown in Fig. 5.2. Here, a half-rate phase detector produces an error proportional to the phase difference between the 10-Gb/s data stream and the 5-GHz output of the VCO. Furthermore, the PD automatically retimes and demultiplexes the data, generating two 5-Gb/s sequences and

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

79

Although the focus of this work is point-to-point communications, a fullrate retimed output, is also generated to produce flexibility in testing and exercise the ultimate speed of the technology. The VCO has both fine and coarse control lines, the latter allowing inclusion of a frequency-locked loop in future implementations. In this chapter, a new approach to performing linear phase detection using a half-rate clock is described. Owing to its simplicity, this technique achieves both a high speed and low power dissipation while minimizing the ripple on the oscillator control voltage. It is interesting to note that half-rate architectures do suffer from one drawback: the deviation of the clock duty cycle from 50% translates to bimodal jitter. As depicted in Fig. 5.3, since both clock edges sample the data waveform, the clock duty cycle distortion pushes both edges away from the midpoint of the bits. Typical duty cycle correction techniques used at lower speeds are difficult to apply here as they suffer from sig-

80

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

nificant dynamic mismatches themselves. Thus special attention is paid to the symmetry in the layout to minimize bimodal jitter. Another important aspect of CDR design is the leakage of data transitions to the oscillator. In Fig. 5.2, such leakage arises from (1) capacitive feedthrough from to CK in the phase detector, (2) capacitive feedthroughfrom and to CK through the multiplexer, and (3) coupling of to the oscillator through the substrate. To minimize these effects, the VCO is followed by an isolation buffer and all of the building blocks incorporate fully differential topologies.

2. Building Blocks 2.1. VCO The design of the VCO directly impacts the jitter performance and the reproducibility of the CDR circuit. While LC topologies achieve a potentially lower jitter, their limited tuning range makes it difficult to obtain a target frequency without design and fabrication iterations. Since the circuit reported here was our first design in technology, a ring oscillator was chosen so as to provide a tuning range wide enough to encompass process and temperature variations. A three-stage differential ring oscillator [Fig. 5.4(a)] driving a buffer operates no faster than 7 GHz in CMOS technology. The halfrate CDR architecture overcomes this limitation, requiring a frequency of only 5 GHz. As shown in Fig. 5.4(b), each stage consists of a fast and a slow path whose outputs are summed together. By steering the current between the fast and the slow paths, the amount of delay achieved through each stage and hence the VCO frequency can be adjusted. All three stages in the ring are loaded by identical buffers to achieve equal rise and fall times and hence improve the jitter performance. Figure 5.4(c) shows the transistor implementation of each delay stage. The fast and slow paths are formed as differential circuits sharing their output nodes. The tuning

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

81

is achieved by reducing the tail current of one and increasing that of the other differentially. Since the low supply voltage makes it difficult to

82

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

stack differential pairs under and the current variation is performed through mirror arrangements driven by PMOS differential pairs. Figure 5.5 depicts the small-signal gain and phase response of each delay stage. While providing a phase shift of 60°, each stage achieves a gain of 5.5 dB at 5 GHz, yielding robust oscillation at the target frequency.

A critical drawback of supply scaling in deep submicron technologies is the inevitable increase in the VCO gain for a given tuning range. To alleviate this difficulty, the control of the VCO is split between a coarse input and a fine input. The partitioning of the control allows a reduction of more than one order of magnitude in the VCO sensitivity. The idea is that the fine control is established by the phase detector and the coarse control is a provision for adding a frequency detection loop. The coarse control is provided externally in this prototype. The fine

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

83

control provides a gain of 150 MHz/V and the coarse control provides 2.5 GHz/V. The tuning range is 2.7 GHz (Fig. 5.6).

2.2.

Phase Detector

For linear phase comparison between data and a half-rate clock, each transition of the data must produce an “error” pulse whose width is

84

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

equal to the phase difference. Furthermore, to avoid a dead zone in the characteristics, a “reference” pulse must be generated whose area is subtracted from that of the error pulse, thus creating a net value that falls to zero in lock. The above observations lead to the PD topology shown in Fig. 5.7(a). The circuit consists of four latches and two XOR gates. The data is

applied to the inputs of two sets of cascaded latches, each cascade constituting a flipflop that retimes the data. Since the flipflops are driven by a half-rate clock, the two output sequences and are the de-

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

85

multiplexed waveforms of the original input sequence if the clock samples the data in the middle of the bit period. The operation of the PD can be described using the waveforms depicted in Fig. 5.7(b). The basic unit employed in the circuit is a latch whose output carries information about the zero crossings of both the data and the clock signal. The output of each latch tracks its input for half a clock period and holds the value for the other half, yielding the waveforms shown in Fig. 5.7(b) for points and The two waveforms differ because their corresponding latches operate on opposite clock edges. Produced as the Error signal is equal to ZERO for the portion of time that identical bits of and overlap and equal to the XOR of two consecutive bits for the rest. In other words, Error is equal to ONE only if a data transition has occurred. It may seem that the Error signal uniquely represents the phase difference, but that would be true only if the data were periodic. The random nature of the data and the periodic behavior of the clock in fact make the average value of Error pattern dependent. For this reason, a reference signal must also be generated whose average conveys this dependence. The two waveforms and contain the samples of the data at the rising and falling edges of the clock. Thus, contains pulses as wide as half the clock period for every data transition, serving as the reference signal. While the two XOR operations provide both the Error and the Reference pulses for every data transition, the pulses in Error are only half as wide as those in Reference. This means that the amplitude of Error must be scaled up by a factor of two with respect to Reference so that the difference between their averages drops to zero when clock transitions are in the middle of the data eye. The phase error with respect to this point is then linearly proportional to the difference between the two averages. In order to generate a full-rate output, the demultiplexed sequences are combined by a multiplexer that operates on the half-rate clock as well. This output can also be used for testing purposes in order to obtain the overall bit error rate (BER) of the receiver. It is important to note that the XOR gates in Fig. 5.7 must be symmetric with respect to their two differential inputs. Otherwise, differences in propagation delays result in systematic phase offsets. Each of the XOR gates is implemented as shown in Fig. 5.8 [48]. The circuit avoids stacking stages while providing perfect symmetry between the two inputs. The output is single-ended but the single-ended Error and Reference signals produced by the two XOR gates in the phase detector are sensed with respect to each other, thus acting as a differential drive

86

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

for the charge pump. The operation of the XOR circuit is as follows. If the two logical inputs are not equal, then one of the input transistors on the left and one of the input transistors on the right turn on, thus turning off. If the two inputs are identical, one of the tail currents flows through Since the average current produced by the Error XOR gate is half of that generated by the Reference XOR gate, transistor is scaled differently, making the average output voltages equal for zero phase difference. Channel length modulation of transistor reduces the precision of current scaling between the two XOR gates. This effect can be avoided by increasing the length of the device. The gain of the phase detector is determined by the value of the resistor and the tail current sources The voltage is generated on chip in order to track the variations over temperature and process. This voltage equals the output common-mode level of the latches preceding the XOR gate. It is generated using a differential pair that is a replica of the preamplifier section of the latch. Current source raises the common-mode level of the differential signal formed by the Error and Reference signals, making compatible with the input of the charge pump. It is instructive to plot the input/output characteristic of the PD to ensure linearity and absence of a dead zone. This is accomplished by obtaining the average values of Error and Reference while the circuit operates at maximum speed. Figure 5.9 shows the simulated behavior as the phase difference varies from zero to one bit period. The Reference average exhibits a notch where the clock samples the metastable points of the data waveform. The Error and Reference signals cross at a phase difference approximately 55 ps from the metastable point, indicating

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

87

that the systematic offset between the data and the clock is very small. The linear characteristic of the phase detector results in minimal charge pump activity and small ripple on the control line in the locked condition.

The choice of the logic family used for the XOR gates and the latches is determined by the speed and switching noise considerations. While rail-to-rail CMOS logic achieves relatively high speeds, it requires amplifying the data swings generated by the stage preceding the CDR circuit (typically a limiting amplifier). Furthermore, CMOS logic produces enormous switching noise in the substrate and on the supplies, disturbing the oscillator considerably. For these reasons, the building blocks incorporate current-steering logic. The phase detector incorporates an input buffer with on-chip resistive matching.

2.3.

Charge Pump and Loop Filter

Figure 5.10 shows the implementation of the differential charge pump. The common-mode feedback (CMFB) circuit senses the output CM level

88

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

by and providing correction through and Both the matching and channel-length modulation of in Fig. 5.10 impact the residual phase error in locked condition. Thus, their lengths and widths are relatively large to minimize these effects.

The design of the loop filter is based on a linear, time-invariant model of the loop and is performed in continuous time domain. The loop is in general a nonlinear time-variant system and can only be assumed linear if the phase error is small. The time-invariant analysis is valid if the averaging behavior of the loop rather than its single-cycle performance is of interest, i.e., the loop can be analyzed by continuous-time approximation if the loop bandwidth is small. Under this condition, the state of the CDR changes by only a small amount on each cycle of the input signal. A low-pass jitter transfer function with a given bandwidth and a maximum gain in the passband is specified for a SONET system. The closedloop transfer function of the CDR has a zero at a frequency lower than the first closed-loop pole. This results in jitter peaking that can never be eliminated. But the peaking can be reduced to negligible levels by overdamping the loop. As derived in [41], the closed-loop unity-gain bandwidth is approximated as:

where and are the gains of the VCO and PD, respectively, and denotes the conversion gain of the charge pump. Equation (5.1) can be used to determine the value of The amount of the jitter

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

89

peaking in the closed-loop transfer function can be approximated as:

Equation (5.2) yields the required value of In order to obtain greater suppression of high-frequency jitter, a second capacitor is added in parallel with the series combination of and These components are added externally to achieve flexibility in defining the closed-loop characteristics of the circuit. Another advantage of linear PDs over their bang-bang counterparts is that their jitter transfer characteristic is independent of the jitter amplitude. It should also be mentioned that if the CDR is followed by a demultiplexer, the tight specifications for jitter peaking need not be satisfied because such specifications are defined for cascaded regenerators handling full-rate data. Figure 5.11 depicts the simulated behavior of the CDR circuit at the transistor level. The voltage across the filter is initialized to a value relatively close to its value in phase lock. The loop goes through a transition of 350 ns before it locks. The ripple on the control line in phase lock is approximately 1 mV.

3.

Experimental Results

The CDR circuit has been fabricated in a CMOS process. Figure 5.12 shows a photograph of the chip, which occupies an area of ESD protection diodes are included for all pads except the high-speed ones. Nonetheless, since all of these lines have a termi-

90

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

nation to they exhibit some tolerance to ESD. The circuit is tested in a chip-on-board assembly. In this prototype, the width of the poly resistors was not sufficient to guarantee the nominal sheet resistance. As a result, the fabricated resistor values deviated from their nominal value by 30%, and the VCO center frequency was proportionally lower than the simulated value at the nominal supply voltage (1.8 V). The supply was increased to 2.5 V to achieve reliable operation at 10 Gb/s. While such a high supply voltage creates hot-carrier effects in rail-to-rail CMOS circuits, it is less detrimental in this design because no transistor in the circuit experiences a gate-source or drain-source voltage of more than 1 V. This issue is nonetheless resolved in a second design [49] by proper choice of resistor dimensions. The circuit is brought close to lock with the aid of the VCO coarse control before phase locking takes over.

Figure 5.13(a) shows the spectrum of the clock in response to a 10Gb/s data sequence of length . The effect of the noise shaping of the loop can be observed in this spectrum. The phase noise at a 1-MHz offset is approximately equal to -106 dBc/Hz. Figure 5.13(b) depicts the recovered clock in the time domain. The time-domain measurements using an oscilloscope overestimate the jitter, requiring specialized equipment, e.g., the Anritsu MP1777 jitter analyzer. The jitter performance of the CDR circuit is characterized by this analyzer. A random sequence of length produces 14.5 ps of peak-to-peak and 1 ps of rms jitter on the clock signal. These values are respectively reduced

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

91

to 4.4 ps and 0.6 ps for a random sequence of length . SONET OC-192 specifies 10 ps as the maximum peak-to-peak jitter on the clock. Therefore, the measured results are relatively close to the specifications.

The measured jitter transfer characteristic of the CDR is shown in Fig. 5.14. The jitter peaking is 1.48 dB and the 3-dB bandwidth is 15 MHz. The loop bandwidth can be reduced to the SONET specifications, but the jitter analyzer must then generate large jitter and drives the loop out of lock. The loop bandwidth can be reduced to the SONET specifications if a means of frequency detection is added to the loop (Chapter 6). The circuit is then much less susceptible to loss of lock due to the jitter generated by the analyzer. Figure 5.15 depicts the retimed data. The demultiplexed data outputs are shown in Fig. 5.15(a). The difference between the waveforms results from systematic differences between the bond wires and traces on the test board. Figure 5.15(b) depicts the full-rate output. Using this output, the BER of the system can be measured. With a random sequence of ,

92

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

BER is less than However, a random sequence of results in a BER of This BER can be reduced if the bandwidth of the output buffer driving the 10-Gb/s data is increased. Furthermore, if the value of the linear resistors is adjusted to their nominal value, the increased operating speed of the back-end multiplexer results in an improved BER (Chapter 6). The CDR circuit exhibits a capture range of 6 MHz and a tracking range of 177 MHz. The total power consumed by the circuit excluding the output buffers is 72 mW from a 2.5-V supply. The VCO, the PD, and the clock and data buffers consume 20.7 mW, 33.2 mW and 18.1 mW, respectively.

4.

Conclusion

CMOS technology holds great promise for optical communication circuits. The raw speed resulting from aggressive scaling along with high levels of integration provide a high performance at low cost. A 10-Gb/s clock and data recovery circuit designed in CMOS technology performs phase locking, data regeneration, and demultiplexing with 1 ps of rms jitter.

A 10-Gb/s Linear Half-rate CMOS CDR Circuit

93

Chapter 6 A 10-GB / S CMOS CDR CIRCUIT WITH WIDE CAPTURE RANGE

This chapter describes the design and experimental results of a 10Gb/s phase-locked CDR circuit incorporating a multiphase LC oscillator and a half-rate phase/frequency detector with automatic data retiming. Fabricated in CMOS technology over an area of the circuit exhibits a capture range of 1.43 GHz, an rms jitter of 0.8 ps, and a peak-to-peak jitter of 9.9 ps with a PRBS of length The power dissipation is 91 mW from a 1.8-V supply. This circuit is the first 10-Gb/s CMOS CDR circuit to meet the specifications for jitter generation defined by SONET OC-192. The high integrability and low power dissipation of this CDR circuit demonstrates the capability of using a full CMOS process for implementation of high-performance SONET transceivers operating at 10 Gb/s.

1.

Introduction

The majority of the CDR circuits employ ring and LC oscillators to generate a clock signal. Ring oscillators have been dominantly used to implement systems, operating at lower speeds such as OC-3 and OC-12. They provide a wide tuning range and differential control that makes the circuit less susceptible to supply and substrate noise. Furthermore, they benefit from a compact layout, easing the routing of high-speed signals, and yielding a smaller area. The output jitter of these oscillators is small enough to meet the OC-3 and OC-12 standard specifications. As the data rate increases, the ring topology becomes an unattractive candidate for the oscillator implemented in a CDR circuit. The most important disadvantage is its limited signal integrity. Generation of a robust clock signal at a high frequency using a ring oscillator is difficult. The maximum oscillation frequency achieved from a ring oscillator

96

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

depends on the number of stages and the minimum amount of delay achieved from each stage. As the number of stages is reduced, the phase shift introduced by each stage increases. However, achieving a significant phase shift from a single stage in presence of process and temperature variations is difficult. Therefore, to achieve reliable operation, a minimum number of three stages should be used in a ring oscillator. A higher number of stages limits the maximum frequency that can be achieved from a ring oscillator. The oscillation frequency of the ring oscillators by itself varies significantly over the process. It heavily depends on the RC delay of each stage. A large portion of the output capacitance of each stage comes from the interconnect capacitances that are hard to extract because of their versatile configurations. Also, the sheet resistance of the resistors significantly varies from one wafer to another. The LC oscillators in general provide a more precise center frequency because the value of the inductor can be estimated with a high precision and the parasitic capacitances mostly contribute to a small percentage of the output capacitance of each stage. This capacitance is mostly dominated by the diode or the varactor. Furthermore, since the inductor and the capacitor values determine the oscillation frequency, their product can be reduced to achieve reliable operation at high frequencies. However, the drawback of the LC oscillators is a narrower tuning range and single-ended control. Meeting the specifications for jitter generation defined by SONET requires a VCO that inherently has a small phase noise. Since the loop bandwidth of the CDR circuit is small, the jitter produced by the oscillator accumulates in the loop. Therefore, the inherent jitter of the oscillator should be as low as possible, requiring the implementation of the CDR circuit using an LC oscillator with low phase noise. The circuit introduced in this chapter uses an oscillator that benefits from the quality of LC tanks to achieve an improved jitter performance. Furthermore, the oscillator is formed by placing a number of stages in a loop, providing multiple phases over the tuning range, without the oscillation frequency being a strong function of the number of stages in the loop. The SONET standard recommends operation at an exact data rate. Therefore the oscillators should be guaranteed to generate a clock signal at an exact frequency. The tuning range should be wide enough to cover the frequency variations over process and temperature. On the other hand, the CDR circuits lacking a means of frequency acquisition, provide a very narrow capture range. This range is primarily determined by two factors: loop bandwidth and phase detector topology,

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

97

limiting the circuit’s capture range to a few megahertz. Therefore, the CDR circuit cannot acquire lock if the VCO starts at a frequency that is significantly different from the data rate. This limitation calls for an aided acquisition mechanism. Various frequency detection schemes have been introduced that operate with or without a reference signals. In this chapter, a new frequency acquisition scheme using a half-rate clock is described. This circuit benefits from full compatibility with the operation of the phase detector, eliminating the need for a lock detection scheme. The frequency detector is automatically tri-stated when the circuit gets close to the phase lock. The next section of the chapter presents the CDR architecture and design issues. Following that, the implementation of the building blocks, loop characterization, and the experimental results are described.

2.

Architecture

Because of the marginal performance of the technology used in this work, similar to the circuit described in chapter 5, the clock frequency is chosen to be half of the data rate. However, the previous circuit suffers from a limited capture range because it lacks a means of frequency detection. Various techniques for performing frequency detection without a reference clock have been introduced. But such techniques rely on a full-rate clock to obtain the frequency error signal. In this work, a new approach to performing phase and frequency detection using a half-rate clock is described. This technique both achieves a high speed and automatically retimes the data. Shown in Fig. 6.1, the CDR consists of a phase and frequency detector (PFD), a voltage-controlled oscillator (VCO), a charge pump, and a lowpass filter (LPF). The PFD compares the phase and the frequency of the input data to that of a half-rate clock, providing two binary error signals for phase and frequency. These error signals are fed back to the VCO through the charge pump and the low-pass filter. After phase lock is achieved, the phase of the output clock is within a small offset from the phase of the input data. This guarantees that the clock frequency is equal to one half of the input bit rate. The PFD is designed such that, in addition to providing information about the phase error, it retimes the data as well. Consequently, the CDR exhibits no systematic offset, i.e., inherent skews between clock and data edges due to their nonidentical paths through the loop do not degrade the quality of detection. The VCO provides multiple phases over the full tuning range.

98

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

In order to minimize sensitivity to common-mode noise, the CDR circuit incorporates fully differential topologies for all of the building blocks.

3. Building Blocks 3.1. VCO Shown in Fig. 6.2(a), the VCO consists of a four-stage differential ring oscillator with LC-tuned loads, providing a tuning range wide enough to encompass process and temperature variations. The number of stages is chosen such that multiple clock phases with 45° of spacing required in the PFD can be generated. This loop must have a negative feedback at low frequencies in order to provide multiple phases; otherwise, the four signals will be in phase. Figure 6.2(b) shows the implementation of each stage. The loads are formed using spiral inductors and MOS varactors. In order to determine the frequency of oscillation, we recognize that each stage in the ring must provide 45° of phase shift for oscillation to occur. As shown in Fig. 6.2(c), the load can be modeled by a parallel LC tank along with a parallel resistor The major contributor to this resistive loss in the tank is the limited Q of the inductor. Therefore, can be approximated as Setting the phase shift of the parallel tank to 45°, we arrive at the following equation.

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

99

The oscillation frequency can therefore be written as:

Equation (6.2) suggests that the tuning characteristic of LC-tuned ring oscillators is identical to cross-coupled LC oscillators if the inductor has a relatively constant Q across the tuning range. Also, as the number of stages, n, increases, the oscillation frequency becomes less dependent on n, approaching This is in contrast to ring oscillators using resistive loads. The dominant portion of the tank’s parallel capacitance is contributed by the MOS varactor. The varactor capacitance varies by a factor of 2 across the tuning range. As shown in Fig. 6.2(b), resistor shifts the common-mode level of down so that the varactor gate-source voltage can assume both positive and negative values, providing a large tuning range. Each stage has a tail current source of 4 mA. The bias current is chosen to provide large voltage swings at the output to drive the following circuit with smaller phase noise.

100

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Simulation results indicate that the maximum self-resonance frequency for the inductors is achieved by forming them as the stack of two metal layers, M3 and M6 [50], despite the fact that the inductors required for oscillation at 5 GHz are usually small enough to be formed using a single layer of metal over a relatively small area. In order to arrive at the exact oscillation frequency in simulations, a distributed model is used for the inductor. Shown in Fig. 6.3, a lumped inductor is replaced by a chain of smaller inductors in series with resistors, modeling the loss of the tank. The layer-to-layer capacitance and the layer-to-substrate capacitance is distributed across the nodes on the resistive/inductive chain. In this model, and equal 1/8 of the inductor value and its series resistor. and equal 1/5 of and respectively.

As shown in [50], the model of Fig. 6.3 can be used to predict the selfresonance frequency of the tank. Theoretically, it can be shown that the effective capacitance of the inductor equals In reality, this value is closer to The VCO occupies a large chip area as a result of having eight spiral inductors. Therefore, the metal lines carrying the multi-phase clock signals are very long. These interconnects are laid out using wide traces of the top metal layer in order to reduce the resistance of the wire. This results in a large routing capacitance, since the fringe capacitance of the top metal layer in a CMOS technology is several times higher than its bottom-plate capacitance. If the buffers following the VCO are placed before the interconnects, the parasitics will introduce a large time constant at the buffer outputs, drastically reducing the voltage swing of the high-speed signal. To alleviate this problem, the buffers are placed after the interconnects so that the parasitics can

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

101

be tuned out by the inductors in the VCO. The parasitic capacitances are precisely calculated so that the resulting oscillation frequency does not fall out of the range of interest. One remedy for reduction of the differential capacitance between the adjacent lines is by routing the signals such that the two adjacent lines carry signals that are close in phase. Figure 6.4(a) depicts the signal arrangement that minimizes the differential capacitance. The signals carried over two adjacent lines are only 45° apart in the phase domain. Figure 6.4(b) depicts an arrangement in which differential signals are placed close to each other. This orientation maximizes the capacitance because the two signals sustain a maximum phase difference of 180°.

Although the orientation of Fig. 6.4(a) minimizes the coupling capacitance between the lines, it results in unequal lengths for the traces carrying the signals. At 5 GHz, only a few picoseconds of skew can significantly degrade the performance of the circuit. Therefore, the orientation of Fig. 6.4(b) was adopted for routing the signals. However, in order to minimize the capacitance, the traces are placed far apart from each other. The first-order parasitic capacitance models indicate that the value of the coupling capacitance stays constant until the spacing reaches a certain limit. There after, the capacitance decreases linearly

102

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

as the spacing increases. The spacing of was chosen such that the parasitic capacitances contribute to less than 20% of the total capacitance seen at the output of each stage. Special attention was paid to equalizing the length of the traces carrying the high-frequency clock signal to the phase and frequency detectors. The metal trace connecting the control line of the VCO to the pad is shielded by two metal layers that are connected to the VCO supply. Therefore, the supply noise is capacitively coupled to the VCO control, and the modulation of the varactor capacitance due to supply noise is less pronounced. The output clock signal can be taken from either of the clock phases. Also, the sum of the four phases can be routed to the output. The second solution provides a larger swing at the input of the clock buffer. However, simulations indicate that at such high frequencies, the resulting sum is significantly distorted. For this reason, only one of the phases is passed to the output and additional dummy circuits are introduced, so that the capacitive loading on all four phases is equal. Another approach for generation of half-quadrature phases is by using a quadrature oscillator, generating the 0° and 90° phases, and interpolating between these phases to generate the half-quadrature phases. However, this approach is susceptible to introduction of mismatch between the phases. The combination of the quadrature VCO and the interpolators consumes more power for the same performance, compared to the VCO described here.

3.2.

Phase and Frequency Detector

The PFD described in this work consists of two phase detectors (PD) and a modified double-edge-triggered flipflop (DETFF). The PD is derived from the data transition tracking loop (DTTL) described in [52] and [11]. In this PD, in-phase and quadrature phases of a half-rate clock signal sample the data in two double-edge-triggered flipflops. As shown in Fig. 6.5, four distinct possibilities can be identified for the cases when the clock is early or late, whether the data edge is positive or negative. For a positive edge, if the clock is early, the quadrature sample is negative and if the clock is late, the quadrature sample is positive. When the data edge is negative, the polarity of the quadrature samples is reversed. If either of the in-phase or quadrature samples is used to form the phase-error signal, the reversed polarity of the samples of positive and negative data edges provides inconsistent phase-error information. Since the phase error information is only present when a data transition occurs, the following set of rules can be proposed to obtain the desired phase error signal.

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

103

If the data makes a low-to-high transition, the quadrature sample goes to the phase-detector output. If the data makes a high-to-low transition, the inverse of the quadrature sample goes to the phase-detector output. Figure 6.6 shows the implementation of the PD according to these rules. Two latches operating on opposite clock phases and a multiplexer form a DETFF that samples the data using both the positive and negative transitions of a half-rate clock. The two signals and are therefore the in-phase and quadrature samples of data, respectively. A modified DETFF is used to implement the above rules. The output of the latch operating on the rising edge of the trigger signal goes to the multiplexer with no inversion (the first rule), whereas the output of the latch operating on the falling edge of the trigger signal is inverted before going to the multiplexer (the second rule). We therefore use the in-phase sample to clock the quadrature sample The output of the modified DETFF is the phase error signal. This phase detector can operate at a high speed because it uses a halfrate clock. Since in the locked condition, the rising and falling edges of the quadrature clock coincide with data transitions, the in-phase clock transitions sample the data at its optimum point with no systematic offset, generating a full-rate output stream. Also, since the phase-error signal is revalidated only at data transitions, the ripple in the phase error signal is suppressed. The PD is independent of the data transition density, resulting in substantial reduction of pattern-dependent jitter.

104

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

A difference between the data rate and twice the clock frequency will result in a beat frequency, formed as an alternating low-speed signal at the output of the PD. The average period of this signal represents the difference of the bit rate and twice the clock frequency. This signal, however, is not sufficient to determine the polarity of the frequency error. We therefore add a second PD to the circuit, whose structure is identical to the first PD. The only difference is that the in-phase and the quadrature clock signals applied to this block lead their counterparts in the other phase detector by 45°. Figure 6.7 depicts the output of both PDs for two cases when clock frequency is less or greater than half the data rate. From these waveforms, the following observations can be made: If clock is slow, lags rising and falling edges of respectively. If clock is fast, leads the rising and falling edges of previous case.

Therefore, if is sampled by the the results are negative and positive, Therefore, if is sampled by the results are the inverse of the

We conclude that the modified DETFF can be used to extract the frequency error signal from and Figure 6.8 shows the PFD structure, where and lead by 45°, 90°, and 135°, respectively. The voltages and are used as the phase error and frequency error signals.

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

105

106

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

As shown in Fig. 6.7, if the PFD is designed such that has a unipolar output, the difference between and will have positive and negative unipolar pulses for slow and fast clock signals. The polarity of these pulses determines the sign of frequency error. A PFD that generates two signals for phase and frequency error must be designed such that its frequency error signal falls to zero in phase lock. As described in [34], the modified DETFF used to produce the frequency error signal generates unipolar tri-state pulses at the output. Figure 6.9 depicts how the multiplexer used in the DETFF is modified for this purpose.

3.3.

Charge Pump

Figure 6.10 shows the implementation of the charge pump. Since the circuit drives the single-ended control of the varactors, it is designed to provide a single-ended output. In phase lock, the differential frequency error signal falls to zero. Therefore, is equally split between and having negligible effect on In order to reduce the ripple at the output, the charge-pump current is relatively small. Simulation results indicate that the capture range of the circuit is limited because the output of the charge pump cannot go from rail to rail. The current sources, and and the transistor impose a voltage drop. This drop can be minimized by proper choice of device dimensions.

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

3.4.

107

Output Buffers

The output buffer delivers the high-speed clock and data signals to the output termination. As shown in Fig. 6.11, to achieve a wide

bandwidth the buffer stages employ inductive peaking [43]. The value of the inductors is chosen so as to avoid peaking in the passband. Since the quality factor of the inductors is not critical here, the spiral structures have a line width of only to achieve a high self-resonance frequency. The value of the inductors ranges from 1.5 nH to 3.5 nH.

108

4.

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Loop Characterization

Figure 6.12(a) depicts a linear small-signal model of the CDR circuit, in the vicinity of the lock point. In chapter 5, the 3-dB bandwidth of

the transfer function from the input of the phase detector to the VCO output and the value of jitter peaking in this system was calculated. The assumption is that the loop filter only consists of a series resistor and capacitor (Fig. 6.12(b)). This simple model can also be used for determination of the 3-dB bandwidth of the closed-loop VCO’s phase noise characteristic Shown in Fig. 6.12(c), this bandwidth can be approximated if the loop is heavily overdamped and the jitter transfer function has no peaking. Thermal noise enters the system from the input of the phase detector and the control of the VCO. If the transfer function from these inputs to the output is represented as and respectively, then the power spectral density of the output noise can be given as: where is the power spectral density of the input thermal noise. The 3-dB bandwidth is the frequency at which equals 0.5. Therefore

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

109

In this equation:

In a heavily overdamped system The mathematical term for can be substantially simplified using this approximation:

The assumption of will require that where represents the 3-dB bandwidth from the input of the phase detector to the oscillator output.

5.

Experimental Results

The CDR circuit has been fabricated in a CMOS process. Figure 6.13 shows a photograph of the chip, which occupies an area of

110

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

ESD protection diodes are included for all pads except the high-speed ones. The circuit is tested in a chip-on-board assembly, running from a 1.8-V supply. Figure 6.14(a) depicts the VCO tuning characteristic. It achieves a tuning range of 1.2 GHz The VCO achieves the highest signal

purity at the lower bound of its tuning range (-102.35 dBc/Hz at 1-MHz offset). The open-loop VCO phase noise at 5 GHz is -86 dBc/Hz. The tuning characteristic of the VCO varies by 1% over process. Figure 6.15(a) shows the spectrum of the clock in response to a 9.95328-Gb/s data sequence of length The phase noise at 1-MHz offset is approximately equal to -107 dBc/Hz. Figure 6.15(b) depicts the recovered clock in the time domain. The jitter performance of the CDR circuit is characterized by the Anritsu MP1777 jitter analyzer. A ran-

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

111

dom sequence of length produces 9.9 ps of peak-to-peak and 0.8 ps of rms jitter on the clock signal. These values are respectively reduced to 2.4 ps and 0.4 ps for a random sequence of length SONET OC-192 specifies 10 ps as the maximum peak-to-peak jitter on the clock. Therefore, the measured results are within the standard specifications. The measured jitter transfer characteristic of the CDR is shown in Fig. 6.16. The jitter peaking is 0.04 dB and the 3-dB bandwidth is 5.2 MHz. In order to measure the jitter tolerance, a random sequence of 10 Gb/s was applied to the circuit. The BER was for a PRBS of and the circuit did not pass the tolerance requirements defined by SONET. To identify the source of the error, the data rate was reduced to 5 Gb/s. The circuit still sustained lock while the VCO was oscillating at 5 GHz. The BER was smaller than and the circuit passed the SONET mask (Fig. 6.17). The jitter tolerance of the circuit can be limited by either the input buffer, the CDR circuit, or the output buffer. Since

112

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

the jitter on the clock signal is not limiting the circuit’s performance, as verified by the jitter tolerance experiment at a lower rate, and the input does not impose a severe limitation on the bandwidth, the output buffer probably results in the high BER. Improvement of the output

A 10-Gb/s CMOS CDR Circuit with Wide Capture Range

113

buffer bandwidth can result in improved performance, required to meet the SONET specification. Figure 6.18 depicts the full-rate retimed data.

Despite the small loop bandwidth, the frequency detector provides a capture range of 1.43 GHz, obviating the need for external references. The total power consumed by the circuit excluding the output buffers is 91 mW from a 1.8-V supply. The VCO, the PFD, and the clock and data buffers consume 30.6 mW, 42.2 mW and 18.2 mW, respectively.

6.

Conclusion

A 10-Gb/s clock and data recovery circuit designed in CMOS technology performs frequency acquisition, phase locking, and data regeneration. Achieving an rms jitter of 0.8 ps, this circuit is the first CMOS CDR circuit to meet the jitter generation requirements defined by SONET. The power consumption of this circuit is much smaller than the power consumption of similar circuits fabricated in bipolar or GaAs processes.

Chapter 7 CONCLUSION

The number of the Internet nodes doubles approximately every 100 days, leading to an average bit rate of a few terabits per second on the backbone. The bandwidth requirements are growing with an extremely fast pace. Applications such as online virtual reality will require data rates that are 10,000 times higher than currently available ones [53]. With fiber optics being the only communication medium capable of handling such high data rates, this trend has suddenly created a widespread demand for high-speed optical and electronic devices, circuits, and systems. The new optical revolution has gradually replaced modular, generalpurpose building blocks by end-to-end solutions that benefit from device, circuit, and architecture codesign. Greater levels of integration on a single chip enable higher performance and lower cost. Mainstream VLSI technologies such as CMOS continue to take over the territories thus far claimed by GaAs and InP devices. In the past two decades, CMOS technology has rapidly penetrated the analog integrated circuit design arena, providing low-cost, highperformance solutions and rising to dominate the market. More than 90% of the analog and mixed-signal products in today’s semiconductor industry are designed and fabricated in pure CMOS technologies. Exploitation of the CMOS process for fabrication of the electronic interface in the optical system allows for integration of high-speed frontend circuits and low-speed framers and mappers on the same chip. This integration can reduce the package count, board size, and cost of the system. The two widely accepted commercial systems, namely SONET OC48 and OC-192, operate at 2.5 and 10 Gb/s respectively. The 2.5-Gb/s

116

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

CMOS transceiver has already been introduced by a few companies and an extensive amount of research has been performed to improve the design of these systems. However, implementation of the 10-Gb/s CMOS transceivers lags the 2.5 Gb/s receivers by a few years because these systems have only become realizable in relatively advanced technologies such as the process that has become available in the last two years. In this book, the design of the world’s first and second 10-Gb/s CDR circuits has been described. Targeting the performance requirements of the SONET OC-192 standard, the second circuit satisfies the jitter generation specification. The jitter tolerance specification can be satisfied by further improvement of the output data buffer bandwidth and reduction of the circuit’s BER. The future research in the field of optical transceivers can be diversified into two major categories. On the one hand, circuit designers can look into implementation of systems operating at higher rates. The next optical standard (SONET OC-768) introduces a data rate that is very close to 40 Gb/s. Device scaling of the CMOS process will soon allow operation at such speeds. However, the integrity and purity of the signal are issues that are becoming extremely critical at such high speeds. Only a few picoseconds of timing jitter can have a detrimental effect on the performance of the system. Furthermore, connection of the circuit core to the output environment at these speeds, in the presence of parasitics is extremely difficult. New broadband circuit techniques need to be developed to address these issues and many other challenges that arise in implementation of 40-Gb/s systems. The research in this field can be directed to a different direction. Integration of receivers, transmitters, and perhaps digital circuits on the same chip is indeed a very tough challenge. If the receiver and the transmitter incorporate separate VCOs running at close frequencies, special attention needs to be paid to avoid signal coupling and reduce clock jitter. The digital circuits can heavily pollute the signal environment, introducing noise through substrate and supplies. In such an environment, meeting the jitter generation and transfer requirements is critical. The noise usually manifests itself as peaking in the jitter transfer characteristic. The goal of the research described here was not only to demonstrate the capability of the CMOS process for fabrication of broadband circuits such as 10-Gb/s CDR circuits, but also to provide architectural and circuit techniques that can be used in any commercial system incorporating clock and data recovery. The focus of this work was implementing CDR circuits using the half-rate concept. Utilization of this technique will

Conclusion

117

allow designers to increase the maximum operating rate of the CDR circuit by 60 to 80 percent in any given technology. This feature becomes attractive since the speed capability of the fabrication processes always lags the demand for higher bit rates. Different types of VCOs and phase detectors were used in the CDR circuits to demonstrate their performance at speeds as high as 10 Gb/s. LC oscillators benefit from larger swings, lower phase noise, and higher accuracy in prediction of the resonance frequency. Their drawbacks are their relatively large area and narrow tuning range. However, as their frequency of operation increases the size of the integrated spiral inductors needed to form the LC tank reduces. At the same time, the relatively precise oscillation frequency of the LC oscillators relaxes the requirements for a wide tuning range. The design and optimization of LC oscillators has extensively advanced due to the research performed in UCLA and many other institutions in the last few years. These circuit can be found in most of the high-speed optical transceivers in very near future. Both the linear and binary phase detectors are attractive for implementation in a CDR circuit. The performance of the linear phase detectors can be more easily modeled and predicted. They benefit from a lower charge pump activity at the lock point because unlike the binary phase detectors, the output of the linear phase detectors goes to zero in phase lock. On the other hand, the binary phase detectors are less susceptible to peaking in their jitter transfer characteristic because they have a single-pole-like jitter transfer characteristic [51]. One major advantage of binary phase detectors to linear phase detectors is that they can be expanded to perform referenceless frequency detection on top of phase detection. This is because binary topologies are capable of providing a strong beat frequency in presence of clock and data frequency mismatch. Having this issue in mind, a binary system is more suitable where unaided frequency acquisition must be performed. However, systems that rely on an external reference signal for frequency acquisition can incorporate a linear phase detector. The two implementations described in this book illustrate this concept. The CDR circuit remains to be the most critical block of the optical receiver. In the years to come, we will see new techniques targeting improved performance, higher data rates, higher integration, lower power consumption, and lower cost.

References

[1] webopedia.internet.com/networks/networking_standards/SONET.html [2] SONET OC-192 Transport System Generic Criteria, GR-1377-CORE, Issue 5,

Dec. 1998. [3] Cypress Hotlink, User’s Guide, Cypress Semiconductor, April 1999. [4] A. X. Widmer, P. A. Franaszek, “A DC-Balanced, Partitioned-Block, 8B/10B

Transmission Code,” IBM Journal of Research and Development, vol. 27, pp. 440-451, Sept. 1983. [5] R. Walker, B. Amrutur, T. Knotts, “64B/66B Coding Update,” IEEE 802.3ae

Meeting, Albuquerque, March 2000. [6] H. Kim, J. Bauman, “A 12 GHz 30 dB Modular BiCMOS Limiting Amplifier

for 10 Gb SONET Receiver,” ISSCC Digest of Technical Papers, vol. 43, pp. 160-161 , Feb. 2000. [7] M. Neuhauser, H.-M. Rein, H. Wrenz, “Low-Noise, High-Gain Si Bipolar Pream-

plifiers for 10 Gb/s Optical Fiber Links - Design and Realization,” IEEE Journal of Solid-State Circuits, vol. 31, pp. 24-29, January 1996. [8] E. M. Cherry, D. E. Hooper, “The Design of Wideband Transistor Feedback

Amplifiers,” Proc. IEE, vol. 110, pp. 375-389, February 1963. [9] H.-M. Rein, M. Moller, “Design Considerations for Very High Speed Si Bipolar

ICs Operating up to 50 Gb/s,” IEEE Journal of Solid-State Circuits, vol. 31, pp. 1076-1090, August 1996. [10] B. Gilbert, “A New Wideband Amplifier Technique,” IEEE Journal of Solid-

State Circuits, vol. 3, pp. 353-365, December 1968. [11] A. W. Buchwald, Design of Integrated Fiber-Optic Receivers Using Heterojunc-

tion Bipolar Transistors, Ph.D. Thesis, University of California, Los Angeles, Jan. 1993. [12] J. Savoj, B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit,”

Digest of Symposium on VLSI Circuits, pp. 136-139, June 2000.

120

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

[13] F. Herzel, B. Razavi, “A Study of Oscillator Jitter due to Supply and Substrate

noise,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 46, pp. 56-62, Jan. 1999. [14] B. Razavi, RF Microelectronics, Upper Saddle River, NJ: Prentice Hall, 1998. [15] B. Razavi, Design of Analog CMOS Integrated Circuits, New York, NY: McGraw

Hill, 2000. [16] B. Razavi, ed. Monolithic Phase-Locked Loops and Clock Recovery Circuits, Pis-

cataway, NJ: IEEE Press, 1996. [17] T. C. Weigandt, B. Kim, P. R. Gray, “Analysis of Timing Jitter in CMOS Ring

Oscillators,” Proc. IEEE ISCAS, vol. 4, pp.27-30, June 1994. [18] W. B. Kuhn, N. K. Yanduru, “Spiral Inductor Substrate Loss Modeling in Sili-

con RF IC’s,” Microwave Journal, pp. 66-81, March 1999. [19] C. M. Hung, L. Shi, I. Lagnado, K. K. O., “A 25.9 GHz Voltage-Controlled Os-

cillator Fabricated in a CMOS Process,” Digest of Symposium on VLSI Circuits, pp. 100-101, June 2000. [20] D. B. Leeson, “A Simple Model of Feedback Oscillator Noise Spectrum,” Proc.

of IEEE, vol. 54, pp. 329-330, 1966. [21] J. J. Rael, A. A. Abidi, “Physical Processes of Phase Noise in Differential LC

Oscillators,” Proceedings of the Custom Integrated Circuits Conference, pp. 569572, May 2000. [22] T.-P. Liu, “A 6.5 GHz Monolithic CMOS Voltage-Controlled Oscillator,” ISSCC

Digest of Technical Papers, pp. 404-405, Feb. 1999. [23] A. Rofougaran, J. Rael, M. Rofougaran, A. Abidi, “A 900 MHz CMOS LC-

Oscillator with Quadrature Outputs,” ISSCC Digest of Technical Papers, pp. 392-393, Feb. 1996. [24] B. Razavi, “A 1.8 GHz CMOS Voltage-Controlled Oscillator,” ISSCC Digest of

Technical Papers, pp. 388-389, Feb. 1997. [25] C. Lam, B. Razavi, “A 2.6 GHz/5.2 GHz CMOS Voltage-Controlled Oscillator,”

ISSCC Digest of Technical Papers, pp. 402-403, Feb. 1999. [26] J. J. Kim, B. Kim, “A Low-Phase-Noise CMOS LC Oscillator with a Ring

Structure,” ISSCC Digest of Technical Papers, pp. 430-431, Feb. 2000. [27] T.-P. Liu, “1.5-V 10-12.5 GHz Integrated CMOS Oscillators,” Digest of Sym-

posium on VLSI Circuits, pp. 55-56, June 1999. [28] B. Kleveland, C. H. Diaz, D. Dieter, L. Madden, T. H. Lee, S. S. Wong, “Mono-

lithic CMOS Distributed Amplifier and Oscillator,” ISSCC Digest of Technical Papers, pp. 70-71, Feb. 1999. [29] H. Wu, A. Hajimiri, “A 10 GHz CMOS Distributed Voltage Controlled Oscil-

lator,” Proceedings of the Custom Integrated Circuits Conference, pp. 581-584, May 2000.

REFERENCES

121

[30] J. A. McNeill, “Jitter in Ring Oscillators,” IEEE Journal of Solid-State Circuits,

vol. 32, pp. 870-879, June 1997. [31] C. A. Sharpe, “A 3-State Phase Detector Can Improve Your Next PLL Design,”

EDN, pp. 55-59, Sept. 1976. [32] C. Hogge, “A Self-Correcting Clock Recovery Circuit,” IEEE Journal of Light-

wave Technology, vol. LT-3, pp.1312-1314, December 1985. [33] J. D. H. Alexander, “Clock Recovery from Random Binary Data,” Electronics

Letters, vol. 11, pp. 541-542, Oct. 1975. [34] A. Pottbacker, U. Langmann, H. U. Schreiber, “A Si Bipolar Phase and Fre-

quency Detector IC for Clock Extraction up to 8 Gb/s,” IEEE Journal of SolidState Circuits, vol. 27, pp. 1747-1751, December 1992. [35] S. B. Anand, B. Razavi, “A 2.5-Gb/s Clock Recovery Circuit for NRZ Data

in CMOS Technology,” Proceedings of the Custom Integrated Circuits Conference, pp. 379-382, May 2000. [36] J. C. Scheytt, G. Hanke, U. Langmann, “A 0.155, 0.622, and 2.488 Gb/s Auto-

matic Bit Rate Selecting Clock and Data Recovery IC for Bit Rate Transparent SDH Systems,” ISSCC Digest of Technical Papers, pp. 348-349, Feb. 1999. [37] G. Gutierrez, S. Kong, B. Coy, “2.488 Gb/s Silicon Bipolar Clock and Data Re-

covery IC for SONET (OC-48),” Proceedings of the Custom Integrated Circuits Conference, pp. 575-578, May 1998. [38] C. F. Schaeffer, “The Zero-Beat Method of Frequency Discrimination,” Proceed-

ings IRE, vol. 30, pp. 365-367, August 1942. [39] F. M. Gardner, “Properties of Frequency Difference Detectors,” IEEE Transca-

tion on Communications, vol. COM-33, pp. 131-138, Feb. 1985. [40] B. Razavi, J. Sung, “A 2.5-Gb/s 15-mW BiCMOS Clock Recovery Circuit,”

Digest of Symposium on VLSI Circuits, pp. 83-84, June 1995. [41] L. M. De Vito, “A Versatile Clock Recovery Architecture and Monolithic Imple-

mentation,” Invited Paper, Monolithic Phase-Locked Loops and Clock Recovery Circuits, Theory and Design, Edited by B. Razavi, IEEE Press, New York 1996. [42] S. B. Anand, B. Razavi, “A 2.75-Gb/s CMOS Clock Recovery Circuit with

Broad Capture Range,” ISSCC Digest of Technical Papers, pp. 214-215, Feb. 2001. [43] J. Savoj, B. Razavi, “A CMOS Interface Circuit for Detection of 1.2 Gb/s RZ

Data,” ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999. [44] M. Wurzer, et al., “40-Gb/s Integrated Clock and Data Recovery Circuit in a

Silicon Bipolar Technology,” Proceedings of the Bipolar/BiCMOS Circuits and Technology Meeting, pp. 136-139, Sept. 1998. [45] M. Rau, et al., “Clock/Data Recovery PLL Using Half-Frequency Clock,” IEEE

Journal of Solid-State Circuits, vol. 32, pp. 1156-1159, July 1997.

122

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

[46] K. Nakamura, et al., “A 6 Gb/s CMOS Phase Detecting DEMUX Module Using

Half-Frequency Clock,” Digest of Symposium on VLSI Circuits, pp. 196-197, June 1998. [47] E. Mullner, “A 20 Gbit/s Parallel Phase Detector and Demultiplexer Circuit in

a Production Silicon Bipolar Technology with ” Proceedings of the Bipolar/BiCMOS Circuits and Technology Meeting, pp. 43-45, Sept. 1996. [48] B. Razavi, Y. Ota, R. G. Swarz, “Design Techniques for Low-Voltage High-

Speed Digital Bipolar Circuits,” IEEE Journal of Solid-State Circuits, vol. 29, pp. 332-339, March 1994. [49] J. Savoj, B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit with

Frequency Detection,” ISSCC Digest of Technical Papers, pp. 78-79, Feb. 2001. [50] A. Zolfaghari, A. Chan, and B. Razavi, “Stacked Inductors and 1-to-2 Trans-

formers in CMOS Technology,” Proceedings of the Customs Integrated Circuits Conference, pp. 345-348, May 2000. [51] Y. M. Greshishchev, et al, “A Fully Integrated SiGe Receiver IC for 10-Gb/s

Data Rate,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 1949-1957, Dec. 2000. [52] T. O. Anderson, W. J. Hurd, and W. C. Lindsey, “U.S. pat. no. 3,626,298;

Transition Tracking Bit Synchronization System,” Dec. 1971. [53] G. Stix, “The Triumph of the Light,” Scientific American, Jan. 2001.

Index

Acquisition time, 26 Aided acquisition, 53 Amplifier, 67 common-gate, 67 common-source, 67 low-noise, 67 wideband, 67 BER, 92 Buffer, 73 CDR, 11, 22, 77 full-rate, 27 half-rate, 27, 78, 97 open-loop, 22 phase-locking, 23 speed, 24 CMOS process, 6, 92, 115 Capture range, 26, 92 Charge pump, 24, 87, 97, 106 Cherry-Hooper, 16 Clock-multiplying unit, 8 DETFF, 103 Data NRZ, 5, 45 RZ, 61 binary, 45 regeneration, 78 spectrum, 45 transition density, 85 Decision circuit, 58 Demultiplexer, 71 Deserialize, 5 Duty cycle, 29 ESD, 110 Edge detection, 5, 23 Encoding 64B/66B, 5 8B/10B, 5 FIFO, 8 Feedback shunt-shunt, 14

Fiber optics dispersion, 4 multi-mode, 3 single-mode, 3 transceiver, 7 Filter band-pass, 23 low-pass, 24, 97 matched, 62 selectivity, 23 Framer, 6 Frequency detector, 52, 97 Pottbacker, 57 referenced, 53 referenceless, 53, 55 rotational, 57 Frequency division, 9 self-resonance, 100 Gilbert amplifier, 16 Hot-carrier effect, 90 IC, 5 ISI, 19, 21, 24, 61 Inductive peaking, 17 Inductor, 18 Q, 18 distributed-model, 100 integrated, 18 stacked, 69, 100 Injection locking, 9 Integrate-and-dump, 63, 69 Integrator, 64 Interface, 61 electronic, 2 optical system, 4 Interleaving, 64 Internet backbone, 1 Jitter, 25, 32, 69, 77, 110 RMS, 90

124 analyzer, 90 bimodal, 29, 80 generation, 25, 96 pattern-dependent, 103 peak-to-peak, 90 thermal, 32 tolerance, 26 transfer, 25, 88, 91 Laser diode, 11 Latch, 84 current-steering, 7 Limiter, 16 Logic current-steering, 87 Loss Displacement, 32 Eddy, 32 substrate, 33 Mapper, 6 Multiplexer, 7, 85 Noise, 22 input-referred, 14, 68 switching, 36 thermal, 35, 108 Optical amplifier, 4 carrier, 2 communications, 2 point-to-point network, 3 ring network, 2 PLL, 5, 24, 53 bandwidth, 25 characterization, 108 dual-loop, 54 filter, 88 jitter, 41 Phase detector, 42, 83 Alexander, 49 Hogge, 46

HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS Pottbacker, 51 bang-bang, 50 binary, 45, 48 characteristic, 86 gain, 86 linear, 45, 83 pattern dependency, 85 triwave, 47 Phase/frequency detector, 43, 97, 102 Photo detector, 11 Power, 26 Quadricorrelator, 55 SNR, 21, 63 SONET, 2 Serialize, 5 Silicon bipolar, 6 Supply scaling, 27 TIA, 11, 13 common-gate, 13 noise, 14 Transceivers, 2 VCO, 8, 24, 29, 77, 80, 97–98 LC, 32, 95 coarse control, 82 distributed, 41 fine control, 82 gain, 82 half-quadrature, 102 multi-phase, 36, 98 phase noise, 34, 110 quadrature, 36 ring, 30, 80, 95 sensitivity, 82 tuning, 32, 110 Varactor, 34 Wave-division multiplexing, 2, 7 XOR, 23, 43, 85 symmetric, 85