Handbook Of Digital CMOS Technology, Circuits, And Systems 3030371948, 9783030371944, 9783030371951

This book provides a comprehensive reference for everything that has to do with digital circuits. The author focuses equ

3,023 623 35MB

English Pages 653 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Handbook Of Digital CMOS Technology, Circuits, And Systems
 3030371948,  9783030371944,  9783030371951

Table of contents :
There’s Something About Electronics......Page 6
Analog Circuits......Page 7
Digital Circuits......Page 8
Mixed-Signal Circuits......Page 9
It’s an Abstract Art......Page 10
What is This All About?......Page 12
How to Use This Book......Page 17
Assumptions, Simplifications, Accuracy, and Managing to Do Anything......Page 19
The Story as It Is Told......Page 20
Contents......Page 23
1.1 The Band Model......Page 27
1.2 Intrinsic Silicon......Page 32
1.3 Band Model with Doping......Page 36
1.4 Extrinsic Silicon......Page 40
1.5 Drift......Page 43
1.6 Diffusion......Page 47
1.7 Forming a Homojunction......Page 49
1.8 PN Junction in Equilibrium......Page 52
1.9 Junction Capacitance......Page 55
1.10 Forward and Reverse Bias......Page 57
1.11 Minority Carrier Injection......Page 59
1.12 Forward-Biased PN Junction Current......Page 61
1.13 Bipolar Junction Transistor......Page 67
1.14 Materials Interfaces......Page 75
1.15 MOS Capacitor Preliminaries......Page 81
1.16 Modes of the MOS Capacitor......Page 84
1.17 MOS Capacitor Characteristics......Page 91
1.18 MOSFET Linear Regime......Page 93
1.19 MOSFET Saturation Regime......Page 94
1.20 Body Effect......Page 100
1.21 Channel Length Modulation......Page 103
2.1 PMOS......Page 106
2.2 Regions of the MOSFET......Page 107
2.4 Abandoning BJT......Page 110
2.5 Scaling MOSFET......Page 113
2.6 What is a Logic Family......Page 115
2.7 Resistive Load Inverter......Page 118
2.8 Open-Circuited Transistor......Page 120
2.9 Enhancement Load Inverter......Page 121
2.10 Enhancement Load VTC......Page 122
2.11 Static Power......Page 123
2.12 NAND and NOR Enhancement Load......Page 124
2.13 Random Logic in Enhancement Load......Page 128
2.14 Depletion Load Logic......Page 129
2.15 Pseudo-NMOS Logic......Page 131
2.16 Limitations of Ratioed Logic......Page 132
3.1 Basics of the CMOS Inverter......Page 135
3.2 CMOS VTC......Page 136
3.3 Preliminaries of Delay......Page 140
3.4 MOS Capacitance and Resistance......Page 144
3.5 Simplified Delay Model......Page 147
3.6 Non-static Power......Page 149
3.7 CMOS NAND and NOR......Page 152
3.8 CMOS Complex Logic......Page 154
3.9 Sizing, Delay, and Area......Page 159
3.10 Supply and Width Scaling......Page 165
3.11 Limitations of CMOS......Page 166
4.2 Sizing an Inverter Chain......Page 168
4.3 Gates Versus Inverters: Preliminaries......Page 171
4.4 Normalizing Gate Intrinsic Delay......Page 172
4.5 Normalizing Gate External Delay......Page 173
4.6 Architecture, Inputs, and Effort......Page 175
4.7 Optimal Sizing in a Logic Chain......Page 176
4.8 Logical Effort for Multiple Inputs......Page 179
5.1 High-Impedance Nodes......Page 180
5.2 Dynamic CMOS and Why it is Great......Page 181
5.4 Leakage in Dynamic Logic......Page 185
5.5 Charge Sharing......Page 190
5.6 Cascading Dynamic Logic......Page 194
5.7 Logical Effort in Dynamic Gates......Page 198
6.1 Sequential Versus Combinational......Page 200
6.2 Latches, Registers, and Timing......Page 203
6.3 The Static Register......Page 204
6.4 Dynamic Registers......Page 210
6.5 Imperfect Clocks and Hold-Time......Page 211
6.6 Pipelines, Critical Path, and Slack......Page 215
6.7 Managing Power in a Pipeline......Page 222
6.8 Examples on Pipelining......Page 228
6.9 Impact of Variations......Page 236
7.1 Setting and Location......Page 239
7.2 Photolithography Iteration......Page 241
7.3 Account of Materials......Page 243
7.4 Wafer Fabrication......Page 245
7.5 Operations and Equipment......Page 248
7.6 Locos......Page 259
7.7 Advanced Issues in CMOS Processing......Page 269
7.8 Account of Layers......Page 293
8.1 What Is a Layout......Page 296
8.2 Stick Diagrams......Page 297
8.3 Standard Cells......Page 301
8.4 Design Rules: Foundations......Page 309
8.5 Design Rules—Sample......Page 313
8.6 Fixed-Point Simulation......Page 319
8.7 Physical Design......Page 324
8.8 FPGAs......Page 333
9.1 Design Philosophy......Page 338
9.3 IEEE Library and std_logic......Page 340
9.4 Types, Attributes, and Operators......Page 343
9.6 Structural Connections......Page 348
9.7 Generics and Constants......Page 356
9.9 The Process Statement......Page 363
9.10 Signals and Variables......Page 366
9.11 Selection in a Process......Page 368
9.12 Latches and Implicit Latches......Page 369
9.13 Registers and Pipelines......Page 378
9.14 Memories......Page 386
9.15 Counters......Page 390
9.16 State Machines......Page 394
9.17 Testbenches—Preliminaries......Page 400
9.18 Functions and Procedures......Page 402
9.19 Wait, Assertions, and Loops......Page 408
9.20 File I/Os......Page 415
9.21 Packages and Configurations......Page 422
9.22 Good Design Practices......Page 426
10.1 Steep Retrograde Body Effect......Page 431
10.2 Velocity Saturation......Page 432
10.3 MOSFET Leakage......Page 436
10.4 DIBL......Page 442
10.5 MOSFET Structures for DIBL......Page 447
10.6 Miscellaneous Scaling Effects......Page 450
10.7 Impacts on CMOS......Page 455
11.1 Binary Addition and Full Adders......Page 459
11.2 Ripple Carry Adder......Page 461
11.3 Generate—Propagate Logic......Page 462
11.4 Carry-Save and Bypass Adders......Page 465
11.5 Lookahead Addition......Page 469
11.6 Group Generates and Propagates......Page 471
11.7 Parallel Prefix Adders......Page 473
11.8 Binary Multiplication......Page 476
11.9 Array Multipliers......Page 477
11.10 Wallace and DADDA Multipliers......Page 479
11.11 Booth Multiplication......Page 484
12.1 Architectures and Definitions......Page 490
12.2 NOR ROM Arrays......Page 493
12.3 NAND ROM Arrays......Page 498
12.4 NVMs......Page 501
12.5 SRAM Cell......Page 511
12.6 Sense Amplifiers......Page 515
12.7 SRAM Timing......Page 519
12.8 DRAM Cells......Page 523
12.9 Decoders and Buffers......Page 529
13.1 Basics......Page 538
13.2 Lumped C Wires......Page 541
13.3 Silicon Wires......Page 543
13.4 Scaling Wires......Page 544
13.5 Interchip Communication......Page 546
13.6 Supply and Ground......Page 552
13.7 Clock Networks......Page 555
13.8 Metastability......Page 559
13.9 Synchronization......Page 565
14.1 Fundamentals of Testing......Page 570
14.2 Logical Hazards......Page 578
14.3 Stuck-at Fault Model......Page 586
14.4 Scan Paths......Page 590
14.5 Built in Self-test......Page 595
14.6 IC Packaging and Boundary Scan......Page 599
14.7 Testing Memories......Page 605
14.8 Reliability......Page 608
Glossary......Page 611
Index......Page 651

Citation preview

Karim Abbas

Handbook of Digital CMOS Technology, Circuits, and Systems

Handbook of Digital CMOS Technology, Circuits, and Systems

Karim Abbas

Handbook of Digital CMOS Technology, Circuits, and Systems

123

Karim Abbas Cairo University Giza, Egypt

ISBN 978-3-030-37194-4 ISBN 978-3-030-37195-1 https://doi.org/10.1007/978-3-030-37195-1

(eBook)

© Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To To To To

mom, for teaching me how to want to learn, dad, for teaching me to be who I want to be, Sehar, for teaching me it’s all worth it, Sophia and Julia, for being Sophia and Julia.

Preface

Digital integrated circuits are incredible. They include more components than most continents have people; and each of these components can be smaller than all bacteria, most viruses, and some molecules. They are also ubiquitous, they are everywhere, doing everything. It is not an exaggeration to say that semiconductors define how we live. One would expect such an incredible outcome to come from an incredible story. These complicated systems must have a rigorously disciplined design approach backing them up. And to an extent this is true, at least on the surface. Understanding an integrated circuit requires understanding how its components behave at multiple levels of abstraction. We must understand how the semiconductor behaves. And this requires a foray into some weird physics. We must understand how the circuits work, and this requires us to adopt a paradigm shift in how our brains work. We also have to understand how the system functions at a higher level, using more abstract algorithmic approaches to solve problems that circuits and devices cannot. It is all very neat. It is hierarchical. We can go from the top-down or from the bottom-up. And there are always tools to aid us at every level of the design and plenty of space to trace back, detect problems, and solve them. But the true story is much messier. Things can and do go wrong. Spectacularly wrong. And the exciting part of the story is not when things work seamlessly. The truly fascinating part is how to wade through a mess of seemingly unworkable problems, and how to draw conclusions from scant evidence. The most amazing thing about microchips is that they actually manage to work in the first place.

There’s Something About Electronics There is something strange and unique about electronics. Innovation is part of what makes us human. Discovering something new that affects the way we live and how our history proceeds have been a regular part of our story. But some discoveries are like no other. Some are paradigm shifts that cause everything to change. And electronics is one such pivotal moment. But perhaps it was never a specific moment, it was more of a gradual but ever-escalating process. Electronics had relatively humble beginnings. The earliest computers were important and impressive, but they were also bulky, cumbersome, and expensive. And thus, they were more of a specialty item used for military or large-scale business applications. It took decades of consumer electronics, personal computers, wireless platforms, and the Internet for the true revolution to occur. And it is still ongoing. Electronics are integral to two emerging trends with the potential to change the future: embedded systems and artificial intelligence. Electronic systems have had a very reliable trajectory of improvement. They keep getting smaller, faster, cheaper, and more mobile. The pace of change is steady, but if you examine its path over a few decades, the change is dramatic. From the inception of integrated circuits in the 70s to the edges of CMOS scalability, transistor channel sizes have dropped nearly four vii

viii

orders of magnitude. Speeds have grown nearly in pace, and cost has continued to decrease. This has allowed us to create ever more complex systems with better economics. There are two secrets behind this steady path. First, electronics are self-perpetuating. Better electronics make better processors. Better and more powerful processors allow us to design, characterize, fabricate, and test better electronics. Which in turn allow us to build even better processors, completing the virtuous cycle. Second, electronics have always been tightly knit with economics. Moore’s law, which predicted the pace of development of electronics was originally as much a study into economic viability as it was a study of technology. There has always been keen awareness that it is not simply enough to be able to make something, but that you must be able to make it at a cost that allows you to sell it. This kept advances in electronics relevant, as it linked them to the market. But things are not that easy. Semiconductor fabrication is a complicated process. We have to fabricate features that are smaller than some molecules, and we have to fabricate them with incredible precision. Semiconductor fabrication requires mastery of and advances in disparate fields like organic and inorganic chemistry, polishing, metal coating, optics, welding, and photography. As systems grew larger and more complicated, it became impossible to manage the complexity manually. Computer-aided design became an integral part of electronic design. Optimization techniques are necessary to manage the many interrelated parts of an integrated circuit, and ever more complex approaches are necessary to properly verify and test said designs. And as devices grew smaller, we started to see weird physics. Things did not happen the way we projected them to happen. Power did not drop the way we expected it, the growth of circuit speed started to slow down, and general oddities and reliability issues started to predominate. The path was not as easy as Moore charted it. The problems of digital circuit design sound insurmountable. It can be daunting to contemplate the scale and size of designs, the size and behavior of components, and the disparate fields of science that all need to advance in tandem. But this is what we will do in this book.

The Signals We Give Electronic circuits are categorized by the nature of the signals they process. This book addresses digital integrated circuits. To understand what distinguishes digital circuits, and what makes them great, we have to contrast them with nondigital signals and systems.

Analog Circuits These are circuits that deal with analog signals, that is, signals whose values are continuous. The information carried in the signal may be in its amplitude, frequency, or phase. But in all cases, this information exists on a continuous scale. Analog signals have incredible precision because they allow a theoretically infinite number of points to represent data. However, analog signals are fundamentally limited by their susceptibility to noise. As shown in Fig. 1, if an analog signal is exposed to noise, this noise will inextricably change the data that the signal carries. This puts a damper on the aforementioned incredible precision of analog signals. But does noise have to be “added” to analog signals? Analog circuits are circuits that process analog signals. Noise is a fact of life for all circuits, regardless of their category. Noise is a fundamental physical phenomenon that occurs whenever charged particles exist at temperatures above absolute zero. Thus analog circuits inevitably add noise to any input analog signal. A good analog circuit will add less noise, but it will still add noise.

Preface

Preface

ix

Fig. 1 Analog signal with added noise. The noise cannot be removed from the signal once it is added

Fig. 2 When an analog signal passes through analog circuits its SNR inevitably deteriorates. Good analog circuits deteriorate the SNR slower, but the addition of noise is inevitable

Noise on a signal is characterized using the signal-to-noise ratio or SNR. SNR is the ratio of signal power to noise power in the total signal. It is usually measured in dB with the signal power typically being decades higher than the noise power. As shown in Fig. 2, an analog chain will inevitably lead to the deterioration of SNR. Each resistive component in any electronic circuit will add noise. Any amplifier will amplify noise as much as it amplifies the signal, and then it will add its own noise on top. Thus, all analog blocks will add noise to a signal. And signal-to-noise ratio will always deteriorate through any analog chain, and this deterioration is irreversible once it happens.

Digital Circuits Digital signals take discrete values, as opposed to the continuous scale on which analog signals exist. Digital circuits are circuits that process digital signals. The absolute majority of digital circuits deal with binary logic. That is to say, they are circuits which assume a clean input can only be one of two electrical values. These electrical values correspond to logic “0” and logic “1”. Digital circuits are still circuits. They still contain resistances with energetic charge carriers. They still generate noise. Even if the digital circuit itself is not noisy, there are many sources of interference around a typical digital system that couple to it. Thus, the amplitude and phase of a digital signal are as corrupted by noise as an analog signal. But this does not really matter. It is a bit misleading to say that digital circuits accept only two electrical values as inputs. Digital circuits expect only two electrical values as clean digital

x

Preface

Fig. 3 Digital signals do not have to have a specific electrical value. In fact, every logic value has a corresponding range of acceptable electrical values. This range of values is a breathing space through which noise can infect the signals without affecting it

Fig. 4 When a noisy digital signal travels through digital circuits, the circuits clean it up, removing the noise and restoring the nominal logic values

values corresponding to “0” and “1”. A digital circuit will accept an entire range of electrical values corresponding to each of the logic values. This is the concept of noise margin and it is part of what sets digital circuits apart. Figure 3 shows how a digital circuit will view a signal at its input. The axis represents electrical values. As shown, the circuit does not consider only a single value to represent “0” or “1”, there is instead a range for each. Thus, there is a margin for the input to the current stage within which it can change and still be considered “correct”. This is a margin for noise that can be added to the clean signal, or in other words, a noise margin. But why can digital signals accept a wide range of electrical values and still see them as valid logic inputs? This ties in with the second advantage of digital circuits: the regenerative property. This is illustrated in Fig. 4. When a digital circuit observes a noisy signal at its input, it will accept it, and will produce cleaner output. In effect, the circuit will remove some of the noise. If the digital chain is long enough, it can effectively remove all the noise in the signal. Does this mean that digital circuits are impervious to noise? If we reexamine Fig. 3 we will find that if enough noise is added to the signal, it will push it beyond the range of acceptable values. For example, if excessive noise is added to a clean logic “0” then it will exceed the “highest value of ‘0’” point. The regenerative property will still kick in and clean up this value, but it may misinterpret it as a logic “1” instead of a logic “0”.

Mixed-Signal Circuits Most consumer electronics are mounted on platforms that end up communicating wirelessly. As we will shortly discuss, this means that these systems must combine analog and digital circuits. These two signal domains must communicate with each other. This requires a transformation of signals from analog to digital and vice versa. The design of analog to digital and digital to analog converters is called mixed-signal design. Mixed-signal design requires familiarity with both analog and digital signals and

Preface

xi

systems. But it also requires a thorough understanding of the behavior of signals as they get transformed across domains, in other words, a strong foundation in digital signal processing is necessary.

It’s an Abstract Art So which category of electronics is more important? Which is more ubiquitous? Which is more “powerful”? Historically, analog circuits were once used to do very rudimentary and unscalable signal processing in the form of analog computers. Analog computers are now relics. The age of computers is closely associated with digital circuits. When we talk about electronics we are, most of the time, talking about digital electronics. Even the pop conception of electronics in various media is restricted to digital electronics. And there are many good reasons for this. The above may insinuate that analog circuits are obsolete and unimportant. And this is absolutely false. Analog circuits continue to be essential and will remain so for very fundamental reasons. The majority of modern electronic platforms use wireless communications. When the signal goes to the antenna it is, by definition, an analog signal. This signal might come from and go to a large and complicated core of digital electronics, but it is both originally and eventually analog. This necessitates a long and very complicated (particularly at the receiver) chain of analog circuits to transmit or receive the signal. It also requires the use of mixed-signal circuits to convert signals. Analog electronic circuits are as challenging to design as digital circuits, if not more so. While they typically contain far fewer transistors, they are significantly more dependent on the characteristics of the device. Thus when a new technology is introduced, digital circuits typically emerge first as they are easier to characterize and design, while analog circuits have to wait while analog designers work their magic. But in terms of the sheer volume of products, versatility, uses, scale, and diversity, there is no denying that digital circuits reign supreme. Digital circuits are so robust and easy to work with that any function that can be done in the digital domain, will be, even if an analog alternative is readily available. A cell phone, for example, is a complicated system with multiple subsystems that may be integrated on a single chip or may be distributed across multiple chips. A modern smartphone has a very powerful and energy-efficient processor. The processor runs the operating system, as well as application-level software. It also contains a radio for data and voice communication. The radio contains many subsystems on multiple layers. For example, it contains a medium access control processor to arbitrate how the phone accesses the shared environment. It also contains data compression tools, channel encoders and decoders, modulators and demodulators, as well as multiple signal conditioning circuits. So which of the above functions are performed by digital circuits? The answer is all. While the majority of digital circuits are processors, the bulk of a transceiver is also digital. Application-specific circuits are also all digital. This makes the absolute majority of electronics, by sheer volume, digital. This book covers the bulk of electronic circuits: digital integrated circuits. That is, digital circuits which are designed to be fabricated on a single crystal of silicon. Digital integrated circuits range in complexity from very simple circuits with a few dozen transistors to complicated systems on a chip with billions of components. All digital circuits are fabricated on silicon, which is a versatile semiconductor with excellent electrical and mechanical properties. This book develops a full understanding of how such highly complicated systems are designed, and how they function. This is no easy undertaking, and it requires an understanding of many seemingly unrelated concepts. One way we can approach such a difficult problem is to try and model it at different levels of abstraction. A lower level of abstraction allows for a more detailed understanding of the

xii

Preface

behavior of the circuit, but it would be impossible to use such a low level to model a very large system. Thus, the low levels are used to create models which are then used to develop a higher level understanding of the system. The book covers all levels of abstraction, with a particular focus on how these levels interact with each other and where it would be more reasonable to use one or the other. Figure 5 shows the four levels of abstraction we will use in this book. They are discussed in detail below with a focus on the model produced from each and how it is used at higher levels. The abstraction levels from the lowest to the highest are as follows:

Fig. 5 Levels of abstraction. Every design can be seen at the four levels, but at different points it makes sense to abstract the design up to a certain level

Preface

xiii

1. Physics. At this level, we try to model how charge carriers behave. The focus is on material properties and the different behaviors of conductors, insulators, and semiconductors. We also focus on how different types of materials behave when they are brought in contact with each other. When we bring different types of materials together, we form microelectronic devices. And when we understand how charge carriers behave in such devices, we can develop simplified current–voltage equations that describe the terminal behavior of said devices. 2. Circuits. When many devices are brought together to form a circuit, it becomes too cumbersome, and not very productive, to try to keep track of the physics of individual devices. However, assuming all devices are isolated from each other, and given we have already developed current–voltage equations for the devices, we can move to the circuit level of abstraction. At this level, the models obtained from the physics level are used to solve multi-device circuits. We obtain currents and terminal voltages, we can also calculate delay, power dissipation, and rough estimates for the area of the circuit. Once the electrical behavior of the circuit has been determined, we can simplify the static (steady state) behavior of digital circuits into logic truth tables. 3. Logic. At this level, we use the truth tables obtained from the circuit level to model logic gates. This allows us to understand the logic behavior of much larger circuits without worrying about their electrical properties. 4. System. If multiple logic circuits are combined, they form a large system. The system has millions of logic circuits and billions of transistors. However, because we have characterized each at a lower level of abstraction, we can use hierarchical design to manage the incredible complexity of the system. Note that the levels of abstraction are not arranged in any order of favorability. None of them is “better” than the other. We need to understand the design at all levels at different times. For example, some problems that appear at the system level must be addressed at the system level, otherwise they would be too complicated to solve. On the other hand, some problems that manifest at the system level will require us to understand what is happening at the logic or circuit or even physics level.

What is This All About? Table 1 lists the 14 chapters of this book, marking the levels of abstraction that each chapter delves into. The table shows that most chapters deal with multiple levels of abstraction. This is on purpose, because we always want to stress that these levels of abstraction are intertwined and derived from each other. Note also that the latter chapters of the book tend to cover higher levels of abstraction. This follows the bottom-up approach of the book where we develop a thorough understanding at a lower level, then use it to understand higher levels. It also goes with one of the major themes of the book: managing complexity. Digital circuits are extremely complicated and a lot can and will go wrong with them. One of our major concerns will be how to manage this complexity so that things do not get out of hand. Chapter 1 starts by discussing the physics of semiconductors. We discover that the main reason semiconductors are interesting lies in the sensitivity of their electrical conductivity to impurities. By the careful insertion of impurities from groups 13 and 15, the properties of silicon change substantially. We develop a semiclassical band model to understand the quantitative behavior of semiconductors. Hence, we develop the concept of a hole and contrast n-type and p-type silicon. With quantitative models at hand, we proceed to understand the two charge transport mechanisms: drift and diffusion. When we stick n-type silicon to p-type silicon, we find ourselves dealing with the most basic of electronic devices: the PN junction. To understand the PN junction, we introduce

xiv Table 1 Levels of abstraction in different chapters

Preface Chapter

Physics

1. Devices

X

Circuits

2. Ratioed logic

X

3. CMOS

X

4. Logical effort

X

5. Dynamic logic

X

6. Pipelines X

8. Design flow

X

9. HDL

Systems

X

X

7. CMOS process

10. Scaling

Logic

X

X

X

X

X

X

X

X

X

X

11. Arithmetic 12. Memories

X

X

X

13. Wires

X

X

X

14. Design for testability

X

methods to look at electrochemical equilibrium when two different materials are brought into contact. The PN junction allows us to introduce some of the most important concepts in electronic devices. For example, the formation and modulation of charge-depleted regions, nonlinear IV characteristics, and the development of device models divided by regions of operation. The PN junction also allows us to study the simplest type of transistor, the Bipolar Junction Transistor. In the BJT, we observe the first example of transistor action: a three-terminal device where the current between two terminals is controlled by the voltage on a third terminal. We notice that transistors resemble switches. However, the BJT has issues that limit its use in digital circuits, but only by Chap. 2 would it become clear why. Almost half of Chap. 1 is dedicated to the MOSFET. The device is studied in detail in its vertical structure (the MOS capacitor), as well as the horizontal channel. We develop a model for channel formation, charge distribution, and current in an ohmic MOSFET. We also deduce conditions under which the MOSFET current saturates and develop an understanding of its most important secondary effects. By the end of Chap. 1, we conclude that electronic devices operate in regions of operation described by different IV equations. In Chap. 2, we introduce the concept of a “logic family” as an approach to building logic gates. We recognize the inverter as the foundational gate of any family. Then proceed to understand the fanout limitation of BJT logic, after which the BJT is abandoned in favor of MOSFET. MOSFET is used to create logic families based on the driver–load architecture. We try out different active loads with the hope of finding a perfect logic family. To define what “perfect” would look like, the concept of voltage transfer characteristics (VTC) is introduced as a representation of the static behavior of the inverter. By observing the VTC, we define a few clearly important voltage values: the values that represent “clean” logic outputs, and the values that represent the boundaries of logic “0” and “1” in Fig. 2. We also derive expressions for power dissipation. Transistor count and aspect ratio are used as a back of the envelope indication of area, and we encounter the first design decision: how to choose aspect ratios to control logic values. By extending the drivers of inverters, we design the first complex logic gates. This introduces the concept of a pull-down network (PDN). MOSFETs in parallel and series are found to be equivalent to a single transistor, whose aspect ratio is some combination of the

X

X

Preface

xv

individual transistor’s aspect ratios. This allows us to design bigger circuits, understanding their outputs in response to different inputs. By the end of Chap. 2, we will give up on trying to find an “ideal” inverter. We discover that the driver–load architecture makes it impossible to get rail-to-rail clean logic values. We also discover that static power dissipation is inevitable with all families that use the driver– load structure. These families will be collected under the umbrella term: ratioed logic. In Chap. 3, we introduce an inverter (and family) that is apparently ideal. The CMOS inverter has rail-to-rail logic, its VTC looks as close to that of an ideal inverter as it gets, and it does not seem to dissipate static power. However, to build CMOS gates, we find that we have to abandon the idea of a driver and a load. Instead, we develop complementary pull-up and pull-down networks. We have to use PMOS transistors as extensively as we use NMOS transistors. But other than that, there does not seem to be any major drawback for CMOS. However, things are not as good as they seem. By digging deeper, we discover that CMOS is a great but flawed logic family. Understanding its flaws is necessary for proper design. And to develop this understanding we begin by deriving expressions for capacitive loads at the outputs of CMOS gates. CMOS gates are heavily loaded by the gate capacitance of subsequent gates. They are also self-loaded by their own drain diffusion capacitance. This capacitive load is charged and discharged when the inverter switches its state from “0” to “1”. Because this charging happens through the nonzero resistance of on MOSFETs, the process by necessity takes time. Because CMOS gates are loaded by both pull-up and pull-down transistors, they have a worse delay performance than ratioed logic. We also find that while switching CMOS gates will always dissipate some amount of power called dynamic power and that this power can be substantial. However, the attraction of CMOS is just too strong. In an effort to control its delay, we try to resize transistors to control the output time-constant. However, this attempt to reduce a gate’s resistance will also increase the input capacitance of the gate. Thus, trying to size a CMOS gate in isolation is a futile effort. Proper sizing has to be in a complete chain. In Chap. 4, we pick up on the queue given in Chap. 3 and develop a model for delay optimization in a chain of logic. We begin by modeling the sizing of a chain of inverters for the optimal drive of a large capacitive load. The results of this problem will be used later to design power and clock distribution grids. The inverter chain design problem forms the basis of a much broader concept, the concept of logical effort. We develop techniques to calculate electrical and logical effort for arbitrary logic gates. Logical and electrical effort are normalized measures of the complexity of a gate, representing the extra effort it takes to drive loads regardless of its sizing. Logical effort is used to optimize the distribution of loads in a generalized logic chain, leading to optimal delay. CMOS gates have a lot of advantages, but their high capacitive loading leads to limited delay performance. In Chap. 5, we introduce the dynamic CMOS logic family. Dynamic CMOS seems to preserve all the advantages of CMOS while also getting rid of its dismal delay. However, dynamic gates require storage on high impedance nodes. This imposes severe signal integrity issues. This chapter introduces very important concepts like leakage and charge sharing. These are phenomena that are present in all MOSFET circuits, but they only have an effect in the presence of high impedance nodes. Dynamic circuits are used as high-speed alternatives for the remainder of the book, particularly in arithmetic, memories, and latching. Chapter 6 introduces sequential circuits. Latches and registers are distinguished from each other. We introduce important concepts in the timing of registers such as setup-time, hold-time, and clock to Q delay.

xvi

This will be our first encounter with storage elements. Positive feedback is introduced as the principal static storage mechanism, while storage on a capacitor is introduced as the primary dynamic storage mechanism. Registers are combined with combinational logic blocks to form synchronous pipelines. While clocks are introduced in Chap. 5, they are only properly understood in Chap. 6. The primary timing parameter of a synchronous pipeline is the clock period. While calculating the clock period, we are introduced to the concept of the critical path. This leads to a series of important definitions and concepts including setup-time violations, hold-time violations, and slack. Phenomena in the timing of synchronous pipelines form the highest level understanding at the circuit level. These definitions are used widely in later chapters that cover system-level concepts. To solidify an understanding of the design of synchronous pipelines, Chap. 6 ends with three involved design examples. Chapter 7 takes a step back from the circuit level and asks the question: How are the circuits described in Chaps. 2 through 6 manufactured. The photolithographic CMOS process is introduced in detail with a focus on how highly miniaturized transistors can be fabricated. The chapter discusses all operations of the CMOS process in detail. This includes important steps such as oxidation, ion implantation, and vapor deposition. But we also discuss oft-ignored operations such as chemical mechanical polishing and mask alignment. An understanding of many of these operations is necessary to justify design decisions made in latter chapters. Part of the chapter discusses special issues that can only be understood at such a low level. Especially important is latchup, a phenomenon that puts great limitations on the fabrication of CMOS circuits. The LOCOS process is used to illustrate a simple CMOS flow. Each step in the process is performed using a photomask. We gather the collection of masks used to fabricate a wafer, giving each a different color code. The collection of masks overlaid over each other is our first encounter with a layout. By the end of Chap. 7, we discover that the layout is the end of the design process and the beginning of manufacturing. In Chap. 8 we consider the design flow, with the understanding that the layout is its final outcome. Stick diagrams are used as an intermediate step between layouts and schematic diagrams. We develop simple rules for heuristically drawing stick diagrams, but also introduce the Euler path technique as an example of systematic stick diagram minimization. We discuss how standard cells allow regular ASIC design. Each step of the design flow is considered with a focus on the library and how its components are used. We also conclude that the design flow is iterative. The outputs of a flow include many reports as well as the final layout. The design rule set and design rule checks are discussed as necessary for obtaining a good yield. We introduce a sample scalable design rule set. We also discuss layout versus schematic as the ultimate check on circuit functionality before fabrication starts. Taking a step upward from Chap. 8, in Chap. 9 we discuss hardware description languages. HDL or a similar high-level description is the main input to the ASIC or FPGA design flow. We use VHDL as an illustrative example of HDLs. While we do consider syntax throughout the chapter, the focus is on good design practices. VHDL is deceptively easy to read and write. This can tempt some designers to write VHDL as they would write a high-level programming language. The chapter aims at establishing a connection between the written code and the resulting hardware. This is done by introducing a library of synthesizable constructs that the designer can reliably use. We also discuss general practices that make VHDL code readable, predictable, and portable. The chapter also introduces two miscellaneous but very important concepts: latch-based pipelines and state machines. Latches are generally undesirable, and for the majority of the

Preface

Preface

xvii

chapter we discuss them as unwanted accidental constructs. However, we discuss how slack borrowing can be used by highly seasoned designers to construct some of the fastest pipelines possible. State machines are the brains of most digital circuits. State machines can be sprawling and confusing. However, we introduce a three-process approach to state machines that makes their design and coding manageable and prevents the creation of unpredictable hardware. While the MOSFET model developed in Chap. 1 is valid for very long channel lengths; when the transistor scales down, most parts of the model crumble. Chapter 10 makes very fundamental changes to our understanding of the MOSFET by discussing various scaling effects. The assumption of a linear field to velocity relationship is the first assumption that breaks down when channel length drops. The increased fields in short channels lead to a very different current–voltage relationship in the ohmic region. But more critically, it leads to a different saturation mechanism where the current stops increasing because velocity stops increasing, not because the channel pinches off. We also consider a lot of strange phenomena that happen as the gate oxide gets thinner. This particularly includes the possibility that electrons would move through the oxide, leading for example to the hot carrier effect. A large portion of the chapter is dedicated to covering leakage. Leakage is any current that flows in a MOSFET other than the on drift current between the drain and source. While the terms leakage and subthreshold current are sometimes used interchangeably, we distinguish the two. We develop an understanding of the dilemma of oxide thickness in terms of the tradeoff between subthreshold conduction and gate tunneling. We focus on the problem of the gate losing control over the channel, and the drain acquiring some control through drain-induced barrier lowering. This leads to the introduction of exotic, but now dominant, MOSFET structures such as the FinFET. By the end of the chapter, we discuss how the circuit design techniques and results developed in earlier chapters are impacted by the scaling of MOSFET. We find that while many of the concepts remain unchanged, there are subtle but fundamental shifts to be aware of. We also develop a taste for the physical phenomenon that will form the ultimate limits for CMOS scaling. Chapter 11 starts taking a higher level look at digital systems. Our focus in the chapter is on designing arithmetic circuits. Digital systems consist of controllers (state machines), storage (memories), and processing units. The most complicated processing units consist of arithmetic datapaths. We introduce the concepts of binary long addition and multiplication. While the operations seem simple enough and are very similar to their decimal counterparts, we soon discover that the scaling behavior of arithmetic delay has some fundamental limits. We discover some interesting facts about adders, primary among them is the critical nature of carry calculation. This leads us to change the way we see addition, leading to the design of some of the fastest adders possible. We conclude that multipliers are most likely to be the critical paths of digital datapaths. We design naive parallel multipliers, but quickly abandon them for some advanced concepts like Booth encoding or Wallace/DADDA tree multipliers. By this point in the book, we have developed a good understanding of how data paths are designed. But we would be in the dark about large-scale storage. Chapter 12 covers the topic of memories, which are very dense and fast storage structures. Memories introduce very different design challenges from random logic. This leads to some strange design decisions such as extremely long silicon wires, transistors with inaccessible gates, or signals that do not perform a full logic swing. We discuss different memory types and different array architectures with an emphasis on how and why each type of memory is used. SRAM is used as a versatile illustrative example of self-timing in memories. The challenges

xviii

and oddities of DRAM design and operation are discussed in detail. Attention is given to nonvolatile memories due to their ubiquitous use and custom-made devices. The chapter considers accessory circuits to be as important as the memory core. A section is dedicated to sense amplifiers, and another is dedicated to decoder and buffer design. Decoder design, in particular, is an elegant and challenging problem that can be solved by using the logical effort technique. Memory arrays bring to focus one aspect of design that we had ignored until that point: Wires. Chapter 13 covers the topic of CMOS wires and how they impact circuits and systems. We consider how metal and silicon wires are modeled in combination with circuits. This leads to the introduction of multiple time-constant circuits, and the concept of Elmore delay. But we also consider wires on a system level. In other words, how systematic wires are distributed through the chip, and how wires are used to allow the chip to communicate with the outside world. This leads to our first encounter with pins and pin conditioning circuitry. We also consider power distribution networks and ground and power bounce. Clock distribution networks are also studied in detail because they offer their own challenges and requirements. Skew and jitter are considered as particular design challenges. Various clock distribution networks are contrasted in terms of their performance and versatility. The chapter also covers communication across clock domains. This brings in the concept of metastability, which we spend considerable time defining and understanding. We solve the problem of metastable sampling by using synchronizers and asynchronous FIFOs. Chapter 14 is the highest level of abstraction in the book. It covers a topic that at first seems tangential: testing and design for testability. By the end of the chapter, it should be clear that testing is far from tangential: it is at the core of making electronics usable. The chapter introduces important concepts like yield, defects, faults, fault coverage, and defect level. This is our first and only encounter with the commercial and economic aspects of chip design and fabrication. The bulk of the chapter is spent on understanding why making sure that a chip works is challenging. Test design, optimization, self-testing, controllability, and observability are some of the concepts discussed in detail. But the chapter also covers chips as finished products, considering how they are packaged and assembled. This leads us to printed circuit boards, IC package types, and practical issues of pin access and testability for finished multicomponent systems. This culminates in an introduction to the boundary scan technique and JTAG. The chapter introduces metrics to measure test results and reliability. It also introduces systematic methods to add circuitry that performs testing. We consider memory testing as a separate topic due to its peculiar nature.

How to Use This Book The best way to read this book is in order. It tells a story. It starts at a very low level with electrons moving through a crystal and ends with packaged chips tested in a commercial facility before shipping to customers. The story builds up and makes more sense if read in the sequence it is written. There are a lot of interdependencies in the book. Most of them are backward looking. For example, understanding how pipelines are timed in Chap. 6 is very difficult without understanding how combinational delay is calculated in Chap. 3. Modifying one’s view of how a MOSFET behaves in Chap. 10 requires a foundational view to modify, which is introduced in Chap. 1. Some chapters also mention forward-looking references. These are mostly not dependencies; they are mentions of application. In other words, when a chapter mentions a later chapter, the information mentioned is not necessary for understanding. It is simply a mention of an example of use. For example, in Chap. 8, we mention Chap. 12 for examples of the use of

Preface

Preface Table 2 Concepts introduced, Chaps. 1 through 10

Table 3 Concepts introduced, Chaps. 11 through 14

xix Chapter

Main concepts introduced

1. Devices

Electron, hole, n-type, p-type, drift, diffusion, PN junction, depletion region, BJT, regions of operation, MOS capacitor, MOSFET, channel length modulation, body effect

2. Ratioed logic

Logic family, diode-connected transistor, load and driver, Voh, Vol, Voltage Transfer Characteristic, Vil, Vih, noise margins, ratioed logic, static power dissipation, Pull-Down Network, equivalent aspect ratio

3. CMOS

Ratioless logic, gate (external) capacitance, drain (self) capacitance, transistor equivalent resistance, time-constant, delay model, propagation delay, dynamic power dissipation, Pull-Up Network

4. Logical effort

Intrinsic delay, fanout, total fanout, logical effort, electrical effort, total effort, chain effort, optimal fanout

5. Dynamic logic

High impedance node, low impedance node, signal integrity, clock, precharge, evaluate, charge sharing

6. Pipelines

Latches, registers, master and slave, setup-time, hold-time, critical path, clock period, slack, setup-time violation, hold-time violation, clock overlap

7. CMOS process

Photolithography, PVD, CVD, oxidation, CMP, ion implantation, mask, LOCOS, latchup, SOI, layers, self-alignment

8. Design flow

Layout, stick diagrams, design flow, standard cell, synthesis, place and route, library, design rules, design rule checks, LVS, FPGA

9. HDL

Concurrent and sequential, synthesizable constructs, good coding practices, state machine, latch-based design, slack borrowing, test benches, non-synthesizable constructs

10. Scaling

Velocity saturation, velocity overshoot, hot carrier effect, leakage, tunneling, subthreshold conduction, narrow channel effects, drain-induced barrier lowering, FinFET

Chapter

Concepts introduced

11. Arithmetic

Binary addition, binary multiplication, ripple carry adder, carry-save, carry-bypass, Manchester carry chain, generate-delete-propagate logic, lookahead addition, parallel prefix adder, array multipliers, Booth encoding, Wallace trees, DADDA multipliers

12. Memories

NAND and NOR ROM, pseudo-NMOS and precharged loads, SRAM, NVM, FLOTOX, FLASH, sense amplifier, row decoder, drive buffer, column decoder, self-timing

13. Wires

Wire delay, RC model, Elmore delay, ground and supply bounce, pin pad, ESD protection, level conversion, skew, jitter, clock grid, clock tree, hybrid clock network, metastability, synchronizer, asynchronous FIFO

14. Testing

Defect, fault, error, test vector, observation, gold standard, fault coverage, yield, static hazards, dynamic hazards, stuck at fault model, SCAN technique, Boundary Scan Technique, flip-chip technology, DIP, surface mount, BGA, PCB, PCB design flow, multilayer PCB, memory walks, NPSF, reliability, failure rate, burn-in, useful life, MTBF

nonmetal wires for routing. This is not necessary to understand routing in Chap. 8 but is an interesting exception that is worth mentioning. The book can also be partitioned and used to teach shorter courses on digital design. The partitioning can be done in any way the instructor sees fit. Below, is one nonunique suggestion for dividing the book into four, roughly equal-length courses: Chapters 1 and 10 (Electronic devices) This course will cover the basics of semiconductor physics and devices. It also introduces a lot of short channel effects and develops a MOSFET model more suitable for advanced technologies. Chapters 2–6 (Digital circuit design) This is an entirely circuits focused course. It considers various digital logic families and strategies for sizing and controlling delay, area, and power. It culminates in how synchronous pipelines are designed and characterized.

xx

Chapters 7–9 (Digital VLSI) This course focuses on design and fabrication. It follows a bottom-up approach. First introducing the fabrication process in detail, then discussing how the design flow leads to the mask sets used in fabrication. HDL is discussed in detail as a practical first step in the design flow. Chapters 11–14 (Advanced topics in digital integrated circuits) Miscellaneous topics in digital design. Although most of the topics can be understood independently, a full background in a digital design course is needed to appreciate their importance.

Assumptions, Simplifications, Accuracy, and Managing to Do Anything Digital integrated circuits are complicated. They contain a daunting number of components. Each component is extremely small. The devices have challenging physics. And everything becomes much harder to deal with when brought together. Throughout this introduction, we have considered levels of abstraction, hierarchical design, and bottom-up learning to be the answer to this challenge. And it is. However, there is a very important caveat. Abstraction involves assumptions and simplifications. And assumptions and simplifications by necessity mean that the derived performance deviates from the actual performance. And this is inevitable, otherwise we would learn nothing. In Chap. 14, we will talk about finished chips and assembled systems. We will assume that anything that can go wrong with these chips can be represented in a truth table. But if we go back to Chaps. 7 and 8, we discover that things that can go wrong are actually way too complicated for a mere truth table to handle. In Chaps. 7 and 8 we use sample libraries, example processes, and design rules that are all significantly simpler than reality. And everything that these chapters use to build their libraries is based on a delay model developed in Chap. 3. But as Chap. 3 itself admits; the delay model is a gross oversimplification. But even if we used a more accurate delay model, the transistor model used in Chap. 3 is much simpler than the understanding developed in Chaps. 1 and 10 for how an electronic device behaves. At every step in the book, we make more assumptions. And the assumptions are built on assumptions. When we do not make assumptions, we use oversimplified illustrative examples. So is this whole exercise futile? Would the final understanding we develop reflect reality or would all the inherited simplifications lead to a skewed idea of how everything works. The answer is that it is a balancing game. Digital circuit design is very tightly associated with computer-aided design tools. In fact, the majority of Chap. 8 covers how the tool flow works. And all of Chap. 9 considers how the designer should talk to the automated design flow. This is necessary when dealing with large designs and complex devices in a typical integrated circuit. But Chaps. 6–9 will clearly show that a designer who is not fully aware of what each step in the automated flow does will never be able to produce a working design. So the designer must be “aware” and this awareness must be at a fundamental level. It is important that we understand the limitations of the device and delay models in Chap. 3. But if we refuse to move on, we would not be able to appreciate how a pipeline works in Chap. 6. Which in turn would stop us from understanding how the design flow works in Chap. 8, how to design properly in Chap. 9, or how to speed up critical circuits in Chap. 11. Thus, we must be aware of the assumptions we make. We must even appreciate that some of them are dramatic oversimplifications. This allows us to appreciate the level of complexity that design tools have to work with. Which will sometimes help us discover problems while debugging designs. However, if we do not accept that we have to make these assumptions and move on, we would never develop the level of understanding to even start talking to the design tools.

Preface

Preface

xxi

The Formula of the Book Each chapter is divided into a number of sections. Neither chapters nor sections are of equal size. However, both chapters and sections have organic unity. While, as stated earlier, dependencies are inevitable; each chapter can stand on its own. Each chapter presents a unity of topic and a focus on a certain area that makes it easy to read standalone, or to refer back to it for a specific piece of information. Sections also stand on their own, but in a different way. Each section presents a single idea. The sections have a definite flow within the chapter and must be read in sequence. But each section covers a single idea, and only a single idea. This allows sections to remain focused. Each section starts with a list of objectives. This is simply a list of things that the section intends to do. Sections will differ substantially in their length and complexity. However, the magnitude of the idea that section presents is more or less the same.

The Story as It Is Told Most digital integrated circuits are made using MOSFETs. To understand MOSFETs, we have to endure a paradigm shift into a semi-quantum model for solid-state physics. This allows us to understand how the MOSFET operates and why its current flows the way it does. While this might sometimes seem tedious, it is nevertheless necessary to understand almost any behavior of larger circuits, especially ones that use smaller devices. The simplest logic gates are made using an architecture that that uses a driver and a load to create a potential divider that leads to inversion. These gates have their specialty uses, and their delay can be impressive. However, they have some of the worst noise margins and static voltage transfer characteristics possible. Better inverters are made using CMOS. In CMOS noise margins are maximized, static power is nullified, and VTC properties are almost identical to the ideal inverter. At least at face value. A deeper understanding of CMOS indicates fundamental limits on delay due to large self- and external loading. We are also faced with a serious tradeoff when we try to size an entire chain of logic. Faster logic gates are usually made using dynamic CMOS. Dynamic CMOS has great delay performance and area. However, signal integrity issues mean dynamic circuits, in general, cannot keep signal values the same way static nodes can. This is an issue that affects all digital circuits, be they combinational, sequential, or even large-scale storage in the form of memories. Most digital circuits are synchronous pipelines that combine combinational logic with registers. Timing in a pipeline is both very systematic and challenging. Performance is dominated by the critical path delay, which requires the longest clock period. Paths other than the critical path finish with “slack”. Paths that finish with negative slacks lead to setup-time violations and cause the circuit to fail unless operated at a lower frequency. Clock overlaps necessitate holding data at the inputs of registers. Paths that fail to hold data long enough lead to hold-time violations, a serious failure mode for the circuit. Knowing how to identify and solve setup and hold-time is the most important aspect of circuit-level design for digital integrated circuits. A CMOS circuit is fabricated using a process called photolithography. Photolithography is at once a very simple and elegant flow, and one that can have significant complications. Photolithography involves exposing a silicon wafer through a set of photomasks, which are used to draw a pattern on a material called photoresist, using light. Thus, the name photolithography. The masks for photolithography are obtained from an automated design flow. The designer provides a higher level description of the circuit they desire to create. Intervening tools then map this description into a combined layout for the entire chip. Photomasks for lithography

xxii

Preface

can then be derived directly from the layout. While the process is highly automated, it requires the user to be aware of what is happening at each step. At the level of complexity of VLSI, things can and will continuously go wrong. And when things go wrong, nobody can fix them except for someone aware of every level of the design, from circuit-level concepts such as the critical path down to how gates are drawn in layout. The high-level description that the designer provides as an input to the design flow is itself a complicated affair. The description is usually written in a hardware description language (HDL). HDLs are very powerful but can also lead to a lot of confusion. They at once have a syntax similar to programming languages and have behaviors that in no way parallel programming languages. Thus, writing effective HDL requires a lot of discipline and a set of good coding principles that constrain the designer even when the tool does not. Writing good HDL requires that the designer be fully cognizant of what it is that they are trying to design. Most digital circuits are synchronous pipelines that contain combinational circuits sandwiched between registers. And for the overwhelming majority of applications, said combinational circuits perform arithmetic. Thus, to be able to design intelligently, we have to be aware of how arithmetic can be done in hardware. Addition is the big deal. Additions are the most prevalent arithmetic operations in any algorithm. But more importantly, any complicated arithmetic operation is ultimately performed in hardware using additions. Hardware implementation of arithmetic is deceptively simple. The circuits look very analogous to long addition and long multiplication, and for good reason: they directly implement long addition and long multiplication. But the problem is not about doing maths, the problem is how to do it efficiently. We find that our main problem is that the performance of our adders and multipliers deteriorates quickly when we try to use them with operands that contain a larger number of bits. This requires insight that combines hardware and algorithms to produce more efficient adders and multipliers. This is a very exciting area, where we observe patterns in how numbers behave, then exploit these patterns to increase efficiency. In highly scaled technologies, arithmetic combinational cores can be very fast. Often, this causes functional units to be much faster than we need them to be. Consider the circuit in the left sub-figure of Fig. 6, for example. We are using three functional units. FU1 and FU2 act in parallel on external inputs A through D. FU3 then accepts the outputs of FU1 and FU2 as an input. But because of improving technology, it is very likely that we find this circuit operating much faster than the application actually demands. Thus, in the right sub-figure of Fig. 6 only a single FU is used, producing a third of the throughput in the left sub-figure. This approach is called time-sharing, and it is extremely common in modern digital circuits. In this case, three consecutive cycles are used to produce the final result Z. Because inputs have to be time-multiplexed, the approach is called time-sharing.

Fig. 6 Time-shared architecture. FU1 and FU2 act in parallel, while FU3 combines their outputs. To the right, a single FU performs all three functions. In the first cycle, A and B are combined to produce X. In the second cycle, C and D are combined to produce Y. In the third cycle, X and Y are combined to produce Z

Preface

xxiii

One side effect of time-sharing is the need for a lot of storage. The inputs of the functional unit have to be multiplexed across different cycles, and a lot of intermediate results have to be stored. This promotes the need for dense and very fast mass storage devices. Mass storage is the domain of memories. Memories are very different from standard CMOS. While the same fundamentals apply, the design goals are very different. Memories incentivize density. Speed is also of prime importance. On the other hand, every cell in a memory is identical, reducing the complexity of their design relative to random logic. Another aspect that affects highly scaled digital circuits is the increased role played by seemingly auxiliary components in determining performance. We might expect logic gates, and maybe memories, to determine the performance of our chip. However, we end up observing that signal wires, supply and ground, I/O pins, and even the network used to distribute the clock can play an even larger role. Designing digital circuits is very complicated. There are multiple levels of abstraction, and multiple levels of things that can go wrong. And things do go wrong all the time. Thus assuming that we will always get working chips if we followed the design flow properly is naive. In fact, even commercially mature operations occasionally produce a defective chip. It is reasonable to expect that a system that contains billions of extremely small components will occasionally fail to work. It is even more reasonable to assume that even if all chips initially work, some will break down with continued use. Thus, digital design is not about producing faultless perfect chips. It is about producing incredibly complicated, fundamentally fallible, but ultimately economically viable systems. This certainly requires people with a deep understanding of everything that can and will go wrong. And this is the ultimate aim of this book: Understanding what can go wrong and what to do about it, rather than what happens when everything happens perfectly.

Contents

1

Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 The Band Model . . . . . . . . . . . . . . . 1.2 Intrinsic Silicon . . . . . . . . . . . . . . . . 1.3 Band Model with Doping . . . . . . . . . 1.4 Extrinsic Silicon . . . . . . . . . . . . . . . . 1.5 Drift . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Diffusion . . . . . . . . . . . . . . . . . . . . . 1.7 Forming a Homojunction . . . . . . . . . 1.8 PN Junction in Equilibrium . . . . . . . . 1.9 Junction Capacitance . . . . . . . . . . . . 1.10 Forward and Reverse Bias . . . . . . . . 1.11 Minority Carrier Injection . . . . . . . . . 1.12 Forward-Biased PN Junction Current . 1.13 Bipolar Junction Transistor . . . . . . . . 1.14 Materials Interfaces . . . . . . . . . . . . . . 1.15 MOS Capacitor Preliminaries . . . . . . 1.16 Modes of the MOS Capacitor . . . . . . 1.17 MOS Capacitor Characteristics . . . . . 1.18 MOSFET Linear Regime . . . . . . . . . 1.19 MOSFET Saturation Regime . . . . . . . 1.20 Body Effect . . . . . . . . . . . . . . . . . . . 1.21 Channel Length Modulation . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

1 1 6 10 14 17 21 23 26 29 31 33 35 41 49 55 58 65 67 68 74 77

2

Ratioed Logic . . . . . . . . . . . . . . . . . . . . . . . 2.1 PMOS . . . . . . . . . . . . . . . . . . . . . . . 2.2 Regions of the MOSFET . . . . . . . . . 2.3 BJT Logic . . . . . . . . . . . . . . . . . . . . 2.4 Abandoning BJT . . . . . . . . . . . . . . . 2.5 Scaling MOSFET . . . . . . . . . . . . . . . 2.6 What is a Logic Family . . . . . . . . . . 2.7 Resistive Load Inverter . . . . . . . . . . . 2.8 Open-Circuited Transistor . . . . . . . . . 2.9 Enhancement Load Inverter . . . . . . . . 2.10 Enhancement Load VTC . . . . . . . . . . 2.11 Static Power . . . . . . . . . . . . . . . . . . . 2.12 NAND and NOR Enhancement Load 2.13 Random Logic in Enhancement Load 2.14 Depletion Load Logic . . . . . . . . . . . . 2.15 Pseudo-NMOS Logic . . . . . . . . . . . . 2.16 Limitations of Ratioed Logic . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

81 81 82 85 85 88 90 93 95 96 97 98 99 103 104 106 107

xxv

xxvi

Contents

3

CMOS . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Basics of the CMOS Inverter . . . 3.2 CMOS VTC . . . . . . . . . . . . . . . . 3.3 Preliminaries of Delay . . . . . . . . 3.4 MOS Capacitance and Resistance 3.5 Simplified Delay Model . . . . . . . 3.6 Non-static Power . . . . . . . . . . . . 3.7 CMOS NAND and NOR . . . . . . 3.8 CMOS Complex Logic . . . . . . . . 3.9 Sizing, Delay, and Area . . . . . . . 3.10 Supply and Width Scaling . . . . . . 3.11 Limitations of CMOS . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

111 111 112 116 120 123 125 128 130 135 141 142

4

Logical Effort . . . . . . . . . . . . . . . . . . . . . . . 4.1 Sizing in a Chain . . . . . . . . . . . . . . . 4.2 Sizing an Inverter Chain . . . . . . . . . . 4.3 Gates Versus Inverters: Preliminaries . 4.4 Normalizing Gate Intrinsic Delay . . . 4.5 Normalizing Gate External Delay . . . 4.6 Architecture, Inputs, and Effort . . . . . 4.7 Optimal Sizing in a Logic Chain . . . . 4.8 Logical Effort for Multiple Inputs . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

145 145 145 148 149 150 152 153 156

5

Dynamic Logic . . . . . . . . . . . . . . . . . . . . . . 5.1 High-Impedance Nodes . . . . . . . . . . . 5.2 Dynamic CMOS and Why it is Great 5.3 Delay, Period, and Duty Cycle . . . . . 5.4 Leakage in Dynamic Logic . . . . . . . . 5.5 Charge Sharing . . . . . . . . . . . . . . . . 5.6 Cascading Dynamic Logic . . . . . . . . 5.7 Logical Effort in Dynamic Gates . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

157 157 158 162 162 167 171 175

6

Pipelines . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Sequential Versus Combinational 6.2 Latches, Registers, and Timing . . 6.3 The Static Register . . . . . . . . . . . 6.4 Dynamic Registers . . . . . . . . . . . 6.5 Imperfect Clocks and Hold-Time . 6.6 Pipelines, Critical Path, and Slack 6.7 Managing Power in a Pipeline . . . 6.8 Examples on Pipelining . . . . . . . 6.9 Impact of Variations . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

177 177 180 181 187 188 192 199 205 213

7

CMOS Process . . . . . . . . . . . . . . 7.1 Setting and Location . . . . . 7.2 Photolithography Iteration . 7.3 Account of Materials . . . . 7.4 Wafer Fabrication . . . . . . . 7.5 Operations and Equipment 7.6 Locos . . . . . . . . . . . . . . . 7.7 Advanced Issues in CMOS 7.8 Account of Layers . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . Processing . .........

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

217 217 219 221 223 226 237 247 271

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Contents

xxvii

8

Design 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8

Flow . . . . . . . . . . . . . . . . What Is a Layout . . . . . . . Stick Diagrams . . . . . . . . . Standard Cells . . . . . . . . . Design Rules: Foundations Design Rules—Sample . . . Fixed-Point Simulation . . . Physical Design . . . . . . . . FPGAs . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

275 275 276 280 288 292 298 303 312

9

HDL . 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 9.13 9.14 9.15 9.16 9.17 9.18 9.19 9.20 9.21 9.22

...................... Design Philosophy . . . . . . . The Entity . . . . . . . . . . . . . IEEE Library and std_logic .

. . . .

. . . .

. . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

317 317 319 319 322 327 327 335 342 342 345 347 348 357 365 369 373 379 381 387 394 401 405

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

411 411 412 416 422 427 430 435

11 Arithmetic . . . . . . . . . . . . . . . . . . . . . . 11.1 Binary Addition and Full Adders . 11.2 Ripple Carry Adder . . . . . . . . . . 11.3 Generate—Propagate Logic . . . . . 11.4 Carry-Save and Bypass Adders . . 11.5 Lookahead Addition . . . . . . . . . . 11.6 Group Generates and Propagates . 11.7 Parallel Prefix Adders . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

439 439 441 442 445 449 451 453

Types, Attributes, and Operators Architecture . . . . . . . . . . . . . . . Structural Connections . . . . . . . Generics and Constants . . . . . . . Multiplexing and Choice . . . . . . The Process Statement . . . . . . . Signals and Variables . . . . . . . . Selection in a Process . . . . . . . . Latches and Implicit Latches . . . Registers and Pipelines . . . . . . . Memories . . . . . . . . . . . . . . . . . Counters . . . . . . . . . . . . . . . . . State Machines . . . . . . . . . . . . . Testbenches—Preliminaries . . . . Functions and Procedures . . . . . Wait, Assertions, and Loops . . . File I/Os . . . . . . . . . . . . . . . . . Packages and Configurations . . . Good Design Practices . . . . . . .

10 Scaling . . . . . . . . . . . . . . . . . . . . . . . 10.1 Steep Retrograde Body Effect . 10.2 Velocity Saturation . . . . . . . . . 10.3 MOSFET Leakage . . . . . . . . . 10.4 DIBL . . . . . . . . . . . . . . . . . . . 10.5 MOSFET Structures for DIBL 10.6 Miscellaneous Scaling Effects . 10.7 Impacts on CMOS . . . . . . . . .

. . . . . . . .

xxviii

Contents

11.8 11.9 11.10 11.11

Binary Multiplication . . . . . . . . . Array Multipliers . . . . . . . . . . . . Wallace and DADDA Multipliers Booth Multiplication . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

456 457 459 464

12 Memories . . . . . . . . . . . . . . . . . . . . 12.1 Architectures and Definitions 12.2 NOR ROM Arrays . . . . . . . . 12.3 NAND ROM Arrays . . . . . . 12.4 NVMs . . . . . . . . . . . . . . . . . 12.5 SRAM Cell . . . . . . . . . . . . . 12.6 Sense Amplifiers . . . . . . . . . 12.7 SRAM Timing . . . . . . . . . . . 12.8 DRAM Cells . . . . . . . . . . . . 12.9 Decoders and Buffers . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

471 471 474 479 482 492 496 500 504 510

13 Wires 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

519 519 522 524 525 527 533 536 540 546

14 Testing . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Fundamentals of Testing . . . . . . . 14.2 Logical Hazards . . . . . . . . . . . . . 14.3 Stuck-at Fault Model . . . . . . . . . 14.4 Scan Paths . . . . . . . . . . . . . . . . . 14.5 Built in Self-test . . . . . . . . . . . . . 14.6 IC Packaging and Boundary Scan 14.7 Testing Memories . . . . . . . . . . . . 14.8 Reliability . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

551 551 559 567 571 576 580 586 589

and Clocks . . . . . . . . . . . Basics . . . . . . . . . . . . . . Lumped C Wires . . . . . . Silicon Wires . . . . . . . . . Scaling Wires . . . . . . . . . Interchip Communication Supply and Ground . . . . Clock Networks . . . . . . . Metastability . . . . . . . . . Synchronization . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

1

Devices

1.1

The Band Model

1. Understand the difference between a classical and a semi-modern model for an atom 2. Understand how Pauli’s exclusion principle creates bands 3. Realize why we measure energy in eV 4. Distinguish semiconductors, insulators, and conductors in terms of gaps and bands 5. Understand the concept of a “hole” and hole current 6. Develop intuitive preliminaries for current flow in intrinsic silicon. Figure 1.1 shows a classical model of a hydrogen atom. Atoms consist of a nucleus and orbiting electrons. The nucleus is massive and contains protons and neutrons. Protons have a positive charge; electrons have an equal negative charge. Neutrons are uncharged. All atoms are electrically neutral. Thus, all atoms contain the same number of electrons and protons. Non-isotopic hydrogen is the simplest atom. It contains a nucleus with a single proton orbited by a single electron. The electron has a charge of −q and a mass of 9.1  10−31 kg. The proton has a charge of +q and a mass of 1.67  10−27 kg. Where q is q ¼ 1:6  1019 C The opposite charge of the proton and electron in Fig. 1.1 creates an electrostatic force attracting the two to each other. This keeps the electron bound in a circular orbit around the nucleus. If the hydrogen atom is heated, the electron gains some of the thermal energy. The newly energized electron can then increase the radius at which it orbits the nucleus by an amount commensurate with the gained energy. If the electron gains enough energy, with the word “enough” admittedly being undefined for now, the electron can break from the force binding it to the nucleus. The electron then becomes “free”, no longer orbiting the nucleus.

It leaves behind a single proton. The lone proton is called an ion. An electric field is an effect in an area, where a point charge put in the area will observe a force. Depending on the conditions of the charge, this force may or may not move it. But if the charge is movable, a field will accelerate it. An electron that has been freed from a hydrogen atom has a very small effective mass. Thus, a field will definitely cause it to move. This motion yields electric current if there is a closed loop for the current to flow through. Thus “free” electrons are critically important for the electrical properties of the material. The proton, or more correctly the ion, left behind by the electron is charged. Thus, the field will exert a force on it. However, it is massive, thus the force will barely be able to move it. If the electron were still bound to the atom, then the electric field may or may not have been able to move it. If the force resulting from the field is larger than the force of electrostatic attraction, then the field may be able to break down the hydrogen atom, causing the electron to carry current. The classical model in Fig. 1.1 provides a good but limited insight into the electrical behavior of materials. It may give some qualitative understanding, but particularly for solid materials it fails to provide a quantitative model. A modern model for a hydrogen atom involves solving its Schrodinger wave equation. For solid materials like silicon, the wave equations are made more complicated by the fact that a lot of atoms are brought in close proximity, forming bonds. However, we might be able to develop a simpler model that suits our purposes. This is namely the band model. Modern physics is distinguished from classical physics in three different but interrelated ways: • Quantization: Physical quantities take discrete values rather than continuous values. When applied to the model of the hydrogen atom in Fig. 1.1, the electron can only

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_1

1

2 Fig. 1.1 A classical model for the atom that uses Newtonian principles. A low-mass electron (marked by −q) is bound to a high-mass proton (marked by +q), orbiting it in a circular motion

have discrete values of energy, and thus can only orbit at discrete orbits around the nucleus. Discretization also applies to physical quantities other than energy • Particle–wave duality: Matter with mass displays properties of waves; exhibiting propagation, diffraction, reflection, and interference patterns. Electromagnetic radiation exhibits properties of particles such as momentum and discrete quanta • Uncertainty: We can never be certain about physical quantities. We only know statistics. This is more fundamental than simply having a statistic that describes the behavior of the physical quantity. It is, in fact, impossible to know the physical quantity for certain. The band model is a semiclassical model for atoms. Quantization is taken into consideration, but particle–wave duality is ignored. This allows us to develop accurate models for conductivity, but still use classical field theory to estimate current. We will have to use particle–wave duality in Chap. 10 to explain how very small transistors behave. Figure 1.2 shows the band model of a hydrogen atom. The figure shows the energy of the electron on the vertical axis. This is the only value of concern, and it is strictly quantized. That is to say, the electron can only take discrete values of energy marked by black bars. The horizontal axis is the location in a one-dimensional space. The white spaces between the bars represent energy ranges that are “forbidden”. Thus, an electron can exist in any of the allowed energy levels. If it gains enough energy to jump to a higher allowed energy level, it will go to exactly

1

Devices

that level. If it loses enough energy, it will fall to the next allowed level. The electron cannot jump up or down to any energy value in the forbidden gaps. If an electron gains more than enough energy to jump to an allowed level, it will lose the difference, normally as heat, and rest at the next lowest allowed level. Silicon differs from hydrogen in the fact that it belongs to group 14 and that it is a solid. Because silicon belongs to group 14 it has four electrons in its outer orbital. Because it is a solid, it forms a crystal structure (Chap. 7) where a large number of atoms are brought together in close proximity. A concept of modern physics called “Pauli’s exclusion principle” states that no two electrons in close proximity can occupy the same energy level. If two such electrons are forced together by a chemical bond, their energy levels “split” as shown in Fig. 1.3. If a very large number of atoms are brought together, the splitting reaches such a degree that it is more useful to talk about continuums of allowed levels. A continuum of energy levels is called a band. Silicon atoms contain multiple electrons. Each electron or group of electrons in an isolated silicon atom occupies a specific energy level. Thus, once silicon atoms are brought together, each of these energy levels will form a band. Strictly speaking, Pauli does not prohibit two electrons from having the same energy. It prevents them from occupying the same state. But this distinction requires an understanding of the concept of spin, which is beyond the scope of this book. Thus, a silicon crystal forms multiple energy bands. This is shown in Fig. 1.4. The energy bands are separated by forbidden gaps. Electrons prefer to rest at the lowest possible allowed energy level, so electrons populate lower energy bands. Thus, even if there is some thermal energy in the system, lower energy bands are fuller than higher energy bands. As we observe the energy bands going from the lowest up, we will eventually hit a band that is mostly full followed by a band that is mostly empty. These two bands are, respectively, marked as the valence band and the conduction

Fig. 1.2 Quantum model of the hydrogen atom. The y-axis is energy value. The black bars represent allowed energy values for the hydrogen electron. White spaces are forbidden energy ranges

1.1 The Band Model

Fig. 1.3 Energy level splitting and band formation. The more levels brought together, the closer the spacing. Eventually (bottom), levels form a continuous band

band in Fig. 1.4. These two bands, and particularly the forbidden gap separating them, are critical in determining the electrical properties of the material. At this point, it is useful to examine a classical model of a silicon crystal. While we will immediately abandon this classical model, it will give us a lot of insight into how electrical conduction happens. Figure 1.5 shows a planar view of a silicon crystal. The silicon crystal is a three-dimensional structure (Chap. 7), so the figure is a projection rather than an accurate representation. A silicon atom has four electrons in the outermost energy level. We are only showing these outer electrons. Thus, the blue circles in Fig. 1.5 are not only nuclei but also contain all the internal orbits of electrons. Silicon has an atomic number of 14. Thus, the nucleus has a charge of +14q. The whole atom has an electron charge of −14q. And

Fig. 1.4 Energy bands in a solid, with the two bands of interest and intervening bandgap marked. Lower energy bands are fuller than higher bands. The highest full band and the lowest empty band are the most critical. Bands are separated by forbidden gaps where electrons cannot reside. With the gap separating the valence and conduction bands being most significant

3

thus, the blue circles in Fig. 1.5 have a charge of (14– 14 + 4)q = 4q. In the crystal structure, each silicon atom is surrounded by four other silicon atoms. The silicon atom forms covalent bonds with each of the surrounding four atoms. This causes each silicon atom to perceive eight electrons in its outer shell, which is a stable chemical state, forming a stable crystal. At 0 K all covalent bonds are intact, and all electrons are bounded to the crystal structure. As temperature rises some (not many at room temperature) electrons gain enough energy to break free from their covalent bonds. These electrons become free. They are not free from the crystal; they are free within it. Thus, if an electric field is applied, these electrons will move through the crystal, causing current to flow (Sect. 1.5). As shown in Fig. 1.5, the electrons that become free leave behind an empty position in a covalent bond. This causes this bond to become chemically unstable. The bond really “needs” an electron to fill the empty position. Thus, if an electric field is applied, electrons from other covalent bonds can be swept into this empty position. This moves the empty position to the newly broken bond, which can, in turn, accept other electrons from other bonds. This electron displacement in the covalent bonds also causes current flow. The empty position in Fig. 1.5 is called a hole, and it is an extremely important charge carrier in semiconductors. We can already see two types of “charge carriers”. A charge carrier is a charged particle that can cause current flow. These two are namely electrons and holes. The classical view in Fig. 1.5 gives us a good qualitative understanding of semiconductors; however, a quantitative understanding requires us to step back to the band model.

4

1

Devices

Fig. 1.5 Classical model of a silicon crystal at 0 K (left) and as temperature rises (right), showing the formation of a hole. This is a planar view, not modeling the crystal structure of silicon

In Fig. 1.4, we noticed that the two bands of interest in the silicon crystal are the highest nearly full band and the lowest nearly empty band, these bands are shown in Fig. 1.6. The lower band is called the valence band, the higher band is called the conduction band, and the amount of energy separating the two is called the bandgap energy. The valence band is mostly full because it represents the electrons participating in covalent bonds. At 0 K the valence band is completely full, because all electrons are bound in the covalent bonds. The conduction band represents those

Fig. 1.6 Silicon valence band, conduction band, and bandgap at 0 K. Ec is the lowest level in the conduction band. Ev is the highest level in the valence band. Conduction band is referred to as CB. Valence band is referred to as VB

electrons that have broken free from the covalent bonds. The conduction band is completely empty at 0 K because no electrons are free. As temperature rises, some electrons gain enough energy to jump up from the valence band to the conduction band. The necessary energy to make the jump is equal to at least the bandgap energy. The electrons that make the jump become “conduction electrons”. When we use the word “electron” in electronic devices, we mean an electron that can participate in current flow, in other words a free electron, or strictly speaking an electron in the conduction band. The electron that jumps to the conduction band leaves behind an empty state in the valence band, this empty state is called a “hole”. As shown in Fig. 1.7, an external field applied to the semiconductor will cause the conduction band electron to drift in the opposite direction. This will cause a current to flow in the direction of the electric field. This current is called the electron current. Electrons will only move if there is a free state to move to. This is why it is easy for electrons in the conduction band to move: the conduction band is full of empty states. The electric field also exerts a force on electrons in the valence band. Normally electrons in the valence band do not see any free states, and thus would not budge. But because the conduction electron left behind an empty state, the valence band electrons can move into that free state. The free state is called a hole. When an electron moves within the valence

1.1 The Band Model

Fig. 1.7 Electron–hole formation and drift under a field

band to fill the hole, it will itself leave a hole behind. We can try to keep track of how the valence electrons move against the field in the valence band, but it is much simpler to notice that the empty state itself will move in the same direction as the field. Thus, it is more useful to think of the empty valence state, the hole, as a charge carrier itself. Because the hole moves in the same direction as the electric field, its charge is positive. It represents a lack of an electron, thus the magnitude of its charge is q. The hole flow will cause a current to flow in the same direction as the field, this current is called the hole current. The total current flowing through the semiconductor is the sum of the electron current and the hole current. Notice that the two currents flow in the same direction and are thus added up. In pure silicon, hole current is as important as electron current, although the two are not generally equal. In some doped semiconductors, hole current is significantly more important than electron current, while in others it is negligible. In short, we cannot ignore holes in semiconductors. They cause significant current flow. In some cases,

Fig. 1.8 Conductor with partially filled band (left), conductor with overlapping bands (right)

5

they can be dominant. And some critically important devices will rely exclusively on current carried by holes. The band structure of an insulator is very similar to that of a semiconductor. There are bands, the lower ones are mostly full, and the higher ones are mostly empty. The only difference is that the bandgap energy between the conduction and valence band in insulators is huge. But how high would a bandgap energy have to be before a material is classified as an insulator? To answer the above question, we have to define a useful unit to measure energies in a band structure. We are measuring the energy of an electron, which can be extremely small. Using Joules to measure energy is very unwieldy. From the definition of electric potential, we can define electron Volt or eV as a measure of energy as 1eV ¼ 1:6  1019 J An electronvolt is the amount of energy needed to bring one electron to a potential of 1 V. One can also think of it as simply a scaled version of the Joule. For silicon, the bandgap energy is roughly 1.1 eV. Most semiconductors tend to have bandgap energies around this value. Good insulators have bandgap energies higher than 8 eV, however, some materials with much lower bandgaps are also classified as insulators. To more properly distinguish insulators from semiconductors, we have to examine the sensitivity of the material to impurities. This will be addressed in Sect. 1.4. Conductors are very different from both insulators and semiconductors. We can also distinguish them in terms of the bandgap energy, but the band structure can sometimes look confusing. As shown in Fig. 1.8 there are two ways in which a conductor can form. The first is that the highest filled energy level is only partially filled. The second is when the bandgap between the conduction and valence bands is nonexistent or even negative.

6

1

Both scenarios in Fig. 1.8 lead to the creation of a band where there are a lot of electrons and also a lot of empty states. Current flow requires two things: the presence of electrons to carry the current, and the presence of free states to allow these electrons to move to. Metals have an energy band that is only half full, this band is rich in both electrons and free states. This translates into very high conductivity, and the potential to carry a very large current. The following questions are helpful as preliminaries for developing a quantitative model for conduction in silicon: • Is there a relation between the number of electrons and the number of holes in pure silicon? • What happens to the energy of electrons and holes as they rise through the conduction/valence bands? • What exactly causes higher/lower conductivity? • What is the level of electrons/holes at room temperature? These questions can be answered as follows: • The number of electrons and holes in pure (intrinsic) silicon must be equal. As Fig. 1.7 shows, the formation of an electron would inevitably lead to the creation of a hole in its place. A hole is caused by an electron jumping to the conduction band and leaving behind an empty state. Thus, the two are inextricably linked. The annihilation of an electron would also lead to the disappearance of the hole. Because if an electron drops to the valence band, it will permanently fill the hole, leading to its permanent disappearance. Thus, the number of electrons and holes are necessarily equal in pure silicon. Note that the addition of any impurities would immediately break this balance (Sect. 1.4) • Electron energy increases the higher the electron energy, which sounds redundant, and it is. However, the lower the hole in the valence band, the higher its energy. The former is immediately obvious, after all going up the conduction band by definition means electron energy increases. Electrons prefer to sink to the lowest possible allowed energy value. Thus, in the valence band, electrons by default sink lower in the band. Thus, by default, holes tend to be pushed up in the valence band. As valence electrons become more energetic, they rise in the valence band, pushing the hole lower. Thus, hole energy increases as it sinks lower in the valence band. It is useful to think of electrons as water that prefers to sink lower. Holes can be thought to be analogous to bubbles which by default prefer to float up, requiring additional energy to push them lower • Higher conductivity is the result of the presence of electrons and empty states for such electrons to move through. By default, the valence band is full of electrons

Devices

but poor in empty states. The conduction band is rich in empty states but poor in electrons to move through these states. Thermal energy increases conductivity in a twofold manner: it introduces more electrons into the conduction band, while simultaneously creating empty states in the valence band for electrons to move through • The bandgap energy in silicon is 1.1 eV. At room temperature, thermal energy kT is roughly 25 meV. Thus, thermal energy is significantly lower than the bandgap energy. Thus, intrinsic carrier concentration tends to be very low. In fact, the conductivity of pure silicon at room temperature is dismal. Proper conduction in silicon requires the introduction of impurities.

The hole is a very important abstract concept. We will continuously use it throughout this book. And at higher levels of abstraction, it can be easy to forget that the hole is a virtual particle. But it is always good to be cognizant that it is ultimately electrons that flow whenever current flows in a solid. If electrons flow in the conduction band, we call it electron current. If electrons flow in the valence band, we call it hole current.

1.2

Intrinsic Silicon

Conductors and metals are inextricably tied. In fact, the two terms can be interchangeably used for normal operating temperatures. Metals occupy odd numbered groups in the periodic table. There must be some reason this leads to high conductivity. In Fig. 1.4, an atom forms multiple energy bands. Each band accepts two electrons from each atom. The reason two electrons from the same atom can be accepted without breaking Pauli’s principle is that the two electrons can have opposite “spin”. In metals, the odd number of electrons means that the highest energy band can support two electrons from each atom but only has one electron from each. Thus, this highest occupied band is only “half occupied”. And the metal has a band with plenty of free electrons and plenty of free states for these electrons to move through. This leads to the extremely high conductivity of metals. While the distinction between insulators and semiconductors is still vague, one thing can be stated

1.2 Intrinsic Silicon

7

clearly. In most good insulators, the bandgap energy is so large that electron and hole concentration at room temperature is essentially null. Thus, the conductivity of insulators is nearly zero.

1. 2. 3. 4.

Understand the concept of a unique Fermi level Recognize the Fermi–Dirac function Realize the role of the density of states Derive carrier concentrations in intrinsic silicon at thermal equilibrium 5. Derive the position of intrinsic Fermi level 6. Understand the constancy of hole–electron product 7. Derive the dependence of intrinsic carrier concentration on temperature.

Figure 1.9 shows the band diagram of pure silicon. Ec is the edge of the conduction band, its lowest energy level. Ev is the edge of the valence band, its highest energy level. The bandgap energy Eg is the difference between Ec and Ev, thus Eg = Ec−Ev. Figure 1.9 also shows a new, critically important level marked as Ef. This is called the Fermi level, and it is instrumental in calculating charge carrier concentrations in pure or doped silicon. The Fermi level is the average energy of electrons in the material. Because the population of electrons and holes in pure silicon is equal, the Fermi level lies in the middle of the bandgap. We will shortly prove this and show in the process why it does not lie exactly in the middle. It is important to understand that the Fermi level is a value rather than a real energy level. Because it lies in the bandgap, it is not an allowed energy value for electrons. No electron can lie in the Fermi level even though the average energy of electrons is Ef. For an electron to exist somewhere, there has to be a valid energy level there and there are no valid energy levels in the bandgap. Figure 1.10 shows the Fermi–Dirac probability distribution. It is the probability density function for electron energy. It indicates how likely an electron is to be at a certain energy level. The function’s main parameters are the Fermi level and temperature. The function takes the form: Pn ðEÞ ¼

Fig. 1.10 The Fermi–Dirac probability function. Electrons are much more likely to exist at lower energy levels, with the probability fast dropping toward the conduction band. By definition, the probability is 0.5 at the Fermi level

By definition, the probability of finding an electron at EF is 0.5. This is because EF is the average electrochemical energy in the material. This fact applies whether the material is pure or doped, at 0 K or at extremely high temperatures. The Fermi–Dirac function indicates that as the energy rises toward infinity, the probability that an electron exists tends toward zero. As energy drops toward negative infinity, the probability of finding an electron tends toward unity. Figure 1.11 shows the dependence of the Fermi–Dirac function on temperature. At 0 K, the function takes a block shape with the probability equal to 1 for E < EF and the probability equal to 0 for E > EF. This translates into the valence band (VB) being totally full because there is a certainty that electrons exist in all of its energy levels. The conduction band (CB) is totally empty because the probability that an electron exists in any of its levels is null. As temperature rises, the function starts to become smoother. This increases the tail of the function in the CB, meaning that some electrons will start to exist there. But in all cases, the probability of finding electrons in the VB remains much higher than the CB regardless of temperature.

1 1 þ eðEEF Þ=kT

Fig. 1.9 Fermi level in the bandgap of intrinsic silicon. It lies roughly midway between the edge of the conduction band Ec and the edge of the valence band Ev Fig. 1.11 Dependence of Fermi–Dirac on temperature

8

1

Figure 1.11 and the observations on it follow the logic of electron–hole pair formation. We already know from Sect. 1.1 that the valence band is mostly full of electrons while the conduction band is mostly empty. The function supports this. We also know that as temperature rises, the formation of electron–hole pairs increases. This is also supported by Fig. 1.11. As the temperature increases, the “tail” of the Fermi–Dirac function increases in the conduction band indicating a higher probability of finding an electron there. What is the probability that a hole exists at energy level E? The definition of a hole is a state that does not carry an electron. The existence and nonexistence of electrons are complimentary events, thus the probability of finding a hole can be obtained from the probability of finding an electron as Pp ðEÞ ¼ 1 

1 1þe

EEF kT

¼

e

EEF kT

1þe

EEF kT

¼

For example, from the Fermi–Dirac function, what is the definition of the Fermi level? Fermi level is obviously the energy level at which the probability of finding an electron is 0.5. Does this mean that there is a 50% chance of finding an electron at the Fermi level? No, because the Fermi level exists in the bandgap. In fact, the probability of finding an electron at the Fermi level is zero. Thus, the Fermi–Dirac function is the probability of finding an electron in a state provided that the state exists. The existence of electrons is thus the product of two things: That a state exists, and that the probability of the electron existing allows it. Thus, in an infinitesimally small energy range E + dE, the concentration of electron (n) is the density of states in this range, multiplied by the probability that these states are occupied: nðE þ dEÞ ¼ NCB ðEÞPn ðEÞdE

1 1þe

ðEEF Þ kT

Note that the existence of “electrons” is nearly certain in the valence band and highly improbable in the conduction band. The existence of “holes” is nearly certain in the conduction band, while very rare in the valence band. However, this statement is a little inaccurate. When we use the term “electron” in this chapter, or indeed in the whole book, without qualification we will always mean a conduction band electron. While when we use the word “hole” we will always mean an empty state in the valence band. Empty states in the conduction band are not holes, electrons in the valence band are not “electrons”. The reason will become clear when we consider current-carrying mechanisms. But in qualitative terms, current requires the presence of two things: • A charge carrier • An empty state for the charge carrier to move into The conduction band is full of empty states but poor in electrons. The valence band is poor in empty states but very rich in electrons. At the end of the day, in pure silicon the electron (conduction band) and hole (valence band) currents will be within a range of each other. The overall current in the material will be the summation of the two current components. Now how do we obtain a quantitative measure of the concentration of charge carriers (electrons and holes) in a piece of pure silicon? Concentration is defined as the number of charge carriers per unit volume. Does the Fermi–Dirac function or its complement indicate the presence of an electron or hole? No, the Fermi–Dirac function is a conditional probability, it indicates the probability that an electron will occupy the energy level E given that an available state exists at such a level.

Devices

NCB ðEÞ ¼ Density of states in the CB around level E in cm3

When we say we want to find the electron density in a material, we do not mean the electron density at a certain energy level. Instead, we mean the density of all conductive electrons at all meaningful levels. Thus to obtain n (the concentration of electrons per centimeter cube) we integrate n(E + dE) over the entire conduction band. Z1 n¼

NCB ðEÞPn ðEÞdE EC

The density of states in the conduction band is NCB ðEÞ ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4p ð2me Þ3=2 E  EC 3 h

The density of states function indicates that the material does not have the same concentration of energy levels at every value of energy. In fact, the density of states is 0 at Ec, and increase as we rise above the edge of the conduction band. The only constant of interest in the equation is me . This is the effective mass of electrons. This is not exactly the physical mass of electrons in Sect. 1.1. It is adjusted to account for the fact that it is harder for the electron to move through the silicon crystal than in vacuum. Similarly, we can obtain the concentration of holes as ZEV p¼

NVB ðEÞPp ðEÞdE 1

where: NVB ðEÞ ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4p ð2mh Þ3=2 EV  E 3 h

1.2 Intrinsic Silicon

The density of state functions in the valence and conduction bands are mirror images of each other with the main difference being the effective masses. The effective mass of holes is higher than the effective mass of electrons due to the fact that the valence band is crowded with electrons, making motion more difficult. Figure 1.12 shows a graphical representation of the calculations above. The graphs to the left are the probability density functions for finding an electron and a hole. specifically, the Fermi–Dirac function and its complement. The middle graphs show the density of states in the valence band and the conduction band. The density of states is very low closer to the band edges but increases steadily toward plus and minus infinity. The graphs on the right are the products of the Fermi– Dirac and density of state graphs. They represent the density of carriers in the CB and VB. Electron concentration is low at the edge of the CB because the density of states is low. It rises toward a maximum and then drops precipitously since the Fermi–Dirac function drops for high energy values. A similar pattern is observed for holes. The area under the two rightmost curves in Fig. 1.12 is the integration of the functions, which from the discussion above is the electron concentration (n), and the hole concentration (p). Because we know p = n in intrinsic silicon, we can also conclude that the areas under the two rightmost curves in Fig. 1.12 must be equal.

9

The integrals used to obtain n and p can be evaluated using gamma functions to yield the following results: n ¼ NC eðEC EF Þ=kT p ¼ NV eðEF EV Þ=kT NC and NV are constants independent of the value of energy. They are called the effective density of states. NC is the effective density of states at the edge of the conduction band. It is a number that represents the totality of the density of states in the CB if the entire band was collapsed to a single energy level at EC. A similar definition exists for NV. The two constants can be evaluated in terms of effective mass and temperature only as 

 2pmh kT 3=2 NV ¼ 12 h2   2pme kT 3=2 NC ¼ 12 h2 In intrinsic silicon n = p = ni. The carrier concentrations can be used to confirm the position of the Fermi level: n¼p NC eðEC EF Þ=kT ¼ NV eðEF EV Þ=kT

Fig. 1.12 The probability densities (left), density of states (middle), and carrier concentrations (right) for holes (bottom) and electrons (top)

10

1

ln

  NC EC  EF E F  EV ¼  NV kT kT EF ¼

Devices

Fig. 1.13 Dependence of ni on temperature

  EC þ EV kT NV ln þ 2 2 NC

Contrary to what we have been affirming so far, the Fermi level does not seem to lie exactly in the middle of the bandgap. However, the second term in the expression of Ef is vanishingly small for two reasons: Nc/Nv is small. The difference between the two effective densities is in the difference between the effective masses of electrons and holes The term kT is a very small energy value relative to a semiconductor bandgap, it is approximately 25 meV at room temperature, relative to 1.1 eV for the bandgap Thus, for all practical purposes we can say that for intrinsic silicon: EF 

EC þ EV 2

At this point, it is very instructive to try to find the dependence of intrinsic carrier concentration ni on temperature T. This can be obtained by noticing that the product of n and p is equal to the square of either:

change the properties of the semiconductor. In fact, we will never use intrinsic silicon to build any semiconductor device. When is the approximation of the Fermi level lying in the mid-gap of pure silicon inappropriate? From the derivation of this section we notice that the error term is due to the difference in the effective mass of electrons and holes. However, this term is significantly suppressed by multiplying by kT. However, at very high temperatures, thermal energy can be significant. This magnifies the difference in effective masses. This will cause the Fermi level to migrate down from the mid-gap.

n ¼ p ¼ ni np ¼ n2i n2i n2i

¼ NC NV e

ðEC EV Þ=kT

¼ NC NV e

1.3

Band Model with Doping

ðEC EF Þ=kT ðEF EV Þ=kT

e

¼ NC NV e

Eg =kT

where: Eg ¼ bandgap This gives a very useful expression for intrinsic carrier concentration: pffiffiffiffiffiffiffiffiffiffiffiffi n ¼ p ¼ ni ¼ NC NV eEg =2kT Intrinsic carrier concentration dependency on temperature is shown in Fig. 1.13. The dependence is best shown using a logarithmic relationship: lnðni Þ ¼ 0:5 lnðNC NV Þ  Eg =2kT Intrinsic concentrations in silicon are very small for all practical temperatures. This means that silicon is a very bad conductor with a very high resistance when it exists in its pure form. In the following two sections, we will qualitatively and quantitatively understand how the introduction of a small concentration of impurities could dramatically

1. Understand how impurities affect the band diagram of a crystal 2. Recognize the high probability of ionization for dopants 3. Understand how holes and electrons are greatly imbalanced in doped silicon 4. Conclude that the Fermi level must move from the middle of the bandgap in doped silicon 5. Distinguish the effect of donors and acceptors. Table 1.1 lists selected semiconductors, insulators, and their bandgaps. Silicon and germanium are the most recognizable semiconductors. Germanium has some historical and specialty significance, but most modern microchips are made using silicon. Both are elemental semiconductors, meaning that they consist of a single element. Some compounds from groups 13–15 or even 12–16 have molecules that effectively act as semiconductors. These “compound” semiconductors have some specialty uses, but again silicon remains the dominant semiconductor for integrated circuits. Insulators are distinguished from semiconductors primarily through the bandgap. Bandgap is large for insulators. But finding a specific value for cutoff is very difficult. Values

1.3 Band Model with Doping Table 1.1 List of semiconductors and insulators with bandgaps. Bandgap energy is a good metric for determining the category under which the material should fall, but it is not perfect

11 Material

Group

Bandgap (eV)

Type

Silicon

14

1.1

Elemental semiconductor

Germanium

14

0.67

Elemental semiconductor

Gallium Arsenide

13–15

1.43

Compound semiconductor

Zinc oxide

12–16

3.37

Compound semiconductor

Silicon dioxide

14–16

9

Insulator

Silicon nitride

14–15

5

Insulator

Diamond

14

5.5

Borderline

anywhere between 4 and 6 eV are sometimes used as the cutoff. But in reality, the transition between insulators and semiconductors is not very well-defined. Carbon is a particularly interesting example. It has a borderline bandgap and its conductivity can vary wildly depending on its molecular structure. Graphite, a form of carbon, is a very good conductor. On the other hand, carbon in the form of diamonds is most often considered an insulator. But in some cases, it exhibits semiconductor properties. Thus, bandgap should not be the only metric for distinguishing insulators from semiconductors. As we will shortly see any semiconductor has a choice of suitable dopants. Dopants are impurities that greatly upset the balance of charge carriers in the material, thus changing its conductivity dramatically. Insulators normally have no suitable dopants. Dopants come from the groups to the right and left of the group to which the semiconductor belongs. Thus for 13–15 semiconductors, dopants come from groups 12–16. For single-element semiconductors such as silicon, dopants come from groups 13–15. Dopants from group 15 are called donors (for reasons that will become clear shortly). Dopants from group 13 are called acceptors. Table 1.2 lists silicon donors and acceptors with some interesting properties. Figure 1.14 shows a classical view of what happens when donor atoms are inserted into the silicon crystal at a very low concentration. Some of the silicon atoms are displaced by the donor atoms. The donor atom belongs to group 15, thus it has five electrons in its outer shell. As with silicon atoms in Sect. 1.1, the green circle in the middle of the donor is not only its nucleus, it is also all its internal electron orbits. This whole internal structure has a net charge of +5q, while five electrons orbit in the final shell. Four of the five outer shell electrons of the donor in Fig. 1.14 contribute to covalent bonds with the surrounding four silicon atoms. This causes the four surrounding silicon atoms to become chemically stable but leaves an extra electron in the outer shell of the donor. This extra electron can become free very easily. The question is, when we say “very easily”, what do we mean?

Fig. 1.14 Doping with donors, classical view. The extra electron is readily freed, leaving behind a positively charged ion

In intrinsic silicon, the creation of an electron–hole pair required the valence electron to acquire an amount of energy equal to the bandgap, roughly 1.1 eV. For donor extra electrons to become free, they need a much smaller amount of energy called ionization energy. Table 1.2 shows that the values of ionization energies are much lower than bandgap energy. The significance of ionization energy will only become clear when we develop a band model for doped silicon. Recall that the definition of a hole in the classical sense is an empty position in the covalent bonds. When the donor donates its extra electron, the remaining silicon and donor atom are all chemically stable with four electrons in their covalent bonds. An empty location is not created in Fig. 1.14. The donor creates an electron without creating a hole. There are still electron–hole pairs formed by thermal

12

1

Table 1.2 Donors and acceptors for silicon. The ionization energy is the distance between the dopant level and the nearest silicon band, and it is the most important electrical metric for the dopant

Material

Donor/acceptor

Atomic number

Ionization energy (eV)

Antimony

Donor

51

0.039

Phosphorus

Donor

15

0.045

Arsenic

Donor

33

0.054

Titanium

Donor

22

0.21

Boron

Acceptor

5

0.045

Aluminum

Acceptor

13

0.067

Gallium

Acceptor

31

0.072

Indium

Acceptor

49

0.16

Devices

generation, along the same lines as intrinsic silicon. However, there are certainly excess electrons that escape the donor atoms. This type of semiconductor (where we have doped with donors) is called n-type and is characterized by the fact that the concentration of electrons is higher than the concentration of holes: n[p Why is the energy required to free the electron called ionization energy? Once the extra electron leaves the donor atom in Fig. 1.14, it leaves behind the rest of the donor atom. Since the original donor atom is electrically neutral, then the removal of the extra electron with a charge of −q leaves the rest of the donor atom with a net charge +q. Thus, the donor atom becomes a positive ion. What is the difference between an ion and charge carriers (electrons and holes)? Ions contain inner shell electrons, bonded outer shell electrons, and the nucleus. Thus, although the ion is charged, it is extremely heavy relative to electrons and holes. An electric field will cause electrons and holes to move, causing current flow, while ions will never move. A solid where only ions exist will thus not normally conduct electricity, despite being charged. Figure 1.15 shows what happens when pure silicon is doped with acceptors from group 13. Atoms from group 13 contain only 3 electrons in the outer shell. Due to the low concentration doping, an acceptor atom occasionally replaces a silicon atom in the crystal lattice. The acceptor atom in Fig. 1.15 forms covalent bonds with three surrounding silicon atoms. However, there is one silicon atom which observes an empty location in its covalent bond. An empty covalent bond position is, by definition, a hole, and it allows current flow by accepting valence electrons into it. When the acceptor atom accepts one of the valence electrons from surrounding silicon atoms, the acceptor will have an extra negative charge −q. Thus, the acceptor atom becomes ionized into a negative ion. The energy necessary to move the valence electron into the hole is called the ionization energy.

Fig. 1.15 Doping with acceptors, classical view. The acceptor atom creates chemical instability by introducing an extra hole

Note that the acceptor added the hole without creating a free electron in conjunction. Thermal generation still allows some electrons to be created but holes will certainly outnumber them. Silicon with added acceptors is called p-type silicon and always observes: p[n To find a quantitative model for doped silicon along the lines of intrinsic silicon, we must first develop a band model for doped silicon. Dopants are foreign materials to the crystal in which they are introduced. Materials are distinguished by two factors: • They have different Fermi levels (or more strictly work functions, Sect. 1.14). This causes misalignment of energy levels when materials are brought together. • They have different bandgaps. This also contributes to misalignment.

1.3 Band Model with Doping

Fig. 1.16 n-type silicon band structure. The donor level lies in the bandgap near the conduction band. This makes it nearly certain at room temperature that the donor will be ionized, creating an extra electron

Thus, when donors are inserted into the silicon lattice, the bands of the donor do not align with the bands of silicon. However, the donor is inserted at a very low concentration. This low concentration means that the donors are so well-spaced that Pauli’s exclusion principle does not kick in. Thus, donor atoms observe discrete levels instead of bands. And these levels will not align with the bands of silicon due to the different bandgaps. Figure 1.16 shows what happens when donors are used in doping. Donors form discrete energy levels instead of bands. Since the Fermi levels do not align, the donor levels do not necessarily fall within the bands of silicon. For a material to be a donor, one of its valence levels must fall within the bandgap of silicon. For the donor to be good, this donor level Ed, must fall near the edge of the conduction band Ec. The amount of energy needed to move the electron from the donor level to the conduction band is called the ionization energy. Ionization energy is equal to Ec–Ed. Since the donor level lies very near the edge of the conduction band, it is very easy for their electrons to acquire enough thermal energy and jump to the conduction band. Much easier than it is for silicon electrons to traverse the whole bandgap. In fact, at room temperature we can safely assume that all donor atoms are ionized. Note that the electron that leaves the donor level does create an “electron”, that is to say a free electron in the conduction band. However, there are no empty locations in its place in the valence band. Thus, a hole is not created in the process. Electron–hole pairs can still be formed by the thermal generation of electrons jumping from the valence band to the conduction band. However, the electron concentration will always exceed hole concentration. This is why the material is called n-type. The material as a whole has to still be electrically neutral. In thermal generation, when an electron is formed, the

13

Fig. 1.17 p-type silicon band structure. An empty acceptor level lies near the valence band of silicon, allowing an electron to jump into it and create an extra hole. This event (ionization) is near certain at room temperature

charge of the electron is balanced by the charge of the corresponding hole it leaves behind. When a donor donates an electron, the charge is balanced by the positive charge of the donor ion left behind. The distinction is important because holes are charge carriers, while ions do not participate in current flow. Figure 1.17 shows the band structure of silicon with added acceptors. The acceptor atoms again form discrete levels because of the low concentration. Due to different materials misaligning, an empty level of the acceptor falls in the bandgap of silicon. For a “good” acceptor, the empty acceptor level must fall near the edge of the valence band Ev. Since the acceptor level is empty and very close to the valence band, valence electrons find it very easy to jump to the acceptor level. It is much easier for these electrons to go to the acceptor level than it is to jump to the conduction band. The amount of energy required for the electrons to go the acceptor level is Ea–Ev and is called the ionization energy. This is much smaller than the amount of energy required for “thermal generation” of electron–hole pairs, which is Eg. The electrons that leave the valence band leave behind a hole. When they reach the acceptor level, they create a negative ion. No corresponding free electron is created in the conduction band. Thus, silicon with added acceptors will necessarily have more holes than electrons. Figure 1.18 shows the carrier concentration in n-type silicon. The middle curve is the density of states. This is a material property of silicon and is unaffected by the addition of donors. The rightmost curve is the concentration of charge carriers. The curve in the valence band represents holes, the curve in the conduction band is electrons.

14

The integration of the rightmost curve in Fig. 1.18 is the electron concentration n (conduction band), and hole concentration p (valence band). The integration is the area under the curves. In Sect. 1.2, we stated that the area under the two curves must be equal because in intrinsic silicon n = p. However, in n-type silicon n > p. Thus, the area under the conduction band curve in the rightmost graph in Fig. 1.18 must be larger than the area under the valence band curve. The rightmost curve in Fig. 1.18 is the product of the density of states and the Fermi–Dirac function, that is to say, the first two curves in the figure. But since the density of states remains unchanged and roughly symmetric, then the Fermi–Dirac function must shift. In Fig. 1.12, the Fermi–Dirac function was essentially symmetric, with the tail extending into the conduction band equal to that extending into the valence band. In Fig. 1.18, the function has to be asymmetric, with a larger tail extending into the conduction band, to allow for a larger electron concentration n. One of the definitions of the Fermi level is the level at which the probability of finding an electron is 0.5. The above discussion and Fig. 1.18 clearly indicate that the addition of donors causes the Fermi level to shift up in the bandgap, moving closer to the conduction band edge. This causes the probability of finding electrons in the conduction band to rise, raising the concentration of electrons n. Note also from Fig. 1.18 that there is some sort of zero-sum game being played between the electrons and holes. Trying to raise the concentration of electrons by moving the Fermi level up toward the conduction band, will necessarily cause the tail into the valence band to drop, causing the hole concentration to plummet. We will develop a more systematic model for this interplay of electrons and holes in Sect. 1.4.

1

Devices

Figure 1.19 shows the same pattern as Fig. 1.18 but for p-type silicon. The concentration of holes must be larger than the concentration of electrons. We can conclude that the only way for this to happen would be for the Fermi level to fall below its intrinsic level and closer to the valence band edge. For normal temperature ranges, we showed in Sect. 1.2 that the Fermi level lies roughly in the middle of the bandgap. Thus, it is safe to assume that if the Fermi level lies below the middle of the gap, then the material is p-type. Conversely, if the Fermi level lies above the midpoint of the bandgap, then the material is n-type.

1.4

Extrinsic Silicon

1. Derive exact expressions for carrier concentration in doped silicon 2. Obtain approximate expressions for carrier concentration in doped silicon 3. Understand charge neutrality 4. Define carrier depletion 5. Understand the properties of depletion regions 6. Prove the mass action law. Table 1.3 lists different energy values for silicon and their definitions. Most are rehashed from earlier sections, but we define a new energy level Ei. This is the Fermi level in intrinsic silicon and it always lies in the middle of the bandgap. The actual Fermi level in doped silicon is Ef. The interplay of Ef and Ei can be critical in defining carrier densities. To obtain approximate values of carrier concentrations in n- and p-type, we can make two assumptions:

Fig. 1.18 Band structure, probability density (left), density of states (middle), and carrier concentrations (right) in n-type silicon

1.4 Extrinsic Silicon

15

Fig. 1.19 Band structure, probability density (left), density of states (middle), and carrier concentrations (right) in p-type silicon Table 1.3 Energy levels and energy distances in silicon with typical values

Symbol

Definition

Ev

Upper edge of valence band

Ec

Lower edge of conduction band

Ed

Donor level

Near Ec

Ea

Acceptor level

Near Ev

Eg

Bandgap (Ec–Ev)

1.1 eV in silicon

Ec–Ed

Ionization energy of donors

Very small < 0.1 eV

Ea–Ev

Ionization energy of acceptors

Very small < 0.1 eV

Ei

Fermi level in intrinsic silicon

Middle of bandgap, 0.55 eV above Ev and below Ec (roughly)

Ef

Fermi level in doped silicon

Anywhere in the bandgap

• All dopants are ionized • Carriers due to ionization are much higher than those due to thermal generation In n-type silicon, we assume that all donor atoms are ionized: NDþ ¼ Ionized donor concentration ND ¼ Donor concentration NDþ  ND We also assume that the concentration of electrons due to thermal generation is much smaller than the concentration of electrons due to donor ionization. Each ionized donor produces one electron; thus, the electron concentration is equal to the donor concentration, with the balance of thermally generated electrons being negligible: n  NDþ  ND In Sect. 1.2, we showed that the product of the electron and hole concentrations produces a constant value equal to the square of the intrinsic carrier concentration:

Typical values

np ¼ n2i ni ¼ Intrinsic electron concentration We will shortly prove that this relation, called the mass action law, is always valid regardless of the type and level of doping. We have already shown this qualitatively in Sect. 1.3 where increasing the concentration of one carrier necessarily reduces the other. We can use the mass action law to obtain the value of hole concentration in n-type silicon as p¼

n2i n2 ¼ i n ND

For n-type n is called the majority carrier concentration, and p is called the minority carrier concentration. And for most practical values n  p. For p-type we can repeat the same calculations. Except in this case, p is the majority carrier formed mostly by ionization of acceptors, and n is the minority concentration: NA ¼ Ionized acceptor concentration NA ¼ Acceptor concentration

16

1

NA  NA

These two relations correlate the displacement of the Fermi level from the middle of the bandgap with carrier concentration. If we multiply the two concentrations, we obtain the mass action law, an extremely useful result:

p  NA  NA n¼

n2i n2 ¼ i p NA

np ¼ ni eðEF Ei Þ=kT ni eðEi EF Þ=kT

In Sect. 1.3, we concluded qualitatively that the Fermi level must lie above the mid-gap in n-type and below it in ptype. Using the expressions of electron and hole concentration above, we can prove this, in n-type: n ¼ ND ¼ NC eðEC EF Þ=kT 

NC ðEC  EF Þ ¼ kT ln ND



But for intrinsic silicon:   NC ðEC  Ei Þ ¼ kT ln ni Thus: ðEC  Ei Þ  ðEC  EF Þ ¼ kT ln

    NC NC  kT ln ni ND

  ND EF  Ei ¼ kT ln ni But Nd  ni, thus, Ef must lie above Ei. Similarly, for p-type:     NV NV ðEi  EV Þ  ðEF  EV Þ ¼ kT ln  kT ln ni NA   NA Ei  EF ¼ kT ln ni And since Na  ni, the Fermi level must lie below the intrinsic Fermi level. To prove the mass action law, we must first express carrier concentrations in terms of intrinsic concentration, this leads to very useful results: n ¼ NC eðEC EF Þ=kT ¼ ni NC eðEC EF Þ=kT =ni n ¼ ni NC eðEC EF Þ=kT =NC eðEC Ei Þ=kT

np ¼ n2i This result says that regardless of purity/impurity level and concentration, the product of the two carrier types remains constant for a certain material at a certain temperature. This product is the square of the intrinsic concentration. Another conclusion is that a rise in electron concentration must be accompanied by a proportionate drop in hole concentration and vice versa. We made two assumptions in obtaining the above expressions: that full ionization occurs and that carriers from thermal generation are negligible. The former can safely be assumed to be true at room temperature. The latter, however, is not always true. In some cases, we have to add both donors and acceptors, and we end up with an effective concentration of the dopant that has a positive balance. In these situations, we have to calculate exact values for carriers that take into consideration thermal generation. To find the exact carrier concentrations we must define the concept of charge neutrality. Charge neutrality means that the net charge in a piece of silicon must be zero. If we assume that we dope using both donors and acceptors, the majority dopant will define the type of silicon. Thus, if NA > ND the silicon will be p-type. If ND > NA, the silicon will be n-type. There are four types of charges in any piece of silicon. Electrons are negative charge carriers at concentration n. Holes are positive charge carriers at concentration p. There are also immobile negative ions at concentration NA, and immobile positive ions at concentration ND. Charge neutrality dictates zero net charge: ND þ p ¼ NA þ n This, of course, assumes full ionization. Using this with mass action law, we can obtain exact results for n and p: np ¼ n2i p¼

n ¼ ni eðEC EF EC þ Ei Þ=kT n ¼ ni eðEF Ei Þ=kT Similarly, for p-type: p ¼ ni eðEi EF Þ=kT

Devices

ND þ

n2i n

n2i ¼ NA þ n n

n2 þ ðNA  ND Þn  n2i ¼ 0

1.4 Extrinsic Silicon



17

 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ND  NA þ ðND  NA Þ2 þ 4n2i 2

And similarly:  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 NA  ND þ ðNA  ND Þ2 þ 4n2i p¼ 2 Now we can try to reach the approximations from these exact expressions: In n-type: ND  NA n¼

 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ND  NA þ ðND  NA Þ2 þ 4n2i 2 

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ND þ ðND Þ2 þ 4n2i n 2 ND  ni 

1 ND þ n 2

qffiffiffiffiffiffiffiffiffiffiffiffi ðND Þ2 ¼ ND

 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ND þ ðND Þ2 þ 4n2i ¼ 0 p 2 And we can do a similar approximation for p-type. Thus, the exact expressions yield the approximate expressions if we assume that the minority carrier is negligible relative to the majority carrier. So, when do we use the approximate expressions and when do we use the exact expressions? The equations make it obvious that when one charge carrier is much higher than the other, the approximate expressions can be used. In doped silicon this will always be the case, with two important exceptions: • If the dopant level is low, or if both dopant types are added leaving a low net level of one of them. In this case, neither charge carrier is negligible relative to the other • If the temperature is very high. This leads to very high rates of thermal generation. Since thermal generation creates both charge carrier types in equal numbers, it will cause the levels of electrons and holes to become closer If a piece of silicon has its charge carriers removed for some reason, what kind of material does it leave behind? The area with removed charge carriers will still have charges. The charges are ions, positive in the case of n-type and negative in the case of p-type. Thus, this region is not electrically neutral. The ions in this region are also immobile because they consist exclusively of ions.

This region (where charge carriers are removed) is alternatively called the depletion region or the space-charge region. It is called a depletion region because it is fully depleted of mobile charge carriers (electrons and holes). It is called the space-charge region because it contains immobile charges distributed along the volume. Thus, the depletion region is defined by two seemingly contradictory, but very interesting factors: • It is fully devoid of mobile charge carriers. Thus, it has insulating properties. • It is full of immobile space charges. Thus, it can form electric fields. Depletion regions and their modulation is at the center of the operation of PN junctions. Depletion regions also form in all kinds of transistors. Depletion regions will show some of the most contradictory and fascinating properties in the entire chapter. Sometimes they will act as insulators. Sometimes they will conduct huge current. At all times they form critically important built-in fields.

1.5 1. 2. 3. 4.

Drift

Distinguish thermal velocity from drift velocity Define mean free time and mean free path Define drift and mobility Understand carrier drift and current directions for holes and electrons.

Carrier transport is any phenomenon that causes charge carriers to move. From Sect. 1.4, we know that the only charges that can move are electrons and holes. Thus, they are called charge carriers. Any mechanism that causes electrons and holes to move will lead to a charge flux, the charge flux leads to current flow. In this section, we will consider the first and most obvious mechanism of carrier transport: drift under the effect of an electric field. The main source of motion for electrons and holes comes from thermal energy. The charge carriers absorb the thermal energy and convert it into kinetic energy. The velocity of electrons and holes, in this case, is called thermal velocity, it can be calculated as follows for typical values: 3 Ethermal ¼ kT 2 1 Ekineticelectron ¼ mn v2th 2 1 Ekinetichole ¼ mp v2th 2

e

h

18

1

At room temperature: vth

e

¼ 2:3  107 cm/s

vth

h

¼ 2:2  107 cm/s

The amplitudes of these velocities are extremely high. In fact, they are only three orders of magnitude lower than the speed of light. However, thermal velocity does not contribute to current conduction. Figure 1.20 shows the typical path that a charge carrier takes due to thermal energy. The path is random, in fact, it is a zero-mean normal process. Thus, the mean path due to thermal energy is null. The carrier moves very fast instantaneously, but on average it effectively remains in its place. Thermal velocity is significant as a primary source of noise (thermal noise) but does not contribute to current conduction. When an electric field is applied to an area with charge carriers, the charge carriers will observe a force. The force will cause acceleration, which will cause the charge carriers to move from a still position. This motion is extremely choppy. The charge carriers will keep hitting imperfections (mainly dopants) in the crystal. These imperfections will slow the carriers down, essentially down to a standstill. If the field still exists, the charge will start accelerating again. The charge also observes thermal velocity that causes it to simultaneously move in random directions. However, despite the collisions and the thermal velocity, there is a net motion due to the electric field, as shown in Fig. 1.21. This motion is against the direction of the field in the case of electrons, and with the direction of the field in the case of holes. The motion of both or either electrons or holes yields a net current in the direction of the field. The process of motion due to a field is called drift. The resulting current is called the drift current. The drift of a charge carrier is uneven. It accelerates, hits an imperfection, and comes to a standstill, it then repeats the whole cycle again. It is useful to consider the statistical properties of drift rather than its instantaneous behavior: • The time it takes for the particle to collide with a crystal imperfection is random, however, it has a statistical mean. This mean is called the mean free time. It is

Devices

Fig. 1.21 Net motion in a certain direction due to drift despite scattering due to imperfections and random motion due to thermal energy

• • • •

significant in determining how long the charge carrier travels on average before colliding The mean free time corresponds to a mean free path, which is the average distance it travels before colliding and coming to a complete stop We will assume that every time the carrier collides, it will come to a complete stop. Thus, acceleration will start every time from null velocity From the above we can conclude that the mean free time and the mean free path also correspond to a final velocity before collision Between zero and the final velocity, there is a mean velocity. This mean velocity is extremely important. It is the velocity we will use in determining current. We can make the following initial definitions: sfree ¼ mean free time vfinal ¼ mean final velocity before collision n ¼ electric field

The final momentum of the electron before colliding can be defined in two ways: mn vfinal ¼ mean final momentum Fsfree ¼ mean final momentum The electric force is defined as F ¼ qn And equating the two expressions of momentum: qnsfree ¼ mn vfinal Thus, the final velocity is: vfinal ¼ 

Fig. 1.20 Paths of thermal velocity cancel out. The position of the carrier is a Gaussian random variable with zero mean around the original position

qnsfree mn

This represents a maximum value of velocity. A more useful measure is the average value of velocity, a single

1.5 Drift

19

figure representing how fast the electron would be moving if it were uninterrupted. We call this the drift velocity and we can approximate it as half the final velocity: vdrift ¼ Average drift velocity of electrons vdrift ¼

vfinal qnsfree ¼ 2 2mn

The drift velocity is proportional to the electric field. The constant of proportionality is an important parameter known as mobility: vdrift ¼ ln n ln ¼

qsfree ¼ electron mobility 2mn

Similarly, we can derive a relationship for hole mobility. Mobility is a constant under certain conditions. Mobility is dependent on both the effective mass of the charge carrier and the mean free time. Thus, it is highly dependent on conditions in the crystal lattice and on temperature. More imperfections in the crystal lattice mean more frequent collisions, a shorter mean free time, and a lower mobility. Most imperfections come from doping; thus, mobility degrades with increased doping. A major assumption we are making here is that mean free time is a constant. In fact, it is a function of electric field. The stronger the field, the shorter the time the charge carrier can move before colliding. This will lead to diminishing returns. This manifests in the critically important phenomenon of velocity saturation and will be front and center in Chap. 10. Hole mobility is usually smaller than electron mobility due to the higher effective mass and the shorter mean free path for electrons moving in the covalent bonds. This disparity between electrons and holes will be a very important design consideration at the circuit level. Figure 1.22 shows a circuit with a piece of pure silicon connected to an external voltage source. The band structure shows two features: An inconstant Fermi level, and tilted bands. Fermi level is constant (meaning flat) throughout the whole material if the material is in a condition called thermal equilibrium. Thermal equilibrium means: • Steady-state condition with any changes in ambient temperature, lighting, or pressure having been given enough time to settle • The above point translates into the rates of thermal generation and recombination of hole–electron pairs

Fig. 1.22 Band tilting in intrinsic silicon due to external potential

being equal. This leaves a net amount of thermally generated charges in the valence and conduction band derived in Sect. 1.2 • No external energy source. In circuits this means no external voltage source. Since Fig. 1.22 has an externally connected voltage source, the Fermi level cannot be constant. An inconstant Fermi level immediately corresponds to an external applied voltage and is usually associated with current flow if there is a loop for such current to flow through. Notice the amount of tilting for all the energy levels is the same. This amount of tilting must correspond to the external voltage difference. Since the energy levels indicate the energy of an electron, then the amount of tilting corresponds to the amount of energy lost as the electron flows through the semiconductor. This corresponds to the applied voltage because energy and potential are related: DEC ¼ qVS Note that the bands are higher at the lower voltage. This is due to the fact that energy bands indicate the energy of electrons, while voltages correspond to the potential of imaginary positive particles. The external voltage yields a field across the semiconductor. The field is constant, which is indicated by the fact that the bands are linearly tilted (Sect. 1.14 will explore the

20

1

shape of tilting in more detail). Electrons move from the higher energy (lower potential) down the slope of the conduction band under drift conditions. The electrons then flow into the metal wire, where the external voltage source takes care of raising the electron energy again by qVs. The loop is complete and current can flow. Holes move up the valence band from the higher potential to the lower potential. The fact that holes seem to move up in energy is explained by the fact that valence electrons will correspondingly move down the energy band. Hole energy decreases the higher we go in the valence band. Recall that it is easier for valence electrons to settle at lower levels of the valence band, forcing holes in rest state to bubble up to the edge of the band. It might be helpful to remember the analogy of holes as bubbles. Figure 1.23 shows a semiconductor with a cross section A. Although the figure shows a square cross section, we will not make use of this fact in the next derivation, thus all results are valid regardless of the geometry of the wire cross section. Current is the amount of charge that crosses a certain area per unit time: I¼

dQ dt

vn ¼ ln n But velocity is the rate of change of displacement: dx dt

Back to Fig. 1.23, the amount of charge dQ in the current expression is the charge that crosses the cross-sectional area in the infinitesimally small time dt. This is the amount of charge in a cuboid with cross-sectional area A and height dx, thus: dQ ¼ ðqnÞAdx Here we have made use of the fact that charge concentration in n-type silicon is roughly −q  n Coulombs. Notice that n is the concentration of electrons in number per cubic centimeter, Fig. 1.23 Current as the charge in a volume. Because current is the rate of flow of charges, it is also the charge contained in a prism whose height is the magnitude of velocity

rather than charge concentration in Coulomb. Substituting for the expression of dQ in the current expression: I¼

dQ dx ¼ qnA ¼ qnAvn dt dt

And from the definition of drift velocity: I ¼ qln nAn And the current density for electrons is Jn ¼

I ¼ qln nn A

We can perform a similar derivation for holes, yielding a hole drift current density: Jp ¼ qlp pn And the total drift current density is J ¼ Jn þ Jp ¼ qðln n þ lp pÞn The definition of current density in terms of electric field and conductivity thus allows us to obtain expressions of conductivity: J ¼ rn

Assuming an external potential is applied across the shape in Fig. 1.23. This creates an electric field, which causes carriers in the bar to move at a certain drift velocity. Assuming n-type silicon, the velocity of electrons is

vn ¼ ln n ¼

Devices

r ¼ qðln n þ lp pÞ The conductivity of silicon is primarily a function of carrier concentration. Carrier concentration is affected by net doping more than any other factor. Thus, conductivity is affected by doping more than any other factor. We can finally see why the electrical properties of silicon are so sensitive to impurities. For a linear resistor in which the field is linearly related to the potential, the above results can be extended to obtain an expression for linear resistance. This is a special form of ohm’s law: J ¼ qðln n þ lp pÞn ¼ rn J¼

rV L

I rV ¼ A L V ¼ I:

L ¼ IR rA



L rA

Notice that all the above is contingent on the hole and electron currents being added rather than subtracted. Notice

1.5 Drift

21

Fig. 1.24 Free particles diffusing in an environment to achieve a uniform distribution

that the electric field will cause electrons to move against it. However, the current caused by the electron drift will be against the direction of electron flow, thus the electron current will be in the same direction of the field. Holes will flow in the same direction of the field because they have a positive charge. And also, because they have a positive charge, their current is in the same direction as their flow. Thus, both electron and hole currents flow in the same direction as the electric field, and they must be added when calculating the total semiconductor current.

continue until a uniform distribution of the free particle has been achieved, Fig. 1.24. Diffusion in semiconductors concerns the diffusion of charge carriers from areas of high concentration to areas of low concentration through the crystal lattice. The motion of charge carriers causes current flow. This component of current is called diffusion current. The rate at which electrons move, also known as the electron flux can be calculated as Un ¼ Dn

Semiconductors are not interesting just because of how much their conductivity changes, but because of why it changes. In n-type silicon, the conductivity is mainly controlled by electrons, in p-type silicon the conductivity is mainly controlled by holes. This asymmetry leads to very interesting behavior when the two types are brought together.

1.6 1. 2. 3. 4.

Dn is a physical constant called diffusivity. It is a material property. But it is also a function of temperature. It represents how porous the material is to carrier diffusion. The flux is also proportional to the sharpness of the gradient of charge distribution expressed through dn/dx. Charge flux is the number of charge carriers passing per unit area per second. Current density can be obtained from charge flux by multiplying the charge flux by the charge per carrier: Jn ¼ qUn ¼ qDn

Diffusion

Understand carrier diffusion as a natural process Derive diffusion current for electrons and holes Define total currents for electrons and holes Derive Einstein’s relationship for mobility diffusivity.

dn dx

and

Diffusion is a natural process through which free particles move from areas of higher concentration to areas of lower concentration. Diffusion is the mechanism by which gases expand to fill a container, or by which a material dissolves in a solvent to create a homogenous solution. Diffusion will

dn dx

A similar expression can also be obtained for hole diffusion current. It will also be a function of hole diffusivity (which is different from electron diffusivity), and hole concentration gradient. However, the hole diffusion current density has a negative sign. This can be justified by inspecting Fig. 1.25. If the hole gradient is positive, the hole flux is in the negative direction, causing a hole diffusion current in the negative direction. For electrons, the flux would also be in the negative direction, but the current would be in the positive direction. Expressions for total hole and electron current (including drift and diffusion) can now be obtained as follows:

22

1

Fig. 1.25 Direction of hole diffusion and current density. A positive gradient creates a negative diffusion current since current is defined in the positive direction

Jn ¼ qln nn þ qDn

dn dx

Jp ¼ qlp pn  qDp

dp dx

There is a fundamental relationship between diffusivity and mobility. This is called Einstein’s relationship. It makes sense for the two to be closely related because if it is easy for charge carriers to move under the effect of an external field, it must also be easy for them to diffuse. In both cases, the main mechanism that impedes current flow is imperfections in the crystal lattice. Thus, both will benefit and suffer in tandem. Consider Fig. 1.26, the piece of silicon has a doping gradient, however, it has no external applied voltage. The lack of applied voltage corresponds to a constant Fermi level. The variable doping means that the distance between Ec and Ef must vary through the material. This yields the band bending shown in the figure. At this point, it is important to revisit the concept of thermal equilibrium. Thermal equilibrium was defined in Sect. 1.5. It is closely associated with a constant Fermi level. In fact, thermal equilibrium always indicates a constant Fermi level, and a constant Fermi level always indicates thermal equilibrium.

Devices

We must examine what thermal equilibrium entails, not just its definition. In Fig. 1.22, we mentioned that a non-flat Fermi level by necessity indicates an externally applied field. Conversely, if the Fermi level is constant, we can conclude that there is no externally applied field. A constant Fermi level also immediately indicates that no net current is flowing through the material. On the other hand, a nonconstant Fermi level will yield a net current out of the device terminals if a complete loop exists. But the bands in Fig. 1.26 are bent. In Sects. 1.7, 1.8, and 1.14, we will conclude that band bending necessarily indicates an electric field. This follows directly from Gauss law. But the constant Fermi level in Fig. 1.26 means there can be no external electric field. Thus, the field in the figure must be internal or built-in. But built-in or external, an electric field will cause drift current to flow. So, some currents will flow in Fig. 1.26. However, because no net current can flow out of the device, all the current components must cancel out. The doping gradient in Fig. 1.26 means there is necessarily a diffusion current, but since the total current of the device terminals is null, there must be an equal and opposite drift current to cancel it out. This drift flows because of the built-in field we concluded must exist above. Assuming the material is n-type throughout, the current equation is dn ¼0 dx    dn jqln nnj ¼ qDn  dx

Jn ¼ qln nn þ qDn

ln nn ¼ Dn

dn dx

But we already have an expression for electron concentration: n ¼ NC eðEC EF Þ=kT The gradient of electrons can thus be calculated in terms of conduction band edge and Fermi level as   EC EF dn 1 dEC dEF ¼  NC e kT  dx kT dx dx There is no external field, thus: dEF ¼0 dx And thus: dn n dEC ¼ dx kT dx

Fig. 1.26 Material with a doping gradient and short circuit. The Fermi level is constant due to the short circuit. The distance between Ei and Ef varies, indicating variable doping

But the electric field, applied voltage, and bending in the conduction band edge can be related as

1.6 Diffusion

23

dEC dV ¼ qn ¼ q: dx dx dn nq ¼ n dx kT ln nn ¼ Dn

nq n kT

And finally, Einstein’s relationship: Dn ¼ ln :

kT q Fig. 1.27 Work function and electron affinity. Vacuum level is never drawn to scale

Similarly: Dp ¼ lp :

kT q

The quantity kT/q is the voltage corresponding to the average thermal energy in the system and is equal to 25 mV at room temperature.

1.7 1. 2. 3. 4. 5. 6.

Forming a Homojunction

Realize the significance of Fermi level in equilibrium Define work function and electron affinity Understand transient events in PN junction formation Understand why and how the depletion region forms Recognize current balance in the depletion region Understand steady-state conditions in PN junctions.

Figure 1.27 shows the band diagram of a piece of n-type silicon. The band structure shows familiar levels such as Ec, Ev, Ef, and Ei. It also shows a new level Eu and two new energy values electron affinity and work function. Eu is the vacuum level; it is the lowest energy level at which the electron is completely freed from the material but still at its surface. This is measured if the material is in vacuum, thus the term vacuum level. The work function is usually denoted by the symbol w. It is defined as Eu–Ef. Thus, work function is the amount of energy required to free an average electron from the material. The electron affinity has the symbol v. It is defined as Eu–Ec and is the energy required to free the least energetic conduction electron from the material. Electron affinity is a function of the material used to make the lattice, while work function is also a function of the amount and type of doping. Work function is usually more useful since it represents the energy difference to a level with a well-defined electron probability. The electron affinity for silicon is 4.05 eV. Its work function ranges between 4.05 and 5.15 eV. We will further discuss these levels in Sect. 1.14.

Figure 1.28 shows two pieces of silicon one n-type and one p-type brought together. The n-type and p-type silicon have similar Ec, Ev, Ei, and electron affinity. However, they differ in Fermi level and work function. If we bring the two materials together until they form a junction, the situation in Fig. 1.28 cannot be an equilibrium condition. There are many ways to observe this that are as follows: • Since Ef represents the mid-probability point, there are more electrons in the conduction band of the n-type than the corresponding energy levels in the p-type conduction band. This is a concentration gradient that would cause electrons to diffuse from n to p. The opposite happens for holes in valence band with holes diffusing from p to n. This means there are excess carriers injected into both regions and thermal equilibrium cannot be observed in Fig. 1.28.

Fig. 1.28 P- and n-type silicon brought together in nonequilibrium. A carrier gradient occurs across the interface leading to diffusion and indicating this cannot be an equilibrium condition

24

1

Devices

Fig. 1.29 Diffusion, space-charge region, drift, and equilibrium. The depletion regions on the two sides of the interface create a built-in electric field which causes a drift current to flow. This current cancels out diffusion current in thermal equilibrium

• An electron can move from an energy level in the conduction band of the p-type out into the metal and all the way around a short circuit into the conduction band of the n-type. This is possible because these are all at the same energy level. However, this electron is now moving against a gradient, which means it has gained energy, which is not possible without an external source • We can move an electron from the conduction band of the p-type up to vacuum. Then laterally in vacuum to the n-type side and then down to the same level it came from in the p-side. This electron now has gained energy without an external source since the work function on the n-side is less than on the p-side. But it is the fact that there is a concentration gradient that is most critical. The gradient will necessarily lead to diffusion. The diffusion current cannot continue to flow in equilibrium, since equilibrium by definition means there is no external current flow. So, since Fig. 1.28 shows a situation where a large diffusion current must flow, the figure is not an equilibrium diagram. In fact, it is a special kind of diagram called a flat band diagram whose significance will be further discussed in Sect. 1.14. Will diffusion continue to happen until all electrons have migrated from n to p and all holes have migrated from p to n? This would only happen if the carriers that diffuse left behind an electrically neutral region. This is not true, in Sect. 1.4 we explored what happens if majority carriers are removed from a semiconductor. What is left behind is an area with ions, immobile charges. This region is both insulating and space-charged.

Thus, as electrons diffuse from n to p they leave behind a depletion region with positive ions. As holes diffuse in the opposite direction, they leave behind a depletion region with negative ions. As more carriers diffuse, the depletion region widens. The two depletion regions are charged with opposite polarities as shown in Fig. 1.29. This creates an electric field between the two polarities from the n-side (positively charged depletion region) to the p-side (negatively charged depletion region). This built-in field is not imposed due to an external voltage source but is built-in due to a self-imposed diffusion process. This electric field will then cause a drift current to flow. Electrons will flow against the field from the p-side to the n-side, while holes will drift from the n-side to the p-side. In equilibrium there should be no net current. Thus, in equilibrium, the diffusion current due to the gradient will be equal in magnitude to the drift current caused by the depletion region left behind by diffusion. The width of the depletion region is just enough to build a field that is strong enough for drift to cancel out diffusion. So, what would the band diagram look like in equilibrium. The band diagram should reflect the built-in field, which translates into a built-in potential. This built-in potential is as real as any voltage; however, it cannot be measured using external measuring equipment. Since there is no net current flowing through the junction, and since the built-in fields and potentials cannot be externally observed, and also since the material is in thermal equilibrium; the Fermi level must be constant. We begin by drawing such a level in Fig. 1.30.

1.7 Forming a Homojunction

25

The difference between the conduction band edge on the n-side and on the p-side corresponds to the built-in potential through the electron charge: Vbi DEC ¼ q Notice that all levels except the Fermi level bend in parallel. The Fermi level very closely approaches the intrinsic Fermi level in the depletion region. This corresponds to the fact that charge carrier concentration is negligible in the depletion region. Thus, the PN junction is divided into four rough areas: Fig. 1.30 PN junction band diagram in equilibrium. The amount of bending (qVbi) corresponds to the built-in potential (Vbi) which corresponds to the built-in field. The slope causes a carrier drift, but this is countered by carrier diffusion yielding zero net current and a constant Fermi level

Away from the junction, the depletion region disappears and on the n-side there is a region where electrical neutrality is restored, leading to charge concentrations as follows:

• n neutral zone where the Fermi level to conduction band distance is the same as the n-type material at equilibrium and the bands are flat • n depletion zone where the zone has positive ions and exhibits band bending • p depletion zone where the zone has net negative ions and exhibits band bending • p-neutral zone where bands are flat and differences reflect equilibrium carrier levels.

n þ NA ¼ p þ ND ! n ¼ ND Similarly, on the p-side there is a region away from the depletion region where electrical neutrality is restored and the equations of Sect. 1.4 are restored: n þ NA ¼ p þ ND ! p ¼ NA In these two “neutral” regions, shown in green in Fig. 1.29, the band structure is a normal n-type and p-type band structure for each of the regions, this is again reflected in Fig. 1.30. The transitional area between the two neutral regions is the depletion region and is shown in Fig. 1.29 as an ionized layer on either side of the interface. In Fig. 1.30, the depletion region is a smooth bending of all levels except the Fermi level. The band diagram in Fig. 1.30 raises a few questions: • What does the fact that the Fermi level is constant imply? It implies that the material is at thermal equilibrium. There is no externally applied potential or field and there is no net current flowing out of the device terminals. • What does the fact that all other levels bend mean? It means there is some field and potential difference that causes the bands to bend. However, since this field and potential cannot be observed on the Fermi level, then they must be built-in field and potential.

Depleted silicon can be very confusing. It exhibits insulating properties; however, it is very dangerous to assume that it is an insulator. Depletion regions are very deficient in charge carriers; thus they have relatively low conductivity. However, they still have the same bandgap as silicon. Thus, depletion regions do not offer a large potential barrier if charges are injected from surrounding areas. In fact, if a favorable electric field exists, very large current can flow unimpeded in depletion layers. This is the case for saturation current in MOSFET passing between the end of the channel and the drain (Sect. 1.19). It is also the case for forward-biased PN junctions (Sect. 1.10). But it is most evident in BJTs, where the base–collector junction has a very healthy depletion region, but still sees considerable current flow (Sect. 1.13). An insulator is not just a material with low conductivity, it is a material with a very large bandgap. This is, for example, the case with silicon dioxide, where the barrier it offers to attached silicon stops current flow regardless of the direction of field (Sect. 1.15). For this reason, we can generally assume depletion regions have insulating properties, but we should never call them insulators.

26

1

1.8 1. 2. 3. 4. 5.

Devices

PN Junction in Equilibrium

Prove the constancy of the Fermi level Sketch charge density in the depletion region Derive electric field and potential in the depletion region Derive the depletion region width Understand asymmetry in the depletion region.

We have so far assumed that a constant Fermi level in thermal equilibrium is a self-evident fact. But we can actually prove this relatively easily. First, we relate electric field to potential and potential to energy levels: n¼

dV dx

Ei ¼ qV 1 dEi n¼ : q dx The total current is the sum of drift and diffusion currents. Considering only electron current, with no loss of generality: Jn ¼ qln nn þ qDn

dn dx

But the electron concentration can be obtained in terms of the Fermi and intrinsic Fermi levels: n ¼ ni eðEF Ei Þ=kT And the gradient can be obtained as dn ni e ¼ dx

EF Ei kT

dE

F

dx

i  dE dx

kT

¼

  n dEF dEi :  kT dx dx

There is a degree of circular logic in proving both Einstein’s relation and the constancy of Fermi level. To prove one, we have to assume the other is true. This is inevitable, but both can be axiomatically assumed true. In a PN junction, the depletion region is charged. However, the entire PN junction must remain charge neutral since no additional charges were pumped into the overall device. Since the neutral regions in the p-side and the n-side are charge neutral, this means that the depletion regions as a whole must also be charge neutral. Thus, the total charge in the n-side depletion region must equal the total charge in the p-side depletion region. This leads to a very useful relationship. From Fig. 1.31, the total charges can be obtained in terms of the depletion regions widths in the n-side and the p-side: Positive ion charge ¼ negative ion charge   ND xn A ¼ NA xp A   ND xn ¼ NA xp  where A is the cross section of the device. This means that the total width of the depletion region Xn + Xp is not symmetric. The depletion region extends more into the more lightly doped side to accumulate an equal total charge. Figure 1.32 shows the PN junction band diagram. Energy levels (except for the Fermi level) are higher on the p-neutral

Thus, the total current is J¼

  qln n dEi qDn n dEF dEi : : þ  q kT dx dx dx

Using Einstein’s relationship:   dEi dEF dEi þ ln n  J ¼ ln n: dx dx dx Thermal equilibrium means by definition there is zero net current flow:   dEi dEF dEi þ ln n  J ¼ ln n: ¼0 dx dx dx dEF ¼0 dx And thus, the Fermi level must be constant throughout the material.

Fig. 1.31 Neutral regions and depletion region charge densities. The neutral zones are shown in green and have zero net charge. The depletion region has a negative charge on the p-side and positive charge on the n side. But the overall depletion charge must also be null. The material has to be overall charge neutral

1.8 PN Junction in Equilibrium

27

linearly, while it increases linearly on the n-side depletion region. The general volumetric charge density can be obtained in terms of doping concentration and free carrier density as q ¼ qðp þ ND  n  NA Þ In the two depletion regions only ion charges occur, thus: dn qNA ¼ ; xp \x\0 dx e dn qND ¼ ; 0\x\xn dx e Integrating we obtain the expressions of the electric field: qND x þ C1 ; 0\x\xn e qNA x þ C2 ; xp \x\0 n¼ e



The constants C1 and C2 can be obtained from the boundary conditions. We know the field must be zero at both x = xn and x = −xp: qND xn e qNA xp C2 ¼  e C1 ¼ 

And thus, the electric field is qND ðx  xn Þ; 0\x\xn e qNA ðx þ xp Þ; xp \x\0 n¼ e

n¼ Fig. 1.32 PN junction band diagram (top), charge density, electric field, and potential (bottom). The constant charge density creates a linear field. The linear filed builds up a quadratic potential. Everything interesting happens in the depletion region

side than on the n neutral side. This corresponds to a built-in field and a built-in potential. Since the donor ions on the n-side are positive, this means the potential is higher on the n-side. This also follows from the fact that energy bands are lower on the n-side and that there is an inverse energy– potential relationship. The charge density in the neutral zones is null. This corresponds to a null field in these regions. In the depletion regions, there is a uniform ionic space charge. The uniformity is due to the uniform doping in the device and the total depletion of carriers. The electric field is related to the charge density through Gauss: dn q ¼ dx e Thus, a uniform charge density creates a linear electric field. The electric field on the p-side depletion thus decreases

The maximum absolute value of the electric field exists at x = 0. The value of the electric field must be continuous at x = 0, this is another boundary condition. Applying this boundary condition is redundant since it yields a result we already intuitively reached: nn ð0Þ ¼ np ð0Þ ¼ nmax 

qND qNA xn ¼  xp e e ND xn ¼ NA xp

The potential is related to the electric field through n¼

dV dx

From Fig. 1.32, the potential starts at 0 at −xp, and builds up to the entirety of the built-in potential Vbi, at xn. These are the two boundary conditions that will be used to obtain the constants of integration.

28

1

In the p-side depletion region: Z V ¼  n dx Z V¼ V¼

Substituting in the equation for Vbi:   2    ni ND  kTln qVbi ¼  kTln NC N A NC 

qNA ðx þ xp Þdx e

  qNA x2 þ xp x þ C1 ; e 2

qVbi ¼ kTln

xp \x\0

Applying the boundary condition: Vðxp Þ ¼ 0 ¼

Devices

qNA x2p  x2p e 2

Vbi ¼

NA ND n2i



  kT NA ND :ln q n2i

Applying the boundary condition at x = 0 due to continuity of the potential function:

! þ C1

Vp ð0Þ ¼ Vn ð0Þ qND 2 qNA 2 x ¼ x 2e n 2e p q 2 xn ND þ x2p NA Vbi ¼ 2e

And thus:

Vbi 



qNA ðx þ xp Þ2 ; 2e

xp \x\0

Similarly, on the n-side, and using the boundary condition V(xn) = Vbi, we can obtain V ¼ Vbi 

qND ðx  xn Þ2 ; 2e

0\x\xn

Vbi, the total built-in potential has been used as a constant, however, we must be able to calculate its value. Vbi is related to the total bending in any energy level except the Fermi level, thus it can be expressed in terms of the conduction band edge on the n-side Ecn and the conduction band edge on the p-side Ecp as

This relationship expresses the sum of the squares of the two depletion region width components. To obtain an expression for the total depletion region width Xn + Xp, we must also use the result obtained from electric field boundary conditions to obtain an expression for Xn and an expression for Xp, then add them to obtain the total depletion region width. x2n ND þ x2p NA ¼

xp NA ¼ xn ND

qVbi ¼ ECp  ECn ¼ ðECp  EF Þ  ðECn  EF Þ

ND xn NA   ND2 2e 2 xn ND þ ¼ Vbi q NA xp ¼

The distances between the conduction band edge and the Fermi levels are strongly related to the electron concentration, thus: nn ¼ electron concentration on n  side

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ffi 2e NA xn ¼ Vbi q ND NA þ ND2

nn ¼ NC eðECn EF Þ=kT  ND ðECn  EF Þ ¼ kTln

  ND NC

np ¼ electron concentration on p  side np ¼ NC eðECp EF Þ=kT  n2i =NA 

n2i ðECp  EF Þ ¼ kTln NC N A



2e Vbi q

Similarly: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2e ND xp ¼ Vbi q ND NA þ NA2 Thus, the total depletion region width: xd ¼ xn þ xp

1.8 PN Junction in Equilibrium

xd xd

xd

xd

xd xd xd

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2e NA 2e ND þ V ¼ Vbi bi q ND NA þ ND2 q ND NA þ NA2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffi s s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   ffi! 2e NA ND þ ¼ Vbi q ND NA þ ND2 ND NA þ NA2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffi s ffi s ffi! 2e NA2 ND2 ¼ Vbi þ q ND NA2 þ NA ND2 ND NA2 þ NA ND2 sffiffiffiffiffiffiffiffiffiffiffi ! 2e ND þ NA pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ Vbi q ND NA2 þ NA ND2 sffiffiffiffiffiffiffiffiffiffiffi  2e ND þ NA p ffiffiffiffiffiffiffiffiffiffiffi ffi p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ Vbi q ND NA ND þ NA sffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2e ND þ NA pffiffiffiffiffiffiffiffiffiffiffiffi ¼ Vbi q ND NA sffiffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2e 1 1 ¼ Vbi þ q ND NA

This result is valid for any PN junction. In highly asymmetric junctions, one side is much more heavily doped than the other. This would lead to a simpler expression for the depletion region width. When the p-side is more heavily doped, the junction is called a P + N junction, when the n-side is more heavily doped the junction is called a PN + junction. In a P + N junction: NA  N D sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2e xd ¼ Vbi qND

29

Figure 1.33 shows a PN junction with an external voltage applied. The external voltage polarity is so that the n-side is higher than the p-side. Compared to Fig. 1.32, we notice that the potential difference is now the summation of the built-in potential and the external reverse bias. The expression for the depletion region width from above is still applicable with modification for the applied voltage: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ffi 2e 1 1 xd ¼ ðVbi þ Vbias Þ þ q ND NA Applying a voltage with this polarity will lead to an increase in the width of the depletion region. This means that a thicker insulating layer is established, impeding current flow. We can also see that the electric field from the reverse bias will try to cause carrier drift of minority carriers (holes from the n-side and electrons from the p-side), thus the current will be very small. This is why PN junctions do not conduct in reverse bias. We will further explore the application of external bias in Sect. 1.10. The patterns of bending seen in Fig. 1.32 are contingent on one fact: uniform doping. Uniformly doped p and n sides will create depletion regions with uniform ion charge. This leads to the formation of a linear built-in field. This in turn creates a quadratic potential relationship. The energy profile is an inverted and scaled version of the potential profile. In most practical PN junctions, the transition from p to n is gradual. This creates a non-constant ionic charge in the depletion region and sharper band bending.

In a PN + junction, on the other hand: ND  NA sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2e xd ¼ Vbi qNA This confirms the fact that the depletion region width is dominated by doping in the more lightly doped side. Of all the results obtained for the PN junction, the width of the depletion region is the most important. The PN junction is not important as an individual device, but as a phenomenon that occurs in larger devices. And in these devices, it is the size of the depletion region rather than current flow that we are most interested in.

1.9

Junction Capacitance

1. Understand why a junction capacitance exists 2. Understand the significance of junction capacitance beyond diodes 3. Derive dependence of junction capacitance on the applied potential 4. Recognize the nonlinear nature of junction capacitance. The depletion region is extremely deficient in charge carriers, for most purposes we can assume it is fully depleted. In many cases, the depletion region acts as an insulating layer. The n neutral region is a semiconductor, the p-neutral region is also a semiconductor. The semiconductor–insulator–semiconductor structure in Fig. 1.34 is a capacitance. This capacitance is called the junction capacitance and plays a critical role in semiconductor circuits.

30

1

Devices

Fig. 1.33 Reverse-biased junction with a raised potential barrier. The total energy barrier is q(Vr + Vbi) of which qVr is above the equilibrium barrier level

The junction capacitance is important for reasons that go beyond diodes. The PN junction interface exists as part of all important semiconductor devices. For example, it exists at the interface of sources and drains on the one hand and the body on the other in MOSFETs. BJTs are essentially two PN junctions connected back to back, although asymmetric doping makes it way more interesting. Thus, the PN junction capacitance will be observed as a parasitic capacitance in all semiconductor devices. Since capacitance plays a primary role in defining the delay of a circuit, the junction capacitance is critical to understanding the performance of electronic circuits. The junction capacitance can be approximated as a parallel plate capacitor. This approximation is true if the plate area is much larger than the thickness of the insulator. The plate area in Fig. 1.34 is the cross-sectional area of the PN junction, while the thickness of the insulator is the width of the depletion region. Since the width of the depletion region is fairly small, the parallel plate capacitor model is valid for most purposes. If the approximation holds, then junction capacitance takes the following form in equilibrium: C e ¼ Cj ¼ A xd sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qNA ND e Cj ¼ 2ðNA þ ND ÞVbi Since the depletion region width is a function of the applied voltage, then junction capacitance must also be a function of said voltage. The dependence takes the form: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qNA ND e Cj ¼ 2ðNA þ ND ÞðVbi þ VÞ

Fig. 1.34 Junction capacitance. The capacitor plates are the two neutral zones. The dielectric is the depletion region

where V is the amount of reverse bias. We see that as more reverse bias is added to the device, the depletion region widens, leading to a drop in the value of capacitance. More importantly, the dependence of capacitance on voltage is clearly nonlinear (Fig. 1.35). To use this model in linear circuits, efforts must be made to linearize the junction capacitance.

1.9 Junction Capacitance

31

Fig. 1.35 The dependence of junction capacitance on external voltage. The graph is shown on a log scale. This is a nonlinear capacitance by definition

When the diode is forward biased, it conducts current (Sect. 1.10 through 1.12). In such a case, there cannot be a capacitance at the interface and the expression above is invalid. Notice that any path that conducts steady-state current is resistive rather than capacitive by definition.

1.10

Figure 1.36 also shows that the conduction bands of the n-side and the p-side align so that at the same energy level there is still a gradient, with more electrons on the n-side than the p-side. This causes diffusion of electrons from the

Forward and Reverse Bias

1. Understand the change in the potential barrier and band diagram under bias 2. Distinguish the effect external bias has on field and drift 3. Recognize why drift plays a minor role in current conduction in PN junctions 4. Understand how the diffusion of minority carriers is the critical mechanism that defines current flow 5. Understand the injection of minorities as the primary mechanism of current conduction in PN junctions. Figure 1.36 shows a PN junction in equilibrium. There is a slope in the energy levels from the p-side to the n-side. This slope corresponds to the built-in field. It causes electrons to drift down the slope from the p-side to the n-side and holes to bubble up from the n-side to the p-side. These two currents are drift currents and they extract minorities from one side to pump them into the other. Thus, these currents extract electrons from the p-side where they are very rare and inject them into the n-side where they are very plentiful. The opposite happens with holes.

Fig. 1.36 Junction at equilibrium, diffusion, drift, and field. Concentration gradient is just enough to allow diffusion to cancel drift

32

Fig. 1.37 Junction in reverse bias, diffusion, drift, and field

n-side to the p-side. A similar gradient in the valence band causes hole diffusion from the p-side to the n-side. Note that these diffusion currents extract majorities from one side and inject them to the other side where they are a minority. Thus, electrons diffuse from the n-side where they are very plentiful to the p-side where they are rare. This is the opposite of drift current. At thermal equilibrium, the misalignment of levels on the n- and p-side is just enough that the total diffusion current is equal to the total drift current. This leads to a situation where all currents cancel out and zero net current flows out of the terminals. Figure 1.37 shows a reverse-biased PN junction. The positive potential is applied to the n-side. This causes the potential barrier to increase relative to the equilibrium state. This increases the sharpness of the band bending, and it also increases the width of the depletion region. Band bending in Fig. 1.37 is sharper than in Fig. 1.36, this corresponds to the total field in the device being higher. Note that the externally applied field is in the same direction as the built-in field and thus strengthens it. Because the field is stronger, we should see a higher drift current. However, in practice very little increase in drift current is observed. The reason is drift current tries to extract electrons from the p-side and holes from the n-side. Thus, it is trying to move minorities from both sides. These minorities have limited

1

Devices

populations, and thus increased field fails to substantially increase this drift component. In Fig. 1.37 we also notice that the conduction band on the n-side is pushed down relative to the equilibrium diagram in Fig. 1.36. This reduces the electron gradient seen across the depletion region, thus reducing the diffusion of electrons. Similarly, we observe a deterioration of hole diffusion from the p-side to the n-side. Thus, the application of a reverse bias across a PN junction tends to diminish the diffusion current component. It increases the drift currents, but only manages to increase them by a small amount due to the dearth of minorities on either side. Thus, a reverse-biased PN junction will conduct a very small current, resulting from a disturbance of balance in favor of drift currents. The position of the “Fermi level” in Fig. 1.37 is very interesting. In reality, there is no well-defined Fermi level for a reverse-biased PN junction. This is because the reverse bias means the device is not in thermal equilibrium, and the Fermi level is only meaningful for thermal equilibrium. But it is useful to draw a “Fermi level” for areas of the device where carrier concentrations are stable and follow the mass action law. This is a practice we will follow for all electronic devices. And it can be seen in Fig. 1.37, where two Fermi levels are drawn for each of the neutral zones. These Fermi levels correspond to the carrier concentrations in these zones. Notice we do not try to draw a Fermi level for the depletion region since it is not a region that follows the mass action law. When we draw more than one Fermi level in a device, we call them quasi-Fermi levels. The Fermi levels in Fig. 1.37 may be ill-defined, but they are very useful. For example, notice that their separation corresponds exactly to the amount of applied reverse bias. So, while band bending in all levels will correspond to the total potential, the skew in Fermi level will correspond to only external potential. This is a very useful tool to distinguish between built-in and externally applied fields. Figure 1.38 shows the junction in forward bias. The external potential causes the Fermi level to split into two quasi-Fermi levels separated by energy corresponding to the applied voltage. This is similar but opposite to the effect observed in reverse bias. The amount of band bending observed in Fig. 1.38 is reduced relative to equilibrium in Fig. 1.36. This means that the built-in potential and electric field are reduced. This causes the drift current to drop. On the other hand, the diffusion current will increase. This can be seen in the following three ways: • The width of the depletion region drops considerably, making it easier for carriers to diffuse and minorities to be injected into both sides

1.10

Forward and Reverse Bias

33

are still responsible for the forward current in the PN junction. The forward current in the junction is very high because it extracts majorities from either side. The main takeaway from this section is that there is no symmetry to the behavior of diffusion and drift currents across the depletion region. Diffusion current increases extremely rapidly with forward bias. Drift current increases very slowly for reverse bias. Diffusion current extracts majorities and injects them as minorities. Drift current extracts minorities and injects them as majorities. The rate of increase of diffusion with forward bias is in fact, exponential. The reason is that the concentration of electrons increases exponentially toward the conduction band edge, thus a drop is the relative levels between the p- and n-sides leads to an exponential increase in diffusion current.

1.11

Fig. 1.38 Junction in forward bias, diffusion, drift, and field. Concentration gradient is increased across the depletion region. The balance is broken, and a large diffusion of carriers takes place across the interface

• The carrier gradient increases due to the drop in band bending, this can be seen by comparing Figs. 1.38 and 1.36. Since a lower conduction level on the n-side now corresponds to the same level on the p-side, this leads to an increase in gradient and an increase in diffusion • The total field in the depletion region is reduced. This field would normally sweep back carriers trying to diffuse. This effect is now diminished In forward bias, the diffusion current increases and the drift current decreases. This leads to the flow of a net current from the p-side to the n-side. There is one curious thing about this current: it is formed primarily by the injection of minority carriers. Diffusion current consists of injecting electrons into the p-side from the n-side and injecting holes from the p-side to the n-side. These minority injections are in excess of the equilibrium levels and will lead to excess carriers. Because the p-side is very rich in holes, the injected excess minority electrons will quickly recombine with the wealth of holes and the diffusion current will quickly disappear in the neutral p-side. Similarly, the diffused holes into the n-side will quickly recombine and disappear in the neutral n-side. Thus, on both sides, excess carriers must disappear deep in the neutral zones. In Sect. 1.12, we will see that even though the excess minority injections quickly dissipate on the other side, they

Minority Carrier Injection

1. Understand the concept of quasi-Fermi level 2. Derive minority carrier concentration at the edge of the depletion layer in equilibrium 3. Derive minority carrier concentration at the depletion interface in nonequilibrium 4. Relate minority carrier concentration to quasi-Fermi and external potential 5. Derive excess minority carrier concentration at the interface of the depletion layer. In Sect. 1.10, we briefly discussed how the Fermi level is not well-defined in conditions of thermal nonequilibrium. The main reason is that conditions of thermal nonequilibrium often involve excess or deficient carriers. Having excess carriers means that a material has had an excessive number of carriers above the equilibrium levels injected from an external source. These excess carriers cannot survive for too long, silicon will quickly recombine these excess carriers, restoring equilibrium levels. However, the presence of excess carriers, even if transient, is instrumental to semiconductor devices. It is particularly important to forward-biased PN junctions. Having excess carriers means that the mass action law is broken. In other words, the product of electron and hole populations may be in excess of the equilibrium level or may fall short of it. Thus, the mass action law is only valid in thermal equilibrium: n0 p0 ¼ n2i where the naught subscript indicates thermal equilibrium. The mass action law is inextricably related to the presence of a unique, true, Fermi level. In Sect. 1.3, we showed how the Fermi level ties the two carrier populations together, and in

34

1

Devices

Sect. 1.4, we proved how a unique Fermi level creates the mass action law. In Sect. 1.10, we still could use a qualified “Fermi level” in electrically neutral zones where the mass action law holds true. But what about areas where it does not? This is particularly important when talking about the depletion region; specifically, its edges. To deal with these nonequilibrium areas we introduce the concept of quasi-Fermi levels. Any area which is not in equilibrium has two quasi-Fermi levels. One level corresponds to electrons and the other corresponds to holes: EFn ¼ quasi  Fermi level for electrons EFp ¼ quasi  Fermi level for holes n ¼ NC eðEC EFn Þ=kT p ¼ NV eðEV EFp Þ=kT

The quasi-Fermi level is a very confusing concept. A critical key to understanding it is to understand that it is bogus. It means nothing. It is not a real level, nor does it say anything about the chemical energy like the Fermi level. It does not even express the probability of finding electrons because the Fermi–Dirac function breaks down in nonequilibrium. So, what is it? It is simply a number obtained by reverse engineering the carrier concentrations. Because there are excess carriers, we cannot use the Fermi level in their calculation. Instead we ask the question: What if the electron and hole concentrations we had were in equilibrium, what Fermi level or levels would each correspond to? The answer is two levels, the quasi-Fermi level for electrons and the quasi-Fermi level for holes. Figure 1.39 shows the definition of minority carrier concentration at the interface of the depletion layer. Namely, the electron concentration at the end of the neutral p-side, n (−Xp); and the hole concentration at the beginning of the neutral n-side, p(Xn). These concentrations represent the amount of minorities injected from the other side and are thus critical in deriving the PN junction current. In thermal equilibrium, the minority carrier concentrations can be easily obtained using the mass action law and the approximate value for majority carrier concentration. These are the concentrations shown in Fig. 1.39. Once an external potential is applied, the equilibrium concentrations shown in Fig. 1.39 break down completely.

Fig. 1.39 Minority carrier concentration in thermal equilibrium

Particularly in forward bias, large minorities are injected from both sides to the other side. This would naturally lead to an increase of electron concentration at -xp and an increase in hole concentration at xn. These injected minority concentrations do not exist in equilibrium. The mass action law cannot be used and as discussed earlier; we must use the quasi-Fermi levels. There are two quasi-Fermi levels in the PN junction in forward bias: Efn and Efp. Efn is used to calculate electron concentration anywhere in the device. Efp is used to calculate hole concentration anywhere in the device. At −Xp, the electron concentration can be obtained using the quasi-Fermi level for electrons: nðxp Þ ¼ NC eðEC EFn Þ=kT Similarly, we can obtain an expression for hole concentration on the interface of the n-side: pðxn Þ ¼ NV eðEV EFp Þ=kT At thermal equilibrium, the two quasi-Fermi levels collapse into one real Fermi level. On the n-side the electron concentration is unaffected by the forward bias while on the p-side, the hole concentration is unaffected by the disequilibrium. This means that the energy difference Ecn–Efn, where Ecn is the conduction band edge on the n-side is constant regardless of thermal nonequilibrium. Similarly, Efp–Evp is the same in equilibrium and non-equilibrium. Thus, Efn is the equilibrium Fermi level on the n-side, while Efp is the equilibrium Fermi level on the p-side.

1.11

Minority Carrier Injection

35

Figure 1.38 shows the band diagram in forward bias. We can finally understand why there are two Fermi levels in the diagram and why we had to put the words “Fermi level” in quotations. The two Fermi levels are in fact, quasi-Fermi levels. Far away from the depletion region, mass action law is restored deep in the neutral zones. Thus, away from the interface, each side’s quasi-Fermi level can be used as a unique Fermi level to calculate both electron and hole concentrations. At the interface, on the other hand, there are excess charges. The mass action law does not hold, and the two Fermi levels must be used to calculate charges. Efn must be used to calculate electron concentration on both sides. Efp must be used to calculate hole concentration on both sides. Thus, electron concentration on the edge of the p-neutral zone can be calculated as nðxp Þ ¼ NC eðECp EFn Þ=kT The aim now is to obtain this concentration in terms of the equilibrium level in Fig. 1.39: nðxp Þ ¼ NC e

ECp EFn kT

e

nðxp Þ ¼ NC e

ECp EFp kT

ECp EFp kT

=eðECp EFp Þ=kT e

EFn EFp kT

Because Efp is the equilibrium Fermi level for the p-side: N C e

ECp EFp kT

¼ np0 ¼ equilibrium conc. on p side

But also, Efn–Efp is related to bias voltage, because in Sect. 1.10 we stated that the split in quasi-Fermi levels corresponds to externally applied voltage: EFn  EFp ¼ qVbias Thus: nðxp Þ ¼ np0 eqVbias =kT This expression is also valid for reverse bias if the voltage is applied with the correct sign. We can obtain a similar expression for the hole concentration at xn: pðxn Þ ¼ pn0 eqVbias =kT where: pn0 ¼ hole concentration at equilibrium in the n side depletion interface

These equations show that with the application of a positive potential, the minority carrier concentrations at the edges of the depletion region increase dramatically. In fact, the increase is exponential. The middle column of Table 1.4 shows the ratio of carrier concentration at different biases relative to the equilibrium concentration

It is also useful to define excess minority carrier concentrations, which is the difference between nonequilibrium and equilibrium levels: Dnðxp Þ ¼ nðxp Þ  np0 ¼ np0 eqVbias =kT  np0

qVbias Dnðxp Þ ¼ np0 e kT  1

qVbias Dpðxn Þ ¼ pn0 e kT  1 Table 1.4 also shows the ratio of excess carrier concentration to excess carrier concentration at thermal equilibrium. Naturally, this ratio is undefined at zero bias since there is no excess minority carrier concentration in this condition (it is an equilibrium condition). The table shows a pattern of behavior that will remarkably coincide with diode current. If reverse bias is applied excess minority carrier concentration at the interfaces saturates at roughly −1. Thus, for all intents and purposes the minority carriers at the interface disappear completely with the application of any reverse bias. With the application of a slight positive voltage, the ratio of excess carriers increases dramatically and very quickly. This asymmetry in the carrier concentration profiles has to do with the fact that forward bias facilitates diffusion of majority carriers out of each side, while inverse bias leads to an increase in drift of vanishingly small minorities out of each side. See Sect. 1.10 for a more detailed discussion of the asymmetry of PN junctions.

1.12

Forward-Biased PN Junction Current

1. Understand current conditions in the depletion region 2. Derive current continuity conditions 3. Understand the replacement of diffusion currents by majority currents in the neutral zones 4. Derive current equations 5. Examine the reverse saturation current 6. Understand the effect of imperfect depletion on current. The following equation relates the change in current density to the recombination rate of carriers, it is called the current continuity equation: 

dJp Dp ¼q s dx

A similar equation can be written for electrons. This equation can be proved by examining Fig. 1.40. In the figure, the current density going into a small length (or current going into a small volume) must be equal to the current density exiting, unless there is a balance made up by recombination or generation.

36

1

Table 1.4 Minority carrier injection dependence on the applied potential at room temperature

Vbias

Ratio of minority carriers relative to equilibrium

Ratio of excess minority carriers relative to equilibrium

−2

1.80E − 35

−1

−1

4.20E − 18

−1

−0.5

2.00E − 09

−1

0

1

N/A

0.5

4.90E + 08

4.90E + 08

1

2.40E + 17

2.40E + 17

1.5

1.10E + 26

1.10E + 26

To restate this, if there is thermal nonequilibrium in the infinitesimal volume in Fig. 1.40, then the current densities going in and out can be unequal. This is because charges can appear or disappear in the volume by generation or recombination. The rate of recombination is proportionate to the excess carriers in the volume. This makes sense, because the more excess carriers, the faster the material will try to recombine them. The excess carrier density is Dp In the volume in Fig. 1.40, the excess carriers are DpADx And per unit area:

Taking the limit, we can obtain the following differential form: q

Dp Jp ðx þ DxÞ  Jp ðxÞ dJp ¼ ! s Dx dx

The current continuity equation can be applied anywhere. In the depletion region, in the neutral zone, or at interfaces. It can be applied to minority or majority carriers, and it applies whether the area is in thermal equilibrium or not. In Sects. 1.10 and 1.11 we established that minority carrier injection is the most crucial current-carrying mechanism. And we also established that said minorities move by diffusion. Thus, we already know the current equations for these minorities. For holes, the diffusion current equation is specifically:

DpDx

Jp ¼ Dp q

Thus, the excess charge per unit area is qDpDx The rate of recombination is also inversely proportional to a time-constant for recombination. This time-constant is a material property but is also highly dependent on the opposite carrier concentration in the material. Thus, the recombination rate is qDx

And the current continuity difference equation in the volume is Jp ðxÞ ¼ Jp ðx þ DxÞ þ qDx

dp dx

Substituting in the current continuity equation, we obtain a second-order differential equation relating the derivative of the carrier concentration to the excess concentration: Dp q

d2 p Dp ¼ q: 2 dx s

Dp

Dp s

d2 p Dp ¼ dx2 s

The doping profile in the neutral regions is uniform. Thus, the equilibrium carrier levels are not a function of x. Therefore, the derivative of hole concentration is equal to the derivative of excess hole concentration. In other words:

Dp s

p ¼ p0 þ Dp p0 6¼ p0 ðxÞ

Fig. 1.40 Current continuity. The current density entering a volume and that leaving it must be balanced by carrier generation or recombination in the volume

Devices

)

dp dDp ¼ dx dx

And: Dp

d2 Dp Dp ¼ dx2 s

1.12

Forward-Biased PN Junction Current

The time-constant has to be marked to indicate that it is the time-constant of minority holes on the n-side, thus: Dp

d2 Dp Dp ¼ dx2 spn

Similarly, for electrons: Dn

d2 Dn Dn ¼ dx2 snp

We define two new constants Ln and Lp, the diffusion lengths. These are the geometric mean of the diffusivity and time-constant. They have a unit of length and are correlated to how long a distance a carrier can move in a material of the opposite type before recombining. Thus, Lp is how long an injected excess hole can move in n-side before recombining, While Ln is the distance an electron can move in p-side before recombining. Both Ln and Lp are strong functions of doping. The higher the doping the lower the diffusion lengths. This is because if the n-side is heavily doped, it has a lot of electrons, and an injected excess hole will find plenty of willing electrons to recombine with. We can restate the differential equations using diffusion lengths: L2p ¼ spn Dp d2 Dp Dp ¼ 2 dx2 Lp L2n ¼ snp Dn d2 Dn Dn ¼ 2 dx2 Ln

37

depletion region, we know the excess carrier concentration in terms of equilibrium levels and applied bias (Sect. 1.11): Dpð1Þ ¼ pn0  pn0 ¼ 0

qVbias n2 qVbias Dpðxn Þ ¼ pn0 e kT  1 ¼ i e kT  1 ND Solving for the constants, the minority carrier profiles can be obtained as

qVbias DpðxÞ ¼ pn0 e kT  1 eðxxn Þ=Lp ; x [ xn

qVbias DnðxÞ ¼ np0 e kT  1 eðx þ xp Þ=Ln ;

x\  xp

Thus, the excess minority carriers begin at a very high level at the edge of the depletion region. This high level is exponentially related to the applied forward bias. As we move away from the edge of the depletion region, the excess carriers recombine with the plentiful opposite carriers. This recombination is rather rapid and leads to an exponential drop in excess minority concentration, reaching null away from the depletion region. The rate at which the concentration drops is a function of diffusion length, which in turn is a function of doping levels. This profile is shown in Fig. 1.41. Excess holes are pumped by diffusion from the p-side through the depletion region. This is due to the lowering of the energy barrier caused by the forward bias. This excess level is maintained at a constant level throughout the depletion region. The level of holes in the depletion region was obtained in Sect. 1.11. The reason it remains constant will be explained shortly. Once these excess holes reach the neutral n-zone, they drop

The differential equations have a general solution of the form: d2 Dp Dp ¼ 2 ! dx2 Lp DpðxÞ ¼ C1 ex=Lp þ C2 ex=Lp The two integration constants C1 and C2 require two known boundary conditions. Note that the obtained profiles are the profiles of excess carriers as they enter the neutral zone of the opposite type. Thus, the equation for delta p above is valid for x > xn, i.e., it is valid from the edge of the depletion region and deep into the n neutral zone. We know two boundary conditions for this excess carrier concentration. First, at x equals infinity, that is to say extremely deep into the neutral zone, the hole concentration drops to equilibrium levels and excess hole concentration drops to null. Second, at x = xn, at the boundary of the

Fig. 1.41 Minority carrier concentration profiles in forward bias

38

1

exponentially with distance. A similar behavior is observed for electrons diffusing from the n-side through the depletion region and into the p-neutral zone. The reason minority carriers decay rapidly in the neutral zones is that there are plenty of carriers of the opposite type that increase the rate of recombination. For example, on the n-side there are lots of electrons which will quickly recombine with the injected excess holes, reducing their recombination time-constant significantly. The depletion region is extremely poor in free carriers. The assumption that depletion regions are entirely depleted of carriers is an approximation but one we are ready to make at this point. Since there are no carriers in the depletion region, then there is no recombination. This means that the minority carrier profile remains constant through the depletion region. We can now use the minority carrier profile to obtain the diffusion current profile in the neutral zones. We have already determined that drift current in the depletion region is negligible. Thus, we are only concerned with diffusion current. The expression of diffusion current is Jp ¼ Dp q:

dp dx

The excess carrier concentration has been determined in the neutral zone to be

qVbias DpðxÞ ¼ pn0 e kT  1 eðxxn Þ=Lp ; x [ xn

qVbias DnðxÞ ¼ np0 e kT  1 eðx þ xp Þ=Ln ;

x\  xp

The derivative of the hole concentration is the derivative of the excess concentration, recall that doping is uniform: dp dDp pn0 qVbias ¼ ¼ e kT  1 eðxxn Þ=Lp dx dx Lp We can use this derivative to find the diffusion current. There is only one problem. When we talk about “diffusion current” in a forward-biased PN junction, we are talking about diffusion through the depletion region. But the excess minority expressions obtained above apply to the neutral zones. This is not a huge problem if we realize these currents must be continuous at the interface between the depletion region and the neutral zones. Thus, the diffusion currents at x = xn and x = −xp are also the diffusion currents in the depletion region. Thus, we can use the excess hole derivative at x = xn: dDp pn0 qVbias jx¼xn ¼  e kT  1 dx Lp And the hole diffusion current at the edge of the depletion region is

Jp ðxn Þ ¼

Devices

qDp pn0 qVbias e kT  1 Lp

Similarly, we can obtain an expression for the electron diffusion current at the edge of the depletion region: qDn np0 qVbias Jn ðxp Þ ¼ e kT  1 Ln These are the two diffusion current components at the edges of the depletion region. However, their summation is also the total current throughout the depletion region. This is for two reasons: 1. The drift current in the depletion region is negligible (Sect. 1.11), thus we need only consider the diffusion current. 2. The diffusion current is constant throughout the depletion region because carrier depletion means there is no recombination throughout the depletion region. Thus, the current anywhere in the depletion region is the current everywhere in the depletion region. Thus, the total current in the depletion region is the summation of its two diffusion currents anywhere in the depletion region: Jdepletion ¼ Jn ðxp Þ þ Jp ðxn Þ qD p qVbias qDn np0 qVbias p n0 e kT  1 þ e kT  1 Jdepletion ¼ Ln Lp  

qVbias Dn np0 Dp pn0 Jdepletion ¼ q þ e kT  1 Ln Lp In steady state, there is no charge accumulation anywhere in the device. This means that the total current anywhere in the PN junction must be the total current everywhere in the PN junction. Thus, the depletion region’s current density obtained above is also the current density in the entire device. Thus, we can obtain the PN junction current as   Dn np0 Dp pn0 qVbias IPN ¼ qA þ e kT  1 Ln Lp The equilibrium minority levels are np0 ¼

n2i NA

pn0 ¼

n2i ND

And thus, the current is   Dn Dp qVbias þ IPN ¼ qAn2i e kT  1 Ln NA Lp ND

1.12

Forward-Biased PN Junction Current

The current is obviously very dependent on the applied forward bias. It is also dependent on diode parameters through the external constant of proportionality:

qVbias IPN ¼ I0 e kT  1 I0 ¼

qAn2i



Dn Dp þ Ln NA Lp ND

 ¼ reverse saturation current

Io is called the reverse saturation current because it is the maximum current that flows when the diode is reverse biased. Using Vbias = minus infinity, Ipn = −Io. Figure 1.42 shows typical I–V curves for PN junctions. The PN junction current in reverse is negligible relative to the forward current. From the equation, we find that the reverse current tends to saturate toward a value -Io. In the forward direction current increases incredibly rapidly, thus we say that the diode turns on very quickly. The reverse saturation current is independent of applied voltage; however, it is a strong function of temperature through the intrinsic carrier concentration. It is also a function in the doping profile and through it the recombination length. Reverse saturation current is very important in deep submicron technologies as one mechanism of charge leakage. We now have to consider how current behaves in the neutral zones of the PN junction. In Fig. 1.41, as holes diffuse into the neutral n-zone, they recombine very quickly with the abundant electrons. This leads to a drop in both the excess hole concentration and the hole profile gradient in the n neutral zone. This means the hole diffusion current drops rapidly (in fact exponentially) in the n neutral zone.

Fig. 1.42 I–V characteristics of a PN junction

39

However, the total current must remain constant everywhere in the device. Thus, in the n neutral zone, as the hole diffusion current drops, it must be replaced by another current component to keep the total current constant. As shown in Fig. 1.43, the minority carrier diffusion currents are replaced by majority carrier currents. Thus, on the n-side as the hole current dies out, it is replaced by an electron current. The opposite is true in the p-neutral zone. Are the majority currents in Fig. 1.43 diffusion or drift currents? They do not flow due to the application of an external field. An examination of the band diagram of a forward-biased PN junction shows that the entirety of the external potential falls on the depletion region. Thus, these majority currents cannot be drift currents. In fact, they are diffusion currents. As holes recombine with electrons on the n-side, this disturbs the equilibrium level of electrons. This creates a “vacuum” of electrons near the depletion region. This vacuum then sucks in electrons from adjacent areas in the neutral zones. This moves the vacuum along the neutral zone until electrons are finally sucked in from the metal contact to replace the recombined electrons. This process continues uninterrupted. Because the flow of electrons occurs due to a disturbance of electron gradient, it is a diffusion current. One major assumption in the derivation of junction current is that diffusion currents are constant through the depletion region. The depletion region is never entirely devoid of charge carriers. The lowest carrier concentration (highest depletion) is observed when the Fermi level coincides with the intrinsic Fermi levels. This corresponds to carrier levels comparable to pure silicon, which while much lower than doped (neutral) silicon, is still not null.

40

1

Devices

Fig. 1.43 Minority and majority currents in the neutral zones. As minorities recombine deep in the neutral zones, current continuity forces a majority current component to replace it

The imperfect depletion means that there is excess recombination and generation that happens in the depletion region. The excess recombination rate happens mostly at the point, where electron and hole concentrations are equal within the depletion region. This is the point where both carrier levels are high, providing the most capacity for recombining. This happens at the point where the equilibrium Fermi level meets the intrinsic Fermi level. In a symmetric junction, this occurs roughly in the middle of the depletion region. This phenomenon leads to an additional reverse current component, which modifies the total PN junction current as follows:   xd qVbias IPN ¼ I0 þ Aqni e kT  1 sd Ireverse ¼ I0 þ Aqni Idepletion ¼ Aqni

xd sd

xd sd

xd ¼ width of the depletion region sd ¼ recombination time constant in the depletion region

The recombination time-constant in the depletion region is much higher than the time-constants in the neutral zones due to the paucity of carriers. This makes the depletion current Idepletion much smaller than the reverse saturation current Io, relegating it to a secondary effect. However, it can have a significant impact on leakage in modern technologies (Chap. 10). A misconception about the PN junction is that its rectifying action is a result of the modulation of the width of the depletion region. This is not strictly true. Thinking of the depletion region as an insulator is misleading. The rectifying action of the PN junction is a result of the asymmetry in the populations of carriers that move in either direction. With positive bias, majorities diffuse in large quantities across the depletion region. The fact that the depletion region is thinned is not as important as the fact that the energy barrier is reduced. In reverse bias, the barrier to majorities increases. The fact that the depletion region is widened is a correlated result rather than a causative reason.

1.12

Forward-Biased PN Junction Current

. Everything we have derived regarding the PN junction is contingent on an asymmetric behavior of drift and diffusion in the depletion region. Particularly, we have concluded drift current is very small because of the dearth of populations it is trying to move. But drift current is also dependent on the applied field through velocity and mobility. When we apply increasing reverse bias, does this not increase the electric field in the depletion region by strengthening it. And if so, should the drift currents not increase linearly with the applied reverse potential? Why are we seeing a saturating behavior for drift currents? In Sect. 1.8 we showed that the more applied reverse bias, the wider the depletion region. The electric field across the depletion region is roughly equal to the total potential across the device divided by the width of the depletion region. Increasing reverse potential increases both the voltage across the device and the width of the depletion region, albeit at slightly different rates. Thus, the field in the depletion region, expressed as the slope of bending will increase at a slow and saturating rate, and drift currents will saturate at low values.

1.13

Bipolar Junction Transistor

1. 2. 3. 4. 5. 6.

Learn the structure of the BJT Recognize BJT active current components Derive BJT collector and base current Understand BJT parameters List BJT regions of operation Deduce BJT behavior in the active region and the transistor action 7. Realize that BJT saturation is more than just two forward PN junctions. The device shown in Fig. 1.44 is called a bipolar junction transistor (BJT). It is called bipolar because both types of charge carriers can play a role in current flow, although as we will shortly see only one dominates in the active mode. It is immediately obvious why it is called a junction transistor; it effectively consists of two PN junctions connected back to back. However, there are two distinguishing features that set it apart from just a pair of PN junctions: • The particular device shown in Fig. 1.44 is called an npn BJT since it consists of two n regions and one p region. One of the n regions is significantly more heavily doped

41

Fig. 1.44 Structure of an npn BJT. The emitter is more heavily doped than the collector. The base must be extremely narrow. These two features distinguish a BJT from a pair of PN junctions

than the other. This asymmetry in doping is integral to the operation of the BJT • The intermediate p region between the two n regions is extremely narrow. In the npn transistor, the main charge carrier is electrons. A similar device can be constructed using a pnp structure. BJTs have favorable frequency response due to their large currents and limited parasitics relative to MOSFETs (see Chap. 3 for a thorough discussion of delay). However, BJTs are difficult to fabricate and have limited fanout. Thus, the use of BJTs in digital circuits is relegated to specialty applications. When BJTs are used in digital circuits, there is a tendency to use npn transistors only. The lower mobility of holes would make pnp transistors lose their competitive speed edge. The more heavily doped n terminal is called the emitter. The p region is the base, and the lightly doped n region is the collector. We will shortly see that the emitter emits a large number of carriers (electrons for npn), while the collector collects them from the base. The base is a control terminal that defines the current between the emitter and the collector. The PN junction I–V characteristics in Fig. 1.42 were obviously nonlinear. All electronic devices exhibit nonlinear characteristics. But even worse, for almost all electronic devices, we cannot use a single nonlinear equation to describe the I–V characteristics. The equation depends on the conditions on the device terminals. The major takeaway from studying an electronic device should be its model. The model can be used to represent the device in a certain state. But because device behavior can vary wildly, we need to divide its I–V characteristics into pieces. Each piece will cover certain ranges of inputs, with each range being described by a single I–V equation. Each part of the characteristic is called a region of operation, and the I–V characteristic used to describe the region is called the model. The easiest device to model is the PN junction. Examining its I–V characteristics, we see that it behaves differently in two distinct regions. If the applied voltage is negative (on the p-side), then the device does not conduct any current. This is a region of operation, and it is called

42

1

reverse. The model used for this region depends on the accuracy level we want. For example, we can model the device as a perfect open circuit, allowing zero current flow regardless of applied potential. But we can also model it as a current source where the value of current is the reverse saturation current. The other region of operation for the PN junction is called the forward region, and it generally applies when the diode has positive potential applied to the p-side. In this case the diode conducts a very large current. Again, there are several ways to model the device in this region. The most accurate model would be to replace the device with its exponential I– V equation. However, simpler linear models are sometimes more useful. For example, we can model the PN junction as a short circuit. Or perhaps a very small resistance approximating the slope of the PN I–V characteristics. Transistors are three-terminal devices. Thus, they potentially have more regions of operation and more complicated models than the PN junction. The regions of operation of the BJT are much easier to understand than those of the MOSFET. BJT regions of operation stem from the different permutations of forward and reverse for the BC (Base– Collector) and BE (Base–Emitter) junctions. Table 1.5 lists the four modes of operation. The most important mode of operation in analog circuits is the active region. It is also the mode that defines speed in digital circuits. In this mode, the BE junction is forward and the BC junction is reverse. Figure 1.45 shows the band diagram of the npn transistor in thermal equilibrium. Figure 1.46 shows the npn in active region. The BE junction is forward biased, the depletion region is greatly reduced. The BC junction is reverse, a large depletion region with a large field is built up. Figure 1.47 shows the band diagram of the npn transistor in active region. Compare to the equilibrium band diagram in Fig. 1.45. The energy barrier is reduced in the BE junction, while it is greatly increased in the BC junction. Because the BC junction is reverse, we might conclude from Sects. 1.10 through 1.12 that no current can flow through this junction. And thus, no current should be able to flow into the collector from either the emitter or the base. So far, we have also observed depletion regions as insulating, thus we would expect the BC junction depletion region to stop all current flow. However, we will shortly see that a large current will flow between the emitter and the collector. The value of this current is not controlled by the collector to emitter voltage, Table 1.5 Regions of the BJT are defined by combinations of reverse and forward on the two junctions

BE horizontal BC vertical

Devices

but rather by the voltage between the base and the emitter. When a current flowing between two terminals is controlled by the voltage on a third, we say that transistor action exists, with the device potentially acting as a switch. When in active mode (Figs. 1.46 and 1.47), a large electron current diffuses from the emitter into the base. This current is simply the forward-biased PN junction electron current from Sect. 1.12. This electron current then diffuses through the p-type base. In a normal PN junction, the diffused electrons will travel a very short distance on the p-side before recombining and being replaced by a hole current. But assume that in Fig. 1.46, some of the electrons manage to reach the BC junction. These electrons will be immediately swept into the collector, forming a collector current. The fact that the electrons are swept into the collector can be seen in two ways. In Fig. 1.47, the slope in the conduction band between the base and the collector will encourage the electrons to be swept down. In Fig. 1.46, the large depletion region has an internal field from the collector to the base. This field will cause the electrons to be swept into the collector. But how can the electrons flow through the depletion region, when the depletion region is insulating? The depletion region is not an insulator. It has insulating properties because it is deficient in charge carriers. Thus, it has low conductivity. But if charge carriers come from an outside source, and a favorable field exists across the depletion region, the carriers will definitely move through. A true insulator has to have a large bandgap, offering a large barrier to moving charges. To reiterate, the sequence of events involved in active current flow is as follows: • A very large flow of electrons is injected from the emitter to the base. A much smaller hole diffusion occurs from the base to the emitter. The asymmetry is due to the asymmetrical doping concentration between the base and the collector. When emitter electrons flow into the base, they can go one of three ways: 1. They can reach the metal contact of the base, where the electrons flow into the wire, forming part of the base current. 2. They can recombine in the base, in the PN junction (Sect. 1.12), this would eventually happen to all diffusing minorities. Any electrons that recombine in the base will be replaced by a hole current. This hole current draws in holes from the metal contact, forming another part of the base current.

Reverse

Forward

Reverse

Cutoff

Active

Forward

Inverse

Saturation

1.13

Bipolar Junction Transistor

43

Fig. 1.45 Band diagram of npn transistor in thermal equilibrium. There is an imbalance in the bands between the emitter and the collector due to the inequality of doping

Fig. 1.46 Active mode npn transistor. BE depletion region is very narrow. BC is large

3. The electrons can reach the collector without either recombining or being drawn in the metal contact. These electrons would then be collected by the large collector field, forming the collector current. For proper npn operation, component 3 above must be much larger than either components 1–2. That is to say, the majority of electrons must reach the collector without either recombining in the base or reaching the base terminal.

Fig. 1.47 Band diagram of active BJT. The barrier is reduced at the BE junction, but is greatly increased at the BC junction. The dashed lines are the quasi-Fermi levels

Figure 1.48 shows how a BJT would be fabricated on an IC. It looks very different from the conceptual view in Fig. 1.44. This construction reduces components 1–2 of the current above, while maximizing component 3. The base is made extremely thin. This reduces the time that electrons have to travel in the p-type material. This reduces their recombination, thus reducing component 2. The metal contact in Fig. 1.48 is extremely small relative to the area between the collector and base. This reduces component 1 of the current. Because the collector completely envelopes the base, the base–collector common area is huge. This allows the collector to catch the majority of electrons diffusing from the emitter, thus maximizing component 3 of the current. Why did we never see a lot of charge carriers crossing the wide depletion region in the reverse-biased PN junction? The reason is that we would never have seen an abundance of electrons on the p-side in a reverse-biased PN junction. The only electrons that drift from p to n are minority electrons whose density is very small. In the npn transistor, the

44

Fig. 1.48 Actual construction of npn BJT. The p region must be extremely thin, the base metal contact must be small, and the area between the base and collector must be maximized, allowing the collector to capture most diffused emitter electrons

1

Devices

the emitter is much more heavily doped than the base, we can conclude that I1  I4. I6 is the electron flow in the neutral zone of the emitter. It flows to replace the quickly dissipating hole diffusion I4. Thus I4 = I6. Currents I1, I4, and I6 can all be fully understood in terms of a forward PN junction. The total emitter current is the summation of the two currents I1 and I4, but it should ideally be roughly equal to I1. InE ¼ emitter electron current ¼ I1 IpE ¼ emitter hole current ¼ I4 ¼ I6 IE ¼ InE þ IpE ¼ I1 þ I4  I1 The base current, that is the current flowing into the B terminal of the device is I3. It is the hole current that replaces recombined electron diffusion current into the base I2. Thus: IB ¼ I2 ¼ I3

Fig. 1.49 Carrier flux components in active npn transistor. Although marked as currents, the arrows show charge flux directions, thus for electrons current is in the opposite direction. I1 = electron injection from emitter to base. I2 = recombined electrons in base. I3 = base hole flow from metal contact. I4 = hole diffusion from base to emitter. I5 = electron flow into collector. I6 = electron flow into emitter from metal contact to counter recombined base holes

large current flow near the BC interface is due to injection from a third terminal: the emitter. Note that we also care about the large current flow at the interface rather than a large electron concentration. Figure 1.49 shows charge flow components in an npn transistor in active mode. It is critical to notice that although we use the letter I to indicate flows in the figure, arrows mark the direction of charge flow. Thus, for electrons, the resulting current will be in the opposite direction to that shown on the figure. The definitions of charge flow components are the following: • I1 = Injected electrons from emitter to base • I2 = Recombined electrons in base • I3 = Base hole diffusion to counter electron recombination • I4 = Hole diffusion from base to emitter • I5 = Electron flow into collector • I6 = Electron flow in emitter to counter hole recombination We can conclude some relations between these charge fluxes. I1 is the injected electron diffusion from the emitter to the base. I4 is the opposite hole diffusion from the base to the emitter. Thus, I1 and I4 are the two current components of a forward-biased PN junction discussed in Sect. 1.12. Because

By KCL, the collector current is the balance of the emitter diffusion current I1 and the base current. In other words, the collector current is formed by those electrons that have diffused into the base but failed to recombine, thus: IC ¼ InE  IB ¼ I1  I2 Both holes and electrons play a role in current conduction. However, we can already see from Fig. 1.49 that electrons play a much larger role. Also, from Fig. 1.49, we observe that the most interesting events occur in the base. This is where the diffused electrons recombine or manage to reach the collector, thus determining the relative magnitudes of base and collector currents. If we obtain the minority carrier profile in the base, then its slope at the BE interface would determine the diffusion current at this point. This electron flux (or part of it) is what gets swept across the BC junction forming the collector current. On the other hand, the amount of charge lost to recombination in the base is equal to the base current (Sect. 1.12). The base electron profile can be obtained by solving the same excess minority differential equation in Sect. 1.11 with different boundary conditions: d2 Dn Dn ¼ 2 dx2 Ln

qVBE qVBE Dnð0Þ ¼ np0 e kT  1  np0 e kT

qVBC DnðWB Þ ¼ np0 e kT  1  np0 where Ln is the diffusion length of electrons in the base. The excess minority carrier concentration at the BE interface (x = 0) is the amount of injected electrons from the

1.13

Bipolar Junction Transistor

45

emitter and is the same as in Sect. 1.11. However, the other boundary condition is not at x = infinity. The base is extremely narrow; thus, we do not have enough space to assume we can go to infinity. Instead, our final boundary condition is at x = Wb, where Wb is the width of the base. Because the reverse bias at the base–collector junction is extremely large, the excess electron concentration at this junction falls into negative territory, indicating a deficit relative to equilibrium levels. The electron density at the collector–base interface is in fact roughly null. This is supported by common sense because any electrons that manage to reach this interface cannot remain there for long and will readily get swept into the collector. The solution for the differential equation differs from the case of the PN junction due to the limited width of the base and the different boundary conditions. The solution is thus a hyperbolic function:



qVBE sinh WLB x n

; 0\x\WB DnðxÞ ¼ np0 e kT  1 sinh WLnB Note that we are assuming the beginning the base is at x = 0 and its end is at x = Wb. Strictly speaking, these boundaries should have taken into consideration the width of the depletion region on either end. However, the approximation is acceptable given Wb is much greater than depletion width. The hyperbolic solution above is very unwieldy. Fortunately for small values of x (as is the case for a properly designed base), the function can be approximated linearly as 

qVBE  x DnðxÞ ¼ np0 e kT  1 1  ; 0\x\WB WB Which can be further approximated as   qVBE x kT 1 DnðxÞ ¼ np0 e ; 0\x\WB WB

We can use a similar derivation to find I6 since I6 flows under the same physics as I1. In this case however, we must use the geometry of the emitter, where We is the width of the emitter: I6 ¼ IpE IpE

qAE Dp pn0 e ¼ WE

qVBE kT

We can now obtain the total emitter current and confirm that it is dominated by electron diffusion into the base. Note (Sect. 1.12) that total current in the forward BE junction is the summation of the two diffusion currents in the depletion region:   Dn np0 Dp pn0 qVBE IE ¼ jInE j þ IpE ¼ qAE þ e kT WB WE But: WE  WB IE  qAE

Dn np0 qVBE e kT WB

np0 ¼ equilibrium electron density in the base pn0 ¼ equilibrium hole density in the emitter The emitter current is primarily due to I1 because of two reasons: The width of the base is much smaller than the width of the emitter. And the equilibrium electron density in the base is much larger than the equilibrium hole density in the emitter. Current directions should not be confusing. Electrons are diffusing from the emitter to the base, holes are diffusing from the base to the emitter. They will both add up to form a current into the emitter from the base.

There is a gradient for electron density at x = 0. This gradient defines electron diffusion at the BE interface, this is by definition I1 (Fig. 1.49): I1 ¼ InE InE

dn qAE Dn np0 e ¼ qAE Dn jx¼0 ¼  dx WB

qVBE kT

Ae is the common area between the emitter and the base. It is the cross-sectional area of the BE junction. From Fig. 1.48, we can conclude that this area is very large, especially relative to the base width.

Fig. 1.50 Minority carrier profile in the base. It is approximately linear for a narrow base. We will roughly ignore the width of the depletion regions in calculation. There is a deficit of minority carriers at the collector

46

1

Figure 1.50 shows the linear profile of electrons in the base. As shown, the excess electron concentration drops to a negligible level at the collector interface. However, the gradient of the profile remains very sharp, causing a diffusion of electrons that will then be swept into the collector. The profile is thus roughly triangular. There are obviously electrons that exist at the base–emitter interface that disappear at the base–collector interface. These charges do not, in fact, disappear. They recombine in the base, causing a hole diffusion to be vacuumed in from the base metal contact. This is the base current. To calculate the lost charge, calculate the area under the curve in Fig. 1.50: QB ¼ 0:5  WB  qDnð0Þ QB ¼ 0:5  WB  qnp0 e

qVBE kT

The current continuity equation relates current density to lost excess charge recombination and recombination time in the base (Sect. 1.12): q

In ¼

0:5qAE WB np0 e IB ¼ sn IE  qAE

qVBE kT

Dn np0 qVBE e kT WB

The expression of diffusion length allows us to reexpress lifetimes of electrons in the base in length terms, and brings the two equations closer: L2n ¼ sn Dn ! sn ¼ L2n =Dn 0:5qAE WB np0 Dn e IB ¼ L2n IE ¼ qAE

qVBE kT

Dn np0 qVBE e kT WB

IE Dn np0 qVBE 0:5qAE WB np0 Dn e ¼ qAE e kT = IB WB L2n

Substituting and integrating: qVBE kT

0:5qWB np0 e Jn ¼ sn

this comparison, we have to express emitter and base currents in similar terms. First, we will ignore the hole component of the emitter current.

Thus, the two currents can be divided:

Dn dJn ¼ sn dx

dJn 0:5qnp0 e ¼ dx sn

Devices

qVBE kT

Yielding a very useful observation: qVBE kT

0:5qAE WB np0 e sn

qVBE kT

Notice the following: • Qb is the total charge per unit area. To bring it down to concentration for the continuity equation, divide by Wb • We are calculating the recombination of electrons; however, the actual base current is a hole current that counters the recombination. Thus, although most of the equation seems to suggest an electron current, this is actually a hole current from the metal contact into the base • The base current is dependent on the time-constant for electron recombination in the base, and thus is a function of the base doping. This is another reason to reduce base doping. The collector current is the balance of the emitter current and the base current. For proper operation, most of the emitter current should pass into the collector rather than flow from the base. Thus, the base current must be significantly smaller than either the emitter or collector currents. To make

IE L2n ¼ IB 0:5WB2 For proper transistor operation, the ratio of emitter current to base current must be large. This is achieved by having the base width much smaller than the electron diffusion length in the base. This makes sense, if most electron flux manages to reach the collector, then the distance that it has to travel before meeting the collector must be much smaller than the average distance it travels before recombining. That is to say: If WB  Ln IE  IB There are four parameters that can define the performance of a BJT transistor. They are listed in Table 1.6. These parameters generally describe how much the base current degrades the emitter current as it flows to the collector. They also represent the purity of the emitter current in terms of its majority charge carrier. Evaluating emitter efficiency can be performed by expanding the current expressions. This factor should be nearly 1 since the emitter should be able to inject a lot more electrons into the base than the base can inject holes into the emitter.

1.13

Bipolar Junction Transistor

47

Table 1.6 BJT efficiency parameters Parameter

Symbol

Equation

Meaning

Emitter efficiency

c

I1/(I1 + I6)

How much of the emitter current should be purely electrons. Ideally 1

Base transport

at

Ic/Ine = I5/I1

How much of the electron flow into the base makes it to the collector. Ideally 1

CB current gain

a0

c at = Ic/Ie

Same as base transport factor, except for taking hole emitter current into consideration

CE current gain

b

Ic/Ib

Current gain from base to collector, ideally infinity if base current is null

qAE Dn np0 e WB

InE c¼ ¼ InE þ IpE qAE Dn np0 eqVkTBE c¼

Dn np0 WB Dn np0 Dp pn0 WB þ WE

¼

qAE Dp pn0 e WE

WB

þ



Dp pn0 WB Dn np0 WE

that for this purpose the entire emitter electron current can be considered to be the collector current:

qVBE kT qVBE kT



1



And expanding minority carrier concentration in terms of doping in the base and emitter Nb, and Ne: c¼

1 1þ

Dp n2i NB WB Dn n2i NE WE

¼

1 1þ

IC InE  IB InE ¼ ¼ 1 IB IB IB qAE Dn np0 e WB

0:5qAE WB np0 Dn e L2n



D p NB W B Dn NE WE

The base transport factor can be estimated by realizing that collector current is the balance of emitter electron injection and base current: aT ¼

I6 IC InE  IB IB ¼ ¼ ¼1 I1 InE InE InE

aT ¼ 1 

WB

W2 aT ¼ 1  B2 2Ln Base transport factor tends toward 1 if the base width is much smaller than the electron diffusion length. We already discussed how this is a requirement of proper BJT design. Common base current gain is a derivative of emitter efficiency and base transport factor. Note that although it is a derived quantity, it is very important. We have already obtained a rough expression of common emitter current gain. We will derive it again here and show

2L2n WB2

Again, proper operation dictates a very thin base relative to electron diffusion length. There is obviously a relationship between common emitter and common base current gains: b¼

InE InE  IB 1  IB =InE 1¼ ¼ IB IB IB =InE

aT ¼ 1 

Expanding from current equations: qVBE 0:5qAE WB np0 Dn e kT 2 Ln qVBE qAE Dn np0 e kT

1

2L2n 1 WB2

b

WB \WE

c1

qVBE kT

Ln  W B

The low base width, and particularly the differential doping ensure emitter efficiency is high:

NE  NB

qVBE kT



IB IB ! ¼ 1  aT InE InE

1  ð1  aT Þ aT ¼ 1  aT 1  aT

The collector current is thus roughly equal to the emitter current. From Sect. 1.12, the emitter current is a forward PN junction current, so it will be an exponential function of base to emitter voltage. So, in turn the collector current is exponential in the base–emitter voltage. Because the current in a terminal, namely the collector, is a function of the voltage of other terminals, namely the base–emitter voltage; the device exhibits transistor action. The active mode of operation is used for most analog circuits, where it allows the transistor to be used as a small signal amplifier. It is also the mode that supplies the maximum current from the BJT, thus it plays a role in the calculation of delay of BJT logic gates.

48

The three other modes or regions of operation of the BJT are cutoff, inverse, and saturation. In cutoff both BC and BE junctions are reverse biased. The inverse bias ensures no current can flow through either junction. Thus, in cutoff the BJT will have zero current through all three terminals. This mode can be used as an off switch. A very small current can flow in a cutoff BJT. This would be the reverse saturation current passing through the junctions. The inverse mode is very similar to the active mode. The only difference being that the collector–base junction is forward and the emitter–base junction is reverse. All the transistor parameters and current equations can be evaluated for the inverse mode similar to the active mode. However, replace all instances of emitter geometry or doping with collector geometry or doping. The collector is lightly doped. This allows the depletion region to widen in reverse, thus sweeping electrons more effectively in active mode. This also means that the asymmetry in doping between base and collector is almost nonexistent as opposed to the asymmetry in doping between the emitter and base. This makes base current in inverse mode much higher than active mode. We thus observe a deterioration in all the transistor parameters in Table 1.6 in the inverse mode. The inverse mode has very limited applications, simply because it is a far inferior copy of the active mode. In saturation mode, both junctions are forward. Figure 1.51 suggests that this would lead to current flowing from the base into both the emitter and the collector through the forward-biased junctions. In reality, saturation mode is more complicated. There will be a net current flowing from the collector to the emitter. But there will also be a significant base current. Figure 1.52 shows what happens during saturation. The emitter injects immense amounts of electrons into the base. This also happens in active region. However, unlike the active region, the potential at the collector junction does not sweep these electrons away. Instead, the collector also injects a large amount of electrons into the base through the forward BC junction. There will be an excess electron concentration at both ends of the base as shown in Fig. 1.53. The level is higher at the emitter due to the higher emitter doping, but throughout the base the electron level is much higher than the equilibrium level.

Fig. 1.51 At first the saturation mode suggests we have two forward-biased PN junctions with current flowing into both terminals from the base. This is a very false assessment

1

Devices

Fig. 1.52 Base saturation. Massive amounts of electrons are injected from the emitter and collector into the base. They overwhelm the ability of the base to recombine, forming a low impedance n-type path from emitter to collector

The base is thus overwhelmed by electrons from both sides. Its ability to recombine these excess carriers through its holes is saturated, and the base is effectively changed into a stretch of temporary n-type material. Thus, a resistive path is created between the collector and the emitter. This allows the current flow between the two terminals. Resistance between the two terminals is the lowest possible due to the abundance of a unipolar charge carrier. Thus, a collector–emitter current will flow. The BJT in saturation region is effectively an on switch. The excess minority carrier concentration at the BC junction is

qVBC DnðWB Þ ¼ np0 e kT  1 Which is relatively large due to the forward bias. The amount of electron charge in the base in saturation is much higher than in active. This is evident by comparing Figs. 1.50 and 1.53. The area under the trapezium is obviously large due to the rise of electrons at the BC junction. When the BJT is used as a switch, we usually use it in cutoff and saturation modes. The off mode is an open switch, while the saturation mode creates the lowest resistance, thus being an on switch. When we switch between off and on, the BJT has to switch between cutoff and saturation. While switching from saturation to cutoff, excess electron charge in the base must be dissipated before we reach a steady state. Active current is usually responsible for discharging this stored charge, and this is one of the fundamental limits on BJT speed. Notice also that in saturation a base current will flow into both the collector and the emitter. This is despite any ohmic current that flows from the collector to the emitter. This means that base currents observed in saturation are much higher than in active. Thus, any effective common emitter current gain will be much lower than in active mode. Note also that transistor action is lost since the collector current is now a function of all three terminal voltages.

1.13

Bipolar Junction Transistor

49

2. Understand the continuity of vacuum level 3. Understand how to draw heterogenous interfaces 4. Contrast metal, insulator, and semiconductor band behavior in fields 5. Derive insulator–semiconductor semiconductor–metal and insulator–metal interfaces 6. Distinguish ohmic from rectifying semiconductor–metal contacts.

Fig. 1.53 Excess minority carrier concentration in saturation. There are excess electrons everywhere

Fig. 1.54 Model for BJT in saturation

In saturation, we can model the BJT as two voltage drops representing the two forward junctions. This is shown in Fig. 1.54. In the BE junction, the voltage drop typically lies in the range of 0.6–0.7 V. This models a forward-biased PN junction as a constant voltage short circuit. This mimics the small amount of forward voltage the PN junction I–V characteristics need to rise enough to mimic a short circuit. In Fig. 1.54, the BC junction is modeled with a drop of 0.5 V. The difference in the drops of the two junctions has to do with the lighter doping in the collector. This yields an overall collector to emitter voltage drop between 0.1 and 0.2 V in saturation. Note that the fact that the collector to emitter voltage drop saturates at this value is the reason for naming the region of operation saturation. Calculating BJT currents in saturation is always dictated more by the external circuit than by the BJT itself.

1.14

In Sect. 1.7 we introduced a method to draw the band diagram of a PN junction. However, this method was ad hoc and left a few important questions unanswered. In this section, we develop a systematic method for drawing band diagrams for any material interface. This is necessary to understand the band diagram of the MOS capacitor. We begin by reiterating the definition of vacuum level. Figure 1.55 shows the vacuum level Eu. The vacuum level is the lowest energy level at which an electron is considered to be free. Not free from the covalent bond, this would be any electron in the conduction band. Instead we mean it is free from the entire crystal structure. In other words, it is outside the material. It represents the least energetic, immobile condition of an electron just at the surface of the material. The energy difference between the edge of the conduction band and the vacuum level is called the electron affinity. It is obviously a measure of how easy it is for an electron to become free. The difference between the Fermi level and the vacuum level is called the work function. Note that electron affinity is a material property, while work function is also a function of purity in the case of semiconductors. Work function is a more rigorous parameter than affinity since it includes more information about the state of electrons in the material. Notice the Fermi level is the average energy of electrons; thus it contains strong information about electron population. On the other hand, the edge of the conduction band could be empty or full depending on the doping conditions.

Materials Interfaces

1. Define and compare vacuum level, electron affinity, and work function

Fig. 1.55 Vacuum level, electron affinity, and work function

50

1

Devices

significant in a metal, and only it needs to be drawn. In a metal, all levels below the Fermi level are fully occupied at 0 K while all levels directly above it are empty. There are valid states below and above Ef. Any rise in temperature will cause significant charges to rise above Ef. This causes plenty of charges to exist where there are plenty of states, and conductivity is extremely high. Figure 1.57 also shows the metal with an applied potential. An applied potential leads to a finite current density: Jmetal ¼ finite The conductivity of a metal is infinite, combined with the finite current density, this indicates a zero field: Jmetal ¼ finite ¼ rmetal n rmetal ¼ 1 Fig. 1.56 Band diagram of PN junction showing vacuum level. The vacuum level is not drawn to scale

Now consider bringing together two pieces of silicon with opposite doping, or in other words a PN junction. The band diagram in equilibrium is replicated from Sect. 1.7 in Fig. 1.56, with the vacuum level shown. Only the Fermi level is constant. All other levels bend corresponding to the built-in field. This also applies to the vacuum level which bends in parallel to all other levels. This is necessary because both sides of the junction are silicon and the only way for electron affinity to be the same on both sides of the junction is for the vacuum level to bend. Figure 1.57 shows the band diagram of a metal. In a metal, the conduction and valence bands are meshed together and are thus not well-defined. Only the Fermi level is

Fig. 1.57 Band diagram of a metal in equilibrium (left) and with applied field (right). Only the Fermi level is significant. The level does not bend regardless of applied field



Jmetal ¼0 rmetal

The electric field is proportional to the derivative of the potential; thus the voltage drop on the metal is null: n¼

dV ¼0 dx

Vmetal ¼ constant But the Fermi level (or any other level) in a metal is proportional to the potential, thus, the Fermi level in a metal must always remain constant regardless of the applied potential on a device: EF ¼ qVmetal ¼ constant We can use a similar derivation to deduce how the band diagrams of insulators and semiconductors behave in response to an externally applied potential (or field). Figure 1.58 is the band diagram of an insulator. It is mainly distinguished by its huge bandgap. We traditionally do not draw the Fermi level for insulator band diagrams, the reason being that drawing it would be futile. Insulators generally do not have suitable dopants. Thus their Fermi level does not lie close to Ec, or Ev, instead it is usually deep in the bandgap. There is a large range in the bandgap where the Fermi level is extremely far from both the conduction band and the valence band. This means that both electron and hole concentrations are effectively null, leading to very low conductivity. Thus while we can draw the Fermi level in an insulator, why would we? After all the Fermi level is drawn mainly to insinuate carrier levels.

1.14

Materials Interfaces

51

Figure 1.58 also shows an insulator under a field, that is to say with an applied external voltage. The relation between electric field and charge density is

Semiconductor band behavior in a field has already been covered implicitly. But to reiterate, there are two cases for semiconductors:

dn q ¼ dx e

• If the field is applied to an electrically neutral semiconductor, then it develops linear tilt identical to insulators. This is the case in Fig. 1.22, with a semiconductor of a single type exposed to an external potential • If the field is applied to a depletion region (as in the PN junction), then we observe a quadratic tilt in energy. This was derived in detail in Sect. 1.8. But in short, the depletion region is not electrically neutral, instead it has a uniform volumetric ionic charge density. Integrating this yields a linear field in the depletion region. The linear field, when integrated, yields quadratic band bending.

q ¼ volumetric charge density Ideally, we manufacture insulators to be uncharged, thus: q¼0 dn q ¼ ¼0 dx e n ¼ constant Relating this constant field to voltage across the insulator and thus energy bands: n ¼ constant ¼

dV dx

V ¼ linear Einsulator ¼ linear Thus when exposed to a constant external field, insulators develop a linear tilt in their band diagram. This tilt is shown in Fig. 1.58. Notice that the tilt affects all levels, including the vacuum level. This is necessary to preserve electron affinity across the entire insulator.

Drawing band diagrams when different materials are brought together is a little more challenging than when only silicon with differential doping is junctioned. We can junction different semiconductors together, this creates what is known as a heterojunction. We can also junction semiconductors to metals, this is very common when contacting silicon with wires (Chap. 7). We can also have semiconductor–insulator and metal–insulator junctions, which are critically important for MOS capacitors (Sect. 1.15). The electron affinity rule is an extremely important rule for drawing junction band diagrams. The rule is very simple and immediately obvious; however, its impact is large. The rule states that the electron affinity on the two sides of a junction has to be preserved. That is to say in a junction between material A and material B, the affinity on the A side has to be the affinity of material A, while the affinity on the B side has to be that of material B. The following three rules can be combined to draw any equilibrium band diagram: • Electron affinity rule • Constant Fermi level (recall the material is in equilibrium) • Continuous vacuum level (the electron moving along the surface cannot observe a sudden jump in energy). We can combine the following rules to form a systematic approach to band drawing:

Fig. 1.58 Band diagram of an insulator in equilibrium and with applied constant field. The Fermi level is not significant. An applied field leads to linear tilting of the insulator

1. Draw a single, flat vacuum level for the entire structure. 2. Draw the band diagram of all the materials relative to the constant vacuum level. While doing this we will find that we do not have a unique Fermi level. Thus the diagram we are drawing cannot be an equilibrium diagram. It is called a flat band diagram and is defined by the flat vacuum level. It is also defined as a condition where

52

1

Devices

Fig. 1.59 Drawing the band diagram of a heterojunction. Left, the flat band diagram. Right, the equilibrium band diagram

there are no net fields in the material. The flat band diagram is useful as a first step in drawing the equilibrium diagram. 3. The equilibrium band diagram will derive from the flat band diagram. It will have a unique Fermi level. Materials will move differently to accommodate this. The way the material moves is defined by the types of materials in the junction. a. In any structure with a metal, the metal will define the Fermi level. Its Fermi level will not move, tilt, or bend. Everything else must bend in order to reach the metal Fermi level. b. A semiconductor will bend by building up an internal field due to depletion or accumulation. In a uniformly doped semiconductor, the built-in field will be linear and the band bending will be quadratic. If the semiconductor is junctioning with a metal, it will provide all the bending required. If it junctions with another semiconductor or insulator, the bending will be split. c. An uncharged insulator will bend linearly. 4. For semiconductors and insulators, pin all points at metal–semiconductor, semiconductor–semiconductor, insulator–metal, or semiconductor–insulator interfaces. These points will NOT move because of the electron affinity rule. However, the Fermi level must move to the metal level. 5. Far from the junction, the band edges of the semiconductor will approach their neutral-zone conditions (same as flat band carrier concentrations). This is best illustrated by examples. Figure 1.59 shows an application of the above rules for drawing the band diagram of a heterojunction; a junction of two different semiconductors. Since the two semiconductors are different materials, they have different bandgaps and none of their levels align because they have different affinities.

The left sub-figure shows the flat band diagram, the main fulcrum of this diagram is a constant vacuum level. Each side of the junction is drawn independently with its own affinity and work function. There will be two Fermi levels indicating the diagram is not in thermal equilibrium. It is important to note and “pin” the interface points A, B, C, and D. The right diagram is the equilibrium diagram. It is drawn from the flat band diagram as follows: • The Fermi level is constant • Far away from the junction, the two semiconductors return to their flat band levels and charge-neutral states • Points A through D are pinned at the interface. Barriers seen between points C and D and B and A are preserved. This is the only way the vacuum level can be continuous while electron affinity is preserved for the two materials on the two sides of the interface • Between the neutral zones far from the interface and the pinned points at the interface the band diagrams bend. The bending is quadratic because depletion regions form. The relative amount of bending between the two sides depends on the bandgaps and the doping. The equilibrium diagram in 1.14.5 has a vacuum level that bends in parallel to the other levels (except the Fermi level). This is a corollary of the electron affinity rule. If the vacuum level did not bend, the electron affinity on both sides of the interface would not have been preserved. There are many possibilities for how a heterojunction band diagram ends up looking. This depends on the relative bandgaps and the relative positions of their Fermi levels. Heterojunctions can become even more complicated in the case of compound semiconductors. However, applications of heterojunctions are more limited than other types of material junctions.

1.14

Materials Interfaces

53

Fig. 1.60 A rectifying semiconductor–metal contact, flat band (top left), equilibrium (top right), reverse (bottom left), and forward (bottom right)

Figure 1.60 shows a metal–semiconductor interface. The flat band diagram can be drawn directly from the work function of the metal and the semiconductor. In the equilibrium condition, we can assume the metal Fermi level remains in its location and constant. This is valid only with infinite metal conductivity. The points where the semiconductor band edges meet the interface remain “pinned”. Away from the interface, semiconductor bands return to their flat band distance from the Fermi level. Between the interface and the region far away from it, the semiconductor bands bend. The bending is quadratic since the material develops a depletion region with a uniform ionic charge and linear built-in field. The junction in Fig. 1.60 forms a rectifying contact called a Schottky diode. The semiconductor is n-type. A depletion region develops an equilibrium near the interface. The depletion region is rich in positive ions. This will create an electric field from the semiconductor to the metal. This causes a drift of electrons from the metal to the semiconductor. In equilibrium, this is balanced out by a diffusion of electrons from the CB of the semiconductor into the corresponding level in the metal. Since we are in thermal equilibrium, the net current is null. If a positive voltage is applied to the semiconductor, the Fermi level on the semiconductor side is pulled down as in the bottom left of Fig. 1.60. This strengthens the built-in field. However, the barrier seen by electrons moving from the metal to the semiconductor remains unchanged. Thus, the electron current from the metal to the semiconductor does not change substantially even though the field should aid it. Meanwhile, the movement of electrons from the semiconductor to the metal is diminished. Thus only a small net current can be observed with this polarity, limited by the amount of charge that can cross the barrier from the metal side.

If we apply a positive voltage to the metal, as in Fig. 1.60 bottom right, the semiconductor Fermi level moves up. Notice the metal Fermi level cannot move, thus the other level moves in the opposite direction instead. The barrier seen on the metal side is still the same, thus the electrons moving from the metal to the semiconductor remain roughly the same. The built-in field is weakened and the barrier seen on the semiconductor side decreases. This leads to an exponential increase in electrons diffusing from the semiconductor to the metal. A very large current flows from the metal to the semiconductor, corresponding to the diffusion of electrons from the semiconductor to the metal. Notice this behavior is very similar to a PN junction. The metal to semiconductor electron drift is analogous to the reverse saturation current. When a positive voltage is applied to the semiconductor, no appreciable current flows. When a positive voltage is applied to the metal, a huge current flows. The huge forward current is a diffusion current, the small reverse current is a drift current. These are all striking similarities to the PN junction. The device in Fig. 1.60 is called a Schottky diode. It has some interesting properties and specialty applications. However, the overwhelming majority of semiconductor– metal interfaces are wires contacting device terminals. In which case the contact must be very low resistance and must conduct in all directions without rectification. To avoid the Schottky diode effect, ohmic contacts have to be fabricated with very heavily doped semiconductors. This is shown in Fig. 1.61. The Fermi level is closer to the conduction band, indicating heavier doping. This yields sharper semiconductor band bending in the equilibrium diagram in the top right. Again, in equilibrium, the current from metal to semiconductor is canceled by the current from the semiconductor to the metal.

54

1

Devices

Fig. 1.61 Ohmic metal– semiconductor contact flat band (top left), equilibrium (top right), reverse (bottom right), and forward (bottom left)

The sharp bending creates a larger energy barrier on the metal side, so to a first order we seem to have made the rectification problem even worse. However, we will shortly show that this is not the case. The sharp bending also indicates an enormous built-in field. The sharp bending also means that the depletion region is very narrow. This can be seen from the band diagram, but it can also be seen in Sect. 1.8 in the expression of the depletion region width as a function of doping. When a positive potential is applied to the metal, the semiconductor side barrier is reduced, leading to a large diffusion into the metal, as is the case in the Schottky contact. Thus current is conducted with a small resistance in this direction. But this was not the problematic direction, to begin with. When a positive potential is applied to the semiconductor, the semiconductor bands are pulled down as in Fig. 1.61, bottom right. This increases the barrier on the semiconductor side, while maintaining the same barrier on the metal side. In fact, the barrier on the semiconductor side seems to be worse than for the Schottky diode. However, the depletion region is so thin with this amount of bending that electrons can easily tunnel from the metal to the semiconductor. This tunneling current increases exponentially and can be very large. Thus, the contact allows current flow in both directions unimpeded. In Chap. 10, we will develop a more thorough understanding of tunneling. Insulator interfaces are drawn using the electron affinity rule, and the fact that any voltage drop on an uncharged insulator must yield a constant field. Figure 1.62 shows an

Fig. 1.62 Insulator– semiconductor interface showing a surface field due to continuity conditions at the interface dissipating deep in the semiconductor

insulator–semiconductor interface. The vacuum level must be continuous, the band edges are pinned at the interface. The field in the insulator is constant but is linear in the semiconductor. The boundary condition in the interface is dictated by Gauss’ law at boundaries, which says that in the absence of surface charges: esi nsi ¼ eox nox Thus, the surface field in the semiconductor is dictated by the final field in the insulator. The Fermi level is constant through the semiconductor and is not drawn in the insulator. Figure 1.63 shows an insulator–metal interface. The electron affinity rule applies. Any voltage drops, whether built-in or external, must apply fully (and linearly) on the insulator, with no drops on the metal. Note that the insulator band edges are pinned at the interface.

1.14

Materials Interfaces

Fig. 1.63 Insulator–metal interface showing the transfer of the entire field to the insulator

To reiterate how materials behave in response to fields (or potentials): • Metals can support no internal field. The potential on the metal is constant, and thus the field is null. Metal bands observe no bending • Insulators are generally uncharged. They support a constant field. This leads to a linear voltage profile and thus to band tilting • Semiconductors develop depletion regions under the influence of fields. The depletion region institutes a constant volumetric charge which yields a linear built-in field. The voltage profile is quadratic and band bending is observed. The ratios by which an applied voltage, and thus external field, is split between two materials at an interface determines how much band bending is observed at either side. This is an additional degree of freedom in heterojunctions that can change the shape of the diagram substantially. Determining the split is usually nontrivial, but there are a few pointers: • In a metal interface to either an insulator or a semiconductor, the entire potential is transferred to the nonmetal. None of the bending is observed by the metal • In an interface between two pieces of the same semiconductor, the amount of bending is split between the two sides. This is determined by the doping profiles, Sect. 1.8 • In a heterojunction, the split is nontrivial, but is a function of doping and material properties • In a semiconductor–insulator interface, the potential is split nontrivially (Sect. 1.16), normally with more drop on the insulator.

55

Figure 1.64 shows the band diagram of heavily doped polysilicon. Because the doping is extremely heavy we observe three things: The conductivity is very high, theoretically approaching that of metals. Second, the Fermi level moves so close to the band edge that it practically coincides with it. Thus, in n-type polysilicon Ef = Ec, while in p-type polysilicon Ef = Ev. Most polysilicon is n-type. Third, the high conductivity dictates a band behavior similar to metals. Thus, for all intents and purposes heavily doped polysilicon will be treated the same way metal was treated in this section. Polysilicon is used to make MOSFET gates, thus its band diagram will be used instead of that of metals in Sects. 1.15 through 1.18. The reason polysilicon must replace metal has to do with the fabrication process and is discussed in detail in Sect. 7.5. Table 1.7 summarizes the behavior of different material types at interfaces. The rules and observations developed in this section will be used in the next section to understand the behavior of the MOS capacitor. The MOS capacitor is the main building block of the MOSFET transistor.

1.15

MOS Capacitor Preliminaries

1. Understand the structure of a MOS capacitor 2. Draw the flat band diagram of a MOS capacitor 3. Draw the band diagram of a MOS capacitor in equilibrium 4. Derive flat band voltage in terms of plate work functions 5. Understand potential drops across MOS capacitor terminals 6. Recognize surface potential and substrate surface depletion. A MOS capacitor is the basic building block of a MOSFET, the most important transistor. It forms the vertical structure that allows the gate to control current conduction. Understanding the fundamentals of the MOS capacitive action is more demanding than understanding the lateral conduction mechanisms in a MOSFET. Figure 1.65 shows a sketch of a MOS capacitor. The acronym MOS stands for Metal–Oxide–Semiconductor. The acronym is inaccurate since the metal plate is a misnomer. A MOS capacitor is almost always manufactured using polysilicon in place of the metal plate. The reason for this has to do with the physics of fabrication and self-alignment (see Sect. 7.5). Thus an MOS capacitor is typically made of a polysilicon plate, an insulating silicon dioxide layer, and a lightly doped silicon substrate. As discussed in Sect. 1.14, very heavily doped polysilicon can generally be assumed to behave like a metal with a

56

1

Devices

Fig. 1.64 Band diagram of heavily doped polysilicon. The conduction band edge and Fermi levels coincide

moderate bandgap. The polysilicon plate is called the gate, with the letter G often used as an abbreviation. The bottom semiconductor plate is alternatively called the body or the substrate. The letter B is used exclusively to indicate the body/substrate. The band diagram of the MOS capacitor is a three-material diagram. The first step, according to Sect. 1.14, is to draw the flat band, or the constant vacuum level, diagram. The MOS structure is rather complicated. Conventionally, MOS band diagrams are drawn horizontally as shown in Fig. 1.66. Note this is a 90° rotation of the geometry of Fig. 1.65, but it is convenient since it is in line with the band diagrams of PN junctions and BJT transistors. The polysilicon plate in Fig. 1.66 has a coincident Ec and Ef. In Sect. 1.14, we explained that this is due to the extreme doping. We will only draw the Fermi level for the gate, since it is the only relevant level. We will also assume it behaves like a metal, showing no band bending under the effect of fields. Note also that the Fermi level of the gate in Fig. 1.66 is identical to the conduction band edge in the substrate. This is because both materials are silicon, and thus have identical bandgaps and electron affinities. The oxide has a very large bandgap. In flat band, it shows no band tilting. Its alignment with the body and gate in Fig. 1.66 is determined by the electron affinities of the materials. As discussed in Sect. 1.14, the Fermi level is irrelevant in insulators, and is thus not drawn. The bottom plate in Figs. 1.65 and 1.66 is p-type silicon. Its valence band edge lies close to its Fermi level. The Fermi level on either side of the oxide in Fig. 1.66 is different. This is the expected situation in flat band condition, since flat band, in general, is not a state of thermal equilibrium. The difference between the two fermi levels is equal to qVfb. Vfb is known as the flat band potential. This is the potential that needs to be applied to bring the device to flat band condition. Table 1.7 Behavior of material bands. Share of bending is the percentage of total drop that falls on the material in an interface

Fig. 1.65 Sketch of a MOS capacitor. The top plate is polysilicon. The bottom plate is silicon, in this case p-type. The oxide separating the two creates a capacitor

Fig. 1.66 Flat band diagram of MOS capacitor. Note Ef in polysilicon aligns with Ec in the body because in polysilicon Ef = Ec

Table 1.8 lists all the material properties necessary to draw the flat band diagram. In fact, the bandgaps and electron affinities are sufficient to complete Fig. 1.66. The last two lines are derived quantities, but they are important. As

Material

Insulator

Semiconductor

Metal

Significant levels

Ec and Ev only

Ec, Ev, and Ef

Ef only

Behavior in a field

Linear tilting if uncharged

Quadratic bending when depleted

No bending

Share of bending

Large share

Moderate share

No share

1.15

MOS Capacitor Preliminaries

Table 1.8 Parameters of the MOS capacitor. Notice that there is an asymmetry in the energy barrier seen at the oxide in the valence and conduction bands. It is significantly harder for holes to cross the oxide than it is for electrons. The same barriers are seen polysilicon side because both the body and gate are silicon

57 Parameter

Name

Amount (eV)

Egox

Bandgap of silicon dioxide

9

Egsi

Bandgap of silicon

1.1

vsi

Electron affinity of silicon

4

vox

Electron affinity of silicon dioxide

0.95

Ecox–Ecsi

Conduction band barrier at oxide

3.1

Evsi–Evox

Valence band barrier at oxide

4.8

Fig. 1.66 suggests, there is a misalignment in the bands of the oxide and silicon. This creates a different barrier at the valence band than in the conduction band. It takes 4.8 eV for a hole in silicon to be able to cross the oxide, while it takes 3.1 eV for an electron to make the jump. By definition (Sect. 1.14) and by virtue of the nonconstant Fermi level, Fig. 1.66 cannot be an equilibrium band diagram. In the PN junction (Sects. 1.7 and 1.8), the flat band diagram was transformed into an equilibrium diagram through diffusion. As carriers diffuse from either side to the other, an internal potential is built-in. This potential pushes the bands on either side of the interface. This continues until the Fermi levels align, leading to a state of thermal equilibrium. In the MOS structure, there is a carrier gradient between the gate and the body. This is evident in the nonconstant Fermi level in Fig. 1.66. However, carriers cannot diffuse between the gate and the substrate because the insulating oxide lies in the middle. We can, however, conclude from Sect. 1.14 that the polysilicon Fermi level will not move. To bring the diagram to an equilibrium state, the Fermi level in the substrate is brought to the gate level. The interface points between the substrate and the oxide are pinned. And deep in the substrate, bands return to their flat band levels. The oxide and substrate bend accordingly. The thermal equilibrium band diagram is shown in Fig. 1.67. The following questions are immediately obvious from Fig. 1.67:

work function and electron affinity are the same. From the flat band diagram, this potential was defined as Vfb or flat band potential. Thus, the total built-in potential in Fig. 1.67 is equal to the flat band potential in Fig. 1.66. 2. This requires a detailed analysis that will be provided shortly. The built-in potential is shared between the oxide and the substrate only. Since the conductivity of the oxide is much lower than the substrate, we can project that the oxide potential (Vox) will be higher than the surface potential (Vs). 3. From Sect. 1.14, heavily doped polysilicon behaves like a metal. We have already concluded that metals will observe no voltage drops and no bending. This applies to internal fields as much as it does to external fields. 4. At the substrate near the oxide, the band bending in Fig. 1.67 brings the Fermi level closer to mid-gap. This leads to carrier depletion near the surface of the substrate. The formation of a native depletion region in the substrate plate at thermal equilibrium may not be immediately intuitive. In a metal–insulator–metal capacitor, for example, the

1. What is the total built-in potential in the structure? 2. How is the built-in potential distributed between the oxide and the substrate? 3. Why does the polysilicon not bend at all? 4. What effect does band bending have on the substrate? Preliminary answers can already be provided for all these questions: 1. The total built-in potential is the difference in the Fermi levels of the polysilicon and the substrate in flat band conditions. Since we are assuming they are both silicon, then this is also the difference in the work function of the two materials. Notice that for heavily doped polysilicon

Fig. 1.67 MOS capacitor band diagram in thermal equilibrium. The oxide potential is marked as Vox, the substrate potential is marked as Vs and is alternatively called surface potential. The sum of the two drops must be equal to the flat band voltage Vfb

58

1

situation is much simpler. The Fermi level in both plates remains constant, and any potentials, whether built-in or external fall entirely on the oxide. A built-in potential in a metal–insulator–metal capacitor may develop if the two plates are made of different metals. In a MOS capacitor, the bottom plate has a finite conductance, and thus can and will bend. Understanding the depletion is easier if we compare the MOS capacitor to a PN junction, where the formation of native depletion and built-in fields is immediately obvious. The bending has to occur to allow two conflicting requirements: That the substrate is lightly doped p away from the interface, and that the substrate must support some of the work function difference to the polysilicon. Note that in equilibrium in Fig. 1.67, electron affinities must be preserved and the vacuum level is continuous. This is the electron affinity rule. As a consequence, the vacuum level is constant in the polysilicon, tilts linearly in the insulator, and bends in parallel to the substrate. From Table 1.8, the electron affinity of silicon is much higher than that of the oxide, which makes sense given the oxide bandgap is large. One might wonder why a bandgap of 9 eV makes silicon dioxide a null conductor as opposed to 1.1 eV in silicon. Recall that carrier concentrations, and thus conductivity are exponential functions in energy differentials. Thus an increase in energy gap of ninefold would not yield a simple ninefold drop in conductivity but would yield an exp(9) drop. This is also the reason we do not care where the Fermi level is in the insulator. In almost the entirety of the bandgap, the Fermi level is so far away from both the valence band and the conduction band that no appreciable carriers are seen in either. By examining Figs. 1.66 and 1.67, we can conclude a few things about the flat band potential. First, it is equal to the total built-in potential in equilibrium, as discussed earlier. It is also the gate to body potential that needs to be compensated to bring a material to flat band conditions. Thus flat band occurs when an external potential Vgb = −Vfbis applied. Because bands in the gate and substrate align in the flat band condition, the flat band potential is a function of their work function difference. In the gate, there is no difference between the work function and electron affinity, this is a direct result of heavy doping. Thus the flat band potential can be determined from material properties and doping as qVfb ¼ Efg  Efs Efg ¼ gate Fermi level Efs ¼ substrate Fermi level And because the difference between the Fermi levels is the difference between the work functions:

Devices

qVfb ¼ Efg  Efs ¼ ws  wg ws ¼ substrate work function wg ¼ poly work function But for polysilicon, the work function is equal to the electron affinity: wg ¼ vg ¼ poly electron affinity qVfb ¼ Efg  Efs ¼ ws  vg This is the magnitude of the flat band potential. When applied to the MOS at equilibrium, it has to be applied to the substrate in order to reach a flat band condition. Preservation of energy dictates that the applied potential between the gate and the substrate must equal the sum of potential drops between the two terminals. This would equal the sum of metal, insulator, and semiconductor drops. The metal (polysilicon) drop is null. Insulator drop is significant and equal to Vox. The semiconductor drop is called the surface potential, and takes the symbol Vs. However, this does not take into consideration the built-in potential. As discussed above the built-in potential is the flat band potential, and it is only above this potential that drops start to build up across the capacitor. Thus, the null of the voltage equation has to be moved by the flat band voltage: Vgb ¼ Vox þ Vs  Vfb where: Vgb ¼ gate to substrate(body) voltage Note that gate to substrate voltage is termed gate to body voltage to avoid confusion with gate to source voltage. When a voltage Vgb = −Vfb is applied then both Vox and Vs are null. This is the definition of the flat band condition: a condition in which all the bands are flat. One last thing to note in Fig. 1.67 is that the amount of bending in the vacuum level is equal to the flat band potential. It will be interesting to keep an eye on the vacuum level in Sect. 1.16, because it is a good representative of net potential across the structure.

1.16 1. 2. 3. 4. 5. 6.

Modes of the MOS Capacitor

Understand the accumulation of carriers Recognize why depletion occurs in the bottom plate Understand the charge model in MOS capacitor Derive voltage drops in depletion mode Define inversion Understand and derive threshold voltage

1.16

Modes of the MOS Capacitor

7. Understand strong inversion 8. Derive inversion charge density. The most important equation that governs MOS capacitor behavior is the equation for the sum of voltage drops from gate to body derived in Sect. 1.15. It is repeated here for convenience: Vgb ¼ Vox þ Vs  Vfb Observing the two band diagram in Figs. 1.66 and 1.67, we can make the following conclusions about carrier densities. Because of band bending, we have to distinguish hole concentration at the surface from deep body hole concentration. When we use the qualifier “surface”, we mean conditions at the substrate–oxide interface: • If we compare the hole concentration near the surface of the substrate, we find that it is higher in the flat band diagram than in equilibrium. This can be seen by comparing the two band diagrams in Sect. 1.15. The Fermi level at the surface is closer to the valence band edge in flat band than in equilibrium. In fact, the hole concentration at the surface of the substrate in flat band is equal to hole concentration deep in the substrate away from the oxide interface • The flat band condition occurs when we apply Vgb = − Vfb. If a further negative voltage is applied to the gate, the surface bands of the substrate will be pushed lower relative to the flat band diagram. This is shown in Fig. 1.68. In this case, the hole concentration rises even further toward the substrate surface. This mode of operation is called accumulation. It is any condition where the

Fig. 1.68 Band diagram in accumulation mode. A negative voltage is applied to the gate, pushing the substrate band lower. Holes are accumulated near the surface relative to the equilibrium state. Note the Fermi level splitting is always indicative of external potential only

59

concentration of holes near the surface is higher than the deep body hole concentration. The flat band potential and flat band condition is a transitional point. If the semiconductor bands are pulled down relative to the flat band condition, more holes are accumulated than in the body. If they are pulled up, the surface hole concentration falls below the flat band condition • With the application of a positive voltage to the gate, i.e., Vgb positive, the band diagram on the substrate side is pulled up relative to the equilibrium diagram in Fig. 1.66. This yields a situation shown in Fig. 1.69. In this case, the Fermi level is pushed closer to the intrinsic Fermi level. The surface is further depleted compared to the equilibrium condition. A surface depletion region builds up. This is called the depletion mode of the MOS capacitor • If an even more positive voltage is applied than in depletion mode, the band diagram tilts even higher at the surface as more of the potential drop falls upon the substrate. This would push the Fermi level higher than the mid-gap. The substrate near the surface thus starts to accumulate more electrons than holes. This is called inversion since the semiconductor seems to be inverted from p to n. This is shown in Fig. 1.70. Notice that the bottom plate of the capacitor behaves very differently from a metal plate in a traditional capacitor. In a metal–insulator–metal capacitor, the application of a positive voltage to the top plate immediately leads to the accumulation of electrons on the bottom plate. In a MOS capacitor, the bottom plate is silicon. It has dopants, and with the application

Fig. 1.69 MOS band diagram in depletion mode. A positive voltage is applied to the gate. The substrate bands are pulled up bringing the Fermi level closer to mid-gap. A large depletion region starts to build

60

1

Devices

amount by which the substrate Fermi level deviates from the flat band level: EFacc  EFfb ¼ qVs And we can relate accumulation carriers, flat band carriers, and surface potential as pacc ¼ pfb eqVs =kT And the flat band hole density is the same as the acceptor concentration: pacc ¼ NA eqVs =kT

Fig. 1.70 Band diagram of MOS capacitor in inversion. The Fermi level at the surface dips below the intrinsic Fermi level indicating the formation of an electron layer. Strong inversion is reached when the Fermi level dips below the intrinsic Fermi level by as much as it is above it deep in the substrate

of positive gate voltage, it forms a negative charge in the form of a depletion layer. This will continue until inversion occurs and electrons start to gather below the oxide. In flat band condition, the surface hole concentration is the same as in the bulk of the substrate. Thus, hole concentration is equal to doping concentration: pfb ¼ NA In accumulation mode (Fig. 1.68), the carrier concentration is higher than the flat band concentration. By examining the figure, it is obvious that the amount of improvement in hole accumulation over the flat band concentration is related to the amount of bending in the bands relative to flat band. This allows a very useful derivation if we notice that the difference between flat band and accumulation Fermi levels is the surface potential of the semiconductor: pacc ¼ ni eðEFacc EFi Þ=kT ¼ accumulation mode hole concentration EFacc ¼ Fermi level in accumulation mode The flat band carrier density is defined as pfb ¼ ni eðEFfb EFi Þ=kT ¼ flat band hole concentration EFfb ¼ Fermi level in flat band pacc =pfb ¼ eðEFacc EFfb Þ=kT By definition, the surface potential Vs (Fig. 1.67) is the amount of bending at the substrate. Thus, it is also the

Although we derived the hole concentration in accumulation mode, the expression is valid for all modes of the capacitor: accumulation, depletion, and inversion. Since we are talking about a capacitive structure, we should be able to relate this accumulated charge in the plate to the applied potential and the value of capacitance. To approach a model for the value of MOS capacitance, we must study the charge accumulation model. The starting point is Gauss law at the interface: eox nox ¼ esi nsi þ rsi where: rsi ¼ Area charge density at the interface Gauss law at the interface was introduced in Sect. 1.15. In the form stated above, surface charges are also taken into consideration. We will assume the insulating oxide is manufactured ideally, and thus no surface charges accumulate on the oxide side. Surface charge in the form of holes, electrons, or ions is accumulated on the substrate side depending on the region of operation of the capacitor. The voltage drop, and thus field on the substrate is generally significant. However, it is always much smaller than the potential and field in the oxide. This is due to the difference in conductivity, with the high resistance insulator hogging most of the drop. Thus, we cannot generally ignore surface potential, but when it exists in the same equation as oxide drop, it can be ignored. Because both potentials appear in the gate to body potential equation, we can simplify the expression by ignoring the surface potential: Vgb ¼ Vox þ Vs  Vfb  Vox  Vfb We can also consider the surface field of the semiconductor negligible relative to the oxide field: Vs  Vox ! nsi  nox This allows us to simplify Gauss’ law, removing the effect of surface potential, and limiting impact to charges at the substrate surface and oxide field:

1.16

Modes of the MOS Capacitor

eox nox  rsi The surface charge density is also the density of accumulated charge at the semiconductor–oxide interface: eox nox ¼ rsi ¼ Qacc ¼ accumulated charge

61 Fig. 1.71 Geometry of the depletion region. Xd is the main parameter of concern, indicating both the depth of the depletion region and the amount of accumulated ion charge

And thus: nox ¼

Qacc eox

The field in the oxide is constant as long as the insulator is uncharged (Sect. 1.14). Thus, the potential across the oxide is linear, this allows us to state the field in terms of the oxide potential and oxide thickness tox as nox

Vox ¼ tox

And thus: nox ¼ 

Vox Qacc ¼ tox eox

Since this is a relation between accumulated charge and applied voltage, it is strongly suggestive of a linear capacitance. If the gate and substrate plates are substantially larger than the oxide thickness, then capacitance can be approximated as a parallel plate capacitor. And the capacitance per unit area can be obtained as C¼

eox Aox tox

C eox ¼ ¼ Cox Aox tox This can be used to obtain the Q–V relationship in accumulation: Vox ¼ 

Qacc tox Qacc ¼ ¼ Vgb  Vfb eox Cox

gate to substrate voltage is distributed among the oxide and the substrate. An expression for depletion region depth can be obtained if the expression of PN junction depletion in Sect. 1.8 is applied to the MOS capacitor. Instead of Vbi, we use the surface potential of silicon: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2esi ðVbi þ Vr Þ xd jpn ¼ qNA xd jMOS

In depletion mode, the surface charge at the bottom plate is purely due to ions. There are no free charges, and the acceptor ions have negative charges. The charge per unit area can be obtained as rsi ¼ Qacc ¼ qNA xd This can then be plugged into the capacitance equation to obtain an expression relating oxide and surface potentials: sffiffiffiffiffiffiffiffiffiffiffiffi Qacc qNA xd qNA 2esi Vs ¼ ¼ Vox ¼  Cox Cox Cox qNA

Qacc ¼ Cox ðVgb  Vfb Þ The negative sign indicates that the bottom plate accumulates positive charges (holes) in response to negative potential. The opposite sign would apply to the charge on the top (polysilicon) plate. In Sect. 1.17 we will derive MOS capacitance more rigorously. In depletion mode, Fig. 1.69, all excess charges have been depleted. We will assume full depletion, that is to say all dopants are fully ionized and there are negligible mobile carriers. Figure 1.71 shows the depletion region formed under the oxide. The depletion region depth Xd is an important parameter that can be used to estimate how the

sffiffiffiffiffiffiffiffiffiffiffiffi 2esi Vs ¼ xd ¼ qNA

Vox

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qesi Vs NA ¼ Cox

The total gate to body potential is Vgb ¼ Vox þ Vs  Vfb The surface potential can be obtained in terms of depletion region width as Vs ¼

qNA 2 x 2esi d

And since the surface and oxide potentials are related:

62

1

Vox ¼

qNA xd Cox

Thus the gate to body potential can be expressed as Vgb ¼

qNA qNA 2 xd þ x  Vfb Cox 2esi d

This quadratic equation can be solved for the value of xd, the depletion region depth. Once this is found, we can find exact values for both surface and oxide potentials. This result is valid only for the depletion mode. If an even more positive potential than necessary for depletion is applied to the gate, then the Fermi level starts to climb above the intrinsic Fermi level as shown in Fig. 1.71. The following relationships show that if the Fermi level is above mid-gap, the material starts to become n-type as electron population exceeds hole population: n ¼ ni eðEF Ei Þ=kT p ¼ ni eðEi EF Þ=kT But when exactly can we say that the material has “become n-type”. The equations show that the electron population increases exponentially from intrinsic levels as the Fermi level ascends. The increase is very slow initially but escalates quickly and increasingly as the Fermi level rises further toward the conduction band edge. A good point at which to consider that the bottom plate has “become n-type” is when the electron concentration in the surface below the oxide has become equal to the hole concentration deep in the bulk of the substrate. The deep bulk electron and hole concentrations are pbulk ¼ NA nbulk ¼

n2i NA

The surface hole and electron concentrations in flat band are the same as that in the deep bulk. However, in inversion, the surface electron and hole concentrations are pinversion

n2 ¼ i NA

ninversion ¼ NA By inspecting the inversion band diagram in Fig. 1.70, we conclude that for inversion to occur, the amount of band bending at the surface relative to the flat band condition must be equal to double the Fermi level to mid-gap distance in the bulk. By bulk, we always mean deep in the substrate beyond

Devices

the point at which surface fields have any effect. One Fermi to mid-gap difference is necessary to bring the Fermi level to mid-gap, the other is required to raise the Fermi level above the mid-gap by as much as it was below it in the bulk. This can be translated into a surface potential at inversion: Vs;th ¼ surface potential at inversion Vs;th ¼ 2qðEi  EFbulk Þ We can also define a bulk potential, which is equal to the voltage corresponding to the bulk mid-gap to intrinsic Fermi potential: /b ¼

Ei  EFbulk q

Vs;th ¼ 2/b The gate to body voltage at which the substrate inverts from p-type to n-type is a very important parameter. This voltage is defined as the threshold voltage, Vth. It can be expressed in terms of the threshold surface potential and the threshold oxide potential: Vgb ¼ Vox þ Vs  Vfb Vth ¼ Vox;th þ Vs;th  Vfb The threshold surface potential has already been defined as double the bulk potential. Flat band potential is well-defined in terms of the work functions of the gate and substrate, and thus the substrate doping alone. The oxide threshold voltage can be defined, since we already have an expression for oxide voltage in terms of surface potential: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qNA esi Vs;th 4qNA esi /b Vox;th ¼ ¼ Cox Cox Thus the threshold voltage can be expressed as pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b  Vfb Vth ¼ 2/b þ Cox The bulk potential can be obtained in terms of bulk doping as pbulk ¼ NA ¼ ni eq/b =kT /b ¼

  kT NA ln q ni

If the gate potential increases further beyond threshold voltage, a phenomenon known as strong inversion occurs. Theoretically, we can assume that the higher the gate voltage

1.16

Modes of the MOS Capacitor

above inversion, the higher the electron concentration and thus the charge at the surface. This would also dictate further band bending at the surface and higher surface potential. However, the exponential nature of the carrier-energy equations means that beyond the threshold voltage any tiny increase in surface potential would lead to a huge increase in electron concentration. This huge increase in surface charge would translate through Gauss law into a large oxide potential. Thus, the oxide potential is much higher than the surface potential. So much so that we can assume any increase in gate voltage above the threshold level goes entirely to the oxide potential, with the surface potential saturating at threshold level: Vgb "! rsi ""! DVox  DVs Because:

63

will lead to a very large increase in surface charge, that cannot be squared with Gauss relations. Thus, most of the excess gate to body potential must apply to the oxide only: DVox  DVsi Thus the surface potential tends to saturate at its value around the threshold voltage: Vs;inv ! 2/b But the depletion region depth is a function of surface potential. Thus, as the surface potential saturates, the depth of the depletion region also saturates. Thus, the depletion region in inversion mode has a constant depth: sffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2esi Vs 2esi Vs;inv xd ¼ ! qNA qNA

eox nox  rsi In other words, a slight increase in surface potential would lead to a great increase in surface charge. But since oxide and surface potential deltas must add up to the increase in applied gate voltage, then the boundary condition which relates surface potential and oxide potential to surface charge will dictate that most of the change is in oxide voltage and surface charge. This is because the surface charge term will be large in response to any small change in surface potential, thus pushing surface potential low. Note that oxide potential and field are related linearly. So are surface field and potential. And step by step: An increase in gate to body voltage above threshold levels is divided between oxide and surface potentials:

xdinv

sffiffiffiffiffiffiffiffiffiffiffiffi 4esi /b ¼ qNA

Below the threshold voltage, surface potential increases as the gate voltage increases. This allows the depletion region to become deeper. The negative charge needed on the bottom plate is exclusively in the form of a deepening layer of negative ions. At and above the threshold voltage, the surface potential and the depletion layer saturate. Any further increase in negative charge on the bottom plate must thus come from electrons. By applying the voltage equation for the MOS capacitor in inversion mode: Vgb ¼ Vox;inv þ Vs;inv  Vfb Vs;inv  Vs;th ¼ 2/b

DVgb ¼ DVox þ DVs But the surface charge in the substrate is exponentially proportional to the surface potential. Thus, the excess surface charge is exponentially proportional to excess surface potential: rsi aeqDVs =kT But Gauss relation must still hold: eox nox ¼ esi nsi þ rsi And the increase in oxide and surface fields is proportional to their surface potentials:

Vox;inv ¼ 

Qsubstrate Cox

The surface potential in inversion is identical to threshold surface potential because the value saturates. Flat band potential is also a material property, so it is a constant. The total accumulated charge on the bottom plate corresponds to two components: the depletion region charge and the inversion layer charge. Qsubstrate ¼ Qdep þ Qinv

jDnsi jaDVsi

The depletion charge in strong inversion saturates due to the saturation of the depletion region depth: sffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4esi /b Qdep ¼ qNA xdinv ¼ qNA ¼  4qNA esi /b qNA

This by necessity indicates that increase in surface potential must be muted. An increase in surface potential

The oxide potential is related to the accumulated charge through capacitance:

jDnox jaDVox

64

1

Vox;inv ¼ 

Qsubstrate Cox

And thus: Vox;inv ¼

Qinv þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b Cox

Substituting the expressions for surface and oxide potentials into the expression of gate to body voltage: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b Qinv Vgb ¼ 2/b þ  Vfb þ Cox Cox But the threshold voltage has already been defined as pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b Vth ¼ 2/b þ  Vfb Cox And thus, the gate to body voltage in terms of threshold voltage: Vgb ¼ Vth þ

Qinv Cox

And the amount of mobile electron charge in the inversion layer is Qinv ¼ ðVgb  Vth ÞCox This will be critical in finding the MOSFET current. Notice two things: First, this inversion charge is known to be electrons. We have already substituted for the sign in the equation. Thus, Qinv only represents the absolute value of the charge. More appropriately, we should add a negative sign to account for the fact that inversion charge is negative. Second, the equation is nearly identical to a linear capacitor, with the exception of the threshold voltage. The threshold voltage accounts for two things: depleting the majority carriers and then bending the bands further to form inversion. Both have to do with the fact that the bottom plate is a semiconductor rather than a metal. One approximation we made while deriving all equations is that the oxide is uncharged. For most practical applications, the oxide carries some level of native charge. The oxide charge is by default “trapped” in the insulator, unable to exit due to the unavailability of states to move through. Trapped oxide charges occur mainly due to oxide contamination. This can happen during manufacture if conditions for oxidation or CVD are not ideal (Chap. 7). It can also happen during operation, particularly due to the trapping of hot carriers (Chap. 10). The presence of volumetric charge in the oxide changes Gauss law. It adds a constant charge to the differential equation which would lead to a (weak) linear component in

Devices

the oxide field, and a quadratic component to the potential. This requires us to repeat all the previous analyses. However, a very useful approximation can be made in which the effects of all volumetric charges in the oxide are modeled at the oxide–substrate interface as additional surface charges. This would simply modify Gauss law at the interface by adding an offset to the surface charge. This charge might aid or hinder inversion depending on its sign. Figure 1.72 shows the flat band diagram in the presence of oxide charges. If the oxide charges are positive, this would lead to a downward bend in the oxide since the potential would be higher at the Si–Ox interface. This introduces a negative charge into the equation. The amount of downward bending in energy is proportional to the voltage drop due to the trapped charges. The voltage drop due to the trapped charges is, in turn, proportional to the charges through the capacitance: DVfb ¼

Qtrapped Cox

Qtrapped Cox pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b Qtrapped Vth ¼ 2/b þ  Vfb0  Cox Cox Vfb ¼ Vfb0 þ

Thus, if the trapped charges are positive, they reduce the threshold voltage. If the trapped charges are negative, they raise threshold voltage. In all cases, these uncontrolled trapped charges lead to an unpredictable and undesigned change in threshold voltage. Section 10.6 discusses a

Fig. 1.72 Flat band diagram with oxide charges. Vfbo is the uncharged flat band potential

1.16

Modes of the MOS Capacitor

65

phenomenon where trapped oxide charges play a critical role in determining the reliability of modern transistors.

1.17 1. 2. 3. 4.

MOS Capacitor Characteristics

Understand MOS capacitance–voltage characteristics Understand MOS charge components Derive MOS capacitance in accumulation and inversion Justify MOS capacitance in inversion.

Figure 1.73 shows the CV characteristics of a MOS capacitor. The characteristics are not immediately obvious. The figure resembles a normal capacitor in the accumulation and inversion regimes with a constant capacitance per unit area equal to Cox. In the depletion region, however, the capacitance has a curious behavior. To understand Fig. 1.73, we have to understand how the different charge components behave under different regimes. Figure 1.74 shows the depletion region depth versus applied voltage across the capacitor. In accumulation, there is no depletion region. Starting around flat band potential, the semiconductor Fermi level starts to approach the midband and the material starts to deplete. The depth of the depletion region is related to the applied voltage through a square relation: Vgb ¼

qNA qNA 2 xd þ x  Vfb Cox 2esi d

Thus the depth of the depletion region rises in a square root fashion through the depletion regime. When the gate to body voltage reaches the threshold voltage, the capacitor enters the inversion regime. As discussed in detail in Sect. 1.16, the depth of the depletion region saturates in the inversion regime at the value:

Fig. 1.74 Depth of the depletion region versus capacitor voltage. There is no depletion region in accumulation. In depletion mode, the depth of the depletion region increases with applied voltage. In (strong) inversion the depth of the depletion region saturates at Xdinv

xdinv

sffiffiffiffiffiffiffiffiffiffiffiffi 4esi /b ¼ qNA

There are three possible charge types that can gather at the silicon–insulator interface. These are namely: holes, electrons, and ions. In a p-type substrate the ions are negative. Ions are immobile, forming a depletion region, while electrons and holes are mobile charge carriers. Figure 1.75 shows the concentration of the three charge types versus applied gate voltage. In accumulation mode, by definition, there are no electrons at the surface. Potential in accumulation mode also does not support the formation of any depletion charge. Thus in accumulation mode, the surface charge is made up entirely of holes. Hole accumulation is linear with applied voltage through (Sect. 1.16): Qacc ¼ Cox ðVgb  Vfb Þ In depletion mode, the hole concentration falls to null. This indicates full carrier depletion. There are also no (or negligible) electrons in depletion mode. The only charge type in depletion mode is the depletion charge. The depletion charge in depletion mode increases proportionate to the depth of the depletion region: Qdep ¼ qNA xd

Fig. 1.73 Gate to body capacitance per unit area versus gate to body voltage. For Vgb < –Vf the capacitor is in accumulation mode. For −Vfb < Vgb < Vth, the capacitor is in depletion mode. For Vgb > Vth the capacitor is in deep inversion

But from Fig. 1.74, we notice that xd increases nonlinearly in the depletion mode, thus the depletion charge also increases in the same fashion. This can be observed for −Vfb < Vgb < Vth in the top subplot of Fig. 1.75. In inversion mode, the hole concentration stays suppressed due to the mass action law. The depletion region depth saturates at Xdinv, leading to a saturation of the amount of depletion charge at pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Qdep ¼  4qNA esi /b

66

1

Devices

the next section where the electron-rich source and drain significantly lower the time-constant. Figure 1.76 shows the total charge on the lower plate of the MOS capacitor. It is simply the summation of the three charge components in Fig. 1.75. The slope of the QV curve is the capacitance. There is a negative sign due to the fact that we are looking at the bottom plate charge. The slope in accumulation and inversion yields a constant capacitance of Cox, which satisfies Fig. 1.73. In depletion, we observe a nonlinear capacitance. The slope progressively deteriorates toward threshold voltage where it is suddenly restored to −Cox. Again, this goes with the behavior of Fig. 1.73. The variability of the capacitance in the depletion regime is due to the nonlinear dependence of depletion depth on the applied voltage.

Fig. 1.75 MOS surface charge density in a p-type substrate versus applied gate voltage. Ions (top), electrons (middle), and holes (bottom)

In inversion mode, the electron concentration increases linearly (Sect. 1.16): Qinv ¼ ðVgb  Vth ÞCox The intercept is at the threshold voltage, below which we can assume no electrons are gathered. Notice that this, like the treatment above, is an approximation. Threshold voltage is not the point at which electrons completely disappear at the substrate interface. Instead, it is the point at which electron concentration is equal to bulk hole concentration. Thus, below the threshold we should ideally account for an exponentially dropping electron concentration. This causes one of the major leakage mechanisms known as subthreshold conduction and will become very important in Chap. 10. Where do the electrons that form the inversion layer come from? They are minorities from the bulk which are forcefully extracted and attracted to the semiconductor–oxide interface. Because these minorities are very scarce, the time-constant for moving an MOS capacitor into inversion mode is huge. This concern is immediately addressed in the MOSFET in

. The MOS capacitor characteristics derived in this section are based on a major assumption: That substrate potential is negligible relative to oxide potential. This assumption is valid in deep inversion and deep accumulation. However, it introduces substantial errors in transitional regions. In the depletion regime, if we had not made this assumption we would need to solve the surface field and charge concentration equations simultaneously. This can be done recursively. However, the correction this provides is very small and does not justify the amount of effort required. The main impact of surface potential is that it means accumulation and inversion regimes start gradually. This section and Sect. 1.16, and particularly Fig. 1.74 suggest accumulation starts at a hard boundary at −Vfb and inversion starts sharply at Vth. In fact both regimes start gradually, transitioning smoothly from depletion regime. The approximation is not very drastic for the accumulation regime because Vfb is a very large value. The range of voltages around Vfb where the secondary effect of Vs exists is thus small relative to Vfb. Thus, the most dramatic impact of this approximation is around Vth. The approximation introduces some error to Vgb values slightly above Vth. But this is not our main area of concern. Our main concern is what happens under Vth. If an accurate calculation is performed we will find that a channel, albeit a weak one exists for a large voltage range below Vth. This leads to the phenomenon of subthreshold conduction, which was at one point the main limiter of transistor scaling (Chap. 10).

1.18

MOSFET Linear Regime

Fig. 1.76 Total substrate surface charge versus gate to substrate voltage

1.18

MOSFET Linear Regime

1. Obtain an expression for charge at any point in the channel 2. Obtain an expression for current at a point in the channel 3. Derive current in the linear regime of MOSFET 4. Understand the nonlinear nature of MOSFET current. MOSFET is an acronym for Metal–Oxide–Semiconductor Field-Effect Transistor. The device is shown in Fig. 1.77. It consists of a vertical structure that is identical to a MOS capacitor. It has two additional terminals corresponding to two additional regions. These two regions are heavily doped and are of the opposite type to the body. In Fig. 1.77 the body is p-type. The two regions, called the source and the drain, are thus heavily doped n-type

67

indicated by n + . The source and the drain address the time-constant issue discussed at the end of Sect. 1.17 by providing a plentiful source of electrons. The MOSFET is a symmetric device along the vertical axis; there is no difference between the construction of the source and the drain. This is in contrast to BJT. The source and the drain each form a PN junction with the body. For proper operation, these PN junctions must be guaranteed to be reverse biased. The safest way to do this is to connect the body of the MOSFET in Fig. 1.77 to the lowest potential in the circuit. If the gate to body potential puts the MOS capacitor in either accumulation or depletion modes, then there can be no conductive path between the source and the drain, and the two are separated by two reverse-biased PN junctions. On the other hand, if the gate to body potential is above the threshold voltage, an inversion layer forms under the oxide. In MOSFET the inversion layer is alternatively known as the channel. The channel forms an n-layer between the source and the drain, which are both also n-type. Thus a relatively low impedance semiconductor resistor is created between the source and the drain. Any potential applied between the source and the drain would thus cause current conduction. Whether or not a current flows between the source and the drain is determined by the presence or lack thereof a channel. This, in turn, depends on the gate potential. Thus, the current between two terminals (source and drain) is dependent on the voltage on a third (gate), which is transistor action. The source and the drain are only distinguished by the connected potentials. The drain is always the terminal with the higher potential. Thus current, if it flows, will flow from the drain to the source. While this may sound counterintuitive, notice that the MOSFET is a unipolar device. In the device in Fig. 1.77, the channel is n-type and the main charge carrier is electrons. Thus although the current flows from drain to source, the main charge carriers, electrons in this case, flow from source to drain. In the MOS capacitor, the voltage of the bottom plate was entirely determined by the body terminal. In the MOSFET, the bottom plate voltage is variable across the length of the channel. It ranges from the drain voltage Vd down to the source voltage Vs. To address this, we assume an arbitrary point c anywhere in the channel between the source and the drain. The voltage difference between the gate and this point is Vgc. Vgc can be obtained in terms of Vgs as Vgc ¼ Vgs  Vcs

Fig. 1.77 MOSFET sketch, showing channel geometry. L is the channel length, W is the channel width, and tox is the thickness of the oxide. S is the source, D is the drain, G is the gate, and B is the body

The electron charge at this point is thus (Sect. 1.17): Qinv ¼ Cox ðVgs  Vcs  Vth Þ

68

1

Devices

When a drain to source voltage is applied, a lateral field is established. This field causes a drift of electrons from source to drain, and a current from drain to source. The drift current is (Sect. 1.5) I ¼ Qinv Wv The velocity can be obtained through mobility and electric field as v ¼ ln nðxÞ

Fig. 1.78 Current–voltage of MOSFET in the linear regime

Notice that we are not assuming a constant field in the channel. We are only assuming that velocity is a linear function of field. Field can be related to voltage through nðxÞ ¼ 

dVcs dx

Substituting for the expression of field in the expression of velocity, and for the expression of velocity in the expression of current: I ¼ Cox ðVgs  Vcs  Vth ÞWln

dVcs dx

This differential equation relates the current I at point c to the voltage Vcs at the same point. However, in steady state, the current must be equal at every location in the channel. Thus, if we find the current in the differential equation, it would be the current anywhere in the channel, and thus the current between the drain and source. We can find this current by finding an expression for Vc and then differentiating it. However, it is much simpler to just integrate the equation: ZL

ZVd Idx ¼ 

0

Cox ðVgs  Vcs  Vth ÞWln dVcs Vs



V2 IL ¼ ln Cox W ðVgs  Vth ÞVds  ds 2



  2 W Vds ðVgs  Vth ÞVds  I ¼ ln Cox L 2 I ¼ kn0

  W V2 ðVgs  Vth ÞVds  ds L 2

The constant Kn’ is equal to the product of oxide capacitance per unit area and electron mobility. It is a constant determined entirely by technology. W/L is the aspect ratio of the channel, and it is determined by the designer (Chap. 7). Note that in the last step, we dropped the negative sign in the current expression. The negative sign only indicates that current and charge carrier flow in opposite

directions. We can safely ignore signs if we are aware that current will flow from drain to source in NMOS. Figure 1.78 shows the Ids versus Vds curve for a MOSFET. This region of operation is called the linear region, despite the obvious nonlinearity. The curve shows a nearly linear dependence for very low values of Vds, which then turns strongly nonlinear. Because an n-only path exists between the source and drain, we expect an ohmic behavior. So, the region of operation is sometimes also known as the ohmic region. The current equation shows current depending on the voltage of three terminal voltages, thus the region of operation is also known as triode. The nonlinearity exhibited in both the equation and the curve deserves a deeper look and will be the focus of part of the next section.

1.19

MOSFET Saturation Regime

1. 2. 3. 4. 5. 6.

Understand channel strangling at the drain Deduce voltage profile across the channel Deduce charge profile across the channel Deduce current profile across the channel Draw potentials, energy, charge, and current in pinch off Understand the saturation current and saturation condition 7. Recognize why current saturates and where excess voltage drop goes.

The charge in the channel is nonconstant. The reason is that the charge at any point is a function of Vcs, but the voltage Vcs is itself a function of distance thus Vcs = Vcs(x) and Q = Q(Vcs) = Q(x). Vcs ranges between 0 at the source and Vds at the drain, thus Vcs(0) = 0, and Vcs(L) = Vds. This means that the charge is nonconstant in the channel, with the charge being the lowest at the drain and the highest at the source. A nonconstant charge creates a nonlinear field (recall Gauss’ law). A nonlinear field will, in turn, dictate a nonlinear voltage profile across the channel. The top subplot in Fig. 1.79 shows the distribution of voltage across the channel. As expected it is nonlinear.

1.19

MOSFET Saturation Regime

69

Fig. 1.79 Top to bottom: voltage, charge, energy, and velocity in a MOSFET channel. x is the distance along the channel from the source to the drain

Because the magnitude of the channel charge is directly proportional to Vcs, the charge profile has a shape similar to the voltage profile. The energy band diagram is proportionate to the voltage profile, and thus has a similar shape as well, although it is inverted due to the negative energy– voltage relation. We assume the drain and source have zero resistance. The charge at the source is

Qinv ¼ Cox ðVgs  Vth Þ Whereas the absolute value of charge at the drain is obviously lower: Qinv ¼ Cox ðVgs  Vds  Vth Þ The higher the drain voltage, the larger the lateral field in the channel, and the higher the velocity of charge carriers.

70

1

But from the above equation, the higher the drain voltage, the lower the amount of charge at the drain. Thus raising drain voltage causes two conflicting effects (recall that drift current is roughly the product of charge and velocity): • Increased electron velocity, promoting a higher channel current • A decreased amount of charge in the channel, promoting a lower channel current. The interaction of these two effects yields the non-linear curve in Fig. 1.78. The higher drift velocity increases the current, while the decreased charge dampens this increase. Obviously, the increased velocity is winning throughout Fig. 1.78, with the current increasing through the figure. The IV equation used to draw Fig. 1.78 is parabolic:  2 I ¼ K ðVgs  Vth ÞVds  0:5Vds Any parabola must have an extremum, in this case a maximum point. This maximum point is of interest, because if Fig. 1.78 is extended beyond the maximum of the parabola, we would have an I–V curve with a negative slope. This would indicate a negative large signal resistance, which is impossible. Thus something interesting must be happening at this maximum point. The maximum of the parabola occurs at 

2

I ¼ K ðVgs  Vth ÞVds  0:5Vds

@ ! @Vds

ðVgs  Vth Þ  Vdssat ¼ 0 Vdssat ¼ ðVgs  Vth Þ The value of Vds at which the maximum of the parabola occurs is known as Vdssat, and it is a critically important parameter. Higher drain potential lowers the inversion charge at the drain. There must be a value for drain potential at which the channel disappears completely at the drain. This can be found by equating the drain-side inversion charge to zero:

Devices

linear or ohmic regime because the device exhibits the properties of a (nonlinear) resistor. It is called the triode region because all three voltage terminals have an effect on current value. The triode region in MOSFET is analogous to the saturation mode of the BJT. Thus Vdssat is defined as the point at which the MOSFET exits the linear regime and enters the saturation regime. The question now is what happens to the current at and beyond the pinch-off point. The answer is both immediately obvious and controversial. The current in saturation is the same as the current at pinch off. That is to say, current for any value of Vds above Vdssat saturates at its value at pinch off:  2 Isat ¼ Ipinchoff ¼ K ðVgs  Vth ÞVdssat  0:5Vdssat Isat ¼

K ðVgs  Vth Þ2 2

where: K ¼ ln Cox

W L

Notice that for saturation current to flow the transistor has to have a channel and that channel has to be pinched off. Thus for the MOSFET to be in saturation: Vgs [ Vth Vds [ Vgs  Vth Figure 1.80 shows MOSFET I–Vds characteristics for different Vgs. Raising Vgs increases the amount of charge in the channel. Thus, it increases the current in both current regimes (ohmic and saturation). Sweeping Vds, however, will always lead to a maximum current that cannot be exceeded. This maximum current is called the saturation current. There are two major questions about saturation current: How can it even flow? And why does its value saturate? When the channel completely disappears or pinches off at the drain, what is it replaced with?

Qinv ¼ Cox ðVgs  Vds  Vth Þ ¼ 0 Vds ¼ ðVgs  Vth Þ Thus, curiously, the value of drain voltage at which the channel disappears at the drain is also the value of voltage at which the I–V characteristics reach the maximum, Vdssat. This phenomenon is called pinch off. And the point at which it occurs is called the pinch-off point. For values of Vds higher than Vdssat, the transistor is considered to be in a new region of operation, called saturation. The current–voltage regime described in Sect. 1.18 is called the linear, ohmic, or triode regime. It is called the

Fig. 1.80 MOSFET I–Vds characteristics. Current increases nonlinearly in triode (ohmic) region and then saturates at a maximum value for a given Vgs in the saturation regime. In saturation, only Vgs has an impact on current

1.19

MOSFET Saturation Regime

71

Fig. 1.81 Charge and voltage in the channel at pinch off

Recall from Sects. 1.16 and 1.17 that before an inversion layer forms, a depletion layer has to form at the surface of the substrate, and its depth has to saturate at a maximum value, Xdinv. Thus, the MOSFET channel is completely surrounded by a depletion layer of depth Xdinv (Sect. 1.17). So when the channel pinches off at the drain, we observe a small depletion region between the end of the channel and the drain. Figure 1.81 illustrates this. When Vds = Vdssat, the channel voltage profile is stretched up and the charge profile is pulled down so that |Qinv| is zero at the drain. Thus, an infinitesimally small depletion region occurs between the end of the channel and the edge of the drain. So how can current flow through this depletion point at the drain? And why is this current constant for any value of Vds greater than Vdssat? Figure 1.82 shows the band diagram of a pinched off channel. It is a scaled version of the voltage profile. Drift current was shown to be the product of charge density and velocity (Sect. 1.5). Velocity is proportional to the electric field. Electric field is proportionate to the gradient of the voltage. Thus, the magnitude of the electric field is also proportionate to the gradient of the band diagram. It is worth wondering whether drain voltage rising above the pinchoff point would cause the area under the oxide to accumulate holes. The answer is no, because hole accumulation requires a vertical bottom to top field. Since the body is at ground, the field will always be top to bottom, preventing any accumulation unless gate voltage is negative. To a first order the drain only couples to the body through the depletion

region and thus is not able to affect mobile charges in the channel beyond defining electron density at the drain. In very short channels, on the other hand, the drain can couple to the channel (Chap. 10). In Fig. 1.82, the slope of the curve increases toward the drain. This indicates that velocity also increases toward the drain. This can also be seen in Figs. 1.79 and 1.83. The inversion charge drops toward the drain. Current is the product of velocity and charge. In steady state, the current must be the same everywhere. Thus, the velocity and charge increase and decrease, respectively, toward the drain, but they maintain a constant product equal to saturation current. We can obtain an expression for the channel to source voltage as follows: rffiffiffiffiffiffiffiffiffiffiffi  x Vcs ¼ ðVgs  Vth Þ 1  1  L This expression applies only if the channel is pinched off at the drain. At the source the voltage drop is zero, while at the drain the voltage drop is Vgs-Vth. The differential equation describing channel current is IðxÞ ¼ Cox ðVgs  Vcs  Vth ÞWln :

dVcs dx

This is the current at point x, but it also has to be the current everywhere in the channel. Since we have an expression for channel voltage in pinch off, we can plug it in and find out what the current expression boils down to

72

1

Devices

Fig. 1.82 Band diagram of channel at pinch off

dVcs dx rffiffiffiffiffiffiffiffiffiffiffi  x Vcs ¼ ðVgs  Vth Þ 1  1  L

IðxÞ ¼ Cox ðVgs  Vcs  Vth ÞWln :

1 ðVgs  Vth Þ dVcs 2L ¼ pffiffiffiffiffiffiffiffiffiffixffi dx 1L

IðxÞ ¼ 

ln Cox W 2L

ðVgs  Vth ÞðVgs  Vcs  Vth Þ pffiffiffiffiffiffiffiffiffiffixffi 1L

The derivative of this expression of current with x is null, and thus the current can be shown to be Isat at any point in the channel including at the drain and the source. This shows that current will indeed still flow at and above pinch off, despite the presence of a depletion region at the end of the channel. Figure 1.83 shows what happens if the drain voltage rises above pinch-off point. The charge profile is identical to the pinch off case in Fig. 1.80. This is because the charge has already disappeared in pinch off, so any additional voltage cannot have any further impact. The voltage profile is very interesting. We see the exact same profile as in Fig. 1.80. The excess voltage falls entirely on the drain in zero distance. This excess voltage called Vdrop can be calculated as

in Fig. 1.80, and thus the velocity profile remains the same as in pinch off. But at the drain, the electrons observe a sudden drop in energy. The electrons accept the drop and fall into the trap of the drain. Thus the current can and will flow in the depletion region since the energy drop is in a favorable direction for the charge carriers in the channel. This is analogous to electrons being swept across the BC depletion region in BJT. We already know from the above discussion in pinch off that current is a function of both the slope of the band diagram (velocity) and amount of charge in the channel. In Fig. 1.83, both charge profile and velocity profile remain identical throughout the channel to Fig. 1.80. Thus the current above pinch off remains the same as in pinch off. And the current saturates for any value of Vds above Vdssat. There remains one contentious point: The balance of drain voltage above pinch off falls entirely on the drain. This creates a sudden drop in energy bands. This suggests an infinite field, so how does this create a finite current into the drain. Notice that the assumption that the electric field is infinite is contingent on the assumption that the depletion region is perfectly insulating. Thus, the current density can still remain finite and equal to saturation current even if the field is infinite: J ¼ rn

Vdrop ¼ Vd  Vpinchoff

rdep ¼ 0

This drop actually appears across the depletion region that is created at the drain–channel interface. The slope of the conduction band in Fig. 1.83 is a representation of the channel field. Electrons fall down the slope of the channel. They accelerate from the source to the drain as the slope increases in that direction. Meanwhile, the charge population drops toward the drain. and the product remains constant. But what happens at the drain in Fig. 1.83? Through the channel, the slope of the band diagram remains the same as

ndep ¼ 1 J ¼ rn ¼ 0:1 ¼ Jsat The assumption that depletion layer conductivity is nonexistent is too simplistic. From Sect. 1.16, we conclude that the lowest charge concentrations observable are when the Fermi level approaches the mid-gap. Thus, the depletion layer can have conductivity similar to intrinsic silicon. This is a low but nonzero conductivity.

1.19

MOSFET Saturation Regime

Fig. 1.83 Voltage, velocity, charge, and energy band diagram beyond pinch off

73

74

1

The finite nonzero conductivity of the depletion region causes the pinch-off point to happen earlier in the channel as Vds rises. The distance at which pinch off happens grows proportionately to the excess drain to source voltage. This ensures a constant, and high field exists in the depletion region. This high field always works out so that the conductivity and field together yield continuity with the saturation current. This can be used to estimate the pinch-off point in terms of excess drain voltage: Vexcess ¼ Vds  ðVgs  Vth Þ Jsat ¼ rdep ndep ¼ xpinchoff ¼ ndep

Vexcess rdep xpinchoff

Vexcess rdep Jsat

Vexcess ¼ xpinchoff

. It is legitimate to ask why the excess drain voltage is not shared between the drain and the channel. In other words, in Fig. 1.83, why does Vdrop fall entirely on the drain depletion point? The alternative would be for the excess Vdrop to be shared between the channel and the drain depletion region. If this were the case, then the point at which the gate to channel voltage is equal to threshold voltage would be pushed closer to the source. This causes the depletion region between the edge of the channel and the edge of the drain to widen. If we assume a perfectly insulating depletion layer, this would cause a finite voltage drop, equal to a proportion of Vdrop, to occur on a finite distance, creating a finite field. Because the conductivity of the depletion layer is null, and the field is finite, the widened depletion region cannot support a current. However, the charge-velocity profile in the channel would still indicate current flow. There is a contradiction: the channel tries to pass a current, while the widened zero conductivity depletion layer cannot support any current. The only way for this contradiction to be resolved is for the depletion layer to be zero width and thus for the entirety of Vdrop to occur on the depletion layer.

This is known as channel length modulation and is an important secondary effect. It will be discussed in more detail in Sect. 1.21.

Devices

The above treatment still does not answer one question: An infinite field suggests that electrons can flow at infinite speed: vn ¼ l n n Since we are certain that drift velocity is at least limited by the velocity of light, then we must be doing something wrong. The wrong assumption is that velocity remains a linear function of field for very high values of electric field. This is a false assumption, and at high values of field we have to assume a nonconstant mobility. This will directly lead to a discussion of velocity saturation in Chap. 10.

1.20

Body Effect

1. Recognize the role of source to channel capacitance in channel formation as a cause of body effect 2. Draw quasi-equilibrium bands for the body–-source junction 3. Recognize the requirements for channel formation with the source present 4. Derive a relation for threshold as a function of body voltage. In Sect. 1.18 we discovered that the main difference between an MOS capacitor and MOSFET is that the bottom plate voltage in the MOS cap is controlled entirely by the body, while in MOSFET it is controlled by the source, body, and drain. This also impacts the threshold voltage. This is because in Sect. 1.16 we assumed all charges in the channel are exclusively brought there by the potential between the body and the gate. In Fig. 1.70, the gate could influence the charge in the inversion layer through the capacitance formed by the oxide. The body could influence the charge in the inversion layer through the capacitance formed by the depletion layer. Figure 1.84 shows the situation in a MOSFET. The body, gate, and source can all couple to the channel, and thus they can all affect the channel charge. The gate couples through Cox, which is identical to the case of the MOS cap. The body observes a capacitance formed between the substrate and the channel as conductive plates separated by the depletion layer as an insulator. This is also observed in the MOS cap. The new phenomenon in Fig. 1.84 is that the source also couples to the channel through the capacitance Csb. Notice that the source is also surrounded by a depletion layer, this is because for normal operation the source to body PN junction must always be reverse biased.

1.20

Body Effect

75

The source to body capacitance induces a charge on the channel the same way gate to body potential does DQchannel ¼ Vsb Csb ¼ Vsb Cdep We are making a very strong approximation here assuming Csb has the same value as Cdep. Typically the depth of the channel depletion layer will be different from the depletion layer formed by the reverse bias of the source to body PN junction. From Fig. 1.84, an increase in source potential would cause the source to couple electrons away from the channel. Thus the charge created by coupling from the source should take away from inversion charges: Qchannel ¼ Cox ðVgs  Vth Þ þ Vsb Cdep Thus, the amount of charge in the channel is reduced. This reduction is usually weak. In fact, the expression above overestimates the amount of change significantly, since it assumes the source can couple to the entirety of the channel as a parallel plate capacitor. The reality is that it only manages to couple to the parts of the channel closest to the source. The effect of the source to channel coupling is clearest in its impact on the threshold voltage. Because an increased

Fig. 1.85 Band diagram of PN junction at body–source. Top, no source voltage. Bottom, applied source voltage. The distance A is used to calculate n in the channel and it increases by qVsb in the bottom diagram

Fig. 1.84 There are three capacitors coupling three terminals to the channel

source voltage takes away some of the channel charges, this leads to the observation of an increased effective threshold voltage. This effect is called the body effect and is a secondary though very important effect on MOSFET behavior.

76

1

To quantify the body effect, examine Fig. 1.85. This is the band diagram of the body–source interface. This interface forms a PN junction. This junction is reverse biased for any normal operation of MOSFET to take place. While calculating the threshold voltage, we assumed that this PN junction is at thermal equilibrium. As shown in the top subplot of Fig. 1.85, this only happens when the source is grounded. However, in the normal operating conditions of MOSFETs, we cannot assume that all sources are grounded. The body can be assumed to be grounded, but the source can be higher than the ground. Thus, in general, we have to assume the PN junction is reverse biased rather than in equilibrium, with the latter being a special case where Vsb = 0. The bottom subplot of Fig. 1.85 shows the band diagram for a reverse biased source to body junction. There is no unique Fermi level, the Fermi level splits into two quasi-Fermi levels. From Sect. 1.12, the quasi-Fermi levels correspond to the Fermi levels in the neutral zones. Because there is no unique Fermi level, mass action law cannot be used and carrier concentration should be calculated from the quasi-Fermi levels. Electron concentration in the p-substrate is calculated from Efn. Efn is dictated by the equilibrium level in the source, since in the neutral zone of the source the electron concentration has to remain unchanged at n + . Thus electron concentration in the substrate: nsubstrate ¼ ni e

ðEFn Ei Þ=kT

In the MOS capacitor, we could make the assumption that the quasi-Fermi level for electrons is the Fermi level of the substrate: EFn ¼ EFsubstrate This leads to a threshold surface potential equal to twice the bulk voltage, which is simply the difference between the intrinsic and substrate Fermi levels (see Sect. 1.16): Vs ¼ 2/b ¼ 2ðEi  EFsubstrate Þ=q When a source voltage is applied in a MOSFET, this creates nonequilibrium in the PN junction, and the quasi-Fermi level for electrons in the substrate migrates to the equilibrium Fermi level of the source (Sect. 1.12): EFnsubstrate ! EFnsource Thus the surface potential when inversion occurs becomes Ei  EFnsource Vs ¼ q Ei  EFnsubstrate þ EFnsubstrate  EFnsource ¼ q EFnsubstrate  EFnsource Vs ¼ 2/b þ q

Devices

The difference in the Fermi level in the source and the Fermi level in the substrate is proportional to the applied reverse voltage in the junction. This can be seen in Fig. 1.85: EFnsubstrate  EFnsource ¼ Vsb q Vs ¼ 2/b þ Vsb The threshold voltage for a MOSFET can be obtained from the threshold voltage for the MOS capacitor if every instance of surface potential, and thus double bulk voltage, is replaced with the value of surface potential above. Notice that the flat band voltage is also shifted by the source to bulk voltage because the amount of external voltage required to flatten the MOS bands increases: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b Vth ¼ 2/b þ  Vfb Cox pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qNA esi ð2/b þ Vsb Þ ! 2/b þ Vsb þ  Vfb  Vsb Cox The expression for MOSFET threshold voltage is thus: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qNA esi ð2/b þ Vsb Þ  Vfb Vth ¼ 2/b þ Cox It is useful to restate this expression in terms of the MOS capacitor threshold voltage plus any balance components. To do this, we add and subtract the MOS capacitor oxide voltage term: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b 2qNA esi ð2/b þ Vsb Þ Vth ¼ 2/b þ þ Cox pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Cox 4qNA esi /b   Vfb Cox We then define Vtho, which is both the MOS capacitor threshold voltage as well as the threshold voltage of the MOSFET with zero source to body potential: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b Vth0 ¼ 2/b þ  Vfb Cox And thus, the MOSFET threshold voltage is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qNA esi /b 2qNA esi ð2/b þ Vsb Þ Vth ¼ Vth0 þ  Cox Cox Which can be restated as pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2qNA esi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi ð2/b þ Vsb Þ  2/b Vth ¼ Vth0 þ Cox And: Vth ¼ Vth0 þ c

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi ð2/b þ Vsb Þ  2/b

1.20

Body Effect

77

where: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qNA esi c¼ Cox Gamma is the body-effect parameter. Its unit is the square root of volt. Ideally, it should be very small to reduce the body effect. But since gamma is an entirely technology-determined parameter, the only practical way to impact body effect is to minimize Vsb. Because body voltage tends to be shared between multiple transistors (Chap. 10), while source potential is determined by circuit operation; minimizing Vsb is no easy task. Why does the drain not couple to the channel as well? The drain is at a higher potential than the source. This creates a thicker depletion region around it. This reduces the capacitive coupling between the drain and channel, making its impact small relative to the source. However, if the channel is very short, the depletion region formed by the drain becomes substantial relative to the length of the channel. In such a case, the drain will have a fundamental impact on MOSFET behavior. This is a phenomenon known as drain-induced barrier lowering and is discussed in detail in Sect. 10.4.

1.21

Channel Length Modulation

1. Relate channel length modulation to finite depletion resistivity 2. Draw the band diagram of the channel with length modulation 3. Understand that drift increases due to channel length modulation 4. Justify the channel length modulation current model. For most of this chapter, we assumed that depletion regions had zero conductivity. In Sect. 1.19, we observed that the main effect of finite depletion conductivity is to cause the point at which the channel pinches off to move closer to the source as more drain voltage is applied. The width of the depletion region Xpinchoff can be estimated in terms of the excess voltage Vexcess = Vds–Vdssat, the conductivity of the depletion region, and the saturation current density as xpinchoff ¼

Vexcess rdep Jsat

With zero depletion conductivity, Xpinchoff is zero and pinch off occurs only at the drain. But with nonzero depletion region conductivity, we observe a nonzero Xpinchoff. This phenomenon is known as channel length modulation. It introduces some dependence of Isat on Vds. It is a secondary effect with a minor impact on long channel MOSFETs. However, when channel dimensions are scaled down, depletion conductivity remains constant, while Jsat and Vexcess drop with the same ratio. This causes Xpinchoff to remain roughly constant as channel dimensions are shrunk. Thus, channel length modulation can have a dramatic effect on short channel transistors (Chap. 10). The question is, what kind of dependence does this introduce to the current equation? Developing an analytical model for channel length modulation is extremely difficult. However, we can answer two basic questions: Is the relationship directly proportional? And is it linear? Figure 1.86 shows the effect of channel length modulation. The charge distribution in the channel is nonlinear and drops to zero charge at the point where the channel pinches off. If the depletion region has finite conductivity, the channel pinches off at L-Xpinchoff, where Xpinchoff is determined by Vexcess as described above. This causes the curve for Qinv to become sharper as the pinch-off point is moved closer to the source. The level of Vds at the pinch-off point is always Vdssat. Thus, as Vds increases and the pinch-off point moves closer to the source, the point at which Vcs is equal to Vdssat moves closer to the source. The remainder of Vds is Vexcess and it falls over the distance Xpinchoff between the points x = L-Xpinchoff and x = L. The electric field in the depletion region is determined by Vexcess and Xpinchoff, and is equal to ndepletion ¼

Vexcess xpinchoff

This electric field, and thus Xpinchoff are modulated so that Jdepletion is equal to Jsat. Inspecting Fig. 1.86, we observe that the slope of the energy band diagram at L-Xpinchoff increases (in magnitude) the larger the Vexcess. This is because the total amount of bending remains the same, but it must be finished in a shorter distance. This means that at the end of the channel the velocity of carriers increases. In fact, the velocity profile for every point in the channel increases as Vexcess increases. However, the amount of charge in the corresponding point remains the same. Because current is the product of charge and velocity, a higher saturation current density flows in response to a higher Vds. We can also understand this by thinking of the channel as getting shorter with more Vexcess. Since the drop on the channel remains at Vdssat while its effective length decreases,

78 Fig. 1.86 Voltage, charge, energy, and velocity for a pinched off channel where the depletion region has a non-zero conductance. A finite field is established in the depletion region. v1 < v2 < v3 as the slope of the curve rises with more Vexcess

1

Devices

1.21

Channel Length Modulation

this means the channel observes a higher lateral field and more drift velocity. We can also conclude that the dependence on Vds will be roughly linear, and that it should be a relatively weak relation. The linearity is due to the linear dependence of Xpinchoff on Vexcess. The model used most extensively for saturation current with channel length modulation is Isat ¼

K ðVgs  Vth Þ2 ð1 þ kVds Þ 2

79

The channel length modulation factor k has a unit of 1/V. It is usually a very small number. Deriving this factor from first principles is nearly impossible. For long channel devices, we can conclude it will depend on depletion region conductivity and channel length. However, channel length modulation is a more important effect in short channel devices. And the causes of the effect in short channel devices are significantly more complicated. Thus, curve fitting and empirical methods are almost always used to model it.

2

Ratioed Logic

2.1

PMOS

1. Understand the complementary nature of NMOS and PMOS 2. Recognize the inferior properties of PMOS relative to NMOS 3. Understand current and carrier flow directions in PMOS 4. Distinguish the necessity of PMOS as opposed to pnp. Chapter 1 focused on a MOSFET device where the substrate is p-type; while the source, drain, and channel are all n-type. Because the main charge carriers in the channel are electrons, this kind of device is called NMOS. In Sect. 1.13, while discussing BJT, we mentioned the pnp transistor as a complementary device for npn. However, due to the inferior mobility of holes, and the specialty role of BJT, pnp transistors are rarely used. The complementary device of the NMOS MOSFET is the PMOS which is shown in Fig. 2.1. Although the PMOS suffers from the same mobility issues of pnp, PMOS is very widely used. By the end of this chapter, we will conclude that it is very difficult to design a logic family where noise margins and power dissipation are favorable using only NMOS devices. Thus, PMOS plays an integral role in CMOS design (Chap. 3). The PMOS transistor is very similar to the NMOS in Chap. 1. The only difference is that the body is n-type and the source and drain are p+. The body forms PN junctions with both the source and drain. These PN junctions must be reverse biased for proper operation. This can be guaranteed if the body of PMOS is connected to the supply voltage Vdd. For the PMOS transistor, the flat band voltage is positive. Applying gate to body (more properly gate to source) voltage higher than flat band voltage would attract more electrons to the oxide–body interface. Thus, it acts in accumulation mode. For values of Vgs lower than Vfb, the PMOS starts to develop a depletion region below the oxide. The lower the

value of Vgs, the thicker the depletion region. The depth of the depletion region saturates at Vthp. Vthp is thus a negative voltage. Applying Vgs lower than Vthp attracts holes under the oxide, forming a p channel. We can already see that the NMOS and PMOS have a similar but complementary operation. For example, a channel forms in the PMOS for low gate voltage, but forms in the NMOS in response to high gate voltage. Accumulation is observed in NMOS in response to low gate voltage but is seen in PMOS in response to high gate voltage. Figure 2.2 contrasts the NMOS and PMOS. Assuming a channel has formed in the PMOS, the application of a drain to source potential creates a lateral field, causing drift current to flow. The main charge carrier in the channel is holes. Holes flow from the higher voltage to the lower voltage. Thus, the source for PMOS is the higher potential, while the drain is the lower potential. PMOS current flows in the same direction of the holes, and thus it flows from source to drain. In NMOS, the source is a source of electrons. Since electrons flow from lower to higher potential, the lower potential terminal is the source, while the higher potential terminal is the drain. Current in NMOS flows from drain to source. In both NMOS and PMOS, the determination of whether a terminal is a source or a drain is fully dependent on the rest of the circuit and steady-state voltages. The threshold voltage for PMOS is a negative number and has the equation: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qND esi ðj2/b j  Vsb Þ Vthp ¼ j2/b j  þ Vfb Cox Note that Vsb is a negative potential because PMOS body is connected to supply. The flat band potential is positive. Bulk potential can be calculated along the same lines of NMOS, using Nd for bulk doping. A PMOS drain is the lower potential. The lower the drain potential, the less hole inversion charge on the drain end of the channel. Once the drain voltage reaches a low enough

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_2

81

82

2

Ratioed Logic

3. Recognize the problems of an imperfect off switch 4. Realize the fundamental impact of finite on-resistance on delay.

Fig. 2.1 PMOS transistor

level, the channel completely pinches off at the drain. This is the condition for saturation in PMOS and can be estimated as Vdssatp ¼ Vgs  Vthp Channel length modulation affects PMOS the same way and for the same reason as NMOS. The channel length modulation parameter for PMOS is a negative number and its magnitude is usually marginally larger than NMOS. PMOS current will be derived briefly in Sect. 2.2. However, we can clearly see that PMOS current is a hole drift current. From Sect. 1.5, hole mobility is noticeably lower than electron mobility. For PMOS, this translates into a lower on-current, lower transconductance, and higher on-resistance. This will require special design considerations for delay in Chap. 3.

2.2

Regions of the MOSFET

1. Summarize current equations and region of operation conditions for NMOS and PMOS 2. Understand the logic model of MOSFET as switches

Fig. 2.2 PMOS versus NMOS transistor voltages, charge flow and current. Formation of a channel in PMOS requires a negative gate voltage. In NMOS, charge flow is opposite to current flow, making the

The derivation of MOSFET behavior in Chap. 1 is instrumental to a deep understanding of how MOSFET logic operates. It is also a requirement for understanding the many fundamental distortions that MOSFETs experience as their dimensions are shrunk (Chap. 10). However, we also need to develop a simple model for MOSFET devices that allows us to analyze complicated digital circuits in steady state and while switching. We can distinguish three regions of operation (modes) for the MOSFET: • Cutoff. Alternatively known as off. In this region there is no channel. Thus, no current will flow regardless of the applied drain to source potential. The resistance observed between the source and drain in this mode should ideally be infinite. This corresponds to the accumulation and depletion modes of the MOS capacitor • Ohmic. Alternatively known as linear or triode. This is the region of operation discussed in detail in Sect. 1.18. A channel has formed, but the magnitude of the drain to source potential is not high enough to cause the channel to pinch off at the drain. The current is controlled by all three terminals of the device, thus “triode”. The device behaves like an ohmic resistance, with a linear characteristic for small Vds. The resistance observed between the source and drain is relatively small • Saturation. This is the behavior discussed in Sect. 1.19. The channel has formed under the oxide, and a large enough drain to source potential exists to pinch off the channel at the drain. The current saturates at its pinch-off level and is thus only determined by the gate and source terminals. The drain terminal loses (to first order) control over the current.

drain the higher potential. In PMOS, charge flow is in the same direction as the current, making the drain the lower potential

2.2 Regions of the MOSFET

The current equations in the three regions of operation for NMOS are Icutoff  ¼ 0   V2 W Iohmic ¼ ln Cox L Vgs  Vth Vds  2ds W ðVgs  Vth Þ2 Isat ¼ ln Cox 2L Determining which region the NMOS operates in requires a test of terminal voltages. If the gate is not above the source by at least a threshold voltage, then there is no inversion layer anywhere, and the transistor is cutoff. On the other hand, if Vgs is above the threshold voltage, then the transistor is “on” but can be either saturated or ohmic. To determine which, we have to test for pinch off by observing if Vds is higher than Vdssat: Vgs \Vth ! cutoff Vgs [ Vth ! on ! Vds \Vgs  Vth ! ohmic ! Vds [ Vgs  Vth ! saturation The threshold voltage for NMOS is a positive number and its expression including body effect (Sect. 1.20) is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi Vth ¼ Vth0 þ c ð2/b þ Vsb Þ  2/b When channel length modulation is observed, it impacts the saturation current by adding a weak dependence on Vds, Idssat ¼ ln Cox

W ðVgs  Vth Þ2 ð1 þ kVds Þ 2L

The PMOS has very similar current equations and conditions for regions of operation. However, because the charge carrier is different and the source is the higher potential, we have to make some adjustments. The simplest way to adjust for PMOS is to • Preserve all current equation expressions, but substitute for all quantities with their proper sign. Realize that MOSFET current flows from source to drain and that the source is the higher potential • Use the same conditions for regions of operation as NMOS, but invert the direction of the inequality

83

Vgs [ Vthp ! cutoff Vgs \Vthp ! on : ! Vds [ Vgs  Vthp ! ohmic ! Vds \Vgs  Vthp ! saturation Recall again that all potentials, including the threshold voltage, have to be substituted with their proper sign. The flipping of the inequalities relative to NMOS is justified by the inverted charge carrier. We need a low gate voltage to attract holes and create a channel, turning the transistor on. Saturation always occurs when a large absolute potential difference occurs across the channel, pinching it off. In NMOS, the drain is the higher potential, thus pinching off the channel requires pulling the drain up if the source remains clamped. For PMOS, the drain is the lower potential, so pinching off requires pulling the drain down while the source is clamped. For PMOS, the threshold voltage is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qND esi ðj2/b j  Vsb Þ Vthp ¼ j2/b j  þ Vfb Cox where the zero source to body threshold voltage is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4qND esi j/b j þ Vfb Vthp0 ¼ j2/b j  Cox and the PMOS body effect constant is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2qND esi cp ¼  Cox The bulk potential is calculated for the n-substrate. Channel length modulation impacts PMOS identically to NMOS. However, notice that the channel length modulation constant for PMOS is negative: Idssat ¼ lp Cox

  W ðVgs  Vthp Þ2 1 þ kp Vds 2L

Figure 2.3 shows the circuit symbols for NMOS and PMOS. The MOSFET is, strictly speaking, a four-terminal device. The four terminals are namely: Source, Drain, Gate,

Thus, PMOS current is Icutoff  ¼ 0   V2 W Iohmic ¼ lp Cox L Vgs  Vthp Vds  2ds W ðVgs  Vthp Þ2 Isat ¼ lp Cox 2L Notice that Vthp is a negative number, but also that Vgs and Vds are negative numbers. PMOS current will always flow from source to drain. The inequalities for regions of operation are

Fig. 2.3 Circuit symbols for PMOS (left) and NMOS (right). Body terminals are usually not drawn and are assumed to be connected to one of the two rails

84

and Body. As shown in Fig. 2.3, the NMOS and PMOS are usually drawn as three-terminal devices, with the body terminal dropped. In such cases, we will always assume the body is connected to the proper rail. For NMOS, this would mean the body is connected to ground, while for PMOS it is connected to supply (Vdd). In Chap. 7, we will see that this assumption is very reasonable. In this chapter and up to Chap. 7, we will consider the use of MOSFETs to build logic gates. In all these cases, MOSFETs are used more or less as switches. Switches lend themselves readily to the design of logic. Figure 2.4 shows how two switches arranged in parallel realize OR logic, and how two switches arranged in series realize AND logic. The inputs A and B in Fig. 2.4 are control signals to the switches and are assumed active-high. Switches arranged in combinations of series and parallel can realize any random logic function (Sect. 2.12 and most of Chap. 3). Everything up to and including Chap. 5 will be built on the assumption that MOSFETs are switches. Thus, it is important to consider how good of a switch the MOSFET actually is. Figure 2.5 shows an ideal switch in the on and off state. Ideally, the switch should be a perfect open circuit when off, and a perfect short circuit when on. If we consider the three regions of operation of the MOSFET in Sect. 2.1, the cutoff region is perfectly suited for the off state of the switch. This is because the current

Fig. 2.4 Switches connected in parallel pass the input to output if either control is on, thus realizing an OR function between the two controls. Switches in series realize AND logic

2

Ratioed Logic

does not flow in the MOSFET if it is cutoff regardless of the applied Vds. This translates into infinite resistance, which is the behavior we were expecting from the ideal switch in Fig. 2.5. In Chap. 10, we will discuss MOSFET leakage in detail. Leakage reduces the off resistance of MOSFET, leading to severe issues, especially for dynamic circuits. We hit a fundamental issue, however, when considering the on-resistance of the MOSFET as a switch. Figure 1.80 shows that at no point on the I–V graph in either saturation or ohmic do we have zero resistance. In fact, the slope of the I–V curve in Fig. 1.80 is zero in saturation, indicating zero conductance. The slope of the curve increases in ohmic region as we head toward Vds = 0. The highest possible conductance is reached at Vds = 0. This corresponds to the minimum achievable resistance for an on MOSFET switch, and it can be calculated from the slope of the I–V curve as      @Ids @Vds ¼ K Vgs  Vth  Vds jVds¼0 ¼ K Vgs  Vth Rmin ¼ K V 1V ð gs th Þ This on-resistance will never be zero for finite values of Vgs. Thus we can conclude that the MOSFET can, to first order, act as an ideal off switch, providing a nearly perfect open circuit. It cannot act as an ideal on switch. The best we can do is try to operate as close as possible to the Vds = 0 point and use K to try to reduce on-resistance. From the above we can conclude that the two regions of operation for a steady-state MOSFET in a digital circuit are cutoff and (deep) ohmic. This gives us the two extremes of resistance approximating an on and off switch. This is not to imply that saturation region is of no importance to digital circuits. While transistors in steady state are never in saturation, all transistors that are switching their state from on to off or vice versa will pass through saturation region. Thus saturation region plays a fundamental role in the dynamic behavior of gates (Sect. 3.5). Delay in logic circuits is a result of parasitic capacitors, which exist at all nodes, having to be charged and discharged through the nonzero resistance of transistors. As we have seen in Sect. 1.19, the highest current we will observe for a certain Vgs is saturation current, thus saturation current is responsible for most of the charging and discharging of capacitors. A back of the envelope analysis of delay is Basic capacitor I–V equation: IC ¼ C: dV dt dt ¼ C:dV=IC

Fig. 2.5 An ideal switch is an open circuit when off and a short circuit when on

Thus, delay or dt is heavily dependent on the maximum available current to discharge the capacitor Ic. This is always the saturation current. This delay model will be developed systematically in Sect. 3.5.

2.3 BJT Logic Table 2.1 Truth table of BJT inverter. The table includes both logic and electrical information about the inverter

2.3

85 Logic in

Vin

Q region

Current

Logic out

Vout

“0”

Low enough for Q off

Off

Zero

“1”

Vcc

“1”

Very high, e.g., Vcc

Saturation

Nonzero

“0”

Vcesat

BJT Logic

1. Derive logic low output for a bipolar to resistor inverter. 2. Derive logic high output for a bipolar to resistor inverter Consider the inverter based on a bipolar transistor, shown in Fig. 2.6. This inverter is sometimes called an RTL or Resistor to Transistor Logic inverter. To understand the basic operation of any inverter, first assume its input is a very low voltage and examine the output. You would naturally expect the output to be high if the inverter is functioning properly. Then assume the input is very large and examine the output, in which case you should expect the output to be low. While doing this, it is important to take note of the voltage values at the output corresponding to each of the inputs. This represents the “clean” or nominal values for our logic outputs. It is also useful to note the value of the current flowing in each of these states. As will become clear in Sect. 2.11, current flow can be used to estimate static power dissipation. If we assume a very high input to the inverter in Fig. 2.6, then the transistor Q is certainly on. For proper operation as an inverter, the value of the load and base resistances and the input voltage must be chosen so that the transistor is in saturation. In saturation the BJT has Vce = Vcesat, a value typically around 0.2 V (Sect. 1.13). Thus, Vout will be around 0.2 V and the high input voltage leads to a low output voltage. At this point it is important to distinguish between logic values, which we will always list in quotes, and electrical values in volts. For this inverter, for example, logic “0” at the output is nominally Vcesat. This is the cleanest value of “0” that this inverter can produce. It is worth noting that Fig. 2.6 Bipolar inverter also known as RTL inverter. The base resistor is necessary to limit the base current

because Q is on and saturated, then there is nonzero current flowing in both R and Q. To examine the opposite case, assume a very low Vin. Vin should be so low that it is not enough to turn Q on. In this case, we might say that the logic value at Vin is “0”. Q will be off, and no current will flow in either R or Q. Thus, there will be no drop on R and Vout = Vcc. This is again an inversion operation with Vout = Vcc corresponding to the cleanest value of “1” that can be produced at the output. In this case, since Q is off, current will be zero in both Q and R. Table 2.1 is a useful way to summarize information about any logic gate. It lists the logic values at both input and output and the corresponding voltage values. It also lists the region of operation of Q, and the value of current flowing. Three questions remain about the operation of the inverter: • What electrical values would we ideally have liked to see representing “0” and “1” at the output? • Table 2.1 lists specific values for Vin corresponding to “0” and “1”. In the discussion we were not that specific about what input voltage values to use. Where did the values of Vin in the table come from? • Will this inverter function the same way if it is part of a larger circuit? In other words, what happens when other gates are connected to the output of this inverter? The answers to these three questions will be discussed in Sect. 2.4, leading to the conclusion that BJT logic has many detrimental aspects.

2.4

Abandoning BJT

1. Recognize that all gate inputs come from similar gate outputs 2. Recognize the role of base current in limiting bipolar logic gate fan-out 3. Understand the need for isolation trenches in BJT circuits 4. Realize the speed advantage of BJT and its limitation. The answer to the first question we posed by the end of Sect. 2.3 is simple enough. We would like to see 0 V corresponding to “0” and Vcc corresponding to “1” at the output of any logic gate, and the bipolar inverter in Fig. 2.6 is no exception. The only reason we stop at 0 V and Vcc is that they are the lowest and highest voltages available to the

86

2

Ratioed Logic

Fig. 2.7 Acceptable ranges and clean outputs in an RTL gate

circuit. In other words, our aim is to maximize the range between nominal “0” and “1” as much as possible. Digital signals have nominal or “clean” values at the outputs of gates. However, the next gate would accept a range of inputs as valid logic “0” and logic “1”. This is one of the advantages of digital circuits over analog circuits. This will be further explored in Sect. 2.5 when discussing the concept of noise margins. But as shown in Fig. 2.7, there is a range of inputs for an RTL gate that is considered “0” and a range considered “1”. The range for acceptable “1” and “0” represents the amount of noise that the logic gate can tolerate at its input. If the clean “1” output is high, this gives more range for noise. If the clean “0” output is low, this also increases its noise margin. Thus as seen in Fig. 2.7, the ideal values for high and low output from any logic gate are Vcc and 0 V, respectively. The answer to the second question is also fairly simple. The correct electrical values to assume at Vin are the electrical values you obtain at Vout. Thus “1” at Vin should be Vcc and “0” at Vin should be Vcesat. This is because we have to assume that these inputs come from somewhere. And in practice, inputs always come from another logic gate. Thus, as shown in Fig. 2.8, the input to any stage is the output from a preceding stage. This might suggest circular logic where we need Vin to obtain Vout but need to know Vout to be able to figure out the proper value of Vin. In practice, things are much simpler. For example, even though we should strictly be using Vce as logic “0” input, the analysis will be the same if we use any value for Vin that is low enough to cause Q to be off. Thus,

Fig. 2.9 Fan-out of BJT logic. As more stages are connected, more base current is drawn, reducing the logic high output

using 0 V as an input is not “wrong”, since both Vcesat and 0 V lead to the same Vout. The answer to the third question is a little bit more complicated. Figure 2.9 shows a number of inverters using the output of a BJT inverter as their input. Assume there are N inverters connected to the output. Assume also that Vin is “0”, causing transistor Q to be off. When the inverter was unloaded we obtained Vout = Vcc because current in R was equal to current in Q which was zero. In the case shown in Fig. 2.9, the analysis has to change. KCL at the output does not indicate the current in the resistor R is equal to the collector current in the transistor Q. Specifically, there are N BJT bases connected to the output node. These are the inputs of the fan-out inverters. Thus, the current in the resistor R is equal to the summation of the current in Q and the currents in the N bases. So even if current in Q is zero, current in R may not be zero. This has a very important connotation: it means that the nominal value of “1” at the output is no longer Vcc. In fact, the value of “1” will be a strong function of the number of connected output stages. If there is a minimum value for “1” we can tolerate, we can readily find an expression of the maximum number of stages we can connect Vout ¼ VCC  NIB R

Fig. 2.8 The input to any logic stage comes from the output of a similar logic stage. Thus realistic inputs are equal to realistic outputs

where Ib is the base current of a single transistor and N is the number of connected output stages.

2.4 Abandoning BJT

87

If Vmin is the minimum acceptable logic high output: Vmin ¼ VCC  NIB R min N ¼ VCCIV BR Thus, an inverter based on bipolar transistors can provide a “1” output of Vcc only when it is unloaded. There is an upper bound on the number of outputs that can be connected, dictated by the degradation in the value of “1”. Additionally, if the inverter is loaded, then there will be current flow when the output is “1”, leading to power dissipation. Both complications arise because there is base current drawn into the inputs of the following stages. We can already see one reason that MOSFET-based logic has all but supplanted BJT-based logic. MOSFETs draw zero gate current, at least to first order. Thus, we can connect an unlimited number of gates to the output of a MOSFET logic gate without impacting its output values. But this is not the only reason we favor MOSFET in digital circuits. Another important reason is that MOSFETs tend to be more compact than BJTs. To understand why, we must contrast the way multiple MOSFET or BJT is fabricated in the same substrate. Figure 2.10 shows an attempt to create two BJTs in the same silicon wafer. Because the BJT (Sect. 1.13) uses the substrate to build a large collector capable of gathering most emitter current, the two BJTs in Fig. 2.10 necessarily end up with a shared collector. This is certainly very limiting, the majority of our BJT connections are not common collector. Figure 2.11 shows how BJT with separate collectors can be created. To separate the substrate into individual collectors, regions of the opposite type are used to create rings around each BJT. The ring is shown in the cross section in Fig. 2.11 as the large p-type region separating the two collectors. If the p-type region is to properly isolate the two transistors, it must create reverse-biased PN junctions with all surrounding n regions. The safest way to achieve this is to connect the p-region to ground. This creates a large

Fig. 2.10 Two BJT created in the same substrate. By necessity, they end up sharing the collector

Fig. 2.11 Two BJTs with non-common collectors separated into two isolation regions. The p-type ring must be biased to create a reverse PN junction with the two collectors. Connection to ground guarantees this

depletion region between the p-type and either collector, thus isolating the two transistors. The p rings used to isolate npn in Fig. 2.11 consume a very large area. MOSFET, on the other hand, can be created very close to each other since they are “self-isolating” devices. This is illustrated in Fig. 2.12. The source and drain of any MOSFET create PN junctions with the body. These PN junctions are reverse biased for all normal purposes. This creates a depletion layer enveloping each of the source and the drain. Additionally, before a conductive inversion channel can be created between the source and the drain, the MOS vertical structure first has to build a depletion layer under the oxide. The thickness of this depletion layer increases until it reaches a maximum value at which it saturates (Sect. 1.16). As shown in Fig. 2.12, the depletion layers around the source, drain, and channel fuse together creating a depletion layer enveloping the entire device. This depletion layer forms automatically whenever the device tries to conduct. As shown in Fig. 2.12, this allows us to create multiple NMOS devices in the same substrate, with all of them sharing the body terminal. A similar treatment can be developed for PMOS in an n-substrate. The density of integration in Fig. 2.12 is certainly much higher than that in Fig. 2.11. Thus in general, we assume MOSFET can be created and integrated more efficiently than BJT. The above discussion is a little oversimplified. In Chap. 7, we will see that although MOSFETs are isolated from each other in normal operation, they are susceptible to a very dangerous transient phenomenon known as latch-up. In Sect. 7.7, we will in fact go back to using trenches to isolate MOSFETs from each other to solve latch-up. BJT circuits are not susceptible to latch-up. BJT still finds some rare applications in high-speed circuits. In Sect. 2.2, we developed a very rudimentary model for delay. We showed that delay is dependent on the amount of parasitic capacitance as well as the current available to discharge or charge such capacitance.

88

2

Ratioed Logic

Fig. 2.12 MOSFET are self-isolating devices, enveloping themselves in depletion regions before conducting current

BJT circuits are fast for two reasons. First, their parasitics are limited. The parasitics for MOSFET will be developed in detail in Chap. 3. However, the MOS capacitive structure is a very glaring source of capacitance in MOSFET. In BJT, capacitance exists mainly as PN junction diffusion capacitance. Additionally, BJTs provide more current than MOSFET. The best way to observe this is to notice the limitation on on-resistance for MOSFET derived in Sect. 2.2. The BJT on-resistance in saturation (Sect. 1.13) can be much lower due to the abundance of charge carriers in the base. Thus, we still have to justify how we can abandon BJT despite the MOSFET being a much slower device. In fact, the ultimate justification for the use of MOSFETs in digital circuits, like most things in life, lies in economics. The historical scalability of MOSFET has led to great improvement in delay performance as well as substantial investment in CMOS technology. The behavior of scaled MOSFETs will be discussed in Sect. 2.5, while all of Chap. 7 is dedicated to MOSFET technology. Most CMOS processes actually support the fabrication of a special kind of BJT called a lateral BJT. A lateral BJT will have the emitter, base, and collector lying side by side. It resembles a MOSFET without a gate or gate oxide, where the substrate plays the role of the base. Lateral bipolars have dismal properties. The geometry of the device does not allow the collector to envelop the device or the base to be narrow, thus raising base current and reducing common emitter current gain.

2.5

Scaling MOSFET

1. Recall the difference between BJT and MOSFET fabrication

2. 3. 4. 5. 6.

Realize the suitability of MOSFET for scaling Deduce the effect of scaling on MOSFET performance Understand the limitations of general scaling Reject fixed-voltage scaling Deduce the effect of general scaling on MOSFET performance 7. Recognize the regenerative property.

The bipolar transistor does not have the capacitor structure inherent in MOSFET. Additionally, bipolar transistors tend to carry higher currents. The delay equation demonstrates that an increase in capacitance or a decrease in current available to charge/discharge said capacitor will lead to an increase in delay. Both observations suggest that BJT is more favorable than MOSFET in terms of delay. However, the self-isolating property of MOSFET has made it more friendly to scaling down. A critical effect of scaling down dimensions is an improvement in delay. Thus despite the inherent delay in MOSFET, it has been able to get ahead of BJT due to scaling, making it the transistor of choice for digital circuits. Self-isolation means that MOSFET can be manufactured in a much smaller area than a corresponding BJT. Additionally, self-isolation has allowed MOSFET dimensions to improve (decrease) with time. At any one point in time, the smallest transistor length (L) that can be manufactured is called the technology. Transistors can be manufactured with channel lengths greater than or equal the technology, but not smaller. Thus, a 90 nm technology chip means that all transistor channel lengths must be greater than or equal to 90 nm and the technology parameter (minimum channel length) will be called Lmin. Moore’s law states that transistor counts on a chip double every 1 to 2 years. One reason for this is that transistor channel lengths (a.k.a technology) halves every 2 years. Thus, 90 nm technology evolved into 65 nm then into 45 nm. And 45 nm into 33 nm and then 22 nm. This means that every 1 year to 2 years, a new technology appears and we can migrate all our digital circuits to that new technology. This will lead to a change in all performance parameters for these

2.5 Scaling MOSFET Table 2.2 Technology scaling for MOSFET. S is the ratio of scaling down dimensions. K is the ratio of scaling down potentials, if it is different

89 Metric

Full scaling

Constant voltage scaling

General scaling

Dimensions W, L, tox

1/S

1/S

1/S

Voltages (Vdd, Vds, Vgs, Vth)

1/S

1

1/K

Area

1/S2

1/S2

1/S2

Cox = eox/tox − > 1/tox

S

S

S

Cgate = Cow  W  L

1/S

1/S

1/S

I => (W/L)CoxV2

1/S

S

S/K2

Delay =>CgateV/I

1/S

1/S

K/S2

Power = VI

1/S2

S

S/K3

2

Power density = Power/Area

1

S

S3/K3

Power delay = Power  Delay

1/S3

1/S

1/(SK2)

circuits, not because their designs have changed, but because they are implemented using a different technology. Table 2.2 is an excellent tool to track how scaling affects MOSFET performance. The first column, with the header “full scaling,” assumes all potentials and dimensions are scaled by the same ratio S. That is to say when migrating from 90 nm to 45 nm technology, all dimensions are divided by S = (90/45) = 2, this includes not only channel lengths, but also channel widths, oxide thickness, and the dimensions of the source and drain. For the first column, we will also assume all voltages (supply and thresholds) scale by the same factor S. The scaling of Cox, Cgate, and the channel current (I) can be derived through their dependence on voltages and dimensions. For example, Cox is only inversely proportional to oxide thickness, and thus as oxide thickness scales down by S, capacitance per unit area scales up by S. Delay is directly proportional to the capacitance (in this case, assumed to be full gate capacitance) and inversely proportional to available current. In the first column in Table 2.2, Cgate scales down by S, current scales down by S, and voltage scales by S. Thus, delay scales down by S. Thus, even though BJT is inherently faster than MOSFET, the latter can quickly catch up due to scaling. Power is an equally important metric to delay, because most applications of digital electronics involve battery-powered mobile platforms. Power density is a representation of how much power is dissipated in each millimeter which is directly related to heating in the chip. In full scaling, the power density remains the same, requiring no more cooling effort. Power–delay product is a critically important parameter. The delay in this metric is the time the circuit takes to produce one output. Thus, the power–delay product is the energy the circuit consumes to produce one output. Because batteries hold on to a finite amount of energy, power–delay product is an excellent indication of how fast the battery will

3

run out of charge. It indicates how many outputs can be obtained from the circuit on a single full battery. However, full scaling (the first column in Table 2.2) is a theoretical concept and is not currently, nor has it been for a long time, practically usable. To understand why, we have to review why digital circuits are favorable, where their use is possible, to analog circuits: • Digital circuits are regenerative. In an analog circuit, the signal to noise ratio at the input is always higher than the signal to noise ratio at the output. Thus an analog circuit can never improve the quality of an incoming signal. Digital circuits can regenerate input values. For example, Fig. 2.13 shows a chain of inverters where the output of the first stage is a noisy logic value. As the signal progresses through the chain, it progressively becomes “cleaner”, getting closer to the nominal value. Digital circuits can actually extract the clean signal out of noise and magnify it at the output. Chapter 3 will expand on this concept, studying why CMOS circuits can do this and under what conditions • Binary digital circuits do not deal with continuous values. We already discussed this in Sect. 2.4, but it has to be reiterated because it is tightly related to the regenerative property. Nominally a binary gate can only accept one of two values at its input, a logic “0” or a logic “1”. However, the electrical values corresponding to each logic value are not specific voltages, rather a range of acceptable values, Fig. 2.14. Thus, a gate might expect 0v at its input as a nominal representation of logic “0”, however, it will accept a full range of electrical values as valid “0” inputs. The range of values that a gate accepts as an input is called the noise margin. In response, the gate will produce a clean or nearly clean logic output. This is related to the regenerative property and together they represent the incredible noise immunity that has allowed digital circuits to dominate.

90

2

Ratioed Logic

Fig. 2.13 Digital circuits regenerate noisy values. After a few stages, a noisy value becomes clean. How fast regeneration happens depends on voltage transfer characteristics (Sect. 2.6)

Fig. 2.14 Digital ranges. Inputs to digital gates can cover large ranges and still be accepted. The ranges are known as noise margins, and they are the main advantage of digital circuits

In Fig. 2.14, the range available as valid “0” and “1” inputs would at best span the range between 0 V and Vdd. Thus, a drop in Vdd would directly translate into a shrinkage of absolute noise margins, the ranges available for “0” and “1” at input. Notice most sources of noise do not scale down as technology scales. Thus, a reduction in noise margins would leave us more open to noise. Full scaling means that supply voltage scales at the same rate as dimensions. Historically dimensions have scaled by a factor of 2 every 18–24 months. Over the years, this adds up to a very large overall scaling value. In the early seventies, a processor with 10 µm devices might have used a supply no higher than 20 V. If supplies had scaled down at the same rate as dimensions, we would currently be using supplies in the few tens of millivolts, which is untenable. The second column in the table, headed “Constant Voltage Scaling” shows another option for scaling where we assume all voltages remain constant. This addresses the issue of noise margin shrinkage, while scaling delay even better than full scaling. This is because this type of scaling will lead to an increase in current as well as a decrease in capacitance. Note that while power increases for constant voltage scaling, power–delay product, which is energy, still scales down, which means that there is still an improvement in battery life. However, the rise in power means that there will be a very substantial rise in power density, which reflects on heat and thus the necessity for cooling.

Constant voltage scaling suffers from another hidden problem that makes it impractical. There are two electrical fields in a MOSFET: the field across the oxide between the gate and the channel; and the field across the channel between the source and drain. The field across the oxide is a function of Vg/tox. In full scaling, this field scaled by 1. In constant voltage scaling, the field will scale by S, which means it increases linearly as we scale down. These relationships also hold for the lateral field in the channel. All dielectrics have a maximum field they can tolerate beyond which the dielectric breaks down and starts conducting. Usually, this breakdown process is irreversible and spells a catastrophic loss of the device. In constant voltage scaling, we are risking reaching breakdown values by keeping voltages constant. This will quickly cause the oxide to breakdown and can cause punchthrough in the channel (Chap. 10). The last column of Table 5.1 shows the most general scaling technique. It allows for different scaling factors for dimensions and voltages. This allows voltages to not remain constant, thus backing fields off from breakdown values. It also allows voltages to scale down at a slower pace than dimensions, thus preserving noise margins. Typically, K, the voltage scaling factor, is much smaller than S, the dimension scaling factor. The effect on metrics falls somewhere between full scaling and constant voltage scaling depending on the value of K. Notice that constant voltage scaling is equivalent to general scaling when K = 1, and is equivalent to full scaling when K = S.

2.6

What is a Logic Family

1. Understand the conceptual definition of a logic family 2. List the metrics that characterize the behavior of a logic family 3. Recognize the importance of VTC 4. Draw the VTC of an ideal inverter 5. Draw the VTC of a realistic inverter 6. Define important points and ranges on the VTC 7. Recognize the limitations of a static model. A logic family is a way to build logic gates. It produces gates with clearly similar architectural features. It also produces

2.6 What is a Logic Family

91

behaviors and characteristics that are very consistent. Gates that belong to the same logic family can usually pass their outputs as inputs to gates from the same family without any intermediate conditioning circuits. Understanding the architecture and characteristics of a logic family always starts by studying the inverter of the family. Once the inverter is well understood, all concepts can be extended to other gates. The characteristics that gates in a family share include • • • •

Area Static current flow patterns and power dissipation Switching and dynamic current behavior/ delay Voltage transfer characteristics (VTC) including critical voltage levels and ranges.

Area can be calculated accurately by examining the layout of a circuit (Chap. 8). However, in many cases, we will approximate the area of the circuit as the summation of the areas of its transistor gates. The area of the transistor gate is roughly W * L. In comparison to Chap. 8, this area is extremely small. It ignores wiring area, source and drain areas, design rule separations, and contacts. However, the approximation is serviceable when comparing the areas of similar gates from the same logic family. It should never be used as an indication of the absolute value of the area. Static current is the steady-state or DC current that flows once inputs and outputs have had enough time to settle. Static current is important because it is directly proportional to static power drawn from the supply. The value of the static power of a gate can be calculated as Pst ¼ Ieffective VDD Ieffective ¼ effective current drawn from supply ¼ I H aH þ I L aL where IH ¼ current flow from supply when output is high IL ¼ current flow from supply when output is low aH ¼ proportion of time output is high aL ¼ proportion of time output is low Dynamic behavior considers what happens when the input, and thus the output, of a gate switches state. While switching, the gate will have to contend with capacitance at its output node. This means that the switching operation cannot be instantaneous. A dynamic current has to flow and take a finite time to charge/discharge capacitors. This leads to power dissipation and delay. Sections 3.4 through 3.7 deal with these issues in detail for CMOS circuits. The voltage transfer characteristics, or VTC, of an inverter is a very useful tool in analyzing both its steady-state behavior and its characteristics in the transition

Fig. 2.15 VTC of an ideal inverter. The brick pattern leads to maximized noise margins and a null transition region

region. The VTC is a formalized way to represent how the voltage range is divided between “0” and “1”. It is a graph of Vout on the y-axis and Vin on the x-axis. Figure 2.15 shows the VTC of an ideal inverter. The VTC is divided into three main regions. The range between Vin = 0 V and Vin = Vm is the range of acceptable “0” inputs. For the whole range, the inverter will produce an output of Vdd. The range between Vin = Vm and Vin = Vdd is the range of acceptable “1” inputs. For the whole range, the inverter will produce output of 0 V. The point Vin = Vm, and the sharp vertical drop is a transitional region between the two input ranges. In the ideal inverter in Fig. 2.15, the noise margins for “0” and “1” divide the entire range between 0 V and Vdd, and the transition region is zero width. Figure 2.16 is a VTC of a more realistic (though arbitrary) logic family inverter. The VTC is obviously not as sharp as that of the ideal inverter. This causes the transition region between “0” and “1” inputs to widen, squeezing the noise margins tighter. To understand any VTC in a systematic fashion, we must define important points of interest: • Voh, the logic high output voltage, obtained when Vin is Vol. Ideally, this should be Vdd. This is the noiseless “1” output of the gate and it occurs if the previous gate produces a perfectly “clean” logic “0” • Vol, the logic low output voltage, obtained when Vin is Voh. Ideally, this should be 0 V. This is the noiseless “0” output produced by a gate when the previous gate produces a clean “1” • Vm, the logic threshold (Fig. 2.17), not to be confused with MOSFET threshold Vth. Vm is defined as the point where Vout = Vin. This point is significant because for any input point to its left Vout > Vin, and for any point to its right, Vout < Vin. Although we might be tempted to

92

2

Ratioed Logic

• The noise margin available to logic “0” input is given the acronym NML for noise margin low. Similarly, the noise margin available to “1” as an input is called NMH. From the curve, it is clear that NMH = Voh−Vih and NML = Vil−Vol. The definition of noise margins above is not immediately obvious. The following questions are evident:

Fig. 2.16 VTC of a realistic inverter. Noise margins are reduced by imperfect noiseless outputs and a widened transition region

Fig. 2.17 Different definitions of noise margins. The unity absolute slope points are critical in defining the limits of the transition region

use Vm as the boundary between input “0” and “1”, this would lead to bad behavior. A range of voltages around Vm should be banned as input voltages. • Vil and Vih, the two points on the VTC where the slope is −1, or the two points where the VTC curve shows a discontinuity (as in the case of the ideal inverter). These two points are critical in defining the true noise margins available at the input of an inverter. They also delimit the high-gain region. All inputs higher than Vil and lower than Vih will observe absolute gains in the VTC larger that 1

1. If noise margins are defined at inputs, why are we also using values for Voh and Vol to calculate them? 2. The definition of noise margins leaves a range of voltages between Vil and Vih where inputs are accepted as neither “0” nor “1”, what is this range and why are inputs in it not acceptable? 3. We defined Vm as the borderline between “0” and “1” inputs. So why did we not use it in the calculation of noise margins? The answer to question 1 is simple. Voh is used as a measure of “clean” “1” input. The high noise margin is the range of voltages where the input of a gate is accepted as a “1”. Because we always assume that inputs to a gate of a certain family must be outputs from a gate from the same family; if the previous gate produces a noiseless “1” output, then it is providing the current gate with an input of Voh. In the above case, what is the maximum magnitude of noise that the current gate can tolerate this “clean” “1”? Because the lowest acceptable level for “1” is Vih, then the maximum magnitude of noise, or the noise margin, is Voh−Vih. A similar reasoning can be derived for NML. The answers to questions 2 and 3 are related. The range between Vil and Vih is the high-gain region. It is the region where the absolute slope of the VTC is greater than 1. This region is also called the transition region. Vm strictly speaking delimits “0” and “1” inputs. Every input Vin < Vm is characterized by producing an output larger than the input. We can make a similar observation for every input Vin > Vm. The range Vil < Vin < Vm is thus theoretically a “0” input. While the range Vm < Vin < Vih is theoretically a “1” input. However, both ranges are considered forbidden and inputs within them are not accepted. We do not yet have the tools to understand why we strictly avoid inputs in the transition region. However, in Sect. 3.2, we will develop an intuition about the extreme sensitivity of inputs in this range to noise. In Chap. 6, we will understand how this relates to the behavior of latches and registers. Section 13.8 gives the ultimate justification for this question, illustrating the failures that result from giving inverters inputs in the transition region.

2.6 What is a Logic Family

The high-gain transitional range between Vil and Vih is very important. If the inverter does not have a range with gain, then its family will lose a lot of the advantages of digital circuits. The best transitional range should have very high gain (to increase regeneration) and should be narrow (to increase noise margins). The ideal inverter (Fig. 2.15) has a transition region with zero width and infinite gain. This is the yardstick against which every VTC should be measured.

2.7 1. 2. 3. 4.

Resistive Load Inverter

Derive the output values for a MOSFET RTL inverter Fill the truth table for the resistive load inverter Model the inverter as a voltage divider Understand the impracticality of a passive load.

Figure 2.18 shows a MOSFET-based inverter with a passive resistance load. As discussed in Sect. 2.6, the inverter is the best representative of a logic family. The family in Fig. 2.18 is called the resistive load inverter family, or MOSFET-based RTL. The most critical points on the VTC in Fig. 2.16 are Voh and Vol. We will always start the treatment of a logic family by studying these two static values. When favorable behavior is observed on the two, we may proceed to evaluate other VTC features. It is always easier to start by evaluating Voh. By definition, Voh is the output when the input is Vol. However, we have not yet found the value of Vol. To find the value of Vol, we need the value of Voh. Thus, there is a circular dependency that must be broken. To break this pattern, assume Vin = 0 V and proceed to find Voh. We will shortly show why using Vin = 0 V is as good as using Vin = Vol. This approach can be used for most logic families. So always start by evaluating Voh with Vin = 0 V. Fig. 2.18 Resistive load MOSFET inverter

93

In Fig. 2.18, if Vin = 0 V, the transistor M is cutoff because Vgs = Vin < Vth. The current through M will be zero. The output will go to a logic gate from the same family; thus it will connect to a MOSFET gate, and it will draw no current. Thus by KCL at the output node, the current through transistor M is equal to the current through resistance R and Voh = Vdd−IR = Vdd. Notice this is the best possible value for Voh and it is due to the fact that no current flows in the resistance. This in turn is because the next stage draws no current. The above analysis is valid whether Vin = 0 V or Vin = Vol. In fact, the analysis is valid as long as M is cutoff, which happens if Vin < Vth. Since a properly designed inverter must have Vol < Vth, then we do not need to know Vol to find Voh. On the other hand, finding Vol requires knowing Voh. Fortunately, we have already obtained Voh = Vdd. Now assume Vin = Voh = Vdd. Since Vdd > Vth, transistor M is necessarily on. We still do not know whether it is in the saturation or ohmic region of operation. To solve this issue, we normally assume the transistor is in either region, solve, then check that the terminal voltages of the transistor are consistent with the assumption. However, if we use intuition based on the nature of the circuit, we can often make a smart assumption that might save us the need to iterate. Since the gate in Fig. 2.18 is an inverter, then when input is high and the output must be low. In fact, output should be as close to ground as possible. Notice that Vin is Vgs and Vout is Vds for M. When Vin = Vdd, Vgs is also Vdd. If we expect Vout to be low, then it certainly cannot be high enough to allow the transistor to saturate. Thus it makes sense to assume the transistor is ohmic: Vout ¼ Vds ; Vin ¼ Vgs For saturation : Vds [ Vgs  Vth i:e:Vol [ VDD  Vth This would be a very bad design. Thus the transistor is very unlikely to be saturated. So to find the value of Vol, we assume the transistor is ohmic. Next, we perform KCL at the output node. This is something we will repeatedly do across all logic families to evaluate static behavior. By equating the transistor current to the resistor current, we obtain a quadratic equation in Vol. IM johmic ¼ IR 2 KððVgs  Vth ÞVds  Vds =2Þ ¼ ðVDD  Vout Þ=R 2 KððVin  Vth ÞVout  Vout =2Þ ¼ ðVDD  Vout Þ=R KððVDD  Vth ÞVol  Vol2 =2Þ ¼ ðVDD  Vol Þ=R This equation solves for two values of Vol. Only one value will be “accepted” because only one solution will satisfy the three conditions: 1. Voltage causes M to be ohmic 2. Vol is above ground 3. Vol is below Vdd.

94

2

Table 2.3 Truth table for resistive load inverter. Electrical truth tables list electrical values and regions of operation as well as logic values Logic in

Vin

M region

Current from supply

Logic out

Vout

“0”

Vol

Cutoff

Zero

“1”

Vdd

“1”

Voh = Vdd

Ohmic

Nonzero

“0”

Vol

Table 2.3 summarizes the static characteristics of the MOSFET RTL inverter. Notice that there is no current flow when output is “1”. There is current flowing when the output is “0”, leading to finite static power dissipation. Comparing this inverter to the ideal inverter, we notice that it produces the ideal value for Voh = Vdd. However, its value for Vol is not at the ground. Examining the equations, we see that we have options to manipulate the value of Vol. The two parameters in the quadratic equation available to the designer are K and R. More accurately W/L and R. All other parameters (threshold voltage, supply voltage, capacitance per unit area, and mobility) are set by technology. The question then becomes what we do with K and R to lower the value of Vol. The answer is pretty obvious if we recall that a transistor in the ohmic region behaves as a resistance, albeit a nonlinear resistance. This leads to the view of the circuit in Fig. 2.19 where the two resistances, the load and the transistor channel, are performing a voltage divider on the supply. Figure 2.19 shows that to lower the value of Vol, we have to either lower the value of the ohmic resistance of M, or raise the value of the load resistance R, or both. So how do we lower the ohmic resistance of the transistor. Would raising the aspect ratio of the transistor W/L lower or raise resistance? We notice that increasing W/L raises K. Raising K increases the available current for the same applied voltage. Resistance is inversely proportional to the available current. Thus increasing W/L reduces resistance. This makes sense since increasing W/L involves either increasing W or lowering L or both. Increasing W increases Fig. 2.19 Voltage divider producing Vol. RM is the equivalent resistance of the MOSFET. R is the resistance of the load

Ratioed Logic

the cross section of the channel. Lowering L, lowers the length of the channel resistance. Both actions would naturally reduce channel resistance. In general, when we increase W/L of a transistor, we say that we have given that transistor more current drive capability, that is we allow it to drive more current at the same voltage. We also say that we have made the transistor stronger, in other words, we raised its available current, and we reduced its resistance. When the output is “1”, the transistor is cutoff. This denies current flow in the load and leads to zero current being drawn from supply. Thus, when the output is “1”, power dissipation in the inverter is null. Static power is entirely due to “0” output. If we assume that the input to the inverter is equally probable to be “0” or “1”, then the output is “0” half the time. Finding the value of average power dissipation is simple. Once Vol is found, substitute it in the current equation of either the load resistance or transistor, thus obtaining current drawn from supply. This draws from the fact that current in the supply is equal to current through the load, which is also equal to the MOSFET current. Power can then be obtained by multiplying this current by the supply value Vdd. We will repeat this analysis in detail in Sect. 2.11. Notice that one price we could potentially pay for lowering Vol is increasing power dissipation. Lowering Vol is generally achieved by drawing more current to leave more drop on the load. This leads to more power being drawn from the supply. We will find similar trade-offs in circuit design for the rest of the book. Whenever we try to improve a metric, it is usually at the expense of another equally important metric. The RTL inverter discussed in this section is easy to understand and helps illustrate important concepts that will be used with other logic families. However, it is not practical. We have to use a passive load. This passive resistance also has to be large to maintain an acceptable value for Vol. Large passive resistances are impractical to fabricate on ASICs. We have to use active loads in place of R. This introduces us to three very important logic families in Sects. 2.9, 2.14, and 2.15. While the discussion above might suggest we can never fabricate passive resistors, we can, and we do. In Sect. 13.5, we will see a case where we intentionally create a passive resistor. And in most of Chaps. 12 and 13, we will deal with the consequences of parasitic resistances of silicon and metal wires. The problem is we cannot create passive resistors whose resistance is large enough in an acceptable area. Active loads are much more efficient at creating inverter loads.

2.8 Open-Circuited Transistor

2.8

95

Open-Circuited Transistor

Fig. 2.21 Open-drain NMOS

1. Distinguish a cutoff transistor from an ohmic transistor with zero current flow 2. Understand the behavior of a diode-connected MOSFET 3. Understand the behavior of an open-drain transistor It is important to distinguish two cases where current does not flow in a transistor. The first case is when the transistor is cutoff. This requires that there be no channel between the drain and source. Current flow in this case is zero regardless of the applied drain to source voltage. The transistor is cutoff when its gate to source voltage is lower than the threshold voltage. A transistor can also have zero current flow even if it is not cutoff. In this case, the transistor has a channel and can potentially conduct current. However, there is zero source to drain voltage, and thus there is no lateral field, no drift, and no current. The two cases are very different. In the first case, there is no channel. In the second case, there is a channel. In the first case, drop is defined by the external circuit. In the second case, voltage drop over the source and drain must be zero, otherwise current must flow. The transistor in Fig. 2.20 is diode connected. This means its drain is shorted to its gate. The naming “diode-connected”, is a holdover from BJT, where shorting the base to the collector causes the BJT to look and behave like a diode. The source of the MOSFET in Fig. 2.20 is open circuited. Because of the open circuit in Fig. 2.20, there can be no current flowing through the transistor. We can also guarantee that if on, this transistor is saturated. This is because the gate and drain are shorted. Thus Vgs = Vds and thus by necessity, Vds > Vgs−Vth for all positive values of Vth. To find the value of Vs under these conditions: K ðVgs Vth Þ Isat ¼ 2 Vgs ¼ Vth

2

¼0

VDD  Vs ¼ Vth Vs ¼ VDD  Vth

Fig. 2.20 Diode-connected NMOS with open source

This indicates that the source is just at the level where Vgs = Vth. If Vs were any lower, the transistor would turn on. But it cannot be any lower, because if it turns on in saturation, it will cause a finite current to flow, which the open source cannot allow. Thus, because the transistor is diode connected, it was forced to go to the edge of cutoff by the open circuit. If the transistor in Fig. 2.20 is a switch, with its gate being the control terminal, then it fails to pass a healthy logic “1”. Think of the Vdd on the drain as a “1” input to the switch, the switch only manages to pass Vdd−Vth to the source. This failure of the NMOS to pass a “healthy” “1” will plague the rest of this chapter and will push us to adopt complementary circuits where PMOS transistors are used. The transistor in Fig. 2.21 has an open circuit on the drain. Its source is grounded and its gate is connected to supply. This transistor is obviously on since its Vgs is Vdd. However, it cannot have any current flow due to the open circuit. The only way this could be true is if the transistor were ohmic with zero drop across the channel. This is the exact behavior of an open-circuited resistor. As the equations show, the nonlinearity of the MOSFET resistance has no effect on this behavior: 2 I ¼ KððVgs  Vth ÞVds  Vds =2Þ ¼ 0 2 =2Þ ¼ 0 I ¼ KððVDD  Vth ÞVds  Vds

Solved only by Vds ¼ 0 Vd ¼ Vs ¼ 0 Note that if we had assumed the transistor in Fig. 2.21 was saturated, then the fact that Vgs−Vth does not equal zero would have yielded a nonzero value for current. This would have immediately violated the open circuit, proving the assumption wrong. In Fig. 2.20, it is obvious that Vs cannot be any lower than Vdd-Vth, otherwise current would have to flow. But can Vs be higher than Vdd−Vth? The answer is yes. The situation in Fig. 2.20 is solved by any value of voltage on the source that causes the MOSFET to be

96

2

cutoff. Thus, the true solution is that Vs >= Vdd−Vth. However, when the structure in the figure is used in circuits, it will always yield Vs = Vdd−Vth. The source node Vs starts at a very low voltage, close to ground. The node can then charge up to a higher voltage. This charging takes place through the diode-connected transistor in Fig. 2.20. The transistor cannot continue to provide charging current above the voltage Vs = Vdd−Vth, and thus the node stops at this voltage.

2.9

Enhancement Load Inverter

1. Contrast the enhancement load inverter to the resistive load inverter 2. Derive Voh and Vol for the enhancement load inverter 3. Understand the effect of transistor aspect ratios on Vol 4. Recognize the fundamental limits of ratioed logic. Figure 2.22 shows a new inverter representing a new logic family. The family is traditionally called enhancement load logic. The family is characterized by consisting entirely of NMOS, making its fabrication easy. The word “enhancement” indicates that the load is a traditional NMOS as opposed to the depletion NMOS load in Sect. 2.14. The family is sometimes also called NMOS logic or NMOS only logic. The structure of the enhancement load inverter is very similar to the resistive load inverter. Specifically, there is a transistor called the driver, MD in the figure. The driver accepts the input Vin at its gate. What distinguishes this family from the one in Sect. 2.7 is the load. In the resistive load inverter, the load was a passive resistor. In the enhancement load family, the load is a diode-connected NMOS transistor. The way we analyze the inverter for its Voh and Vol is identical to Sect. 2.7. As discussed in Sect. 2.7, it is always easier to start by estimating Voh. This can be done by assuming Vin = 0 V.

Fig. 2.22 Enhancement load inverter. The resistive load is replaced by a diode-connected NMOS

Ratioed Logic

Refer to Sect. 2.7 for a discussion of why this does not violate the definition of Voh as the output when Vin = Vol. Because Vin < Vth, MD will be cutoff. The current through MD is equal to the current through ML, and both are equal to null. Thus the load transistor ML observes an open circuit on its source, and its drain and gate are diode connected. The above description is identical to Fig. 2.20. This will yield to a source voltage on ML of Vout ðVin ¼ 0Þ ¼ VDD  VthL VthL is the threshold voltage of the load transistor ML. We are marking it to indicate that the drop in Vdd is due to the load rather than the driver. In general, we can assume all NMOS transistors have the same threshold voltage. But if we take body effect into consideration (Sect. 1.20), the threshold voltage of the load transistor ML will be higher than that of the driver. To find Vol, we assume the input is Voh and solve. We already found Voh to be Vdd−Vth. We perform KCL at the output node, which always ends up causing us to equate the current drawn below the output to the current sourced above it: Iup ¼ Idown Next, we find each of these currents in terms of transistor currents. For the inverter in Fig. 2.22, each of these currents is equal to a single MOSFET current: Iup ¼ IML Idown ¼ IMD Next, we assume or make reasonable conclusions about the regions of operation of the two transistors. ML is saturated by virtue of the diode connection. Its source is at Vout, and thus its current is IML jsat ¼

KL :ðVDD  Vol  VthL Þ2 2

In Sect. 2.7, we concluded that when evaluating Vol, the driver must be assumed ohmic. Otherwise, the inverter would be very badly designed. The driver current in ohmic region is IMD johmic ¼ KD ððVDD  VthL  Vth ÞVol  Vol2 =2Þ ¼ 0 Note that for Vgs = Vin, we are using Vin = Voh = Vdd−VthL. The threshold voltage for the driver is marked Vth. Equating the two currents: KL :ðVDD  Vol  VthL Þ2 ¼ KD ððVDD  VthL  Vth ÞVol 2  Vol2 =2Þ

2.9 Enhancement Load Inverter

97

This is a quadratic equation in Vol. It can be solved, giving two solutions, only one of which would satisfy 1. MD ohmic 2. ML sat 3. Solution lies between GND and Vdd The enhancement load inverter is superior to the resistive load inverter because it uses an active load instead of a passive load. The active load is diode-connected ML. The quadratic equation above indicates Vol is a function of both KL and KD. This is analogous to the discussion in Sect. 2.7, where the inverter is modeled as a voltage divider. If Vol is a voltage divider between ML and MD, then to lower the value of Vol, we need to raise the resistance of ML and lower the resistance of MD. As discussed in Sect. 2.7, this can be done by making MD “stronger” and ML “weaker”. Making MD stronger requires raising W/L, which is normally done by increasing W. Making ML weaker involves lowering W/L, which requires increasing L. In both cases, this would increase the area of the gate. Thus, there is a fundamental trade-off between Vol and area. Vol is not in fact a function of either KL or KD on their own, but rather of their ratio: KL/KD. Any logic family where either or both of its nominal logic output values is a function of the ratio of K’s is called ratioed logic. In fact, we will see that any logic family whose inverter consists of a driver and a load will be a ratioed logic family. For the rest of this chapter, we will try using different types of loads in an effort to simultaneously have Voh = Vdd and Vol = 0 V. We will conclude this is futile using driver–load architecture. The enhancement load inverter suffers from a major drawback. Not only is its Vol > 0, but also its Voh < Vdd. This is due to the Vth drop due to the open-source diode-connected arrangement of the load.

2.10

Enhancement Load VTC

1. Understand the general method for deriving a VTC 2. Derive the major points on the enhancement load VTC 3. Understand regions of operation for the driver in the transition region 4. Divide the VTC range by region of operation 5. Prove the peculiar behavior of enhancement load at Vil. Deriving the VTC of any inverter follows a systematic approach. First, define the two known points on the curve: (Vin = Vol, Vout = Voh) and (Vin = Voh, Vout = Vol). Then define the regions of operation of all transistors based on Vin and Vout and mark them on the axes. Finally, connect the graph, changing its shape whenever the region of operation changes.

Fig. 2.23 VTC of enhancement load inverter. The Vout = Vin-Vth line delineates the range where MD is saturated from the range where it is ohmic

Enhancement load VTC is shown in Fig. 2.23. The known points are (0, Vdd) and (Vdd−Vth, Vol). The following is also known: • • • •

Vin is Vgs for MD ML is always sat At Vin = Vdd−Vth MD is ohmic At Vin = 0, MD is cutoff.

MD will remain cutoff until its Vgs = Vin = Vth. Thus, the situation that causes Vout = Vdd−Vth will remain the same until Vin = Vth. Thus, the output will remain constant at Vdd−Vth from Vin = 0 to Vin = Vth. At this point, MD turns on. Thus at (Vth, Vdd−Vth) MD is cutoff and ML is sat, while at (Vdd−Vth, Vol) MD is ohmic, and ML is sat. The question then is what happens to MD in the transition region: Does it change from cutoff to sat to ohmic, or directly from cutoff to ohmic? The curve has to be continuous. Thus, when MD initially turns on, it observes Vout = Vdd−Vth. Since Vout is also Vds for MD, and since Vdd−Vth is a high voltage value, then MD most likely turns on in saturation. Recall that NMOS is in saturation when its drain is at a high voltage causing the channel to pinch off at the drain end. The boundary at which MD turns from sat to ohmic can be found by sketching the region of operation inequality. MD is sat if Vds > Vgs−Vth which translates to Vout > Vin−Vth. To draw the inequality, first draw the corresponding equation Vout = Vin−Vth, a straight line with unit slope and positive x-intercept of Vth. All the domain lying above this line is MD saturated, below it is MD ohmic. Thus the transistor is saturated until its VTC meets the inequality line,

98

2

at which point it turns ohmic. This allows us to sketch the entire VTC as in Fig. 2.23. Finding points of interest (Sect. 2.6) on the curve is as easy as finding Voh and Vol. Again we equate the current above with the current below. The current above is still the current of ML and the current below is the current of MD. The only difference lies in the regions of operation. For example, to find Vm, we have to find the point at which Vin = Vout. This is the point at which the VTC intersects the line Vout = Vin. Since the line Vout = Vin lies above the line setting the boundary between saturation and ohmic, then at the point Vm, MD must be saturated, and thus IMD jsat ¼ IML jsat KD ðVin  Vth Þ2 ¼ KL ðVDD  Vout  VthL Þ2 This equation has two unknowns Vin and Vout. However, notice that this equation does not represent the logic threshold point only. It represents the entire range where both transistors are saturated. To find the threshold point only, we apply the condition Vin = Vout = Vm: KD ðVm  Vth Þ2 ¼ KL ðVDD  Vm  VthL Þ2 We can now solve the quadratic equation for Vm. Only one of the two solutions allows both transistors to be saturated while lying between supply and ground. If the two threshold voltages are equal,   VDD þ KKDL  1 Vth Vm ¼ KD KL þ 1 Vih is the point where the graph has a slope of −1. Since this point is by definition higher than Vm, then it must be a point where MD is ohmic. The current KCL equation is thus KL 2 :ðVDD  Vout  VthL Þ2 ¼ KD ððVin  Vth ÞVout  Vout =2Þ 2 Again, there are two unknowns. And again there is a piece of information we are not using. Vih is not only defined by MD being ohmic and ML being sat. It is also defined by the slope being 1. Differentiating with respect to Vin: 0

0

KD Vout þ KD ðVin  Vth ÞVout  KD Vout Vout 0 ¼ KL ðVDD  Vout  VthL ÞVout And substituting for dVout/dVin = −1: KD ð2Vout  ðVin  Vth ÞÞ ¼ KL ðVDD  Vout  VthL Þ This is the second equation in Vin and Vout which can be solved simultaneously with the original equation to obtain Vin = Vih. Vil is a curious case for the enhancement load inverter. The curve in Fig. 2.23 indicates a sharp corner for the curve

Ratioed Logic

at Vin = Vth. Thus, there is no point for Vin < Vm where the slope is −1. Because the magnitude of the slope of the curve for all Vin > Vth is greater than 1, we will consider Vil to be Vth for the enhancement load inverter only. We still do not understand why the curve takes such a sharp turn in this VTC and whether this would happen in other VTCs. The sharp corner happens at Vin = Vth. At this point, the regions of operation switch from MD cutoff, ML saturated for Vin < Vth; to MD saturated, ML saturated for Vin > Vth. For Vin < Vth, the VTC is a straight line Vout = Vdd−VthL. For Vin > Vth and as long as the driver is saturated, the curve is also linear. This can be shown by examining the current equation for the sat–sat region. This was written above while deriving Vm and is replicated here: KD ðVin  Vth Þ2 ¼ KL ðVDD  Vout  VthL Þ2 This is not the equation of a parabola. It is the equation of two lines. Only one of these lines will have a negative slope, and thus can be part of the VTC. Because the curves before and after Vin = Vth are two linear sections, then the transition between the two has to be a sharp corner rather than a smooth curve.

2.11

Static Power

1. Recognize that static power must assume statistics on the inputs 2. Understand that power is always related to current drawn from supply 3. Derive the average static power dissipation in the enhancement load inverter. Static power is the power drawn in the steady-state. We generally use the term to refer to large-scale DC currents drawn from supply once inputs and outputs have had enough time to settle properly. This is a source of power dissipation of significant historical importance but is negligible in modern logic families. A logic gate has variable inputs. The input switches between “0” and “1”. Thus to calculate static power, we have to make assumptions about the statistics at the input. In other words, we have to assume what percentage of the time the input is settled at “0” and what percentage it is settled at “1”. It is fair to assume, in the absence of additional information, that output (and thus input) of the inverter is “0” half the time. In Fig. 2.24, any DC current will dissipate power in both the resistance RL and the resistance of the channel of M. If we insist on calculating power by adding the power dissipated in resistances, we will have to deal with increasing complexity for larger gates.

2.11

Static Power

99

2.12

NAND and NOR Enhancement Load

1. 2. 3. 4. 5. 6. Fig. 2.24 Inverter current when input is Voh (left) and Vol (right)

It is much easier to use the fact that DC power dissipated is always equal to DC power drawn from supply. The power drawn from supply is equal to the product of the current and the supply voltage: Pdrawn ¼ Idrawn VDD The current I drawn depends on the value of input voltage. In Fig. 2.24, right, the input is Vol. M is cutoff, and thus no current flows anywhere in the circuit. In this case, the power dissipation with high output PH is PH ¼ Idrawn VDD ¼ 0:VDD ¼ 0 When the input is Voh = Vdd−VthL, the current flowing through either MD or ML (both are equal) is also the current drawn from the supply. This is a nonzero quantity: Idrawn ¼ IL ¼ 0:5KL ðVDD  Vol  VthL Þ2 ¼ KD ððVDD  VthL  Vth ÞVol  Vol2 =2Þ and the low output power dissipation PL is PL ¼ IL VDD

Recall logic-switch abstraction Understand the extension of an inverter to a gate Define pull-down network (PDN) Analyze the behavior of the enhancement load NOR Analyze the behavior of the enhancement load NAND Understand the reduction of series and parallel transistors to a single equivalent 7. Fill the NAND and NOR truth tables. Section 2.2 demonstrated a useful abstraction of logic as connections of switches. It also demonstrated how transistors can be considered switches. The resistance between the source and drain of the transistor is controlled by the voltage on its gate. The transistor can function as a nearly perfect off switch. But as an on switch, it will always have a nonzero resistance. Parallel connections between switches were shown to lead to an OR function. Series connections were shown to lead to an AND function. In this section, we will demonstrate that these principles can be applied with minor modification to the enhancement load inverter. Thus the enhancement load inverter can be extended into any number of more complex logic gates. The same principles can be applied to any other logic family, which is why understanding the behavior of the inverter in a family is critical. Figure 2.25 shows how a general logic gate belonging to the enhancement load family looks like. The gate consists of a single load transistor ML, whose source is the output node. There is also a network that consists of multiple NMOS transistors. This network is called the pull-down network or PDN.

Assuming the output is low half the time, Pdissipated ¼ 0:5ðIL VDD þ IH VDD Þ ¼ 0:5IL VDD

Any current drawn from supply indicates power dissipation. Current drawn over a period of time is energy drawn from supply. This energy must go somewhere. While the energy may possibly be stored in capacitors, this storage has to be temporary, otherwise the system would become unstable as it builds up ever-increasing amounts of energy. The only way a pattern can be sustainable is if all power drawn from supply is eventually burnt off in resistors. This will be instrumental in understanding more complicated forms of power dissipation.

Fig. 2.25 General architecture of enhancement load logic. A pull-down network (PDN) contains N NMOS transistors. There is only one diode-connected load

100

2

Ratioed Logic

Table 2.4 NOR enhancement gate truth table AB

M1

M2

“00”

Cutoff

Cutoff

“01”

Cutoff

Ohmic

“10”

Ohmic

Cutoff

“11”

Ohmic

Ohmic

ML

Current

Logic out

Vout

Sat

zero

“1”

Voh

Sat

nonzero

“0”

Vol1

Sat

nonzero

“0”

Vol2

Sat

nonzero

“0”

Vol3

Fig. 2.26 NOR enhancement gate. Even though “intuition” suggests we should observe an OR function, we see NOR

What differentiates one logic function from another is obviously the PDN. All gates in the enhancement load family have the same load. The PDN has the following characteristics: • It consists of only NMOS • It has at least as many NMOS as there are input variables. Strictly speaking, it has as many NMOS as there are variable instances in the logic function • Each transistor in the PDN has an input variable at its gate • The PDN is responsible for implementing the “0” outputs in the truth table. That is, it is responsible for pulling down the output toward “0”. Figure 2.26 shows a gate where the PDN consists of two NMOS transistors connected in parallel. According to the intuition developed in Sect. 2.2, this should lead to an OR function at the output. When a switch has its control enabled, it passes the value at its input to its output. For M1, for example, the control is A. When A enables M1 by turning it on, M1 passes the value at its input to its output. The “value at the input” of switch M1 is the ground at its source. Thus when it is enabled, it passes the ground at its source to the output Vout at its drain. This will lead to a logic function contrary to the intuition in Sect. 2.2. Table 2.4 is the truth table of the gate in Fig. 2.26, which turns out to be a NOR gate. It describes the situation in the four possible input cases in terms of regions of operation, current flow, and output. The table corresponds directly to Fig. 2.27, where each of the sub-figures represents a row from the table. The first row in the table corresponds to the top-left sub-figure in Fig. 2.27. Input is “00”. Thus M1 and M2 will

Fig. 2.27 The four cases of the NOR gate

both be cutoff. Thus output will be Vdd−Vth since ML will be left with an open source. This is identical to the behavior of the inverter with its input connected to “0”. The second row in the table corresponds to the top-right sub-figure. Input is “01”. M1 is cutoff, but M2 is on and ohmic. This is identical to the inverter with input at “1”, with M2 playing the role of the driver. Output value can be obtained by equating the currents of M2 and ML and solving to obtain a value Vol1. 2 KL ðVDD  Vol1  Vth Þ2 ¼ 2K2 ððVoh  Vth ÞVol1  Vol1 =2Þ

When the input is “10”, the circuit behaves like the lower left sub-figure. Again, this is a situation identical to an inverter with “1” input, but this time with M1 playing the role of the driver. And again, output can be found simply by equating the currents of M1 and ML and solving 2 KL ðVDD  Vol2  Vth Þ2 ¼ 2K1 ððVoh  Vth ÞVol2  Vol2 =2Þ

2.12

NAND and NOR Enhancement Load

When inputs are “11”, the situation is as in the lower right sub-figure in Fig. 2.27. The main method of finding voltages of interest we have used so far is to equate currents by performing KCL at the output node. There is no reason this could not be applied in this case. Thus we equate the currents of M1, M2, and ML to find the output value Vol3: AB=“11” IML jsat ¼ IM1 johmic þ IM2 johmic 2 0:5KL ðVDD  Vol3  Vth Þ2 ¼ K1 ððVoh  Vth ÞVol3  Vol3 =2Þ 2 =2Þ þ K2 ððVoh  Vth ÞVol3  Vol3

This KCL equation can be readily solved to obtain Vol3. However, it also offers deeper insight into how transistors connected in parallel behave. Notice that the two ohmic current equations have everything in common except for K, thus the KCL equation can be restated as 0:5KL ðVDD  Vol3  Vth Þ2 ¼ ðK1 þ K2 ÞððVoh  Vth ÞVol3 2  Vol3 =2Þ Thus, the expression of the sum of the two ohmic transistor currents is indistinguishable from the expression for the current of one transistor whose K = K1 + K2. This is a very important result: two transistors connected in parallel can be treated as one transistor with a K equal to the sum of the individual K’s as long as the gate inputs of the two transistors are the same. This reduction of parallel transistors into one transistor makes sense if we consider the resistance of MOSFET channels. As shown in Fig. 2.28, the result can be extended to any number of transistors in parallel as long as all their gate inputs are the same. K for different transistors differs only in the aspect ratio. Thus we can restate the above conclusion as: Any number of

Fig. 2.28 Transistors in parallel and resistors in parallel

101

transistors in parallel, that also have the same gate input value, can be replaced with a single transistor whose aspect ratio is the summation of all the individual transistor aspect ratios. Transistors in the ohmic region are (nonlinear) resistors, with the resistance being that of the n-type silicon in the channel. Thus, as shown in Fig. 2.28, transistors in parallel are effectively resistors in parallel. W/L is proportional to current. Current is proportional to conductance and inversely proportional to resistance. Conductances in parallel are added, thus W/L for transistors in parallel should be added The truth table in Table 2.4 describes a NOR gate rather than an OR gate. Why has a parallel connection resulted in NOR rather than OR. The reason is that the basic structure of the gate is derived from an inverter, thus any connection in the PDN has an additional inversion added at the output due to the inherent inverter. Stated differently, the NMOS transistors open or close a path to ground, thus activation of the parallel paths leads to realization of zeros in the truth table rather than ones. Notice that the ones of the truth table are achieved by the load when no path to ground is open. Table 2.4 lists three different values for Vol for the NOR gate: Vol1, Vol2, and Vol3. By examining the equations from which the three voltage values are obtained, we can deduce a relationship between them. Vol1 and Vol2 are equal if and only if K1 and K2 are equal. Thus Vol1 and Vol2 are equal if M1 and M2 are identically sized. Vol3 is by necessity lower than both Vol1 and Vol2. The reason is clear from Fig. 2.26. Vol3 is a result of the two transistors operating in parallel. This means that their effective resistance is lower than that of either M1 or M2 individually. Because the gate behaves like a voltage divider between the load and the PDN, then the lower resistance of M1||M2 must lead to a lower Vol. In other words, M1||M2 is equivalent to a single transistor with K = K1 + K2. This transistor will by definition have a higher conductance and a lower resistance than either M1 or M2. Another way of seeing this is that when both M1 and M2 are on, they draw a larger current than when either is on individually. The larger current leaves a larger drop on the active load ML, leading to a lower value at the output. Figure 2.29 shows the enhancement gate where the PDN consists of two NMOS transistors in series. Since Fig. 2.26 turned out to be a NOR gate, we can predict that Fig. 2.29 will turn out to be a NAND gate. Table 2.5 is the truth table of the circuit in Fig. 2.29. Figure 2.30 illustrates the circuit in Fig. 2.29 for the four input cases in Table 2.5 The first row of the truth Table 2.5 corresponds to the leftmost sub-figure of Fig. 2.30. With AB = “00” both M1 and M2 are cutoff. This leaves ML with an open source, leading to an output value of Vdd−VthL with no current flow in the circuit.

102

2

Fig. 2.29 NAND enhancement gate

Table 2.5 Enhancement NAND gate truth table AB

M1

M2

“00”

cutoff

cutoff

“01”

cutoff

ohmic

“10”

ohmic

cutoff

“11”

ohmic

ohmic

ML

Current

Logic out

Vout

sat

zero

“1”

Vdd−Vth

sat

zero

“1”

Vdd−Vth

sat

zero

“1”

Vdd−Vth

sat

nonzero

“0”

Vol

The second and third rows of the truth table correspond to the middle two sub-figures in Fig. 2.30. If either A or B is “0”, then M1 or M2 will be cutoff. If either M1 or M2 is cutoff, then the situation is no different from that where they both were cutoff. The reason is the cutoff transistor causes an

open circuit on the path to ground. Thus, the other PDN transistor being on does not matter. Thus if either A or B is “0”, there will be no current flowing in the load because there is no current flowing in the PDN. Thus, the load will produce Vdd−VthL on Vout. The fourth row of the truth table corresponds to the rightmost sub-figure in Fig. 2.30. With AB = “11”, both M1 and M2 will be on and ohmic. This obviously creates a path to ground. The output in this case will be “0”, but to calculate the corresponding Vol, we have to find a way to deal with transistors in series. Since M1 and M2 are essentially resistors when ohmic, then their appearance in series leads to them being treated like resistors in series. In the NOR gate, transistors in parallel were reduced to a single transistor when their inputs are equal. Similarly, transistors M1 and M2 can be reduced to a single transistor if their inputs are both “1”. When connected in parallel, transistors could be reduced to a single transistor with W/L equal to the summation of individual aspect ratios. However, when connected in series, the transistors can be replaced with a transistor whose on-resistance is the summation of the two transistor resistances. If we consider conductance, the conductance of the equivalent transistor will be lower than the conductance of either of the series MOSFETs. Thus when calculating the equivalent aspect ratio of transistors in series, it is as if we are calculating the conductance of resistors in series. To summarize, W  P ðW=LÞ L parallel ¼ all MOSFET W  P 1=ðW=LÞ L series ¼ 1= all MOSFET

Fig. 2.30 The four cases of enhancement NAND

Ratioed Logic

2.12

NAND and NOR Enhancement Load

This is an important result that will be extended in Chap. 3 to account for the delay of complex logic gates.

2.13

Random Logic in Enhancement Load

1. Recall DeMorgan’s theory 2. Distinguish using multiple gates and using a single gate to implement a function 3. Understand the effect of inherent inversion in logic 4. Follow steps to realize any random enhancement load gate 5. Interpret the static behavior of random enhancement load logic through their truth table. There are two approaches to implement a complex logic function: Either use basic logic gates (NAND, NOR, NOT) to realize more complex gates or use transistors to build a single stage that performs the required logic. Both approaches have practical uses (Chap. 8). To understand how to build any complex logic function using a single gate, we must review one of the basic theories of Boolean Algebra, DeMorgan’s expansion. DeMorgan states that we can expand an inversion on a bracket of binary variables by spreading the inversion to the variables and changing all AND into OR operations, and all OR into AND operations. To be specific, DeMorgan can be stated as (AB) ʹ = Aʹ + Bʹ, and (A + B)ʹ = AʹBʹ. This can then be extended to multiple levels and multiple brackets. This is best illustrated by an example 0 0  þ ðC þ DÞEÞ   0 :ððC þ DÞEÞ  ðAB ¼ ðABÞ    þ CDÞ ¼ ðA þ BÞðE

103

• Construct the PDN based on the expression of Fʹ, each AND operation is a series connection, every OR operation is a parallel connection • The output has an intrinsic inversion. Since you implemented the connections that described Fʹ, the output will be (Fʹ)ʹ = F. For example, to implement the function F = (AB + CD + CE)ʹ, • First, obtain the expression of Fʹ = AB + CD + CE • Perform any simple logic simplification to reduce the number of terms in the expression, which will correlate directly with the area of the gate Fʹ = AB + C(D + E). This step is not necessary to produce a correct gate, but is critical to produce a good gate • The PDN will reflect the expression of Fʹ directly. Figure 2.31 shows the implementation with A and B in series, and their branch parallel with a branch where C is series with D and E in parallel. • A diode-connected enhancement load is added on top of the PDN to obtain the output F at Vout. The gate in Fig. 2.31 has N + 1 transistors, where N is the number of input instances. There are five input variables, so the truth table has 32 lines. Table 2.6 shows selected lines from the truth table. If there are paths open to ground, the output in the truth table is “0”. If no paths are open to the ground, then the output is “1”. In all cases, “1” at the output is Vdd−Vth. And there is zero current flow whenever the output is logic high. The electrical value of “0” varies depending on the input that caused it. The truth table shows four possible values for

De Morgan’s theorem is very important to the understanding of, not only enhancement load logic, but also of complementary networks in CMOS logic. An enhancement NMOS logic gate is built on the basic architecture in Fig. 2.25. But unlike the NAND and NOR gate, the PDN network will use multiple nested parallel and series connections of NMOS transistors to realize the logic function. Below, we will detail a simple yet useful approach to implement any logic function. This approach is contingent on realizing that series connections can represent AND, and parallel connections can represent OR, just as long as we are cognizant that there will be an all-encompassing inversion at the output of the gate due to the underlying inverter. To implement any logic function, • Obtain an expression for Fʹ instead of F, using DeMorgan to spread the inversion if necessary. The final expression of Fʹ must not contain inversions on brackets but might contain instances of variable inversions

Fig. 2.31 Implementation of F = (AB + C(D + E))ʹ

104

2

Table 2.6 Selected lines from the truth table of Fʹ = AB + C(D + E) ABCDE

Active branches

Current

Logic out

Vout

“00000”

None

zero

“1”

Vdd−Vth

“01011”

None

zero

“1”

Vdd−Vth

“10100”

None

zero

“1”

Vdd−Vth

“11000”

AB

nonzero

“0”

Vol1

“11110”

AB, CD

nonzero

“0”

Vol2

“11111”

AB, CD, CE

nonzero

“0”

Vol3

“00111”

CD, CE

nonzero

“0”

Vol4

advantage to sizing the three branches for different equivalent K. Thus, sizing should guarantee that Keq for the three cases is equal: KA KB KC KD KC KE ¼ ¼ ¼ Kmin KA þ KB KC þ KD KC þ KE where Kmin can be obtained from the current equation with the required Vol substituted.

2.14 Vol, each corresponding to a row in the truth table with “0” output. Solving for any Vol follows the same procedure: • Calculate the equivalent K of the active transistors in the paths to ground • Equate the current of the equivalent pull-down transistor with the load transistor and solve for Vol. For example, for “11000”,

Ratioed Logic

Depletion Load Logic

1. Understand the fundamental limitations of enhancement loads 2. Contrast depletion NMOS to enhancement NMOS 3. Derive Voh and Vol for depletion load inverter 4. Sketch the VTC of a depletion load inverter 5. Derive logic threshold for depletion load inverter 6. Derive noise margins for depletion load inverter.

The enhancement load logic family has a major advantage: it is made entirely of run-of-the-mill NMOS transistors. Keq ¼ KA KB =ðKA þ KB Þ This greatly simplifies the fabrication process. Combining 2 2 0:5KL ðVDD  Vol  Vth Þ ¼ Keq ððVDD  VthL  Vth ÞVol  Vol =2Þ different types of MOSFET makes fabrication more difficult and opens the circuit for complications such as latch-up For any input, we will solve the same current equation to (Chap. 7). obtain Vol. The only difference is always in Keq. Following However, the enhancement load family suffers from a the resistive voltage divider analogy, those inputs that have major drawback that makes its use very limited. We have the highest Keq will always cause the lowest Vol. The already shown and will reiterate in Sect. 2.16 that all ratioed best-case Vol is always obtained when all variable inputs are logic families cannot simultaneously achieve Vol = 0 V and “1”. This causes the largest number of branches to be active Voh = Vdd. The enhancement load family achieves neither, in parallel simultaneously, reducing equivalent resistance to with Vol being higher than 0 V and Voh being Vdd−VthL. In the minimum possible. In Fig. 2.31, the best-case Keq can this section and Sect. 2.15, we will consider two logic be obtained as families that differ from the enhancement load family only in the load. Both families use non-NMOS active loads but KC K2 Keq ¼ K1 þ achieve better noise margins. KC þ K2 Figure 2.32 shows a depletion load inverter, a represenwhere tative of the depletion load family. The driver is a traditional NMOS. The load, however, is a nontypical NMOS called K1 ¼ KKA AþKKB B depletion mode NMOS. The load transistor ML has source K2 ¼ KD þ KE and gate shorted. In a traditional NMOS, a gate to source Which solves for Vol3 in the current equation. When short would guarantee the transistor is always cutoff. Howdesigning a circuit, we are always required to guarantee the ever, the depletion mode transistor is different. Figure 2.33 shows a depletion mode transistor with zero worst case rather than the best case. Thus when choosing applied gate to source voltage. There is a channel between the sizing of transistors, one must always consider the highest source and the drain. The reason is that the depletion MOSpossible value for Vol and try to accommodate it. The worst FET has an implanted channel: a channel which is inserted by case is always when there is only one branch active. In doping under the oxide. Thus for the depletion mode MOSFig. 2.31, there are three single branches: AB, CD, and CE; corresponding to input patterns “11000”, “00110”, and FET to become cutoff, its channel has to be depleted of these carriers by applying a negative gate to source voltage. “00101” respectively. The depletion mode MOSFET thus has the I–V characProper design guarantees that Vol for each of the three worst cases satisfies the design requirements. There is no teristics shown in Fig. 2.34. This is very similar to a normal

2.14

Depletion Load Logic

Fig. 2.32 Depletion load inverter. The driver is a traditional NMOS. The load is a depletion mode NMOS with shorted gate and source

105

Fig. 2.35 VTC of depletion load inverter. Note the driver behaves very similarly to the enhancement load inverter. The load cannot be cutoff because its source to gate voltage guarantees the existence of a channel

negative, and the current increases with more positive gate voltage (Fig. 2.34). The depletion load allows us to restore Voh to Vdd. However, it requires special steps in the fabrication process to implant the channel. This complicates fabrication relative to the enhancement load family. To characterize this new family, we will Fig. 2.33 Depletion mode NMOS with zero gate to source potential. The channel still exists because it is implanted by doping during manufacture

• Obtain Vol and Voh, usually starting with Voh since it does not require prior knowledge of Vol • Sketch the VTC of the circuit, marking all regions of operation and where they transition • Using KCL at the output node, and substituting for proper regions of operation; find Vm, Vil, and Vih. To estimate Voh, we apply Vin = 0 V. For a discussion of why this is acceptable even though Voh is defined as Vout(Vin = Vol), consult Sect. 2.7. Since Vin is less than Vth, then MD will be cutoff. Using KCL at the output node, IMD jcutoff ¼ IML johmic

Fig. 2.34 Ids−Vgs characteristics for a depletion NMOS in saturation. Current flows at 0 Vgs because the threshold voltage is negative. Distinguish this from a PMOS transistor where threshold voltage is negative but current does not increase with Vgs

NMOS curve except for the negative threshold voltage. Note that this is different from PMOS where the threshold voltage is negative and current increases with decreased gate voltage. In the depletion mode MOSFET, threshold voltage is

It is fair to ask why we assumed ML to be ohmic. For ML threshold voltage is Vthd and is a negative number. Also Vgs = 0 V due to the short between gate and source. Thus, if ML, were saturated, its current would be IML jsat ¼

2 K L KL  Vgs  Vthd ¼ ðVthd Þ2 2 2

This is a nonzero current by necessity. But since MD is cutoff, the current must be null. Thus, ML must be ohmic. Solving the KCL equation:

106

2

  KL ðVDD  Voh  Vthd ÞðVDD  Voh Þ  0:5ðVDD  Voh Þ2 ¼0 Although this has two solutions, only one satisfies the assumed regions of operation: Voh = Vdd We have already restored the value of Voh up to supply. This can never be achieved using an enhancement (traditional) NMOS load. This is because an enhancement NMOS cannot be in ohmic mode when its drain is connected to supply. To find Vol, apply Vin = Vdd to the gate of MD. This turns MD on. MD must also be ohmic while evaluating Vol, for the same reasons discussed in Sect. 2.7. Additionally, with the output node being low, this imposes a large Vds on ML, thus ML is most likely saturated. We will confirm this assumption when we derive the VTC. The KCL equation while finding Vol: IMD johmic ¼ IML jsat

Vm ¼

Ratioed Logic

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi KL =KD jVthd j þ Vth

To find Vil and Vih, we have to assume ML and MD are sat–ohmic or ohmic–sat. For Vih, KL 2

IMD johmic ¼ IML jsat   2 ðjVthd jÞ ¼ KD ðVin  Vth ÞVout  0:5Vout 2

We can obtain the second equation in Vout and Vin by utilizing the fact that for Vih, the slope of the VTC is −1:  0 0  KD Vout þ ðVin  Vth ÞVout  Vout Vout ¼ 0 0 Vout ¼ 1 KD ð2Vout  ðVin  Vth ÞÞ ¼ 0 This can be solved with the original equation to obtain Vout and Vih. A similar approach can be used to obtain a value for Vil.

2.15

Pseudo-NMOS Logic

Writing the current expressions with Vin = Vdd:   KL ðjVthd jÞ2 ¼ KD ðVDD  Vth ÞVol  0:5Vol2 2 This can be solved for one acceptable value for Vol. Equipped with the two starting points (Vol, Vdd) and (Vdd, Vol), we can start sketching the VTC of the depletion load inverter. The VTC is shown in Fig. 2.35. MD is cutoff as long as Vin < Vth. Thus Vout remains at Vdd up to Vin = Vth. When MD turns on, it does so with a large drain voltage. Thus it turns on in saturation. Similar to the enhancement inverter, we can find the transition of MD from sat to ohmic by drawing the equation Vout = Vin−Vth, all sections of the VTC above the line have MD saturated, while those below it have MD ohmic. As discussed when obtaining Voh and Vol, ML is saturated for Vin = Vdd and ohmic for Vin = 0 V. Additionally, we know that ML is never cutoff. To distinguish where ML switches from ohmic to saturation, VDD  Vout [ jVthd j So ML is saturated for output values (remember Vthd is negative): Vout \VDD þ Vthd To find Vm, we have to make some assumptions about Vthd. If the absolute value of Vthd is small, then ML quickly turns to saturation as the VTC curves down. Thus, for the Vm point, IMD jsat ¼ IML jsat KD ðVin  Vth Þ2 ¼ KL ðVthd Þ2 yielding

1. 2. 3. 4. 5.

Understand complications introduced by using PMOS Derive Voh and Vol for the pseudo-NMOS inverter. Sketch the VTC of the pseudo-NMOS inverter Derive logic threshold for pseudo-NMOS inverter Derive noise margins for pseudo-NMOS inverter

Figure 2.36 shows the inverter of a new logic family: pseudo NMOS. The family consists of a PDN identical to that used for resistive load, enhancement load, and depletion load logic. The only distinguishing feature of pseudo NMOS is the load, which is a PMOS with a grounded gate.

Fig. 2.36 Pseudo-NMOS inverter

2.15

Pseudo-NMOS Logic

107

Using a PMOS allows us to obtain Voh = Vdd. However, it complicates fabrication relative to both enhancement load and depletion load. The introduction of PMOS opens the circuit to the possibility of latch-up (Chap. 7), requiring the use of complicated isolation techniques. However, a thorough understanding of pseudo NMOS is useful as an introduction to CMOS. To find Voh, apply Vin = 0 V. This causes MD to be cutoff, and KCL at the output IMD jcutoff ¼ IML johmic ML is assumed ohmic because when Vout is high, this leaves a small source to drain potential on the channel of the PMOS, causing it to remain ohmic. We can also check that the PMOS with grounded gate and source at Vdd can never carry zero current in saturation. Thus, the PMOS must be ohmic:    KL VDD  Vthp ðVout  VDD Þ  0:5ðVout  VDD Þ2 ¼ 0 The only possible solution for this is Voh = Vdd. To find Vol, we use Vin = Voh = Vdd. In this case, MD must be ohmic. Because a large source to drain voltage exists across the PMOS, then the load must be saturated. The KCL equation at the output is IMD johmic ¼ IML jsat    2 KD ðVDD  Vth ÞVol  0:5Vol2 ¼ 0:5KL VDD  Vthp This can be solved for Vol. The pseudo-NMOS inverter will have Voh = Vdd and Vol not equal to 0 V. This is similar to the depletion load inverter. With the two static point defined, we can start to sketch the VTC, which is shown in Fig. 2.37. The driver is unchanged, and thus its behavior and regions of operation for the VTC are identical to the depletion load, enhancement load, and resistive load inverters. The driver starts in cutoff, switching to saturation at Vin = Vth, and switching again to ohmic when the inequality line intersects the VTC. The PMOS load cannot be cutoff since it observes Vgs = −Vdd. Recall that inequalities for PMOS are reversed and that PMOS threshold voltage is negative. The condition for saturation for ML is thus Vout  VDD \   VDD   Vthp Vout \Vthp  To obtain Vm, we can confirm from the VTC that it occurs when the driver is sat and the load is ohmic, thus IMD jsat ¼ IML johmic    KL VDD  Vthp ðVout  VDD Þ  0:5ðVout  VDD Þ2 ¼ 0:5KD ðVin  Vth Þ2

Fig. 2.37 VTC for the pseudo-NMOS inverter

and by definition, Vout = Vin = Vm    KL VDD  Vthp ðVm  VDD Þ  0:5ðVm  VDD Þ2 ¼ 0:5KD ðVm  Vth Þ2 which can readily be solved for Vm. Finding Vil and Vih involve equating the two currents in sat–ohmic and ohmic–sat, respectively. The equalities have to be differentiated to make use of the property that the slope is equal to −1. This creates two equations that can be solved for Vil or Vih. This is identical to the treatment in Sects. 2.10 and 2.14.

2.16

Limitations of Ratioed Logic

1. Recognize that the decline in Voh in enhancement load is due to load type not architecture 2. Understand the major limitation of ratioed logic in terms of Voh−Vol span 3. Realize that ratioed logic is always associated with nonzero static power 4. Understand that ratioed logic VTC can never approach the VTC of an ideal inverter. Families based on a driver–load architecture (Fig. 2.38) all suffer from common problems. Enhancement load logic suffers from the problem of loss of Vth in the value of “1”. It is important to realize that this problem is due to the fact that the load is a diode-connected NMOS. In other words, this deterioration in Voh is not due to the architecture in Fig. 2.38, but rather due to the particular load used. Using other loads in Sects. 2.7, 2.14, and 2.15 resolved this issue, restoring Voh to Vdd.

108

2

Ratioed Logic

Fig. 2.38 Driver–load architecture. The PDN consists of NMOS only. There is a single load that depends on the family

However, any logic based on the architecture in Fig. 2.38, regardless of the type of load will suffer from three major problems. First no, ratioed logic family can achieve both Vol = 0 V and Voh = Vdd. All families can achieve one or the other. This reflects directly on the noise margins of the gate, because as the range between nominal “0” and nominal “1” decreases, the range available for noise also decreases. Additionally, the nonideal value of either Vol and Voh will be dependent on the sizing of transistors. This is a disadvantage since it imposes constraints on transistor sizes. Also achieving specific transistor sizes is very difficult technologically. Fortunately, the voltages are a function of the ratio of the size of the load and the driver. Matching a size ratio is much easier to achieve than achieving a specific size. The third problem with ratioed logic is that the nonideal output values (Vol in all the examples in this section) are achieved through a voltage divider between the driver and load. Voltage divider by definition means DC current must be flowing. For all the examples in this chapter, we saw a static current flowing when the output is “0”. Static or DC current means there must be nonzero static power dissipation (Sect. 2.11). Ratioed logic also has VTC that is very dissimilar to that of the ideal inverter. Specifically, the transition region will always be very non-sharp. No amount of careful sizing can help alleviate this problem. In Sect. 2.8, we considered how an NMOS behaves when connected with one open-circuited terminal. We considered this in terms of using the NMOS as an active load. We will repeat the same derivation, but this time for both NMOS and PMOS, and with the realization that we are now looking at them as switches. In Fig. 2.39, both PMOS and NMOS are considered as switches. In all cases, consider the top terminal of the transistors to be open circuits. The four sub-figures show the MOSFETs acting as switches. The bottom two cases are the switches trying to pass a logic “1”. The top two cases are the two switches trying to pass logic “0”.

Fig. 2.39 PMOS (left) and NMOS (right) acting as switches passing logic “0” (top) and logic “1” (bottom)

For the top-right sub-figure,    2 I ¼ Kn Vgs  Vthn Vds  0:5Vds ¼0 Vds ¼ 0 Vd ¼ Vs ¼ 0 Thus, the NMOS switch has managed to pass the logic “0” as a clean 0 V. For the bottom-right sub-figure, the transistor must be saturated because it is diode connected:  2 I ¼ 0:5Kn Vgs  Vthn ¼ 0 Vgs ¼ Vthn Vs ¼ Vg  Vthn Vs ¼ VDD  Vthn Thus the top terminal of the transistor is the source. The NMOS switch failed to pass the logic “1” as a complete Vdd. Instead it passed it as a weak “1” at Vdd−Vth. In the top-left sub-figure, the PMOS is necessarily saturated because it is diode connected. Thus  2 I ¼ 0:5Kp Vgs  Vthp ¼ 0 Vgs ¼ Vthp   Vs ¼ Vg  Vthp ¼ Vthp  And the top terminal is the source of the PMOS. Recall that Vthp is a negative number. The PMOS failed to pass the logic “0” properly. Instead, it only passes a weak “0” at | Vthp|. For the bottom-left sub-figure,

2.16

Limitations of Ratioed Logic

   2 I ¼ Kp Vgs  Vthp Vds  0:5Vds ¼0 Vds ¼ 0 Vd ¼ Vs ¼ VDD Thus, we can conclude that the PMOS acts as a perfect switch while trying to pass a logic “1”, passing a full Vdd. The NMOS acts as a perfect switch while trying to pass a logic “0”, passing a full 0 V. On the other hand, the NMOS only manages to pass a weak “1” and the PMOS only manages to pass a weak “0”. One of our main problems with ratioed logic is our inability to obtain Vol = 0 V and Voh = Vdd simultaneously. The above discussion gives us one critical conclusion: we

109

will never be able to obtain these output values in a circuit that is either purely NMOS or purely PMOS. In Chap. 3, we will introduce the CMOS logic family. In CMOS, there are no drivers and loads. Instead, there is a pull-up network and a pull-down network performing equal but complementary roles. Because the PMOS can pass a strong “1” while the NMOS will pass a strong “0”, CMOS gates can provide the logic outputs we have sought for so long. In fact, we will find that CMOS is similar to the ideal inverter in more ways than one. It has zero static power dissipation, a very sharp VTC, excellent noise margins, and extremely robust performance.

3

CMOS

3.1

Basics of the CMOS Inverter

1. Recognize the lack of a driver or load in a CMOS inverter 2. Realize the positions of transistor terminals in the CMOS inverter 3. Derive Voh and Vol for the CMOS inverter 4. Conclude the CMOS inverter has the widest possible Voh–Vol span 5. Conclude the CMOS inverter has zero static current. The CMOS inverter, upon which the CMOS logic family is based, is shown in Fig. 3.1. The C in CMOS stands for complementary. Why we call it so will become clear in Sect. 3.8 when we examine the architecture of complex CMOS gates. The inverter contains a PMOS as well as an NMOS transistor. There is no clear driver and load role. Instead, the NMOS and PMOS play a symmetrical role, with each transistor being on for one of the inputs. The fact that there is no load can be seen from Fig. 3.1. Vin feeds the gates of both transistors Mn and Mp, thus by definition, neither can be the load and both are drivers. The gates of both transistors are at Vin. The drains of both transistors are connected to the output node, Vout. The source of the NMOS is at the ground, and the source of the PMOS is at Vdd. To understand the operation of the inverter, start with Vin very high, for now, assume it is Vdd. Strictly speaking we should be using Voh as an input, but we will check later if using Vdd changes the solution. With Vin = Vdd, the NMOS transistor Mn in Fig. 3.1 is necessarily on because Vgs = Vdd > Vthn. The PMOS transistor is necessarily cutoff because Vsg = 0 > Vthp. It is useful to develop some intuition about the regions of operation of NMOS and PMOS. An NMOS with a high gate will most probably be on, while a PMOS with a low gate voltage will most probably be on. An NMOS with a drain at a high voltage will most likely saturate since it is likely to

pinch off at the drain. An on PMOS with a low drain voltage is likely to saturate since its source is at a high voltage, and a low drain establishes a large potential across the channel. Performing KCL at the output node always yields the sum of currents above the output being equal to the sum of currents below the output (Chap. 2). For the CMOS inverter, this means the PMOS current is equal to the NMOS current. With Vin = Vdd, the PMOS is cutoff and the NMOS is on. Thus IMn ¼ IMp and when Vin = Vdd, IMn ¼ IMp jcutoff ¼ 0 The NMOS is on. In Sect. 2.8, we discussed two cases where a transistor is on but has null current. In this case, Mn cannot be saturated, the reason is with Vgs = Vdd, the saturation current of Mn cannot be null. Thus, Mn must be ohmic:    2 IMn johmic ¼ 0 ¼ Kn Vgs  Vthn Vds  0:5Vds For Mn, Vgs = Vin = Vdd, and Vds = Vout:   2 IMn johmic ¼ 0 ¼ Kn ðVDD  Vthn ÞVout  0:5Vout There are two solutions to this equation, but only one satisfies the assumption that Mn is ohmic: Vout ¼ Vol ¼ 0 To find Voh, we assume Vin is very low. We already know Vol = 0 V, so we can use it as an input value without needing to go back and check the assumption. With this input value, the NMOS transistor is necessarily cutoff since Vgs = 0 V. The PMOS, however, must be on because Vgs = −Vdd < Vthp. The situation with Vin = 0 V looks like a mirror image of the situation with Vin = Vdd, with the regions of operation exchanged. The solution is also similar. KCL at the output

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_3

111

112

3

CMOS

for neither input values, is there a current being drawn from the supply and the total static power dissipation is null.

Fig. 3.1 CMOS inverter. Mn and Mp are both “drivers”. They have shared gates and shared drains

node equates the PMOS current to the NMOS current to null: IMp ¼ IMn jcutoff ¼ 0 Again, the PMOS cannot be saturated since its saturation current cannot be null, so it must be ohmic:    2 IMp johmic ¼ 0 ¼ Kp Vgs  Vthp Vds  0:5Vds For Mp, Vgs = Vin–Vdd and Vds = Vout–Vdd, substituting: IMp johmic ¼ 0 ¼ Kp



  VDD  Vthp ðVout  VDD Þ  0:5ðVout  VDD Þ2

This has two solutions, but only one satisfies the assumption that Mp is ohmic:

The CMOS inverter is already starting to look like an ideal inverter, with a perfect spread between Voh and Vol leading to improved noise margins, and zero static power. We still have to examine its VTC to see how close it comes to the ideal inverter. The CMOS inverter manages to achieve all this because it does not have a driver–load interaction. The NMOS and PMOS both play a driver role for one of the two input values. Notice that the behavior of CMOS is impossible to achieve with NMOS only. To understand why review Sects. 2.8 and 2.16, an NMOS transistor with an open drain can pass a perfect ground from its source to its drain. However, an NMOS with a Vdd on its drain and an open circuit at its source can only pass Vdd–Vth. We say that the NMOS can pass a strong “0” but a weak “1”. A similar analysis can be performed for PMOS. An ohmic PMOS with an open drain and a source at Vdd can pass the Vdd to its drain. However, when the drain of the PMOS is connected to ground, it will saturate, and if its source is open, the source voltage will be |Vthp|. Thus a PMOS can pass a strong “1” and a weak “0”. The only way to build logic gates that produce Vdd and 0 V at the output corresponding to “1” and “0” is to use both NMOS and PMOS. The switch network connecting the output node to the ground must be entirely made of NMOS. The switch network connecting the output to supply must be entirely made of PMOS.

ðVout  VDD Þ ¼ 0 Vout ¼ Voh ¼ VDD Both results, Voh and Vol, make sense. When ohmic, MOSFETs are (nonlinear) resistors. When resistors have zero current, they have zero drop. Thus, when Vin = 0 V Mn cuts off current flow, causing Mp to short circuit the supply to the output node. Note that while calculating Voh and Vol, no transistor parameters appear in the solution. Vth and K cancel out and the output values are not a function of any transistor parameters. Thus CMOS can achieve non-ratioed logic. In fact, the CMOS inverter combines multiple important advantages: • Vol is GND • Voh is Vdd • Neither Vol nor Voh are functions of the transistor sizing. In fact, all technology parameters vanish from the equation while calculating the output values • Static current is zero in both cases because for both input cases either the NMOS or the PMOS must be cutoff. Thus

3.2 1. 2. 3. 4. 5. 6. 7.

CMOS VTC

Understand the shape of CMOS VTC Mark regions of operation on CMOS VTC Recognize the role of transition gain in regeneration Recognize the significance of Vil and Vih Derive proper noise margins Derive the logic threshold for the CMOS inverter Understand why CMOS transition is sharp.

A typical CMOS VTC is shown in Fig. 3.2. Qualitatively, the VTC looks a lot more like that of an ideal inverter than the VTC from any ratioed logic family. Particularly, the transition region looks almost ideally sharp. To understand why the VTC takes this shape and to understand its limitations, we must derive the VTC from scratch. Deriving a VTC begins by marking the two known points from the static analysis: (Vin = Vol, Vout = Voh) and (Vin = Voh, Vout = Vol). For the CMOS inverter, these two points are (0, Vdd) and (Vdd, 0). The second step is to denote the regions

3.2 CMOS VTC

Fig. 3.2 VTC of CMOS inverter. Notice the sharp transition region

of operation of the two transistors at the two points. At (0, Vdd), the NMOS is cutoff and the PMOS is ohmic. At (Vdd, 0), the NMOS is ohmic and the PMOS is cutoff. These two points and the regions of operation as well as the derivation of the rest of the VTC are shown in Fig. 3.3. The third step in deriving the VTC is to determine when the regions of operation of the two static points start to change. For example at point (0, Vdd), the NMOS is cutoff. The NMOS continues to be cutoff until Vin = Vthn. The reason is that Vin is also Vgsn. Thus, the regions of operation remain cutoff for NMOS and ohmic for PMOS until Vin = Vthn, and the output remains constant at Vdd until that point.

113

Thus the VTC from Vin = 0 to Vin = Vthn is a horizontal line at Vout = Vdd. The fourth step is to determine when the NMOS turns on, whether it turns on in ohmic or in saturation. Since the curve must be continuous, the NMOS turns on with Vout at Vdd. Since Vout is Vds for the NMOS, then the NMOS turns on in saturation because Vds = Vdd and Vgs = Vthn. The NMOS is ohmic at (Vdd, 0), and it is saturated at (Vthn, Vdd), therefore there must be a point where it switches from saturation to ohmic. To determine this point, we draw the inequality that describes the regions of operation as a line. The inequality is Vdsn > Vgsn–Vthn. Draw the line representing the equality Vdsn = Vgsn–Vthn, which is equivalent to Vout = Vin–Vthn. The sub-plane above the line represents areas where NMOS is saturated, the areas below ohmic. Thus, the NMOS switches from sat to ohmic at the intersection of the VTC curve with this straight line. Note that the behavior of the NMOS is similar to that in the VTC for ratioed logic families in Chap. 2. Carrying out the same analysis for the PMOS: It turns on when Vin = Vdd–|Vthp|. When the PMOS turns on its sees Vd = 0 and Vds = −Vdd, thus it is surely saturated. To determine when the PMOS switches from saturation to ohmic, we again draw the line representing the limit of the inequality. The line is Vdsp = Vgsp–Vthp, or Vout–Vdd = Vin– Vdd–Vthp. Recalling Vthp is negative, the line becomes Vout = Vin + |Vthp|. This is a straight line with a negative x-intercept at −|Vthp|. The area of the plane above the line is where the PMOS is ohmic. This is the opposite of the NMOS but recall that inequalities are reversed for the PMOS. The points of interest on the VTC are restated here. For a more thorough discussion of the significance of each point, refer to Chap. 2: • Voh = Output voltage representing “1”, corresponding to a “0” input • Vol = Output voltage representing “0”, corresponding to a “1” input • Vm = Logic threshold, the point at which Vin and Vout are equal • Vil = The lower input voltage at which the slope of the VTC is −1 • Vih = The higher of the two input voltages at which the slope of the VTC is −1.

Fig. 3.3 Regions of operation and interesting points on CMOS VTC

Voh and Vol were already shown through static analysis to be Vdd and 0 V, respectively. To obtain Vm, or any other voltage for that matter, perform KCL at the output node. What distinguishes one voltage of interest from another is the region of operation of the transistors at that point. As

114

3

CMOS

well as any special conditions that apply to the point of interest. For Vm, the intersection of the line Vout = Vin with the VTC is sandwiched between Vout = Vin–Vthn and Vout = Vin– Vthp. Thus, this point definitely lies in a region where both transistors are saturated. Performing KCL at the output node:

For Kn to be equal to Kp, we have to have aspect ratios that compensate the disparity in mobility between electrons and holes:

IMn jsat ¼ IMp jsat

We will generally consider electron mobility to be double that of holes. In Sect. 1.5, we discussed how mobility is a function of doping. In modern technologies, doping levels lead to electron mobility that is much lower than double that of holes. However, the ratio of 2 between the two mobilities leads to very useful expressions and incurs little loss of generality. Thus we will often use it.

 2 Kn ðVin  Vthn Þ2 ¼ Kp Vin  VDD  Vthp And by definition at Vm: Vin = Vout = Vm, thus  2 Kn ðVm  Vthn Þ2 ¼ Kp Vm  VDD  Vthp If we take the square root, one answer will be outside the boundaries of the two supply rails, and the other answer is the logic threshold: qffiffiffiffiffiffiffiffiffiffiffiffiffi   Kn =Kp ðVm  Vthn Þ ¼  Vm  VDD  Vthp Vm ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffi Kn =Kp þ Vthp pffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ Kn =Kp

VDD þ Vthn

The result shows that the value of Vm is controlled by Vdd, Vthn, and Vthp; which are all determined by technology. It is also affected by the ratio of Kn to Kp, which is a very useful design parameter. If we know that we would prefer Vm to be a certain value, we can find the ratio of Kn and Kp that achieves this. In most cases, the preferable value of Vm is Vdd/2. This assumes that “0” and “1” at the input of the inverter suffer from similar noise powers. In this case, we can obtain the following condition on the sizing of the NMOS and PMOS: First note that Vthp is negative  pffiffiffiffiffiffiffiffiffiffiffiffiffi  VDD þ Vthn Kn =Kp  Vthp  pffiffiffiffiffiffiffiffiffiffiffiffiffi Vm ¼ 1 þ Kn =Kp If we also assume that Vthn = |Vthp|: qffiffiffiffi  Kn  1 VDD þ Vthn Kp pffiffiffiffiffiffiffiffiffiffiffiffiffi Vm ¼ 1 þ Kn =Kp If Kn = Kp, we would obtain Vm = Vdd/2. Let us examine Kn and Kp a little deeper. They both contain technology parameters (oxide capacitance per unit area and mobility), as well as design parameters (aspect ratios): Kn ln Cox ðW=LÞn ¼ Kp lp Cox ðW=LÞp

Kn l ðW=LÞn ¼1¼ n Kp lp ðW=LÞp

ðW=LÞn lp 1 ¼ ¼ ðW=LÞp ln 2 The result says that for the PMOS and the NMOS to split the range of input voltage equally, the size of the PMOS has to be higher than that of the NMOS by as much as the mobility of the NMOS is higher than that of the PMOS. This can be interpreted in another way. If (W/L)p is larger than (W/L)n by the ratio of mobility, then Kp = Kn which means that the PMOS would have the same current drive capability, resistance, and conductance as the NMOS. In other words, because the NMOS is inherently stronger than the PMOS due to its mobility, we allow the PMOS to compensate by increasing its size. The second observation about Vm is that as soon as we assumed both transistors are in saturation, we obtained the following equation:  2 Kn ðVin  Vthn Þ2 ¼ Kp Vin  VDD  Vthp There is something very curious about this equation, namely that it is a function of Vin only. Vout does not appear anywhere in the equation. Thus the two transistors are simultaneously saturated for only one value of Vin, which is Vm. The lines that delimit saturation from ohmic for the NMOS and the PMOS do not lie on top of each other since Vthn and Vthp are not equal. Thus, there must be a region where both transistors are saturated, and as shown above, this region is limited to a single value of Vin. This means that when the two transistors are saturated, the VTC must be described by the equation Vin = Vm. This is a vertical line with infinite slope. This is the reason the transitional region (or at least part of it) for CMOS is extremely sharp, almost as sharp as the ideal inverter. In reality, the transitional region would not yield a line as sharp as that in Figs. 3.2 and 3.3. There will be some finite slope to the sat–sat region, increasing the range of Vin where both transistors are saturated. This slope will be introduced if

3.2 CMOS VTC

Fig. 3.4 Vil, Vih, true noise margin, and high-gain region for CMOS

the current equations for the SAT transistors include a dependency on Vout. Vout is the drain potential of both transistors. As discussed in Sect. 1.21, the only reason a saturated transistor will see its current depend on drain voltage is due to channel length modulation. Channel length modulation is a secondary effect, so we might expect the slope of the CMOS VTC to still be very sharp. However, in deeply scaled MOSFET, this effect can be significant (Sect. 10.6). Vil and Vih are two points whose significance is not as immediately clear. They are defined as the input voltages where the slope of the curve is −1. First, we need to discuss the significance of slope at −1, then we need to understand why the inputs at these points indicate something interesting, and finally discuss how these voltages can be obtained. Fig. 3.5 Inverter pair regenerating a corrupt logic value

115

Figure 3.4 shows that the region between Vil and Vih is an area where the curve has an absolute slope greater than 1. We say that in this area, the curve has gain. This is critical since it is the main reason behind the regenerative property of digital gates. Logic families that do not have a transitional region with an absolute slope more than 1 do not have regeneration and cannot restore corrupted logic values. Figure 3.5 illustrates the regenerative property. In this figure, the output of the first inverter is the input of the second inverter, and the output of the second inverter is the input of the first inverter. Thus the two inverters are connected in positive feedback. Thus, the Vout axis for the first inverter is the Vin axis for the second inverter, and vice versa. It follows that the VTC of the second inverter is a reflection of the VTC of the first inverter on the Vout = Vin line. Assume that a corrupt logic value exists at the output of the first inverter in Fig. 3.5. The corrupt input value falls in the “high-gain region”. The figure traces how the input value causes an output for inverter 1, which is then an input for inverter 2. This in turn produces an output at inverter 2, which becomes an input for inverter 1 and so on. Figure 3.5 shows that after a few revolutions through the two inverters, the corrupt value will settle into one of the low gain regions where the output is either Vdd or 0 V. Notice that all points left of Vm will go up to Vdd, while all points right of Vm will settle down to 0 V. This is the main significance of Vm. The importance of this discussion is not limited to two inverters connected in feedback. Note that the two inverters in Fig. 3.5 simulate a very long chain of inverters connected in cascade. Thus, the conclusion is that any chain of CMOS logic is restorative and regenerative, where a corrupted value will be restored to a clean value after a number of stages. In fact, any logic family where there is absolute gain more than unity in the transition region is regenerative. Thus the high-gain region is very important, the larger the gain, the fewer stages it takes to clean up the value.

116

3

CMOS

example, to obtain Vih, we notice that it must be in the range where the NMOS is ohmic and the PMOS is saturated: IMn johmic ¼ IMp jsat    2 2 2Kn ðVin  Vthn ÞVout  0:5Vout ¼ Kp Vin  VDD  Vthp There are two unknowns here, Vin and Vout. As in Chap. 2, we have to make use of the property that the slope at Vil is −1: 0 Vout ¼ 1

  0 0 2Kn Vout þ ðVin  Vthn ÞVout  Vout Vout ¼ 2Kp Vin  VDD  Vthp   2Kn ð2Vout  Vin þ Vthn Þ ¼ 2Kp Vin  VDD  Vthp

Fig. 3.6 Input sensitivity in the transition region

However, we note that the range of Vin between Vil and Vih in Fig. 3.4 is very narrow, this is a direct result of the high gain. As shown in Fig. 3.6, this means that all inputs between Vil and Vih are very close to Vm. A very small amount of noise will cause the input to cross Vm. This will lead to an error because the logic is read as the opposite. Another way to see this is that the high gain in the transition region translates into high sensitivity with a small change in input causing a large swing in the output. The large output swing as shown in Fig. 3.6 can lead to a wrong reading on the output. Thus we should not accept inputs within the high-gain region. Inputs in the high-gain region are considered unstable and their results are unpredictable. This is discussed in detail in Sect. 13.8 as a phenomenon known as metastability. Our true noise margins or the range of acceptable Vin must be delimited by Vil and Vih, excluding the high-gain region. A formal definition of noise margins is • NMH = Range allowable for “1” at input = Voh−Vih • NML = Range allowable for “0” at input = Vil−Vol. Note that Voh and Vol represent the “clean” logic values as output by the previous logic gate while Vih and Vil represent the lowest acceptable input voltage in the current logic gate for “1” and the highest acceptable voltage for “0”, respectively. Obtaining the values of Vil and Vih is simple enough if we equate the transistor currents and substitute for the correct MOSFET regions of operation by examining the VTC. For

This creates a second, independent equation in Vin and Vout. This new equation can be solved simultaneously with the KCL equation to obtain values for Vout and Vin. Only the value of Vin is significant as Vih. Obtaining Vil is identical, but it starts with a KCL equation with PMOS ohmic and NMOS saturated.

3.3

Preliminaries of Delay

1. Distinguish propagation delay as the true definition of delay 2. Understand transients in low to high transition 3. Understand transients in high to low transition 4. Recognize why CMOS delay is essentially an RC delay 5. Review the step response of an RC circuit 6. Develop a linearized RC solution 7. Appreciate the effect of input slopes on propagation delay calculation. To derive a simplified method to calculate delay, we have to first define what we mean by the term “delay”. Delay does not describe the sharpness of individual signals, instead it describes the time difference between two waveforms. A sharp or dull rise in a single waveform is a different and somehow independent phenomenon from true “delay”. The most useful definition of delay is propagation delay. Propagation delay is the time it takes for a signal to propagate through a gate and produce a waveform at its output. The left subplot in Fig. 3.7 illustrates the two propagation delays of an inverter. Propagation delay is defined as the time between the input waveform reaching 50% of its transition and the output waveform reaching 50% of its corresponding transition. For example, if the input of an inverter drops from “1” to “0”, the output will rise from “0” to “1”. The time between the input waveform reaching Vdd/2

3.3 Preliminaries of Delay

117

Fig. 3.7 Definition of propagation delay. Left, a realistic input signal, and right, for an ideal input

and the output waveform reaching Vdd/2 is the propagation delay. We distinguish propagation delays based on the transition happening at the output. If the output is dropping from “1” to “0”, we say it is doing a high to low transition and calculate tphl. If it is rising from “0” to “1”, we say it is doing a low to high transition and calculate tplh. Delay is a phenomenon intrinsically connected to capacitance, which is present almost everywhere in electronic circuits. The I–V equation of a capacitor is I = CdV/ dt. What the equation says is that for a nonzero voltage transition dV to occur on a capacitor, then the time it takes (dt) cannot be zero since dt = CdV/I. Thus delay (dt) can only be zero if dV is zero (no transition), if C is zero (impossible in electronics due to parasitics), or if the current available is infinite. Is infinite current possible? Logically, we can immediately answer no. However, a deeper examination of this question can yield interesting results. To study the current, I, more deeply, we have to begin by asking which current we are talking about. Figure 3.8 shows what happens when the input of an inverter changes suddenly from Vdd to GND. Because the input switches instantaneously, the NMOS switches from being on to cutoff and PMOS switches from being cutoff to on. Both transistors switch regions instantaneously. Before the input switched (leftmost sub-figure in Fig. 3.8), the circuit is in steady state. In the steady state, the parasitic capacitor on the output node acts as an open circuit. Thus leading to the open circuit on the NMOS drain, which caused the NMOS to pass GND to output as in Sect. 3.1. However, immediately after the input switches, the voltage on the capacitor cannot change instantaneously because

we do not have infinite current, zero dV or zero C. Thus we start with an initial state in which the PMOS gate is at GND, its drain at GND, and its source at Vdd. The PMOS is thus in saturation in the middle sub-figure in Fig. 3.8. A large saturation current will necessarily flow in the PMOS. The current through the PMOS charges C. As C charges, the output node voltage rises. This will eventually raise the drain potential of the PMOS enough to cause it to switch from saturation to ohmic. When the PMOS enters the ohmic region, it will still supply current to the capacitor. The capacitor will still charge up. The output node will continue to rise. However, because the ohmic current of the PMOS is dependent on the drain potential, the PMOS current will start to drop. The capacitor will continue to charge, the output node potential will continue to rise, and the PMOS current will continue to drop; until we reach a new steady state. The new steady state is shown in the right sub-figure of Fig. 3.8, with the output node at Vdd, the capacitor acting as an open circuit, the PMOS in deep ohmic, and the current equal to zero. This is how the inverter switches from the low-output static state to the high-output static state. The opposite transition can be seen in Fig. 3.9. Here, the input switches suddenly from GND to Vdd. Before switching, the circuit is in steady state. The PMOS is on and ohmic. Both its drain and source are at Vdd, and there is no current through the capacitor. Notice that this steady state is the state in which the transition in Fig. 3.8 ends. After the input transition happens, we suddenly see the PMOS turning off and the NMOS turning on. The NMOS turns on with the output node at Vdd, thus its gate and drain are at Vdd and its source is at GND. The NMOS is saturated

118

3

CMOS

Fig. 3.8 Inverter with output switching from 0 to Vdd. A corresponding input switch causes the NMOS to switch off and the PMOS to switch on. This causes current to flow in the PMOS, charging the capacitor. Until a steady state is reached with the PMOS impedance falling to its lowest level

Fig. 3.9 Inverter with output switching from Vdd to 0. Left: original steady state. Right: final steady state. Center: transience

and will necessarily have current flow. This current will gradually discharge C. As C is discharged, Vout starts dropping, at some point, Vout is low enough for the NMOS to go into ohmic regime. The NMOS will continue to pass current discharging C while in ohmic.

Eventually, C will completely discharge and Vout will equal GND. Thus the NMOS will reach a steady state where C is an effective open circuit, the NMOS is on and ohmic, but cannot move current, thus leading to zero drop and Vout = GND.

3.3 Preliminaries of Delay

The two steady states in Figs. 3.8 and 3.9 represent the static cases discussed in Sect. 3.1. The stages in the transition where the transistors are charging or discharging the capacitor represent the dynamic or transient behavior of the circuit, or how it behaves as it changes from one state to another. As discussed earlier, capacitance is the main cause of delay. Zero capacitance will necessarily mean zero delay. All delays observed at outputs are because a capacitor is charging or discharging and needs a nonzero time to do so. If we see the output transitioning between GND and Vdd, this means the capacitor is charging, and it must be doing so through current supplied by PMOS transistors. If we observe the output dropping from Vdd to GND, then this is always because a capacitor is discharging, almost always through NMOS transistors. This explains the reasoning behind naming the NMOS circuit in Chap. 2 a pull-down network. Because the NMOS transistors in the PDN are responsible for pulling down the output from Vdd to GND to achieve the zeros of the truth table. By observing the I–V equation of the capacitor, we notice that delay is zero not only if C is zero, but also if I is infinite. What might cause an infinite current? Current available to charge or discharge the capacitor is essentially a MOSFET current. It is PMOS if we are charging and NMOS if we are discharging. Current is inversely proportional to resistance. Thus if the transistors were to provide infinite current, that would require their channel resistance to be zero. We know transistor channel resistance is nonzero (Sect. 2.2), so we know that we can never produce infinite current. Thus delay is not only a result of nonzero capacitance but it is also a result of the MOSFET being imperfect switches with nonzero on-resistance. We can already conclude, however, that increasing W/L for a transistor will cause it have more current drive capability, which would lead to lower resistance, which might lead to lower delay. The focus of digital design is obtaining a relatively low time-constant. To achieve this, we seek lower resistance and capacitance. However, R and C have antagonistic relations. Trying to lower one will generally increase the other. The inverter while switching can be modeled as an RC circuit as shown in Fig. 3.10. Notice that this represents the middle sub-figures of Figs. 3.8 and 3.9. C is the total parasitic capacitance at the output node, and R is the (effective) channel resistance. Channel resistance is calculated for PMOS when calculating tplh and NMOS when calculating tphl. Calculating both R and C is not very straightforward

119

and will be discussed in detail in Sect. 3.4. For now, we will develop a model for delay assuming both values are known. Writing KCL at the Vout node in Fig. 3.10, we obtain a first-order ordinary differential equation: Vin  Vout dVout ¼ C: R dt Notice that Vin for this equation is actually the Vdd on the source of the PMOS or the GND at the source of the NMOS. GND and Vdd are constant DC sources, however, if access to them is suddenly switched on by the transistors turning on, then they are seen as step inputs for the RC circuit. We are interested in the step response of this equation, where the step is a jump from 0 V to Vdd or from Vdd to 0 V at the source of the active transistor. The solution of this equation, for a step input, is an exponential equation:     t Vout ¼ VDD 1  eRC ¼ VDD 1  et=s Being exponential, the output waveform is very nonlinear. This makes the solution of the delay equation unwieldy, especially for back of the envelope calculations. Figure 3.11 shows sketches of step responses with time. Some of the curves converge on the final value (Vdd) faster than others. The exponential solution has three parameters: Vdd, R, and C. Vdd determines the final value toward which the solution converges. It has only an indirect effect on the speed of such convergence. R and C do not appear independently, they always appear in their product form RC. The product RC has a unit of time and is known as the time-constant, s. The time-constant is the only parameter determining the speed of convergence of a curve if Vdd is unchanged. A (very) rough approximation of the nonlinear solution of the delay equation can be made. We can approximate the time that the curve takes to cover 50% of its range as 0.69 RC s. Thus finding the delay is as simple as figuring out the loading capacitance, and finding the effective resistance of the NMOS in tphl or the PMOS in tplh, and then

Fig. 3.10 MOSFET-C circuits equivalent to RC circuit

120

3

CMOS

5. Derive an approximation for MOS gate capacitance 6. Derive an approximation for MOS diffusion capacitance.

Fig. 3.11 Step response of an RC circuit. The speed of charging is dependent on a single parameter called the time-constant

tphl ¼ 0:69sn ¼ 0:69Rn CL tplh ¼ 0:69sp ¼ 0:69Rp CL The above is a very useful approximate solution because it is linear and it will allow us to make quick calculations for complex circuits. Quick calculations can give much better insight than more complex calculations that improve accuracy only marginally. There is an apparent contradiction that we have to resolve. At the beginning of this section, we defined propagation delay as the time between the input reaching 50% of its total swing and the output reaching 50% of its total swing. In Fig. 3.11, we define tphl as the time it takes the output voltage to go from 0 V to 50% of its maximum value. This sounds like the rise time of a single waveform, which contradicts the definition of propagation delay earlier in the section. However, if we examine the right sub-figure of Fig. 3.7, we will find that the contradiction is resolved for ideally sharp inputs. Because the input is sharp, its 0%, 50%, and 100% points are all the same point in time. This is also the same time instant at which the output starts to drop or rise, i.e., the 0% point of the output. Thus the difference in time between the 50% point of the input and the 50% point of the output is equal to the time difference between the 0% point of the output and the 50% point of the output. This allows us to use the propagation delay equations derived above.

3.4

Section 3.3 demonstrated that finding delay is as simple as finding the effective resistance and capacitance involved in output node charge and discharge. In this section, we will look at the physical sources of capacitance in a MOSFET transistor. We will also consider how to calculate the resistance of a MOSFET while switching. We will particularly be interested in how capacitance and resistance are affected by design choices and technology. Figure 3.12 is a sketch of a MOSFET structure with the intention of illustrating where capacitance and resistance stem from. • There is a very obvious capacitance between the gate and the channel of the MOSFET. This is the gate capacitance. This is the capacitance discussed in Sects. 1.15, 1.16, and 1.17. It is essential to the vertical switching action of the MOSFET, but it also acts as parasitic for the purpose of delay. The main questions about this capacitance are how is it calculated, how is it affected by design choices, and between which MOSFET terminals does it appear • The resistance is the channel resistance between drain and source. This is the resistance that carries the drift current derived in Sect. 1.18. In the ohmic region, the MOSFET acts as a nonlinear resistance, while in saturation, it can only carry a finite current. Thus the transistor channel has a finite, ideally small resistance. The main challenge in calculating resistance is that it is nonlinear, and thus varies through the range of voltages seen during delay. There is also a third, less obvious, but equally important observation.

MOS Capacitance and Resistance

1. Observe where capacitances and resistance lie in a MOSFET 2. Understand the loading on a CMOS output loaded by another CMOS stage 3. Trace current, region of operation, and voltage as a CMOS node switches 4. Derive an approximation for MOS resistance

Fig. 3.12 MOSFET sketch with capacitors and resistance. Resistance exists in the channel. There are three main capacitors: gate to channel, source to body, and drain to body

3.4 MOS Capacitance and Resistance

• Note from Sect. 2.4 that sources and drains in a properly used MOSFET form reverse-biased PN junctions with the body. In Sect. 1.9, we discussed how in reverse-biased junctions, a junction capacitance arises. A junction capacitance exists between the source and the body, and another exists between the drain and the body. In Sect. 1.9, junction capacitance (also known as diffusion capacitance) was shown to be very nonlinear. Thus, the main challenge in modeling these capacitors is to find a method to linearize them. Figure 3.13 is a very useful model to understand the delay in CMOS circuits. It shows a CMOS inverter loaded by another CMOS inverter. This situation is very typical since gates from a family are almost always loaded by gates from the same family. To find the delay at Vout1, we have to find total capacitance at the node as well as the resistance charging/discharging it. Node Vout1 is exposed to MOSFET drains (from Mn1 and Mp1) as well as MOSFET gates (From Mn2 and Mp2). The node will be charged/discharged through the resistances of Mp1 and Mn,1 respectively. So to calculate delay, we have to develop a model for resistance, gate capacitance, and drain capacitance for NMOS and PMOS. The main issue with resistance is that it is nonlinear. The value of resistance varies as the voltage on it changes. Thus, it is impossible to find a single value for “resistance”. However, it is possible to find an average value for this resistance over the range of switching. The instantaneous resistance of the NMOS transistor for any value of Vout for which the NMOS is saturated is RMn ðVout Þ ¼

Vout IMn jsat

If we take channel length modulation into consideration, RMn ðVout Þ ¼

Vout IMn jsat ð1 þ kVout Þ

Fig. 3.13 Inverter loading an inverter. The node of interest is Vout1. The second inverter acts as a typical external load on the first inverter

121

The NMOS pulls the output node down from Vdd to 0 V. But propagation delay is defined only for the 50% point. Thus, the average value of resistance over this switching range is Req ¼

Z

1 VDD =2

VDD =2

VDD

V dV IMn jsat ð1 þ kV Þ

This simplifies to Req ¼

 3 VDD 7 1  :kVDD 4 IMn jsat 9

If channel length modulation is negligible, R VDD =2 V dV V DD  IMn jsat 1 V 2 V¼VDD =2 Req ¼ VDD IMn jsat 2 jV¼VDD  2   2 2 0:25VDD VDD Req ¼ VDD1IMn jsat 2 1 Req ¼ VDD =2



2

VDD Req ¼ 34 IMn j

sat

Expanding the expression of NMOS saturation current, Req ¼

3VDD 2ln Cox ðW=LÞn ðVDD  Vthn Þ2

The saturation current appears in the denominator of the resistance equation. This intuition is critical. More current means less resistance and less delay. Because more current means faster discharge or charge of the capacitor. But also • Increasing W decreases resistance. This makes perfect sense since increasing width of the channel increases the cross-sectional area of the conducting path leading to a proportional decrease in resistance • Increasing L increases resistance. Increasing length of the channel increases the length of the conducting resistor, which increases resistance proportionately • Resistance is inversely proportional to mobility. Thus an NMOS and PMOS that are identically sized will not have identical resistances. NMOS resistance will be lower by as much as electron mobility is higher than hole mobility. If we assume electron mobility is double hole mobility, then an NMOS will have half the resistance of an identically sized PMOS. Why did we use the saturation current in calculating resistance? Figure 3.14 shows a good view of current, voltages, and region of operation as Mn pulls down the output node. When Vin switches from “0” to “1”, Mn switches from cutoff to saturation. The current flowing through the NMOS, and thus C, rises suddenly from zero to the saturation current value Isat.

122

3

CMOS

Calculating gate capacitance is very straightforward. Just recall that this is the same as the MOS capacitor in inversion mode (Sect. 1.17): Cgate ¼ Cox WL

Fig. 3.14 Graph of discharging capacitor. The saturation current remains constant and is responsible for most of the propagation delay. Bottom graph is the region of operation of M. Saturation dominates modern gates due to the low velocity saturation limit and the slow rise times of inputs

Meanwhile, Vout starts to drop as the current discharges the output node capacitance. Because, to first order, saturation current is not a function of drain potential, current remains constant as Vout discharges. Because the current is constant, Vout will discharge linearly (recall I = CdV/dt). When Vout reaches Vin−Vth, M can no longer remain saturated. It enters the ohmic region. Current starts to drop with Vout since ohmic current is a function of Vds. Thus, Vout starts to discharge nonlinearly. The more Vout drops, the more severe the nonlinearity. Eventually, a steady state is reached where Vin = Vdd, Vout = 0 V, I = 0, Mn is ohmic, and Mp is cutoff. The figure shows that the highest current available is the saturation current. Thus the majority of discharging takes place due to saturation current. Figure 3.14 suggests that for the switching range (Vdd/2 < Vout < Vdd), the NMOS will be equally in saturation and in ohmic. However, in modern technologies, Vdd is low, Vth is high, and saturation is dominant for a very long range (Sect. 10.7). Thus, we can safely assume the transistor is saturated for the entire switching range.

This capacitance appears between the gate and the channel. However, the channel is not one of the transistor terminals. We will always assume this capacitance appears between gate and source. This is obviously an approximation. An accurate representation will have the capacitance distributed between the source and the drain depending on operating conditions. However, it is a very acceptable approximation. Firstly, the approximation is applied equally to all transistors. More importantly, for the most critical period of switching, transistors are in saturation. In saturation, the channel is pinched off at the drain, and the capacitance can be thought to exist entirely to source. In fact, we can assume that all gate capacitance appears between gate and ground. This is because all NMOS sources in Fig. 3.12 are at ground. PMOS sources are at Vdd, however supply is a signal ground since it never sees any switching. So even PMOS sources can be represented as grounds. Gate capacitance is dependent on technology through Cox. It is dependent on design choices through both W and L. It is proportional to both W and L. Note that similarly sized NMOS and PMOS transistors will have equal gate capacitance. Diffusion capacitance is shown in Fig. 3.15. The drain (or source) is surrounded by a depletion region created by the reverse bias with the body. From Sect. 1.9, this capacitance is deeply nonlinear because the width of the depletion region is a function of the applied reverse bias. We can linearize this by calculating an average value over the switching range as we did for resistance. The average value for junction (depletion) capacitance can be found per unit area. The per unit area value captures all the technology-dependent parameters, leaving only the area of the junction as a design parameter. Thus depletion capacitance can be calculated as Cdrain ¼ Cj WLdrain where Cj ¼ average depletion capacitance per unit area And Ldrain ¼ drawn length of the drain The area of the junction is approximated as Ldrain  W. As shown in Fig. 3.15, W is the width of the channel, however Ldrain is not the length of the channel. It is

3.4 MOS Capacitance and Resistance

123

3.5

Simplified Delay Model

1. Develop a simple linear model for delay at the output of an inverter 2. Understand which capacitors contribute to delay at a node 3. Derive the reason L should be minimized 4. Appreciate the role W plays as a design parameter 5. Distinguish the contradictory effect of W on capacitance and resistance 6. Relate W to the aspect ratio. The following four capacitors load the node Vout1 in Fig. 3.13: Fig. 3.15 Drain diffusion capacitor. The area of the junction is determined by channel width but not channel length

• • • •

Gate capacitance of Mn2 Gate capacitance of Mp2 Drain (junction) capacitance of Mn1 Drain (junction) capacitance of Mp1.

Note we will use the terms drain, diffusion, depletion, or junction capacitance interchangeably since they essentially mean the same thing. All four capacitors appear between node Vout1 and ground (see Sect. 3.4 for why their other terminal is grounded). Thus all four capacitors appear in parallel at the output node. They can all be added to calculate a total load capacitance at Vout1. Fig. 3.16 MOSFET capacitors schematic. The gate to channel capacitance appears between gate and source, but in general, can be approximated as between gate and ground. Drain-to-body capacitance appears to (signal) ground

the length of the drain as drawn next to the MOSFET. To understand rules that govern Ldrain, see Chap. 8. Depletion capacitance exists at both the drain and the source. However, as shown in Fig. 3.13, the sources in inverters are shorted to body and thus the source depletion capacitance is not significant. Depletion capacitance per unit area can be different for NMOS and PMOS. Notice that this is similar to channel resistances, but different from gate capacitances. Knowing how to calculate capacitances and resistance is different from knowing how to draw them on the circuit. Figure 3.16 shows where the capacitances exist on the circuit diagram. We already concluded Cgc should exist between gate and ground. Cdb is diffusion capacitance at the drain. Body is at the signal ground for both NMOS and PMOS, thus the other terminal of this capacitor is also ground.

Cout1 ¼ Cgn2 þ Cgp2 þ Cdn1 þ Cdp1 Where Cout1: Total capacitance loading node Vout1. Cgn2: Gate capacitance due to Mn2. Cgp2: Gate capacitance due to Mp2. Cdn1: Drain capacitance due to Mn1. Cdp1: Drain capacitance due to Mp1. There are three important observations about the total load at Vout1: • Both NMOS and PMOS transistors from both inverters contribute to loading • Inverter 1 contributes loading only through drain capacitance • Inverter 2 contributes loading only through gate capacitance Since node Vout1 is the output of inverter 1, then inverter 1 can be considered the “current” stage. Loading due to inverter 1 can be called “self-loading”. Loading due to any externally connected stages, such as inverter 2 can be called

124

3

“external loading”. The following conclusion can be generalized to all CMOS circuits: • Self-loading is contributed only through drain capacitance • External loading is contributed only through gate capacitance Note that for both parts of capacitance (internal and external), there must be a contribution from PMOS as well as NMOS transistors. This is a distinguishing feature of CMOS which we will discover is one of its few, but critical drawbacks (see Sect. 3.11). When calculating high to low propagation delay, node Vout needs to discharge from Vdd to 0 V. This can only happen if the node discharges to ground through the resistance of the NMOS transistor Mn1. Conversely, when calculating low to high propagation delay, Vout1 needs to charge from 0 V to Vdd. This can only happen through the resistance of PMOS transistor Mp1. Thus the linearized delay equations for node Vout1 can be written as tphl ¼ 0:69Rn1 Cout1 tplh ¼ 0:69Rp1 Cout1 In Sect. 3.4, we deduced the expressions of R, Cgate, and Cdrain. We concluded that gate capacitance is proportional to both channel length and channel width. Resistance is directly proportional to channel length but inversely proportional to channel width. Drain capacitance is directly proportional to channel width. This is summarized in Table 3.1. One obvious conclusion from Table 3.1 is that there is no benefit to delay from increasing L. Channel length increases both gate capacitance and channel resistance, leading to an overall adverse effect on delay. Thus it is always in our interest to reduce L as much as possible. The lowest that L can go is the technology parameter Lmin (see Chaps. 2 and 7). Thus all transistors will always be sized with length at the technology parameter. Notice that this means that W/L now not only represents the ratio of width to length, but it also contains information about width. This is because L is already implied to be Lmin. Given this information, the smallest transistor that can be built is a transistor with L = Lmin, and W/L = 1, and thus W = Lmin. This is called the unit transistor and is defined as the transistor whose width and length are both Lmin (and thus whose aspect ratio is unity). The unit transistor is the Table 3.1 Impact of length and width on capacitance and resistance

CMOS

smallest transistor that can be fabricated for a given technology. The unit transistor carries all the information about the technology-dependent parts of resistance and capacitance. It carries no information about sizing, since its size is normalized to the smallest possible value. If we know the resistances and capacitances of the unit transistor, we can easily figure out the resistances and capacitances of transistors with other sizes. First recall that for all transistors, L = Lmin. Thus the only dimension that distinguishes transistors from each other is W. Let us define NMOS drain capacitance, PMOS drain capacitance, NMOS gate capacitance, PMOS gate capacitance, NMOS resistance, and PMOS resistance for the unit NMOS and PMOS transistors respectively as Cdn0 ; Cdp0 ; Cgn0 ; Cgp0 ; Rn0 ; Rp0 We can relate the resistance of the unit NMOS and unit PMOS assuming electron mobility is double hole mobility: Rp0 ¼ 2Rn0 Because the drain and gate capacitances are directly proportional to width, we can obtain the capacitance of any MOSFET in terms of the unit transistor capacitances as  W Cg ¼ Cg0  L This applies to drain or gate capacitance, whether for NMOS or PMOS. This relation is dependent on the fact that Cg0 is related to W/L = 1. Similarly, we can obtain resistance for any transistor in terms of MOSFET unit resistance.

 W Rn ¼ Rn0 L A transistor with higher W/L is a wider transistor, since L = Lmin for all transistors. A wider transistor is also a transistor with more current drive capability and is considered a stronger MOSFET. A stronger MOSFET entails less resistance, but also incurs more capacitance. In Sect. 8.5, we will discover that it is impossible in many practical processes to create a transistor for which both the aspect ratio is unity and L = Lmin. In fact, the minimum width that can be fabricated is usually slightly higher than the minimum length. Thus,

Cgate

Cdrain

R

L increases

Increases

N/A

Increases

W increases

Increases

Increases

Decreases

3.5 Simplified Delay Model

the smallest transistor that can be fabricated will have an aspect ratio higher than 1. This does not mean that it is impossible to fabricate transistors with unity aspect ratio, but said transistors have to have channel lengths higher than Lmin. In Sect. 8.5, the sample design rules will indicate that Wmin = 1.5Lmin, and that the smallest transistor has an aspect ratio of 1.5. This throws the calculations in this chapter a little bit off balance. But having a reference aspect ratio of 1 allows results to be very neat and regular, so we will tolerate this slight inaccuracy. Strictly speaking, we should not be using W/L for scaling resistances and capacitances. Consider for example drain capacitance, it is proportional only to W. However, because all L for all transistors is Lmin, we can interchange ratios of W for ratios of W/L. Consider two transistors M1 and M2, the capacitance for the two transistors is proportional to their W’s. C1 ¼ C0 W1 C2 ¼ C0 W2 The ratio between the two capacitances is C1 C0 W1 ¼ C2 C0 W2 Which can be extended to C1 W1 =Lmin ¼ C2 W2 =Lmin But L1 ¼ L2 ¼ Lmin And thus

125

4. Understand how power dissipation in an inverter is distributed between NMOS and PMOS 5. Recognize short-circuit (crowbar) power as a form of power dissipation 6. Derive an expression for short-circuit power. Section 3.1 demonstrated that steady-state current in the CMOS inverter is zero when output is equal to “0” and when output is equal to “1”. This is a major departure from the behavior of ratioed logic (Chap. 2). Because no steady-state current flows, static power is zero. This is one of the major advantages of CMOS. However, it is obvious that total power dissipation in a CMOS gate cannot be zero. CMOS circuits are ubiquitous in all consumer electronics. All consumer electronics heat up. All consumer electronics drain and empty their batteries. Thus CMOS must be dissipating a lot of power somehow. In Sect. 3.3, we saw that as an inverter switches from one steady state to another steady state, it passes through a transient state where a current flows to either charge or discharge the output node capacitance. This transient condition is shown in Fig. 3.17. The current drawn to charge the capacitor translates into power dissipation, but not for the reasons explained in Chap. 2. • In Chap. 2, we concluded that all power drawn from supply must be dissipated in resistances. This is true for static power. The type of power demonstrated in Fig. 3.17 is called dynamic power and occurs while the circuit is switching. During dynamic dissipation, capacitances play an important role. The power drawn from Vdd in Fig. 3.17 does not have to (all) be dissipated, some or all of it might be stored on the capacitance C • Figure 3.17 necessarily shows power dissipation. The current drawn from Vdd flows through the PMOS channel. The PMOS channel is resistive. Any current that flows in a resistance will dissipate power. Thus although

C1 W1 =L1 ¼ C2 W2 =L2 Thus when quantities are proportional to width, we can also consider them proportional to aspect ratio. This is contingent upon all transistors being minimum length.

3.6

Non-static Power

1. Understand that power dissipation cannot conceptually be null 2. Recognize the energy consumption in channel resistances during switching 3. Derive dynamic energy and power dissipation

Fig. 3.17 Capacitor charging through PMOS. Because a current is drawn through PMOS channel, power is dissipated. Note that because of the presence of the capacitance, current drawn from supply does not necessarily translate into power dissipation

126

3

some of the power drawn from Vdd in Fig. 3.17 will be stored on C, some of it must be dissipated in the PMOS. In Fig. 3.17, there are three energy-related phenomena happening. First, there is energy drawn from the supply. Second, there is energy being dissipated in the PMOS channel, and finally, there is energy being stored in the capacitor. We are interested in energy dissipated in the resistance of the PMOS channel. This resistance is nonlinear and very difficult to calculate. Thus, rather than calculate power dissipated directly, we can calculate it as the balance between the energy drawn from supply and the energy stored on the capacitor. Energy drawn from the supply is the integration of power drawn from supply with time. Z Edrawn ¼ power drawn dt The instantaneous power drawn from the supply in terms of current drawn is Z Edrawn ¼ Idrawn VDD dt The current drawn from supply is the same as PMOS current, which is also the same as the capacitor current. Idrawn ¼ IMp ¼ IC ¼ C:

dVout dt

And thus, the total energy drawn, Z dVout dt Edrawn ¼ CVDD dt The integration has to be limited. Our limits in time start at t = 0, when switching is started. We can give the circuit as much time to finish charging as it needs, the other limit can be infinity. Z t¼1 dVout Edrawn ¼ CVDD dt dt t¼0 Since dt cancels out, dVout will become the variable of integration. The limits have to be modified from those of t to those of Vout. t ¼ 0 ! Vout ¼ 0 t ¼ 1 ! Vout ¼ VDD When we start switching, Vout is still at ground. In steady state at time equal infinity, Vout rises to Vdd but cannot exceed this value. Thus the integration can be evaluated. R Vout ¼VDD

Edrawn ¼ CVDD Vout ¼0 dVout 2 Edrawn ¼ CVDD We still have to find the value of energy stored on the capacitance. As always energy is the integration of

CMOS

instantaneous power, and power is the product of current and potential. Current is still the same as I drawn, but the potential over the capacitor is Vout rather than Vdd. R t¼1 Estored ¼ t¼0 IC Vout dt R t¼1 dVout Estored ¼ t¼0 C dt Vout dt R V ¼V Estored ¼ C Voutout¼0 DD Vout dVout Estored ¼ C:

2 VDD 2

The energy stored in the capacitor is less than the energy drawn from the supply. The balance must be energy dissipated in the PMOS. Edrawn ¼ Estored þ EPMOS 2 CVDD ¼ C:

2 VDD þ EPMOS 2

EPMOS ¼ C:

2 VDD 2

Thus, half of the energy drawn from the supply is stored on the capacitor, and half of it is dissipated in the channel of the PMOS. Notice that the power dissipated in the PMOS is not a function of the value of PMOS channel resistance. This is a curious effect of the nonlinearity of the charging operation. When the input of the inverter switches to “1”, the PMOS turns off and the NMOS turns on, discharging the capacitor from Vdd all the way down to GND. This is shown in Fig. 3.18. Because there is a current flowing in the NMOS channel, there must be power dissipation. Again, the total energy in the system is conserved. The initial energy stored on the capacitor must be equal to the final stored energy plus any balance that is burnt in the NMOS channel. In the half-cycle shown in Fig. 3.18, there is no access to supply Vdd, thus the initial energy in the system cannot increase. Energy conservation requires that EC initial ¼ EC final þ ENMOS When discharging is complete, the final voltage on the capacitor is null. Thus, the final energy on the capacitor is also null. EC final ¼ 0 Fig. 3.18 Capacitor discharging. All the energy stored on the capacitor is lost. It is dissipated in the NMOS channel

3.6 Non-static Power

127

And thus ENMOS ¼ EC initial The situation in Fig. 3.18 happens after the steady state is reached in Fig. 3.17. We have already derived the energy stored on the capacitor at the end of the half-cycle in Fig. 3.17. Thus, we can calculate the energy dissipated in the NMOS. ENMOS ¼ EC initial

V2 ¼ C: DD 2

In a complete “cycle” in which the input switches from “1” to “0” and then from “0” to “1”, an amount of energy is drawn from the supply as the capacitor charges. Half of this energy is stored on the capacitor and half is burnt in the PMOS. When the input switches from “0” to “1”, the half stored on the capacitor is dissipated in the NMOS. Thus all the energy drawn from the supply is dissipated in the MOSFETs, with half going to each device. This is necessary if the circuit is to be stable. If some of the energy drawn from the supply is not dissipated, then there is a net energy that gets stored on the capacitor every cycle. Thus, the capacitor incurs ever-increasing charge. This would eventually lead to dielectric breakdown and failure. This would happen if we do not allow enough time for charging/discharging, in other words, if we try to operate the circuit faster than it can. The mechanism of power dissipation described in this section is called dynamic power dissipation. It is the power dissipated in transience while the output is dynamically changing from one value to another. This mechanism of power dissipation used to be dominant in CMOS technology. Its importance declined with scaling as leakage power (Sect. 10.3) increased in importance. However, modern MOSFET structures have been so successful at choking leakage current (Sect. 10.5), that dynamic power is once more back to being critically important. We have not yet obtained an expression for dynamic power dissipation. If we assume that the period of a “cycle” is T, then the power dissipated in a cycle is EPMOS P ¼ ENMOS þ T



2 CVDD T

And in terms of frequency where f = 1/T, 2 P ¼ CVDD f

Dynamic power is directly proportional to the amount of capacitance being switched at the outputs of logic gates. It is also a very strong (quadratic) function of the supply voltage.

The expression above is in fact an upper bound on dynamic power. It assumes that the outputs of all gates switch at a frequency f. If f is the frequency of the system, then we are assuming each gate is switching every cycle. This is not generally true, with most gates only switching for a fraction of cycles. Thus, a better expression for dynamic power is 2 P ¼ aCVDD f

where a is less than or equal to 1 and is called the activity factor. It represents the fraction of cycles that the output of the gate switches. We will develop a better understanding of the difference between the system clock and the outputs switching in Chap. 6 when we discuss sequential logic and pipelines. Another source of switching power dissipation is sometimes called short-circuit power. It is related to a phenomenon called short-circuit current or crowbar current. This is a very important source of power dissipation, especially in very deep submicron. Short-circuit power is directly linked to finite input signal slope. Figure 3.19 shows a CMOS inverter with an input signal with a finite and small slope. The output will also have a finite slope. The slope of the output is gentler than the input slope. The input will be sharper than the output. Since inputs are also outputs that come from a previous stage, then there is no way to guarantee that input signals will be infinitely sharp. The gentle slope of the input in Fig. 3.19 means that there will be a range of voltages where both the NMOS and the PMOS will be on. This situation was not observed in Sects. 3.3 and 3.4 because the input was infinitely sharp, and thus turned off one of the transistors as soon as it turned on the other. Because the input in Fig. 3.19 now has a slope, there is a range where Vthn < Vin < Vdd–|Vthp|. This allows both MOSFET to be turned on simultaneously. To make matters worse, the output waveform is also gently sloped. This means that there is a range where both transistors are simultaneously saturated. When both transistors are saturated, they both cause a large current to flow through their channels. Thus, according to Fig. 3.19, there is a series current flowing from the supply, through the PMOS, down through the NMOS, and into the ground. This is current drawn from the supply and dissipated in the MOSFET channels and is a significant source of power dissipation. To find an estimate for the value of this power, we find its instantaneous value. The maximum current drawn from the supply is the crowbar current Icb. This current is at most the

128

3

CMOS

and the cycle duration is T, thus the average power drawn during the cycle is P¼

Ppeak tcb Isat VDD tcb ¼ T T

As with dynamic power, this assumes two transitions every cycle, and thus should be qualified by an activity factor. Additionally, if the inverter is clocked in a pipeline, it can only see a single transition in a cycle. A more realistic expression for power is P¼a

Isat VDD tcb T

The maximum current Isat should be distinguished from the peak saturation current used to estimate MOSFET resistance in Sect. 3.4. This is not saturation current when the gate input of the MOSFET is at a full rail value. From Fig. 3.19, we can see that the gate input to the MOSFETs is actually closer to the middle of the supply rail, and thus should be calculated as n   2 o  Isat ¼ min 0:5Kn ð0:5VDD  Vthn Þ2 ; 0:5Kp 0:5VDD  Vthp 

Fig. 3.19 Short-circuit current flow. The duration of current flow tcb is dependent on input slope. Most current flows when both transistors are saturated

lower of the saturation currents of either the NMOS and PMOS. And thus the peak power drawn from supply is Ppeak ¼ Isat VDD The current, and thus power drawn from the supply is not this peak value. So we have to find an average value that is more meaningful. The amount of energy drawn from the supply while switching is the area under one of the curves in Fig. 3.19 Eswitch ¼ 0:5Ppeak tcb where tcb is the time during which crowbar current flows between supply and ground. Here, we assume a triangular shape for the current and power. This energy is drawn for each transition in either direction. If we assume that in a cycle, the inverter will see two transitions, then the total energy drawn in a cycle, Ecycle ¼ Ppeak tcb

Obviously, the most productive way to reduce crowbar power is to reduce crowbar time tcb. Crowbar time is a direct result of the input slope. The sharper the input slope, the lower the crowbar time, and the lower the crowbar power. But because our input slope is the output slope of a previous stage, this is easier said than done.

3.7

CMOS NAND and NOR

1. Extend the concept of series and parallel switches to CMOS 2. Understand the significance of the word “complementary” 3. Contrast the PDN and the PUN 4. Analyze all the rows of the NAND and NOR truth table for CMOS. Section 2.12 showed that a series connection of NMOS transistors yields a NAND gate, while a parallel connection of NMOS transistors yields a NOR gate. Note that in Chap. 2, the NMOS formed the pull-down network or PDN, while there was a load that was responsible for achieving the “1”s in the truth table. The same principle of combining series and parallel connections to achieve logic functions can be extended to CMOS. Except that in CMOS there is no load. Instead, the

3.7 CMOS NAND and NOR

129

PMOS in the CMOS inverter plays an equal role to the NMOS, which is why it also accepts the same input. We will find that in CMOS circuits, PMOS plays a complementary role to NMOS. This is where the C in CMOS comes from. Instead of being a load, the PMOS will form a pull-up network (PUN), which complements the NMOS PDN. The PUN is made entirely of PMOS. There will always be as many PMOS transistors in the PUN as there are NMOS in the PDN. All PMOS transistors in the PUN must have a logic input at the gate. The PUN is responsible for achieving the “1”s of the truth table. Additionally, all connections in the PUN will be complementary to those of the PDN. That is to say, a series connection in the PDN will be a parallel connection in the PUN. A parallel connection in the PDN will be a series connection in the PUN. Table 3.2 summarizes the comparison between the PUN and the PDN. Figure 3.20 shows a CMOS NOR gate. The PDN contains two NMOS transistors connected in parallel. The PDN is identical to the PDN of ratioed logic families’ NOR gate. The PUN consists of an equal number of PMOS. Each logic input connected to an NMOS gate must also be connected to a PMOS gate. Also, the PMOS network is the complement of the NMOS network. Thus, because the two NMOS were connected in parallel in the PDN, the two PMOS are connected in series in the PUN. There are four possible inputs to the NOR gate that form its truth table (Table 3.3). The four rows of the truth table are expanded in Fig. 3.21. Figure 3.21 shows how the four transistors of the gate in Fig. 3.20 looks like for each of the inputs in Table 3.3. When the input is “00”, the two NMOS transistors are cutoff, thus there is no possible path to ground. The two PMOS are on, however no current can flow in them because the PDN has shut off any paths to ground. This is shown in the leftmost sub-figure in Fig. 3.21. The two PMOS transistors are on, but they do not have any current flow. In Sect. 2.12, we demonstrated that two series transistors with the same gate input can be simplified as one transistor with a smaller aspect ratio. Thus Mp1 and Mp2 in the leftmost sub-figure of Fig. 3.21 can be reduced to a single PMOS. This PMOS has an open drain and is ohmic. This is identical to Sect. 3.1 with a static input of “0”. Thus Vout = Vdd and there is zero drop on the two PMOS transistors. Table 3.2 Comparing the PDN and the PUN in a CMOS circuit. N is the number of input variable instances

Fig. 3.20 CMOS NOR gate. Mp1 and Mp2 form the PUN. The PDN is identical to that in ratioed logic

If the input is “11”, then both NMOS transistors are on and both PMOS transistors are cutoff. This corresponds to the rightmost sub-figure in Fig. 3.21. In this case, there is no path open to Vdd. So, again, there is zero steady-state current flow. The two NMOS transistors are on and in parallel. We have discussed how these can be reduced to one transistor whose W/L is the summation of the individual aspect ratios. Thus we end up with a situation identical to the CMOS inverter with “1” input. The NMOS in the PDN has zero current flow, but it is on. Thus it has to be ohmic with zero drop and output is 0 V. The cases for inputs “01” and “10” are very similar. In both cases, one of the two PMOS will be cutoff and one of the two NMOS will be on. Since the PMOS are in series, one of them being cutoff is enough to cut the whole path to Vdd. Similarly, only one NMOS being on is enough to open a path to the ground. With only one NMOS being on with an open drain, the output will be 0 V and is identical to Sect. 3.1 with input “1”. Table 3.4 shows the truth table for the 2-input CMOS NAND. Figure 3.22 shows the NAND CMOS gate for each of the inputs in the truth table. The NAND gate consists of two series NMOS in the PDN (similar to ratioed logic) and two PMOS in parallel in the PUN (complementary to the PDN).

PDN

PUN

Transistor types

NMOS

PMOS

Transistor count

N

N

Part of truth table covered

“0” outputs

“1” outputs

Role in delay

R in tpHL

R in tpLH

130 Table 3.3 Truth table for 2-input CMOS NOR gate

3

CMOS

AB

Mn1

Mn2

Mp1

Mp2

Current

Logic out

Vout

“00”

Cutoff

Cutoff

Ohmic

Ohmic

Zero

“1”

Vdd

“01”

Cutoff

Ohmic

Ohmic

Cutoff

Zero

“0”

0

“10”

Ohmic

Cutoff

Cutoff

Ohmic

Zero

“0”

0

“11”

Ohmic

Ohmic

Cutoff

Cutoff

Zero

“0”

0

AB

Mn1

Mn2

Mp1

Mp2

Current

Logic out

Vout

“00”

Cutoff

Cutoff

Ohmic

Ohmic

Zero

“1”

Vdd

“01”

Cutoff

Ohmic

Ohmic

Cutoff

Zero

“1”

Vdd

“10”

Ohmic

Cutoff

Cutoff

Ohmic

Zero

“1”

Vdd

“11”

Ohmic

Ohmic

Cutoff

Cutoff

Zero

“0”

0

Fig. 3.21 The four cases of CMOS NOR

Table 3.4 Truth table for 2-input CMOS NAND gate

Analyzing the truth table of the NAND gate is very similar to the NOR gate. When the input to the NAND gate is “00”, both NMOS are cutoff and both PMOS are on. The two PMOS appear in parallel and thus can be reduced to a single PMOS with an open drain. Thus the output is Vdd. When the input is “11” both PMOS are cutoff, creating an open circuit at the drain end of the PDN. Both NMOS are on. They are in series with the same gate input, and thus can be reduced to a single NMOS with a lower aspect ratio. Since the PDN has an open drain, it must be ohmic and its output must be 0 V (Sect. 3.1). When the input is either “01” or “10”, one of the NMOS is cutoff. Because the two NMOS are in series, one of them being cutoff will cut the path to ground, causing overall circuit current to be zero. For both inputs, one of the PMOS will be on with an open drain. This is identical to Sect. 3.1 with “0” input. This forces the PMOS to be ohmic and the output to be Vdd. Notice the following about both NAND and NOR gates: • The steady-state current is zero for all rows in the truth table

• There must be an open circuit either on the path to ground in the PDN or on the path to Vdd in the PUN. This is a corollary of the first point • In steady state, all transistors must be either ohmic or cutoff • All steady-state conditions can be reduced (by parallel and series combinations) to one of the two cases of Sect. 3.1 • In steady state, there must be a path open to either supply or ground exclusively.

3.8 1. 2. 3. 4.

CMOS Complex Logic

Understand the reduction of parallel and series transistors Follow the steps to build a CMOS gate Recognize branch opening patterns in CMOS gates Distinguish proper CMOS operation from voltage dividers in ratioed logic 5. Develop a feel for the impact of additional inverters on the gate area

3.8 CMOS Complex Logic

131

Fig. 3.22 The four cases of CMOS NAND

6. Contrast the implementation of F and Fʹ. It was shown in Chap. 2 that transistors connected in series that have the same gate input can be treated as one transistor whose reciprocal of aspect ratio is the sum of the reciprocal of aspect ratios of the individual transistors. It was also shown that any number of transistors connected in parallel, with the same gate input, can be treated as one transistor whose aspect ratio is the sum of the individual aspect ratios. That is to say X W ðW=LÞparallel ¼ L all MOSFET , ðW=LÞseries ¼ 1

X

all MOSFET

1

 W L

To implement a logic function in CMOS, follow these steps: • Obtain an expression for Fʹ, applying DeMorgan’s theorem if necessary • Implement the expression of Fʹ in the PDN with AND’s being series connections of NMOS and OR’s being parallel • The PUN is the complementary network of the PDN formed entirely by PMOS with each series connection being replaced by parallel and vice versa • The output obtained is F not Fʹ. Because the architecture is based on an inverter, thus there is an implicit inversion: Implementing the connections of Fʹ results in (Fʹ)ʹ = F • An alternative way to obtain the PUN is to implement the connections in the expression of F, but replace all logic variables with their complements.

The best way to understand this is through example. To implement the function F = (AB + C(D + E))ʹ, we first obtain the expression of Fʹ = AB + C(D + E). In this case, we were fortunate enough that the logic complement was already present on the whole expression. We will see plenty of examples where this is not the case. The PDN is a direct implementation of the resulting expression with A and B in series in a branch and both parallel with a second branch that contains C in series with D and E parallel with each other. This is shown in Fig. 3.23. To obtain the PUN, there are two alternatives. The first is directly by observing the PDN and implementing the complementary connections. So A and B are parallel in the PUN instead of series in the PDN and both are now series with C parallel with D and E in series. The second method to obtain the PUN, which can also be used as a check, is to obtain the expression of F and expand if necessary. So, by applying DeMorgan’s: F = (AB)ʹ(C (D + E))ʹ = (Aʹ + Bʹ)(Cʹ + DʹEʹ). The PUN is a direct implementation of the connections in the expression of F, with one caveat: All variables should appear in the circuit as an inversion of how they appear in the equation. Think of the inversions as telling us that we should turn on “0” instead of “1”, which the PMOS naturally does. In other words, the inversions are a reference to the bubble on the gate of the PMOS. The complete truth table of F is 32 rows long, but Table 3.5 shows selected entries from the table. The table follows directly from observing the gate in Fig. 3.23. Notice that the function given by the truth table is F, even though the expression implemented in the PDN was Fʹ. The columns titled “Number of paths open in PDN” and “Number of paths open in PUN”, represent the number of paths open to ground and Vdd in the PDN and PUN,

132

3

CMOS

Figure 3.24 shows two cases, the one on the left is a CMOS circuit behaving properly. In this case, the PUN forms an open circuit. There is one branch open from F to ground through the PDN. This PDN path is ohmic and has zero current, thus it will have a zero drop, and the output will be 0 V. The right sub-figure of Fig. 3.24 shows a faulty CMOS circuit. In this case, there is a path to ground through the PDN and a path to Vdd through the PUN. This case is illegal in CMOS and indicates that there was a mistake in implementation for two reasons: • The existence of a path from Vdd to the ground means that there will be steady-state current flow. This is impossible in CMOS • The voltage at F will be neither 0 V nor Vdd, it will be a voltage divider between the impedance in the PDN and the impedance in the PUN. Again, this is not legal in the steady state for CMOS. This is a situation similar to ratioed logic in Chap. 2

Fig. 3.23 The implementation of F = (AB + C(D + E))ʹ. The PDN is identical to ratioed logic

respectively, for the given input. The definition of path here is any branch or subbranch that connects the output node to either ground or Vdd. Note that if there are any paths open to the ground in the PDN, there must be zero paths open to Vdd in the PUN. In which case the output is necessarily 0 V. The opposite is also true: If there are any paths open to Vdd in the PUN, there must be zero paths open to ground in the PDN, in which case the output is Vdd. This observation is critical to proper CMOS operation, and it comes directly from the fact that connections in the PDN and the PUN are complementary and that the PMOS works on the opposite voltage to NMOS. Because one of either the PDN or the PUN must be open circuited, this ensures that for all entries in the truth table, there is zero steady-state current. This means that the other network which has an ohmic path to Vdd or ground will have zero voltage drop. Producing either 0 V or Vdd at the output.

Figure 3.25 shows the state of the circuit in Fig. 3.23 for two inputs from the truth table. There is nothing special about these two inputs, so they are representative. For input “11110”, there is a path open from F to ground through the PDN. There is no path open in the PUN. All the NMOS in the PDN can be simplified into one equivalent NMOS (by combining their W/L). So they will be ohmic with a zero drop and the output will be 0 V. For input “01011”, there is a path open in the PUN connecting F to Vdd. All paths in the PDN are open circuited. Thus, the PMOS in the PUN will be ohmic and there will be zero drop on them, leading to an output of Vdd. Now consider the implementation of F = AB + Cʹ. Unlike the previous example, the expression of F does not lend itself easily to obtaining the expression of Fʹ. Therefore, we have to use DeMorgan’s expansion to obtain an expression we can work with. Thus Fʹ = (AB + Cʹ)ʹ = (AB) ʹ.C = C(Aʹ + Bʹ). This expression is implemented in Fig. 3.26. The PDN is a direct implementation of the connections in Fʹ. The PUN is a complement of the connections in the PDN. It is also an implementation of the connections in the expression of F but with all variables inverted. The number of transistors used to implement the main gate of F is 6. There are always 2 N transistors in the gate where N is the number of variable appearances in the literals. However, this total number assumes that all input variables are available in their complement form at no additional cost. That is to say that Aʹ is itself an input variable just like

3.8 CMOS Complex Logic

133

Table 3.5 Selected entries in the truth table of F = (AB + C(D + E))ʹ ABCDE

Number of paths open in PDN

Number of paths open in PUN

Vout

MNA,MNB,MNC, MND,MNE

MPA,MPB,MPC, MPD,MPE

“00000”

0

2

Vdd

all cutoff

All ohmic

“11000”

1

0

0

Ohmic, ohmic, cutoff, cutoff, cutoff

Cutoff, cutoff, ohmic, ohmic, ohmic

“10110”

1

0

0

Ohmic, cutoff, ohmic, ohmic, cutoff

Cutoff, ohmic, cutoff, cutoff, ohmic

“11110”

2

0

0

Ohmic, ohmic, ohmic, ohmic, cutoff

Cutoff, cutoff, cutoff, cutoff, ohmic

“11111”

3

0

0

All ohmic

All cutoff

“00011”

0

2

Vdd

Cutoff, cutoff, cutoff, ohmic, ohmic

Ohmic, ohmic, ohmic, cutoff, cutoff

“01011”

0

1

Vdd

Cutoff, ohmic, cutoff, ohmic, ohmic

Ohmic, cutoff, ohmic, cutoff, cutoff

“10001”

0

1

Vdd

Ohmic, cutoff, cutoff, cutoff, ohmic

Cutoff, ohmic, ohmic, ohmic, cutoff

“00001”

0

2

Vdd

Cutoff, cutoff, cutoff, cutoff, ohmic

Ohmic, ohmic, ohmic, ohmic, cutoff

Fig. 3.24 Improper and proper CMOS operation. Left, a proper CMOS circuit will have an ohmic path to one rail and open circuits on the path to the other. Right, having paths to both rails entails an output decided by a voltage divider and sizing ratios. This cannot be a CMOS gate

A and does not have to be derived from A. If this is not the case, then we need to insert inverters to obtain Aʹ and Bʹ. Thus, two additional CMOS inverters are needed in the circuit and are shown in Fig. 3.26. The total number of transistors is thus 6 + 2 + 2 = 10. There is an alternative to the implementation of any function. If the expression of F is implemented in the PDN and its complement in the PUN, the output will be Fʹ. F can then be obtained by adding an inverter to the output to invert Fʹ into F. To find an alternative implementation, we implement the expression of F directly to the PDN and its complement to

the PUN, as shown in Fig. 3.27. Since the CMOS architecture adds an implicit inversion, applying the expression of F to the PDN will produce Fʹ at the output. So to obtain F, we have to invert that result with an additional CMOS inverter. The total number of transistors in the alternative implementation in Fig. 3.27 is also 10. The main gate contributes six transistors. There are two transistors in the input inverter for C, and two transistors for the output inverter to obtain F from Fʹ. Note that only one inverter is used to obtain Cʹ for both the PDN and the PUN. In other words, never count variable inversions in the PDN and PUN twice.

134

3

CMOS

Fig. 3.25 Logic gate F for inputs “11110” and “01011”

Fig. 3.27 Alternative CMOS implementation of F = AB + Cʹ Fig. 3.26 CMOS implementation of F = AB + Cʹ, showing additional inverters

So when should one implement the expression of Fʹ to obtain F at the output, and when should one do the opposite? While Figs. 3.26 and 3.27 have the same number of transistors, this result cannot be generalized. But rules of thumb can be applied.

• If input complements are available at no extra cost, it is always better to implement the expression of Fʹ and obtain F directly at the output • If input complements are not directly available, then if the expression of F has a lot of complement variables, it is better to implement Fʹ and obtain F directly at output • Otherwise, if the expression of F does not have a majority of inverted inputs, it might be better (area wise)

3.8 CMOS Complex Logic

135

to implement the expression of F, obtaining Fʹ at the output and inverting it with an additional inverter For example, the function F = AB + CD is certainly better off being implemented as F then inverting. Because in this case, there will be eight transistors in the main gate, and two transistors in the inverter used to turn Fʹ into F, for a total of 10 transistors. If we had tried to obtain the expression of Fʹ and implement it to get F directly, we would end up with Fʹ = (Aʹ + Bʹ)(Cʹ + Dʹ) which, in the absence of ready complements, requires eight transistors for the main gate and 2  4 transistors for input inverters for a total of 16 transistors. On the other extreme, the function F = AʹBʹ + CʹDʹ is better off implemented through the expression of Fʹ = (AʹBʹ + CʹDʹ)ʹ = (A + B)(C + D). Implementing this expression will take only eight transistors in the main gate and F will be obtained directly at output. If we try to implement the expression of F, it would take eight transistors for the main gate plus eight for input inverters and two to invert the output from Fʹ to F for a grand total of 18 transistors. Several other examples are shown in Table 3.6. The case of F = Aʹ + AʹB is particularly interesting since a simple logic reduction shows it is nothing but an inverter for which the two implementations are equivalent since F = Aʹ (B + 1) = Aʹ.

3.9

Sizing, Delay, and Area

1. Understand the impact of sizing on delay 2. Recognize the difference in resistance between different input patterns 3. Calculate total loading on output nodes 4. Distinguish self-loading of a gate 5. Recognize the role of the unit inverter as a reference 6. Exercise different implementation choices for complex logic.

Calculating delay in a complex CMOS logic gate is as simple as applying the delay, capacitance, and resistance rules obtained in Sects. 3.5 and 3.6 with the following results for combining aspect ratios for transistors:  X W W ¼ L parallel all MOSFET L ,

 W 1 L all MOSFET

ðW=LÞseries ¼ 1

X

We will use the following rules/simplifications while obtaining delay: • The resistance in the delay equation is the resistance of NMOS transistors that form a path between the output node and ground while calculating high to low delay. It is the resistance of PMOS transistors forming a path between the output node and supply when calculating low to high delay • Capacitance in the delay equation is the capacitance of transistors connected (through their gates or drains) to the output node. Transistors that do not have a terminal connected to the output node do not load it. This is obviously a simplification. Another way to state this is that we are ignoring internal node capacitance • Capacitance used in the delay equation does not depend on whether the transistors involved are on or cutoff. Once a total load capacitance value is obtained, it is valid for any input combination • The value of resistance used in the delay equation will differ greatly based on input combination. This is because inputs can turn on or off additional branches creating a higher or lower equivalent W/L, thus affecting the overall effective resistance • We will always assume all capacitances at the output node are to ground. They are thus parallel, and their values are added up.

Table 3.6 Transistor count in a selection of gates in two implementations Logic function

Implementation through the expression of Fʹ Main gate

Input inverters

Output inverters

F = AB

4

4

0

F = AʹBʹ + C

6

2

F = (AB)ʹ + CD

8

4

F = ABC + Dʹ

8

F = Aʹ + AʹB

2 10

F = (ABC)ʹ + (DE)ʹ

Implementation through the expression of F Total

Main gate

Input inverters

Output inverters

Total

8

4

0

2

6

0

8

6

4

2

12

0

12

8

4

2

14

6

0

14

8

2

2

12

0

0

2

2

2

0

2

0

0

10

10

10

2

22

136

Fig. 3.28 Logic function F = (AB + C)ʹ loaded by an inverter with all transistors minimum sized

Consider the logic function F = (AB + C)ʹ shown in Fig. 3.28. The gate is loaded by a subsequent CMOS inverter. All transistors in the overall circuit are sized W/L = 1, thus they are all unit transistors. This is the implementation of F with the smallest possible gate area. To obtain delay at node F, we must obtain capacitance at this node, and then the value of resistance used to charge or discharge this capacitance. Capacitance at the output of any gate consists of two components: internal (or self) loading and external loading. Internal loading comes exclusively from the transistor drains of the “current” stage connected to the output node. External capacitance comes from the gate capacitances connected from the next stage. Assuming for simplicity that Cgno = Cgpo = Cdno = Cdpo = Co, and knowing that capacitance is proportional to W/L (Sect. 3.6), we can obtain the total loading at node F as 5Co, of which 3Co come from the drain capacitances of the “current” gate (MNA, MNC, MPC), and 2Co come from the gates of the following inverter. Resistance is a more interesting question. Generally speaking, to obtain resistance, we follow these rules: • If we are calculating high to low delay, we consider only the PDN, if we are calculating low to high delay, we consider only the PUN • In the PUN or PDN, apply the input combination given, then calculate equivalent W/L for all paths that connect the output node to ground or supply • Resistance for PDN will be Ro divided by the equivalent W/L obtained. For PUN, resistance will be 2Ro divided by equivalent W/L. This discrepancy is due to the difference in mobility of electrons and holes

3

CMOS

For example if the input we are considering is “110”, then the function F = (AB + C)ʹ will evaluate to F = (1 + 0) ʹ = “0”. Thus this is an input which ends up with output = “0”. We are considering a transition where output settles at “0”. This is a high to low transition. So to calculate resistance, we consider only the PDN. Applying “110” to the PDN, we find that only the branch formed by AB connects F to ground. Thus the resistance discharging the load is the resistance of this branch. This value of resistance can be calculated by obtaining the equivalent W/L of the two transistors in series which is 1/ (1 + 1) = 1/2. The resistance of an NMOS with W/L = 1 is Ro, thus the resistance of the branch AB with W/L = 1/2 is Ro/(1/2) = 2Ro. High to low propagation delay in this case is 0.69  5Co  2Ro. In Chap. 2, the value of Vol was dependent on the input applied. The input applied determined the equivalent W/L used in the Vol ratioed logic equation. For CMOS, we have already demonstrated that Vol = 0 V. In Sect. 3.9, we observed that this was not dependent on the inputs. Whether one, multiple, or all branches to ground are turned on, the output will be the same voltage. However, the number of active branches affects the effective resistance, and thus directly impacts delay. For example, if the input is “111”, then all branches in Fig. 3.28 are turned on, this leads to an effective aspect ratio of ðW=LÞeff ¼ ðW=LÞC þ

1 ðW=LÞA

1 þ

1 ðW=LÞB

Note we combine the aspect ratios as if we are combining conductances. Because all transistors are unit, (W/L)eff = 1 +1/(1/1 + 1/1) = 1.5. Figure 3.29 shows how resistance differs for five different input cases. The first three sub-figures show three different cases for PDN resistance. Note that for all three cases, the capacitance used in the delay equation is 5Co. Thus, capacitance is independent of the inputs used. Resistance for the three inputs “110”, “111”, and “001” will obviously vary. The best-case scenario is the one which involves the lowest resistance, and thus the lowest delay. The lowest resistance corresponds to the highest conductance. This corresponds to the highest equivalent W/L. For any circuit, the best-case pull-down resistance will happen when all inputs are “1”. So input “111” will yield the best PDN resistance in Fig. 3.29. This makes sense from a physical perspective. If we go back to where the delay comes from, the delay is due to finite currents discharging output node capacitors. Thus, the lowest delay will happen when the maximum current is available to discharge the capacitor, which will happen when the maximum number of branches are turned on.

3.9 Sizing, Delay, and Area

The worst case, conversely, is when a single branch is active. In the case where all transistors are sized equally, the branch with the longest series connection of transistors will have the lowest effective W/L and thus the worst delay. Thus input “110” will produce the worst delay. For input of “001”, the C branch is active in the PDN. Thus we are calculating high to low delay again. In this case, the effective W/L is simply that of MNC, i.e., W/L = 1. Thus delay is 0.69  5Co  Ro. This is an intermediate case between the best and worst cases. The two rightmost sub-figures in Fig. 3.29 show resistance for inputs “000” and “100”. For both inputs, F evaluates to “1”. Thus, these inputs represent transitions where the output settles at “1”. So we are calculating low to high delay. Note that for both inputs there will be no paths open to ground. This is a fundamental property of CMOS gates: there cannot be paths open to ground and supply simultaneously. For input “100”, the output of the function is “1”, we only consider the PUN. Only MPB and MPC are on, but they are enough to create a path to supply. In series, they create an effective W/L of 0.5. The resistance of a PMOS with W/L = 1 is 2Ro. This is very important to notice since it can lead to a lot of confusion. Recall that due to lower hole mobility, a PMOS of equal size to an NMOS will always have double the resistance. The low to high delay in this case is 0.69  5Co  2Ro/0.5. Note that once calculated, load capacitance does not need to be recalculated regardless of the applied input combination or of whether we are considering low to high or high to low delay.

Fig. 3.29 Resistance for inputs “110”, “111”, “001”, “000”, “100”

137

If the input is “000”, we can readily conclude that not only are we considering low to high delay, but we are also considering the best-case low to high delay with the most active number of branches in the PUN. In this case, the effective W/L is that of MPA and MPB in parallel then both in series with MPC, ðW=LÞeff ¼

1 1 ðW=LÞC

þ

1 ðW=LÞA þ ðW=LÞB

When all PMOS are unit sized, (W/L)eff = 1/ (1 + 0.5) = 2/3. Leading to a delay of 0.69  5Co  2Ro/ (2/3). It is worth repeating that for PUN (low to high delay), we divide 2Ro by the effective W/L, while for PDN (high to low delay), we divide Ro by effective W/L. The gate in Fig. 3.28 is sized for the minimum total gate area. However, it does not provide any special consideration for delay. A more typical sizing strategy would guarantee that the circuit has a specific minimum speed (or maximum delay). This speed or delay should be related to the requirements of the application for which the circuit is used. Thus we might want to size a circuit so that its worst-case (maximum) delay is 0.69  3RoCo, or 0.69  6RoCo, or any arbitrary maximum delay. Obviously, referring the maximum delay to Ro and Co requires a tedious expression and also an arbitrary reference to minimum size transistor resistance and capacitance. A better reference point is the unit inverter. The unit inverter is shown in Fig. 3.30. It is defined as the smallest possible inverter whose low to high and high to low delays are equal. This directly leads to NMOS size 1/1 and PMOS size 2/1. The unit inverter thus has a self-loading of

138

3Co of which 2Co come from the PMOS and Co comes from the NMOS. Resistance in both pull down and pull up is Ro. The time-constant of a unit inverter is 3RoCo for both pull up and pull down. Now assume that we want to size the gate F = (AB + C)ʹ so that its resistance in pull up and pull down is equal to that of the unit inverter, which resistance of F would we mean? We have already seen that there is a best-case resistance, a worst-case resistance, and possibly many cases in between. However, when sizing, we have to consider the worst case. Consider this from the user’s point of view. If we offer a processor and claim that it functions at 100 MHz, the user assumes it will always function at 100 MHz and will demand an output from it at 100 MHz all the time. Thus our design must accommodate 100 MHz in the worst case, as the user expects it in all cases. In the best case, the processor will outperform, and in the worst case, it will just barely deliver. From the user’s perspective, the processor meets the specifications because he can always expect it to run at 100 MHz. However, if we designed for 100 MHz in the best case, then the processor will often deliver performance lower than 100 MHz. The user thus cannot operate it at 100 MHz and will have to back off to the worst case. Thus, we cannot call this a 100 MHz processor, despite its peak performance being 100 MHz. Worst case in a CMOS gate always means a single branch. This minimizes the effective W/L and maximizes the resistance. So to size the gate F = (AB + C)ʹ properly, we consider all the possible worst cases and size the transistors accordingly. Figure 3.31 shows that in the PDN, there are two possible single branch scenarios: either only AB or only C. We want resistance in both cases to be Ro, thus we want W/L effective in both cases to be 1. MNC alone must have W/L of 1. MNA and MNB in series must have an effective W/L of 1. Thus both have to be sized at 2. In the PUN, we need any single branch to have a resistance of Ro, and thus a W/L of 2. There are two possible

Fig. 3.30 The unit inverter, its resistance, and self-capacitance

3

CMOS

cases for worst case: AC conducting or BC conducting. If AC is conducting, then we need the effective W/L of MNA and MNC in series to be 2, so each must be 4. If we consider the branch BC, MNC is already sized at 4, if it combines with MNB in series, it must produce 2, so MNB is also sized at 4. The load inverter in Fig. 3.31 is sized 1:2 so that it too has equal resistance to the unit inverter. Total loading capacitance at node F in Fig. 3.31 is C = 10Co, of which 7Co is internal loading (3Co from PDN and 4Co from PUN), and 3Co is external loading from the inverter transistor gates. In the worst case, pull-up and pull-down resistances are Ro by design, and thus the worst-case high to low and low to high delays are equal to 0.69  Ro  10Co. The best-case high to low delay is obtained from input “111” and the equivalent W/L = 1+ (2  2/(2 + 2)) = 1 +1 = 2. Leading to an equivalent high to low resistance of Ro/2 and delay of 0.69  10Co  Ro/2. The best-case low to high delay is obtained from input “000”, and an equivalent W/L = (4 + 4)  4/(4 + 4+ 4) = 2.67, and a low to high delay of 0.69  10Co  2Ro/2.67. Note once more that the value of load capacitance does not change regardless of the input applied. Also, recall that the value of PMOS resistance for W/L = 1 is 2Ro not Ro. One crucial note on the implementation of F in both Figs. 3.28 and 3.31 is the choice of placement of transistors. Consider Fig. 3.32, it shows another implementation of F where the only difference is that MPC is now next to Vdd and MPA and MPB are next to the output node. There is absolutely no functional difference between the implementation

Fig. 3.31 F = (AB + C)ʹ sized for worst-case resistance equal to unit inverter

3.9 Sizing, Delay, and Area

139

Delay at node F must be defined as either high to low or low to high. F is a 3-input NOR gate, a high to low delay involves its input changing from F1 F2 F3 = “000” to any input vector where at least one bit is “1”. Thus any of seven combinations will cause the output of F to switch to high. For example switching from “000” to “001” will cause a transition. But so will “000” to “001”, “010”, “011”, “100”, “101”, “110”, or “111”.

Fig. 3.32 Alternative implementation of F = (AB + C)ʹ with C on top in PUN

in Figs. 3.28 and 3.32. The truth tables will look identical. The transistor count is the same. The VTC is the same. However, there is a difference in dynamic behavior because there is a difference in the drains connected to the output node. Figure 3.32 exposes the output node to more capacitance for no clear reason, the total self-loading capacitance in Fig. 3.32 is 4Co. Compare this to 3Co in Fig. 3.28. Thus the implementation in Fig. 3.32 is inferior and only the implementation in Fig. 3.28 is considered “correct” despite nothing being functionally wrong with Fig. 3.32. Note that this is only true as long as we completely ignore the capacitance of internal nodes, which is an approximation. Consider the function F = ((AB) + (CD + Eʹ) + (G (H + I))ʹ)ʹ. Since this function is complicated, we might choose to implement it as a combination of multiple CMOS gates. Figure 3.33 shows one such implementation which uses four independent CMOS gates to realize the function. The four gates are F1 = AB, F2 = CD + Eʹ, F3 = (G (H + I))ʹ, and F = (F1 + F2 + F3)ʹ. The circuit implementations of each of the functions are shown in Fig. 3.34. All circuits are sized for both pull-up and pull-down resistance equal to the unit inverter, except F = (F1 + F2 + F3)ʹ which is sized for pull-down and pull-up resistance half the unit inverter. Assume all input complements are available without additional inverters. Delay from inputs to output in this function is a little bit more complicated than previous single-stage circuits.

Delay is most adversely affected by long series chains of transistors, or by large numbers of parallel transistors next to the output node. The former lead to high resistance, the latter lead to high loading capacitance. Obviously the impact is worse when these occur in the PUN because of the inferior mobility of holes. By examining Fig. 3.33, we can see that delay from the inputs ABCDEGHI to the output F will pass through two stages. First, it will pass through F1, F2, and F3. This will happen in parallel since all three gates act in parallel on independent inputs. Then the outputs of these three gates are the input to the F gate. Total delay then is the summation of the delay of the F gate and the maximum of the delay of the three gates F1, F2, and F3: tphl ¼ maxftF1lh ; tF2lh ; tF3lh g þ tFhl Calculating delay from the inputs changing to F switching from high to low involves all the inputs switching down to “0”. This change at F can happen in response to a switch from any input combination that contains a “1” to input combination “000”. However, since we are only interested in the worst case, we can again state it as tplh ¼ maxftF1hl ; tF2hl ; tF3hl g þ tFlh

Fig. 3.33 Block diagram of composite implementation

140

3

CMOS

Fig. 3.34 The four components of the complex CMOS function in Fig. 3.33

Thus to find delay, we have to find worst-case high to low and low to high delays for all the gates F1, F2, F3, F. To do this, we must first find the capacitance at all these nodes and then find the resistance for the worst-case branches charging or discharging. When calculating capacitance, we must recall that F1, F2, and F3 have F as external loading. The results for C, effective W/L and delay are summarized in Table 3.7. One observation is that high to low and low to high worst-case aspect ratios are equal for all gates F1 through F3. This is by design since this is how we sized the gates to behave. We also note that F has lower resistance than either F1, F2, F3, this is because it was sized larger, causing its resistance to drop but also increasing the external loading, it inflicts on the previous gates. Notice that for F1, F2, and F3 gates; the self-loading is much lower than that of F, but total loading is higher. Thus F benefited itself by being sized larger but harmed previous stages. Whether or not it harmed, the delay of the overall chain is a question that will be addressed in Chap. 4. Note that F2 and F3 have identical delays, so either can be used in calculating overall delay. Calculating the area of a CMOS gate is a complex affair. It involves a deep understanding of transistor layout and design rules. This will be discussed in detail in Chap. 8. However, an extremely rough approximation of the area of a transistor can be taken as W  L. Strictly speaking, this is the area of the transistor gate, which is very close to the value of the area of the channel. In reality, the area of the channel is a small fraction of the overall area of the transistor and an even smaller fraction of the area of the circuit. . Simplified models often miss a lot of details. Some of these details can be important. However, without a simple model, we cannot quickly develop the intuition that allows us to make appropriate design decisions and develop valuable insight. But it is important to track the assumptions we keep inheriting and making.

We have made three major assumptions while calculating delay: • Capacitances all appear to ground. Gate capacitances should be gate to source rather than gate to ground. The value of capacitance observed to ground should be multiplied by the gain of the logic gate. Thus gate capacitances are observed doubled if we assume that a logic gate is unity gain. We can always assume that the given values Cgno and Cgpo already have this doubling folded-in • We are assuming that only output nodes have significant capacitance. This is not always true, especially if the gate is unloaded. In Chap. 13, we develop a model for delay in networks with multiple time-constants that can be used to take internal node capacitance into consideration. Another aspect of this is that the capacitance at the output node is independent of the input combination. If internal node capacitances were taken into consideration, this would not be the case • We are assuming all inputs are infinitely sharp. Since inputs are outputs from earlier gates, we know for a fact, this assumption must fall apart at some point. Input slopes increase delay by switching transistors slower and add more delay by causing a short-circuit path from supply to ground to be detected. The impact is worse in short channel transistors.

Additional areas not considered in the channel area include the area of the drain and source, area occupied by wires connecting transistors, and area for connections between wires and transistor terminals. As well as compulsory spaces between all imposed by the design rules.

3.9 Sizing, Delay, and Area Table 3.7 Summary of delay information for composite implementation of F

141 Gate

Total output capacitance

Of which self-loading

Worst-case pull-up W/L

Worst-case pull-down W/L

High to low time-constant

Low to high time-constant

F1

20Co

6Co

2

1

20RoCo

20RoCo

F2

22Co

8Co

2

1

22RoCo

22RoCo

F3

22Co

8Co

2

1

22RoCo

22RoCo

F

18Co

18Co

4

2

9RoCo

9RoCo

The area of a logic gate can be estimated as the sum of the areas of the gates of the constituent transistors. Calculating the sum of transistor gate areas can be reduced into calculating the sums of the aspect ratios given we already know that L for all transistors is Lmin, the technology parameter. Total gate area X Area  WL Transistors

Realizing that all transistors, by necessity, have a minimum length, we can take the length as a common factor: X X Area  WLmin ¼ Lmin W Transistors

X W X W ¼ Area ¼ L2min L min L Transistors Transistors

This expression is very useful. The quantity outside the summation is the area of the minimum size transistor and is a constant for the technology. The summation is a unitless quantity and is the summation of all the aspect ratios in the logic gate.

3.10

Req ¼

Transistors

and shoe-horning in another channel length: Area ¼ L2min

can be manipulated. We will discuss power management in detail in Chap. 6. In this section, we will examine how delay can be manipulated. Delay is affected primarily by the sizing of the logic gate and its external load. But it is also affected by the power supply level. Delay is a direct function of the time-constant. The time-constant is the product of resistance and capacitance. Capacitance is not, to first order, affected by the value of supply. So to understand the impact of supply level on delay, we examine resistance. The MOSFET channel resistance is (Sect. 3.4)

Supply and Width Scaling

1. Understand the impact of supply on power and delay 2. Recognize that delay contains an intrinsic and an extrinsic component 3. Conclude that gate sizing affects extrinsic but not intrinsic delay 4. Understand why intrinsic delay forms a lower bound on total delay. Delay and power are the primary metrics used to evaluate digital circuit performance. The product of delay and power is energy, and energy directly correlates to battery performance. It is important to understand how delay and power

3VDD 2ln Cox ðW=LÞn ðVDD  Vth Þ2

For large values of Vdd, the effect of supply on resistance becomes clear. Req ¼

3 2ln Cox ðW=LÞn VDD

And thus Req a

1 VDD

Thus resistance and delay are inversely proportional to the power supply. A higher supply means more voltage that has to be discharged, but it also means even more current is available to perform the discharging. But recall that dynamic power is a quadratic function of power supply. Thus raising power supply will dramatically increase dynamic power. Delay and power thus have contradictory requirements on supply. The effect of sizing on delay is a little more complex. In the last example of Sect. 3.9, we saw how raising the size of one gate will lead to an improvement in its resistance but will cause it to exercise more loading on the preceding stages. Let us assume that a certain gate has external loading of Cext, internal loading of Cint, and gate resistance of Rgate. Its time-constant will be

142

3

s ¼ Rgate Ctotal

CMOS

unloaded inverter will have the same delay. Intrinsic delay is only a function of architecture.

where Ctotal ¼ Cint þ Cext s ¼ Rgate ðCint þ Cext Þ ¼ Rgate Cint þ Rgate Cext The time-constant can be divided into two components: An internal time-constant and an external time-constant. Since delay is 0.69 multiplied by the time-constant, delay can also be divided into these two components. Notice that Resistance is common between the two components, but one includes only the internal capacitance, the other includes only external capacitance: s ¼ sint þ sext where sint ¼ Rgate Cint and sext ¼ Rgate Cext Now assume that all the transistors in the gate are scaled up by a factor of S. This means that all widths of all transistors, NMOS and PMOS, are raised by a factor S while L remains Lmin, thus raising aspect ratio for all transistors by a factor S. Since resistance is inversely proportional to width, scaling reduces the resistance of the gate by a factor S. Drain capacitance is directly proportional to width, thus scaling the gate raises the internal capacitance by a factor of S. However, external capacitance is an external load. We have not stated that the external load also scales up. Thus Cext remains constant. These three facts (resistance dropping by S, Cint rising by S, and Cext remaining constant) combine to produce an interesting result: s ¼ Rgate ðCint þ Cext Þ ! scale by S !

Rgate ðSCint þ Cext Þ S

Thus the new time-constant after scaling: snew ¼ Rgate Cint þ

Figure 3.35 shows the graph of total delay versus S. The graph shows that there is a minimum floor for delay that is asymptotically reached as S tends to infinity. From the equation, we can readily see that this asymptotic value is internal delay. Internal delay or intrinsic delay is thus the best ever delay that a gate can achieve, no sizing can improve delay beyond this value regardless of the presence or lack of external loading. Intrinsic delay is a function only of the architecture of the gate and the technology. We still have not provided a simple answer to the simple question of does sizing up a gate improves its delay. Figure 3.35 provides a simple answer: yes, sizing up improves the delay of any loaded logic gate. There might be diminishing returns as the gate gets bigger, but there is definitely a benefit from raising the size. This simple answer is dangerously misleading. It ignores one critical fact. The current CMOS gate is also the “external load” of the previous CMOS gate, the gate that provides us with our inputs. Thus, as the current gate greedily increases its size to reduce its own delay, it increases the loading on the previous stage. The problem here is that we are using the wrong target for optimization. Our aim should not be to reduce the delay of the current stage. It should be to reduce the “total delay”. This will be discussed in detail in Chap. 4.

3.11 1. 2. 3. 4.

Limitations of CMOS

Review the advantages of static CMOS Recognize the area disadvantage of CMOS Understand the effect of CMOS on external loading Conclude that CMOS has a generally dismal loading behavior, internal and external.

Rgate Cext sext ¼ sint þ S S

If the circuit is externally unloaded, there is no benefit whatsoever in raising its size. Any improvement in resistance due to scaling would be completely offset by the increase in self-loading. However, the external component of delay improves (decreases) proportionately with the scaling factor S. Thus there might be a benefit to increasing size if there is external loading. Intrinsic delay is constant regardless of the sizing of the gate. Thus a small unloaded inverter and a large

Fig. 3.35 graph of total delay versus gate scaling factor. A larger gate obviously has a lower delay. However, a fundamental limit is imposed by intrinsic delay

3.11

Limitations of CMOS

A static CMOS gate has many advantages that make it resemble an ideal gate. CMOS logic outputs are at supply and ground. This increases the separation of clean logic values, leading to improved noise margins. The transition region for CMOS gates is very sharp, especially if the devices have small channel length modulation. This reduces the width of the transition region and widens the stable “0” and “1” input regions, further improving the noise margins. CMOS gates are also ratioless. The static outputs Voh and Vol are not a function of transistor sizes. In fact, they are not functions of any circuit parameters. Moreover, in both static cases, the steady-state current is null. Static power dissipation in CMOS is thus zero. These are all advantages for CMOS compared to ratioed logic (Chap. 2). However, CMOS also has disadvantages. All ratioed logic gates in Chap. 2 consisted of N transistors in the PDN plus a load, thus N + 1 transistors if the load is active. In CMOS, gates contain 2 N transistors, thus the area of a CMOS gate is roughly double that of the corresponding ratioed gate. In fact, the area situation is even worse. Half the transistors in CMOS are PMOS. Because of their inferior mobility, they have to be sized larger than the NMOS in the PDN. Notice that the enhancement load family contains no PMOS, while other ratioed families contain only one none-NMOS transistor. The presence of both NMOS and PMOS also introduces the risk of latch-up (Sect. 7.7), which requires more area for ground and supply contacts and isolation trenches. Area is not very critical in modern technologies. Increasing the density of devices with technology scaling means that the price per device continues to drop. Thus, although there is a price to pay in terms of area, this is not the main disadvantage of CMOS. Instead, CMOS has fundamental issues that affect its delay. In ratioed logic, every output from a gate goes to the PDN of another gate from the same family. This means that only one NMOS gate is seen as external loading at the output node. For CMOS, each input is provided to one NMOS and one PMOS. Thus, the output of a CMOS gate sees two transistor gates per input to next stages. For example in Fig. 3.28, the

143

output of the CMOS gate is an input to the inverter. The inverter imposes an NMOS and a PMOS gate as external loading on the complex gate. The fan-out of CMOS is thus worse than that of ratioed logic. The situation is made worse by the fact that one of the two loading transistors is a PMOS. PMOS transistors are sized larger. In ratioed logic, there is either a PDN or a PUN, but not both. CMOS also has worse intrinsic loading than ratioed logic. If we examine the output node of a CMOS gate, we find multiple NMOS drains connected to it in the PDN, and multiple PMOS drains connected in the PUN. A comparable ratioed logic gate would only have the PDN drains connected to the output drain, plus a single drain from the load device. In Sect. 3.6, we discussed the flow of crowbar current due to a range of voltages in which both transistors in the CMOS inverter are on. In Sect. 10.7, we will discover that this also has a bad impact on delay. The fact that a current flows in both the PDN and PUN means that only a small current will flow through the capacitive load. This is particularly concerning when the PUN and PDN currents are close in value, which happens when both transistors are saturated in the transition region. This effect is not observed in ratioed logic because the load does not observe an input, and thus does not suffer from the effects of finite input slope. It is clear that the fact that CMOS has both a PUN and a PDN is the reason for all of its disadvantages. It is the reason for the larger area, the larger fan-out loading, the larger self-loading, and the flow of crowbar current. However, having both a PUN and PDN is necessary to realize the advantages of CMOS. A PDN and a complementary PUN create complementary paths to ground and supply. This allows the gate to produce ratioless outputs with zero steady-state current. These advantages are very compelling, and for most applications we cannot justify the use of ratioed logic. For delay-critical applications, particularly arithmetic (Chap. 11), we might have to use a faster logic family than CMOS. This will be introduced in Chap. 5 in the form of dynamic CMOS. Dynamic CMOS resolves all the problems of (static) CMOS, while preserving its advantages. In the process, however, it introduces a new host of issues.

4

Logical Effort

4.1

sj1 ¼ sint;j1 þ sext;j1 ! Gate j scales by S ! sint;j1 þ Ssext;j1

Sizing in a Chain

1. Recognize that external delay is one of the two components of delay 2. Realize that sizing a gate affects both its delay and the delay of preceding stages 3. Conclude that optimal delay must target total chain delay rather than stage delay Section 3.10 showed that scaling up the size of a gate can have a positive effect on its total delay. We demonstrated that the intrinsic delay of a gate is unaffected by sizing it up. However, if the external load remains constant, the lowered resistance of the gate causes the external time-constant, and thus the total time-constant, to drop. Figure 4.1 shows a chain of N logic gates indexed from 0 to N − 1. Consider an arbitrary stage j and its driver j − 1 and load j + 1. The time-constant of stage j consists of an external time-constant affected by the gate capacitance of stage j + 1 and an internal time-constant affected by the drain capacitance of stage j: s ¼ sint;j þ sext;j If we increase the size of gate j, then the total delay of gate j will decrease. This is because the external delay component of gate j will decrease by the ratio S by which we raise the size of gate j: sjsclaed

sext;j ¼ sint;j þ s

However, because the size of the gates of the MOSFETs in gate j have increased, they offer increased external loading on stage j − 1. This leads to an increase in the external delay of gate j − 1. In fact, the external delay of gate j − 1 increases by the same ratio that the external delay of gate j decreases

We can accept this. But if we want to maintain the total delay of stage j − 1 at its original level, we have to raise the size of gate j − 1 by a ratio S: sj1 ¼ sint;j1 þ Ssext;j1 ! Gate j  1 scales by S ! sint;j1 þ sext;j1 However, this would increase loading on stage j − 2. If gate j − 2 is sized up to accommodate the increased load, it will in turn impose an increased load on gate j − 3. This leads to a ripple effect, where all stages in the circuit must be scaled up by the same ratio just to keep the original value of total chain delay. Obviously, this ad hoc approach to sizing is counterproductive. We end up increasing the total size of the chain and gaining little improvement in total delay. The correct approach is to consider the total delay of the logic chain as our objective for optimization, rather than the delay of a single stage. The optimization problem is bounded by the final loading capacitance that the chain drives. Thus the question is if we know a chain of logic gates drive a final capacitance C, what is the optimal sizing for the gates to minimize the total delay of the chain. This problem will be solved systematically through this short chapter.

4.2

Sizing an Inverter Chain

1. Understand that capacitance and size are directly proportional in an inverter 2. Understand a technology-based relation between drain and gate capacitance for a transistor 3. Derive delay equation for inverter chain 4. Define fan-out as the ratio of sizes, or capacitances 5. Optimize inverter chain delay 6. Realize that optimal distribution of fan-out is equitable

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_4

145

146

Fig. 4.1 Increasing the size of gate j makes it easier to drive gate j + 1, but it imposes a larger load on gate j − 1, making it harder for it to drive gate j

Solving the delay problem described in Sect. 4.1 is a little complicated. A good first step is to assume the entire chain is made of inverters. Inverters are the simplest possible logic gates; this simplifies the optimization problem. However, because inverters provide great insight, the results of this section can easily be extended to a chain of arbitrary gates. Consider the inverter chain shown in Fig. 4.2. We will assume the number of stages is known and is N. This is similar to Fig. 4.1 with the exception that all the stages are inverters. Also similarly to Sect. 4.1, the inverter chain is driving an external load CL. CL is known and is one of the constraints of the problem. We also assume that we know the size of the first inverter in the chain. Thus, the givens of the problem are • N, the number of stages • Size of the first inverter • CL, the external load The required result is • Sizes of all the inverters in the chain that minimize total chain delay The setup in Fig. 4.2 is more than a simplified illustrative example. Inverter chains are often used when we need to drive a large load with minimum delay. We will see this problem often in the rest of the book, particularly when considering memory accessory circuits (Sect. 12.9), clock network distribution (Sect. 13.7), and pins and off-chip communication (Sect. 14.6).

Fig. 4.2 Inverter chain driving an external capacitive load CL

4

Logical Effort

We are given the size of the first inverter in the chain. Knowing the size of an inverter is equivalent to knowing its gate capacitance and vice versa. If we assume that Cgno = Cgpo = Co, then a unit inverter will have a total input (gate) capacitance of 3Co, 2Co contributed by the PMOS and Co contributed by the NMOS. If for another inverter, we know that the gate capacitance is 6Co, then we know that this inverter is double the size of the unit inverter with NMOS size 2/1 and PMOS size 4/1. This assumes that for all inverters, sizing is performed to maintain equal pull-up and pull-down resistance. We will maintain this assumption for the rest of the chapter. There is also typically some relation between the drain and gate capacitance of a transistor, and thus the drain and gate capacitance of a logic gate. This does not mean that drain capacitance and gate capacitance for the transistor are physically connected. It means that for a certain technology, we notice that one is typically proportional to another. Thus, we will define Cd ¼ cCg Gamma is a unitless constant of proportionality, defined only by technology. Gamma is typically the same for NMOS and PMOS. For example, if for an NMOS we state that Cgno = Co and Cdno = 0.9Co then gamma is 0.9. We can safely conclude that for PMOS in this example gamma is also 0.9. When we assume that Cg,o = Cdo = Co, we are implicitly stating that gamma is 1. In the inverter chain in Fig. 4.2, the delay in any stage j can be defined as   tpdj ¼ 0:69 Rinv;j Cint;j þ Cext;j where Rinv;j ¼ resistance of inverter j Cint;j ¼ self loading of inverter j Cext;j ¼ external loading on inverter j We can expand this delay expression, reducing it again to two delay components, an intrinsic and an external delay: tpdj ¼ 0:69 Rinv;j Cint;j þ 0:69 Rinv;j Cext;j ¼ tpdj;int þ tpdj;ext

4.2 Sizing an Inverter Chain

147

Cint,j is the self-loading of stage j on itself. It comes entirely from the drain capacitances connected to the output node. Thus, we can restate Cint,j as Cd,j. Cext,j is the external loading on stage j. It comes entirely from the loading of stage j + 1 on stage j and is thus exclusively formed by gate capacitances of stage j + 1. So, we will restate Cext,j as Cg, j + 1. tpdj ¼ 0:69 Rinv;j Cint;j þ 0:69 Rinv;j Cext;j ¼ 0:69 Rinv;j Cd;j þ 0:69 Rinv;j Cg;j þ 1 The first term in the delay equation represents intrinsic delay. Recall that intrinsic delay of an inverter is a constant for the technology. This was explored in detail in Sect. 3.11. So for all stages, the intrinsic delay is equal. The delay in terms of the intrinsic delay of the inverter tpo is     Cg;j þ 1 Cg;j þ 1 tpdj ¼ 0:69 Rinv;j Cd;j 1 þ ¼ tp0 1 þ Cd;j Cd;j The drain capacitance of stage j can be expressed in terms of its gate capacitance as Cd;j ¼ cCg;j So stage delay is   Cg;j þ 1 tpdj ¼ tp0 1 þ cCg;j The ratio between the gate capacitance of stage j + 1 and the gate capacitance of stage j is an important quantity. Since gate capacitance is directly proportional to inverter size, the ratio of gate capacitances is also the ratio of the sizes of the inverters. This ratio represents how much bigger stage j + 1 is than stage j, thus it represents how much bigger of a load stage j + 1 offers to stage j. If this ratio increases, then stage j is having a harder time driving stage j + 1 and vice versa. We define this ratio as fan-out and give it the symbol f. Fan-out is a unitless quantity that can be defined for every stage: fj ¼

Cg;j þ 1 Cg;j

And stage delay in terms of stage fan-out (fj) is   fj tpdj ¼ tp0 1 þ c We can also define a chain fan-out (F), which can be understood if we consider the whole chain of inverters to be a black box. In this case the chain fan-out is the ratio of the load capacitance being driven by the entire chain divided by the gate capacitance of the very first stage:



CL Cg;0

One curious result can be obtained if we multiply the fan-out of all the stages. Since most of the terms in the resulting multiplication cancel out, we end up with the following: Y

fj ¼ f0 f1 f2 . . .fN1 ¼

j

Cg;1 Cg;2 Cg;3 Cg;N1 CL  Cg;0 Cg;1 Cg;2 Cg;N2 Cg;N1

If we simplify, we end up with a very important result: Y j

fj ¼

CL ¼F Cg;0

Total chain delay is the summation of stage delays (remember tpo is a constant): tpd ¼

X j

tpdj ¼ tp0

X j



fj c



Recall our problem statement. We want to find the size of each stage to minimize total delay in the chain given the load capacitance, input stage capacitance, and number of stages. Knowing the sizes of the inverters is equivalent to knowing their input capacitances and knowing their input capacitances is equivalent to knowing the fan-out of each stage. This is true because we know the size and thus gate capacitance of the first stage, thus from f0 we can find the capacitance and size of stage 1, which will allow us to know the size of stage 2 from f1 and so on. Optimizing total delay means we have to differentiate and equate with zero. The variable we are trying to determine is size of the gate. The size of the gate is equivalent to its gate capacitance. Thus, we should differentiate with respect to gate capacitance. Optimization means finding the maximum or minimum of a cost function. For an optimization algorithm to work, the cost metric it is optimizing must be affected by conflicting push and pull effect, thus the drive to try to find an optimal compromise. In an inverter chain the conflict in sizing any inverter is that increasing its size will reduce its own external delay but increase that of the preceding inverter. The cost function is the total chain delay, and the optimization parameter is capacitance, which is proportional to size. Since total delay is a function of N gate capacitances, our differentiation w.r.t. to the gate capacitance of stage j has to

148

4

be a partial differentiation. Fortunately, Cg,j appears in only two terms. The term for delay in stage j where it appears in the denominator as self-loading. And the delay term of stage j − 1 where it appears in the numerator as external loading for stage j − 1. This significantly simplifies differentiation:   @ tpd;j1 þ tpd;j @tpd ¼ @Cg;j @Cg;j    @ Cg;j þ 1 Cg;j tp0 1 þ þ1þ ¼0 @Cg;j cCg;j cCg;j1 1 Cg;j1



Cg;j þ 1 ¼0 2 Cg;j

The most useful interpretation of this result comes by rearranging it in terms of stage fan-outs: Cg;j þ 1 1 ¼ 2 C Cg;j g;j1 Cg;j þ 1 Cg;j ¼ Cg;j Cg;j1



Y

N fopt ¼ fopt ¼

j

Logical Effort

CL Cg;0

This allows us to find the optimal fan-out: sffiffiffiffiffiffiffiffi p ffiffiffiffi N CL N fopt ¼ F ¼ Cg;0 The optimal chain delay can now be found as    X fopt fopt 1þ tpd;opt ¼ tp0 ¼ Ntp0 1 þ c c j q ffiffiffiffiffiffi 0 1 N

¼ Ntp0 @1 þ

CL Cg;0

c

A

What we have done in this section is size a chain of inverters with the aim not to minimize the delay of one of them, but to minimize the entire chain delay. This is the problem we were not able to address in Sect. 3.11. However, we still have to extend the results we obtained to a general chain which includes a mix of random logic gates. To do so we will introduce the concepts of parasitic delay and logical effort.

fj ¼ fj1 Since we never specified any conditions on j, this means the result is valid for all j. That is to say, to minimize total chain delay, the fan-out of any stage must equal the fan-out of the following stage. Thus, all stage fan-outs must be equal. Every stage should be bigger than the stage before it by the same ratio with which the following stage is bigger than it. The chain will look like a gradually growing chain of inverters where the rate of growth from one stage to another is constant. This still does not tell us how to find the value of this optimal stage fan-out. However, since we know the total chain fan-out, and we also know that the product of all stage fan-out is total chain fan-out, we can find the value of optimal stage fan-out if we know the number of stages. The total chain fan-out can be obtained in terms of the problem constraints as F¼

CL Cg;0

But it is also the product of all stage fan-outs: Y fj F¼ j

If the solution is optimal, the stage fan-outs are equal, then:

It makes sense that for an inverter chain the optimal solution is to have equal stage fan-outs. There is nothing that distinguishes the relationship between inverters j and j − 1 from the relationship between inverters j + 1 and j. Thus, any result would have to apply to all stages identically.

4.3

Gates Versus Inverters: Preliminaries

1. Realize that a gate delay equation looks very similar to an inverter delay equation 2. Recognize that gate intrinsic delay is not a useful reference point 3. Recognize that the ratio of gate input capacitances is not a measure of fan-out or sizing 4. Manipulate the gate delay equation to extract inverter intrinsic delay as a common factor. To a first order, the delay equation for a gate does not suggest that we should treat it any differently from an inverter. In fact, it looks like we could take the same common factor from the equation to obtain an equation normalized to intrinsic delay. The delay equation of an arbitrary logic gate is

4.3 Gates Versus Inverters: Preliminaries

  tpd;gate ¼ 0:69 Rgate Ctot ¼ 0:69 Rgate Cd;gate þ Cg;next gate This can be expressed as the sum of an internal delay component and an external delay component: tpd;gate ¼ 0:69 Rgate Cd;gate þ 0:69 Rgate Cg;next gate tpd;gate ¼ tgate;internal þ tgate;external We can also take the internal delay as a common factor as we did with inverter delay, leading to a very similar expression:   Cg;next gate tpd;gate ¼ 0:69 Rgate Cd;gate 1 þ Cd;gate   Cg;next gate ¼ tgate;internal 1 þ Cd;gate tpd;gate

  Cg;next gate ¼ tgate;internal 1 þ cCg;gate

The similarities with the inverter equation are very striking. However, they are also very superficial. There are two main issues with this equation. First, the intrinsic delay of the inverter is a very well-defined technology parameter, therefore taking it as a common factor in the equation is very useful. The intrinsic (self) delay of an arbitrary logic gate is constant regardless of sizing, thus it is constant for the technology. However, defining the intrinsic delay of all possible gates in existence and cataloguing them is an impossible task. Therefore, we have to find another way to state the problem to make it more intuitive, and to refer it to a more meaningful constant. Another issue with the delay equation is that the ratio of the gate capacitances does not only represent fan-out. In the inverter equation, gate capacitance ratio was identical to fan-out because gate capacitance is proportional to sizing, thus the ratio of gate capacitances is also the ratio of sizes and thus fan-out. However, here the two gates (current and next) are not necessarily the same type of gate. Thus, the ratio of their gate capacitances does not only represent the ratio of their sizes (fan-out) but also includes information about which gate is more complicated. For example, the ratio of the gate capacitance of a 10-input XOR to the input capacitance of a 2-input NAND will probably be very large. Not because the XOR is sized to be large, but because it is a more complicated gate with longer transistor chains. The XOR needs to be larger than the NAND just to achieve the same resistances. We have to restate the gate delay equation in a way that isolates technology-dependent components of delay from architecture-dependent components. The best parameter for capturing all technology parameters is inverter intrinsic

149

delay. In fact, for any new technology node, the inverter intrinsic delay can be used as a metric for characterization. Thus, we have to force the inverter intrinsic delay into the equation and examine the balance for what it means. If we multiply and divide the gate delay equation by the inverter intrinsic delay:   Rgate Cd;gate Rgate Cg;next gate tpd;gate ¼ 0:69 Rinv Cd;inv þ Rinv Cd;inv Rinv Cd;inv  tpd;gate ¼ tp0

Rgate Cd;gate Rgate Cg;next gate þ Rinv Cd;inv Rinv Cd;inv



The balance inside the bracket is two unitless ratios: Rgate Cd;gate Rinv Cd;inv

and

Rgate Cg;next gate Rinv Cd;inv

The implication of these two ratios and how to calculate them will be examined in Sects. 4.4 and 4.5.

4.4

Normalizing Gate Intrinsic Delay

1. Isolate the first remainder term in the gate delay equation 2. Define the ratio of gate intrinsic delay to inverter intrinsic delay as parasitic delay 3. Develop a method to calculate parasitic delay for any logic gate 4. Realize that gate sizing is irrelevant for the purpose of calculating parasitic delay 5. Recognize that the true size of a gate is only determined when fan-out is found. The first term leftover by the end of Sect. 4.3 is Rgate Cd;gate Rinv Cd;inv It is a unitless ratio between the intrinsic delay of the gate and the intrinsic delay of the inverter. Recall that whether for the gate or the inverter, the intrinsic delay is constant regardless of sizing (Sect. 3.11). Thus this leftover ratio is not a function of sizing. This first term is called parasitic delay, and is given the symbol (p): p¼

Rgate Cd;gate ¼ parasitic delay Rinv Cd;inv

We need to figure out two things about parasitic delay. First, what it represents; and second, how to obtain its value for any gate. It is very obvious what p represents, it is a ratio between the intrinsic delay of the gate under consideration and that of the inverter. Thus, it represents how much harder it is for the logic gate to drive itself unloaded relative to how easy it is for the inverter to drive itself unloaded.

150

4

The lowest value that p can have is 1, which is the value of p for the inverter. No logic gate can have p lower than that of the inverter, and only the inverter can have p = 1. The inverter is the simplest logic gate; thus it is the logic gate with the least effort exerted to drive its unloaded output node. Notice again that p is a function only of the type of gate, it is completely unaffected by the sizing of the gate since neither the intrinsic delay of the inverter nor the intrinsic delay of the gate is a function of how big the gate is (Sect. 3.11). Knowing how to obtain the value of p for any gate is also simple once the facts in Sect. 3.10 are fully understood. p can be restated as a ratio of capacitances multiplied by a ratio of resistances: p¼

Rgate Cd;gate : Rinv Cd;inv

Parasitic delay is reduced to only the ratio of capacitances if Rgate and Rinv are equal. In other words: p¼

Rgate Cd;gate Cd;gate : ¼ j Rinv Cd;inv Cd;inv Rgate ¼Rinv

Thus p is the ratio of the total drain capacitance at the output node of the logic gate, to the corresponding drain capacitance on the inverter output, but only if the resistance of the gate is equal to that of the inverter. At first glance, this seems restrictive since we are obtaining the ratio only for a certain sizing, specifically the sizing that makes the resistances equal. However, we must recall from Sect. 3.11 that the product of intrinsic capacitance and resistance remains the same regardless of sizing. Thus this method incurs no loss of generality. The best way to illustrate the method and its generality is through an example. Consider the example of a 3-input NAND gate in Fig. 4.3. To obtain p for the NAND gate: • Size the gate for worst-case resistance equal to the unit inverter, this has the effect of forcing gate resistance and unit inverter resistance to be equal, reducing p to the ratio of drain capacitances (left side of Fig. 4.3) • Calculate total drain capacitance at the output node for the logic gate, 9Co in the case of the 3-input NAND • p is the ratio of this obtained total drain capacitance to the total output node capacitance of the unit inverter (always 3Co) • p for the 3-input NAND is thus p = 9Co/3Co = 3. Note that the specific sizing is of no concern here, we choose to size similar to unit inverter for convenience.

Logical Effort

However, consider the right-hand sizing in Fig. 4.3, the gate is sized for half the resistance of the unit inverter, however, going through the original definition of p we end up with the same value: p¼

Rgate Cd;gate 0:5 R0 18 C0 : ¼ : ¼3 Rinv Cd;inv R0 3 C0

Thus sizing a gate like the unit inverter is both a convenient and general method since it removes the effect of resistance while not restricting the results to a particular sizing. This method of obtaining the ratio p can be extended to any gate of whatever complexity. For example for F = (AB + CD)ʹ in Fig. 4.4, p can be obtained by calculating the ratio of output drain capacitances to 3Co, yielding p = 12/3 = 4. Note that as shown in the figure even if all sizes in the gate are scaled by a random factor of S, p would still be 4: p¼

Rgate Cd;gate Rgate 12 SC0 R0 =S 12 SC0 : ¼ : ¼ : ¼4 R0 Rinv Cd;inv Rinv 3C0 3C0

Note that when we size the gate for resistance equal to the unit inverter, it is only for the convenience of calculating p as a ratio of capacitance only. The sizing does not insinuate any information about the actual size of the gate. The actual size of the gate can only be obtained from fan-out and the capacitance of the preceding stage. This will be the result of optimization in Sect. 4.7.

4.5

Normalizing Gate External Delay

1. Isolate external delay term from gate delay equation 2. Realize the term represents the ratio of complexity of the gate relative to inverter 3. Define logical effort, and how to calculate it 4. Recognize that sizing equal to unit inverter is convenient and general 5. Stress that final sizing is only decided once electrical effort (fan-out) is found. The second term of the remainder in Sect. 4.3 is more complex than the first term which yielded parasitic delay: Rgate Cg;next gate Rinv Cd;inv The term includes a ratio between the gate capacitance of the next stage and the drain capacitance of the (nonexistent) inverter. This can be readily restated in terms of gate capacitance ratio as

4.5 Normalizing Gate External Delay

151

Fig. 4.3 A 3-input NAND gate. The parasitic delay is the same when calculated for the two sizings. Thus, we can choose any size for the gate when calculating p. The sizing that leads to the simplest calculation is the one that makes resistance equal to that of the unit inverter

Fig. 4.4 F = (AB + CD)ʹ for the purpose of obtaining p

capacitance of the following stage and the current stage was called fan-out in the inverter chain, we call it electrical effort in logic chains and give it the symbol h. Electrical effort represents the effort the stage expends to drive the external load: Rgate Cg;next gate Rgate h Cg;current stage ¼ Rinv cCg;inv Rinv c Cg;inv Note that this entire term is equal to h/c for the inverter. Thus, the balance of the term must relate to the fact that we are dealing with a complex gate instead of an inverter. The balance of the term is Rgate Cg;current stage Rinv Cg;inv This is a unitless quantity that we will call logical effort. We will give it the symbol g.

Rgate Cg;next gate Rgate Cg;next gate ¼ Rinv Cd;inv Rinv cCg;inv This term is corresponding to the second term in the inverter delay equation in Sect. 4.2. The inverter delay equation included fan-out, f. Thus, we have to somehow inject fan-out into this expression: Rgate Cg;next gate Rgate Cg;next gate Cg;current stage ¼ : : Rinv cCg;inv Rinv cCg;current stage Cg;inv Here, we have injected the gate capacitance of the current stage by multiplying and dividing by it. The ratio of the gate



Rgate Cg;current stage Rinv Cg;inv

As with p, we have to interpret the physical meaning of g as well as learn how to calculate it. Logical effort is the ratio of gate capacitance of the complex gate under consideration to the gate capacitance of the unit inverter, given that the resistances of both are equal: g¼

Cg;current stage jRgate ¼Rinv Cg;inv

Logical effort represents a more complicated quantity than parasitic delay, and it is related to the capacity of a gate

152

4

to drive external loads. Logical effort represents how much harder it is for a gate to drive an external load identical to itself relative to an inverter driving an external load identical to itself. Thus for a 2-input NAND gate, logical effort can be calculated by calculating the external delay component of the 2-input NAND driving an identically sized 2-input NAND; and then dividing it by the external delay component of an inverter driving an identically sized inverter. As with parasitic delay, the minimum possible value of logical effort is 1. Logical effort is 1 only for inverters, and a logical effort of 1 always indicates we are dealing with an inverter. To calculate g, we do not actually have to calculate external delays. As shown earlier, we simply calculate the ratio of gate capacitances of the gate and that of the inverter and multiply that ratio by the ratio of resistances. As with p, increasing size of the logic gate will increase its gate capacitance and reduce its resistance by the same ratio, canceling out the effect. Thus, we can calculate g for any size and it will be the same. Thus g, as with p, is a function of the gate and its construction, not its size. It is very important to realize that sizing applied to obtain p and g is not the optimal sizing. It is also critical to understand that this sizing in no way reflects the real size of the gate and does not limit the values p or g to any particular sizing. In fact, any sizing can be applied, including the final optimal sizing, and the same values of p and g will be obtained.

4.6

Logical Effort

Architecture, Inputs, and Effort

1. Realize g and p are only a function of architecture 2. Understand why a gate can have multiple possible values of p 3. Understand why only the arrangement with minimum p is considered “correct” 4. Understand that g can change depending on the input under consideration. The values of p and g represent information about how hard it is for a gate to drive itself and how hard it is for the gate to drive an external load respectively. They are normalized to the unit inverter and are properties of the gate and its architecture independent of sizing. All information about the effect of sizing is contained in the electrical effort h. However, a single logic gate might have multiple possible values for p and g. In particular, p is a very strong function of the architecture, or how you choose to arrange the transistors. g is dependent on which input is active in the chain we are considering. To illustrate this point we consider an asymmetric gate like Fʹ = AB + CDE in Fig. 4.5. As shown in the figure F can be implemented in two functionally equal ways depending on the placement of the AB block in the PUN. In Chap. 3, we already commented on this choice, stating that while they are functionally equivalent, the choice to put AB near the output node is more correct since it exposes the output node to less capacitance leading to lower intrinsic delay.

For the sake of convenience, it is best to size the gate for resistance equal to the unit inverter while calculating g. It is also worth pointing out that this sizing is only relevant while calculating g and does not represent the final sizing of the gate. To calculate g for the 3-input NAND in Fig. 4.3, we divide the gate capacitance of the relevant input (Sect. 4.6) by the gate capacitance of the unit inverter (always 3Co). The input capacitance of any input in Fig. 4.3 is 5Co (2Co from PMOS and 3Co from NMOS), yielding g = 5/3. For the logic gate in Fig. 4.4, the same method yields g¼

R0 Rgate Cg;current stage 6SC0 ¼ S : ¼2 Rinv Cg;inv R0 3C0

The same sensibility applies to the calculation of g as it does to the calculation of p. Any sizing we apply will cancel out in the ratios of resistances and capacitances. Thus, we are free to choose whichever sizing is most convenient for the gate. This is always the sizing that causes resistance to equal that of the unit inverter, since it allows us to ignore resistances in the calculation.

Fig. 4.5 Implementation of Fʹ = AB + CDE using two architectures with different p. Although both are functionally “correct”, the implementation on the right is “more correct” because its parasitic delay is lower

4.6 Architecture, Inputs, and Effort

This can be systematically confirmed because the two architectures in Fig. 4.5 have two different values for p. If AB is next to the output node p = 13/3. If CDE is next to the output node, then p = 17/3. This explicitly represents the fact that it is harder for the second architecture to drive itself than the first. Thus p is clearly a function of how you choose to arrange transistors. So which of the implementations in Fig. 4.5 is “correct”. The answer depends on what we consider “correct”. If correct behavior is defined by the truth table and proper static functionality, then both are correct. But taking delay into consideration, there is no clear argument to support the left-hand side implementation in Fig. 4.5. Note that the above argument is contingent on a major assumption we have made so far: that only the output node capacitance has an impact. In Chap. 3, we justified this by saying that the output node also experiences external loading, which increases its loading. However, when we consider parasitic delay, we are only considering intrinsic loading, so this argument does not work. In Sect. 13.3, we will introduce the Elmore time-constant, a tool that allows us to estimate the delay of some circuits with multiple capacitive nodes. We will observe that not all capacitances observe the same value of resistance when evaluating the time-constant. For any CMOS gate, the output node capacitance will observe the largest resistance because it is the furthest away from either ground or supply. Thus, the value of the output node capacitance appears magnified by a larger resistance than any of the internal node capacitances. Thus, in most back of the envelope calculations, including those involving parasitic delay, it is safe to ignore internal node capacitances. Calculating g requires us to know which input is active in the chain we are sizing (Sect. 4.7). This is because g involves adding up the gate capacitance of the logic gate. Some of the inputs may have different gate capacitances, so we have to know which input we are calculating the logical effort for. For example, in Fig. 4.5, it makes a difference whether A, B on the one hand or C, D, E on the other hand are involved in the chain, because each group has a different value of gate capacitance. For example, if A or B is active, then g = 6/3 = 2. If C, D, or E are active, then g = 7/3. Note that g is the same for the two architectures in Fig. 4.5 but differs depending on the input. On the other hand, p is dependent on the architecture, but is independent of the active input. In summary, a gate might have as many values of g as it has variables in its expression. It might also have several possible values of p, but usually only one represents the more intelligent design choice which minimizes the intrinsic delay.

153

4.7

Optimal Sizing in a Logic Chain

1. Write the equation for stage delay 2. Write chain delay in terms of parasitic delay, logical effort, and electrical effort 3. Define total stage effort 4. Define chain total effort, chain electrical effort, and chain logical effort 5. Derive the optimal value of stage total effort 6. Understand how to calculate optimal stage total effort 7. Obtain values for optimal electrical effort in a chain of random logic. Combining the information in the previous sections, we can express the delay of stage j in a chain of gates as   hj gj tpd;j ¼ tp0 pj þ c This is analogous to the delay equation in the inverter chain, p and g in the inverter chain reduce to 1 for all stages. We know the particular gates in the chain of interest. We know how they are connected. We also know the final capacitance that the chain is driving, as well as the size of the first gate in the chain. As with the inverter chain in Sect. 4.2, the optimization problem is: how do we size the gates to minimize the total chain delay. Because the problem is very similar to the inverter problem in Sect. 4.2, we will follow the same steps in solving it: • Write an expression of total delay • Partially differentiate with respect to gate capacitance of a certain stage and equate to null • Interpret the results in a meaningful way. The total delay in the chain is the summation of stage delays:  X hj gj tpd ¼ tp0 pj þ c j When we partially differentiate this with respect to the gate capacitance of stage j, only two terms are nontrivial. This is very similar to Sect. 4.2: @tpd 1 Cg;j þ 1 ¼ gj1  gj ¼0 2 cCg;j1 @Cg;j cCg;j It is critical to note that the differentiation treats g as a constant. While we might protest that g contains information about gate capacitance, this would be a misunderstanding of the meaning of g. g is a constant value for the gate (actually

154

4

for a particular logic input in the gate) and is independent of the actual value of gate capacitance considered under a specific sizing. That is to say, the gate capacitance used to calculate g is not the real gate capacitance of the logic gate, but rather an expression of its complexity. In fact any sizing for the gate would lead to the same value of g, thus g is not a function of gate sizing or of its gate capacitance. Similar to the inverter problem, we can restate the result in terms of stage fan-out: gj1 :

Cg;j Cg;j þ 1 ¼ gj : Cg;j1 Cg;j

We can also calculate chain electrical effort without knowing stage fan-outs (Sect. 4.2):

gj1 hj1 ¼ fj1 ¼ gj hj ¼ fj We also define chain quantities for g and f, namely G and F. These chain quantities mirror the relation between stage fan-out f and chain fan-out F in inverter chains. Thus, our list of definitions for stage quantities is hj ¼ stage electrical effort gj ¼ stage logical effort pj ¼ stage parasitic delay fj ¼ hj gj ¼ stage total effort We define chain quantities as Y H¼ hj ¼ chain electrical effort j



Y

gj ¼ chain logical effort

j



Y

fj ¼ chain total effort

j

The stage parasitic delays and stage logical efforts can be calculated for all stages according to Sects. 4.4 and 4.5. Thus, we can calculate chain logic effort G directly from the given logic chain.

CL Cg;0



where both the load of the chain, CL; and the input capacitance of the first stage are givens of the problem. We already concluded that for optimal delay, the total stage efforts have to all be equal. Notice this is different from Sect. 4.2, where optimization was the result of equal stage fan-outs. Thus to optimize fj ¼ fj1

gj1 hj1 ¼ gj hj The results of optimization are very similar to those of the inverter chain (Sect. 4.2), with the major difference of logic effort being involved. To interpret these results in a more useful way, we define a new variable f, called stage total effort where f = h.g. f is the total effort that a stage expends in driving its output. Total effort is divided between two targets. h is the electrical effort, representing effort expended to drive external load. g is logical effort, representing the penalty the gate faces in providing current relative to the inverter. Thus, for optimal sizing:

Logical Effort

But the chain total effort is F¼

Y

fj

j

For an optimal chain, the stage total efforts are equal, thus chain total effort is Y N F¼ fopt ¼ fopt j

But F can be obtained in terms of F and G; F¼

Y j

fj ¼ F ¼

Y j

hj gj ¼

Y j

hj

Y

gj ¼ HG

j

We have already shown that we can calculate H and G from the givens of the problem. Thus, we can calculate the optimal stage total effort: pffiffiffiffiffiffiffi fopt ¼ N HG Once we have a value for optimal stage effort, we can obtain the stage electrical effort, and thus the sizing of each stage given its logical effort: p ffiffiffiffiffiffiffi N fopt HG hj ¼ ¼ gj gj Interpreting these results and understanding them is best done through an example. Figure 4.6 shows a chain of logic where the input capacitance of the first gate is defined as Cin and the output load the chain is driving is also defined as 100Cin. The ratio between the output and input capacitance of the chain is 100. The chain contains a variety of gates. To optimize the sizing of the chain: • Find the logic effort (g) of all the gates • Find the chain logic effort G by multiplying the individual gʹs • Find the chain electrical effort by dividing CL by Cg,0

4.7 Optimal Sizing in a Logic Chain

155

Fig. 4.6 Chain of unsized logic, 2-input NAND, NOT, F = (A + BC)ʹ, NOT, 4-input NOR

• Find the chain total effort (F) as H  G • Find the optimal stage total effort by taking the Nth root • Find the stage electrical effort for each stage by dividing the optimal stage total effort by the stage logical effort For example in Fig. 4.6, chain electrical effort is readily calculated as H¼

CL 100Cin ¼ ¼ 100 Cg;0 Cin

The logical effort for the 2-input NAND, inverter, and 4-input NOR can be calculated as (Sect. 4.5): g2NAND ¼ 4=3 gNOT ¼ 1 g4NOR ¼ 3 For function F, we notice that input A is the input involved in the chain, thus calculating logical effort for this input: gF ¼ 5=3 The chain logical effort is calculated as the product of stage efforts: G ¼ g2NAND gNOT gF gNOT g4NOR ¼ 6:67 F ¼ HG ¼ 666:67 There are five stages in the chain, and the optimal stage total effort is ffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi p 5 fopt ¼ N F ¼ 666:67 ¼ 3:67 With optimal stage total effort known, we can find the electrical effort of every stage given its logical effort. The detailed results are shown in Table 4.1. Note the following about the results in the table: • Because we know the input capacitance of the first stage, we know its size. Knowing the stage electrical effort means we know the input capacitance of the second stage, which means we know its size, and so on. Thus finding all the stage electrical efforts means we can fully size the chain

• Multiplying all the stage electrical efforts by each other must yield the value of the chain electrical effort H (100 in this case). Once the stage electrical efforts are found, we can check that they satisfy this condition to confirm the solution holds up • We do not need to calculate parasitic delays p to be able to calculate the optimal sizes in the chain. Parasitic delay does not appear in any step of the optimization. Parasitic delay is significant only when we need to find the value of optimum delay, not the sizing that would lead to such value • All stages have the same total effort. However, for each stage this total effort is split between two components: overcoming its own complexity (logical effort), and driving the external load (electrical effort) • The higher either g or h for a stage, the lower the other must be so that total effort remains constant. This means that more complex gates (with higher g) must drive a smaller portion of the chain electrical effort. Simpler gates will drive a higher portion of chain electrical effort since their logical effort is smaller • Inverters will always be assigned the highest electrical effort in the chain due to their logic effort being the lowest possible • The highest electrical effort that any stage will handle is the optimal total effort This will be assigned to and only to inverters • The most complicated gate in a chain will drive the smallest portion of the chain electrical effort • For random logic (like gate F), we must be aware which input is active in the chain while calculating g

Why does p play no role in optimization? p characterizes self-loading, which is independent of sizing. Optimization is performed to find the optimal sizing. Since sizing has no impact on intrinsic delay, then the latter acts as a constant in the cost metric. Total intrinsic delay represents the floor delay for the entire chain. We can only optimize external delay, in which only g and h play a part.

156

4

Table 4.1 Results of optimizing the logic chain in Fig. 4.6. Each stage fan-out can be found as the quotient of the stage total effort and the stage logical effort

4.8

Stage

g

f

h = f/g

2 NAND

1.33

3.67

2.75

NOT

1

3.67

3.67

F = (A + BC)ʹ

1.67

3.67

2.20

NOT

1

3.67

3.67

4 NOR

3

3.67

1.22

Logical Effort

Logical Effort for Multiple Inputs

1. Understand that logical effort has to be calculated in a special way for circuits with multiple variable instances 2. Distinguish simple logical effort as the effort calculated through this chapter 3. Define total logical effort for a gate and distinguish from f, F, or G 4. Define bundle effort and its practical usefulness. The method used to obtain logic effort in Sect. 4.5 can be applied to gates where all inputs are independently accessible. In most practical gates, inputs can appear in more than one term. And in many gates, inputs and their complements appear. In such a case, the simple definition of logical effort has to be expanded. The prototypical gate for such a condition is the XOR gate. In the XOR (or XNOR) gate, the input and its complement must appear. Figure 4.7 shows a 2-input XOR gate. All transistors are sized for resistances like the unit inverter. As discussed in Sect. 4.5, this sizing is only for calculation of logical effort, and does not restrict the actual size of the gate. We have to introduce three different kinds of logical effort: • Simple effort. This is effort due to only a single input variable appearing only once at the gate of a single NMOS-PMOS pair. This is the effort calculated through all this chapter • Total logical effort. This is not F or f calculated in Sect. 4.7. It is not chain logical effort G, either. Instead, it calculates the logical effort of the gate if all of its inputs are being driven simultaneously in the chain • Bundle effort. This is an intermediate between simple and total effort. In this case only select inputs are used in the calculation. Normally, we would pick related inputs to produce a meaningful effort number.

Fig. 4.7 Two-input XOR gate for the purpose of calculating logical effort. As always, it is simplest to size like the unit inverter

The simple effort for input A in Fig. 4.7 can be calculated as (2 + 4)/(1 + 2) = 6/3 = 2. This is not a very useful measure for the XOR gate since there is no case where the input A is driven while input Aʹ is not driven. The total effort for the gate is (2  4 + 4  4)/ (1 + 2) = 24/3 = 6. This is again of little practical value since the only case where all inputs are activated from the same source would be a trivial case. The bundle effort is useful when used to calculate effort due to related variables, multiple variable instances, or due to a variable and its complement. For example the bundle effort A* calculates the effort due to all appearances of A and Aʹ. In Fig. 4.7, the effort for A* = (2  2 + 4  2)/(1 + 2) = 12/3 = 4. When the XOR gate is used in a chain and input A is active, this is the value of effort that should typically be used.

5

Dynamic Logic

5.1

High-Impedance Nodes

1. Understand the definition of a low-impedance node 2. Understand the definition of a high-impedance node 3. Contrast high and low-impedance nodes in terms of how they respond to changes in charge. Consider the nodes X and Y in Fig. 5.1. Node X is the type of electronic node we encountered in Chaps. 2 through 4. The node is connected to supply through a finite, usually small, resistance, marked as R in the figure. This type of node is called a low-impedance node because it is connected to supply through a low-impedance path. A node connected to ground through a low-impedance path is also called a low-impedance node. Now consider node Y. If the capacitor is initially charged to bring its voltage to Vdd, then the voltage of node Y will be Vdd. However, the node observes open circuits in all directions. Toward ground, node Y observes the steady-state open circuit of the capacitor dielectric. Toward the supply, the node observes an open circuit, usually created by an off switch. Node Y is called a high-impedance node because it lacks a low-impedance path to both ground and supply. If we assume the capacitor is precharged to Vdd, then nodes X and Y in Fig. 5.1 will have the same voltage Vdd. However, the nature of the voltage on the two types of nodes is very different, with unexpected behavior showing up on node Y. Figure 5.2 shows that node X represents the case of a CMOS inverter with input “0”. The PMOS is on and the NMOS is cutoff. The output node capacitance appears in series with the PMOS. In steady state, no DC current flows and the PMOS resistance shorts the output node to Vdd. Thus R in Figs. 5.1 and 5.2 is the channel resistance of the on and deeply ohmic PMOS. The capacitance in Fig. 5.2 is the parasitic at the output of the inverter. It consists of a self-loading component and an external component (Chap. 3). In steady state, the capacitor

holds enough charge to maintain the output voltage at Vdd. Specifically, the charge on the top plate of the capacitor is Q ¼ CVDD The capacitor can lose some of its plate charge. This can happen due to noise or coupling, or due to a set of systematic reasons that will be explored in Sects. 5.4 through 5.6. When the capacitor loses a charge of delta Q, this leads to a drop in plate voltage of: DVout ¼

DQ C

This is illustrated by the dip in the graph in Fig. 5.2. In Sect. 3.5, we discussed how the situation in this case cannot be a steady state. The PMOS no longer has zero drop across it, current starts to flow through the PMOS. The current will charge the capacitor, copying the process that happens during switching. As in Sect. 3.5, the charging cannot and will not stop until the voltage on the capacitor reaches Vdd and the current through and voltage across the MOSFET both drop to null. Any loss of charge, and thus voltage, on the output node X will be transient. It will be compensated and the output will be restored to its correct value as long as the input to the PMOS is maintained at “0”. The same applies to any low-impedance node, whether it is connected to ground or Vdd. All low-impedance nodes can restore any lost charges after a predictable delay. Node Y in Fig. 5.3 is a very different story. If the PMOS in Fig. 5.3 has a “0” input to the gate, it will be identical to Fig. 5.2. It will charge the capacitor up to Vdd in a set delay. If the gate of the PMOS now has “1” applied to it, the PMOS turns off. With the PMOS cutoff, node Y becomes a high-impedance node identical to the right sub-figure in Fig. 5.1. Node Y thus has no path to ground or supply through a finite resistance. Since there is enough charge on the capacitor to form a voltage

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_5

157

158

5

Dynamic Logic

Fig. 5.1 A low-impedance node X, and a high-impedance node Y. The “impedance” refers to the resistance on the path to either ground or supply

Fig. 5.3 Node Y is a low-impedance node when the input to the PMOS is “0”. It is a high-impedance node when the PMOS gate is at “1”. In high-impedance mode, the node cannot restore any lost charges

High-impedance nodes are sometimes called floating nodes because they appear floating among open circuits. In CMOS (Chap. 3), a low-impedance path had to exist to either supply or ground exclusively for each entry in the truth table. In ratioed logic (Chap. 2) a low-impedance path had to exist for all entries in the truth table to either or potentially both supply and ground. This chapter is our first experience with steady-state high-impedance nodes. This introduces opportunities and challenges we did not have to consider before.

Fig. 5.2 Node X is analogous to CMOS inverter with output high. If some of the charge on the capacitor is taken away, the PMOS will cause current to flow and will restore Vx to Vdd after a set delay

Vdd, we will observe Vy = Vdd. The charge has no path to escape from Y, and thus the voltage will be maintained. Assume, however, a source of interference or noise removes some of the charge on the capacitor connected to Y (Sects. 5.4, 5.5, 5.6). The lost charge cannot be recharged. This is because node Y has no finite impedance path to the supply rail. The loss of charge on Y will lead to a dip in voltage. Unlike node X, however, the dip in voltage is permanent. As seen in the graph in Fig. 5.3, Vy does not rebound from its noisy level, continuing instead at the same level until a path to Vdd is restored. High-impedance nodes like Y are much more sensitive to noise and interference than low-impedance nodes.

5.2 1. 2. 3. 4.

Dynamic CMOS and Why it is Great

Understand the architecture of dynamic CMOS Recognize the nature of the clock signal Understand the role of the precharge phase Realize that output of dynamic gates is valid only in the evaluate phase 5. Distinguish the nature of “0” and “1” in the evaluate phase of dynamic CMOS 6. Contrast static and dynamic CMOS 7. Realize how dynamic CMOS solves the issues of static CMOS.

5.2 Dynamic CMOS and Why it is Great

Figure 5.4 introduces a new logic family: Dynamic CMOS. This contrasts with CMOS from Chap. 3. When we use the word “CMOS” unqualified, we always mean traditional CMOS from Chap. 3, but sometimes we describe it as static CMOS to avoid confusion. Dynamic CMOS is both an independent logic family and a variation on static CMOS. In fact most circuits (Chaps. 6 and 12) come in two flavors: Dynamic and static. For example, Dynamic RAM and Static RAM (Chap. 12), Dynamic latch and Static latch (Chap. 6), and Dynamic logic and Static logic (this chapter and Chap. 3). All dynamic circuits share a few features that set them apart from static circuits. These features can be seen clearly in Fig. 5.4. First, dynamic circuits will contain a clock signal, whether or not the circuit is sequential. Second, dynamic circuits depend on a capacitor for their operation. So, while the capacitor is typically the same parasitic that causes delay, it is also a functional component necessary for proper operation. The way the capacitor provides functionality is by providing a sort of memory on a high-impedance node. We will illustrate this for logic circuits in this section, but it is also true for memories and latches. The architecture of dynamic CMOS is very clear from Fig. 5.4. It consists of the same PDN as in static CMOS, which is incidentally the same PDN for ratioed logic. The PDN is sandwiched between an NMOS and PMOS transistor with a clock on both their gates. It is critical to notice that whether or not it is drawn, there will be a capacitance at the output node of any dynamic circuit. This capacitance exists even if the gate is unloaded due to self-loading. Figure 5.5 shows the operation of the dynamic CMOS inverter. Because a clock signal is present in the circuit, we

Fig. 5.4 Dynamic CMOS implementation of inverter (left), and F = (AB + CD)′ (right). The architecture consists of the PDN sandwiched between two clocked MOSFETs Mn and Mp

159

have to consider how the circuit behaves in the two phases of the clock. Notice that the clock signal is not a logic input. It is a periodic square wave that carries no information beyond its frequency and phase. However, in its two phases, it reconfigures the circuit behavior fundamentally. When the clock is low, the transistor Mn is cutoff. The path to ground is completely blocked. Transistor Mp is on, and path to Vdd is open. The output capacitor thus has a low-impedance path to Vdd. It will charge and will reach Vdd in the steady state. Note that in the “0” phase of the clock, the output is shorted to Vdd through the PMOS regardless of logic inputs in the PDN. In other words, the output of the circuit in the “0” phase of the clock does not represent a true logic function. Notice also that the output node in the “0” phase of the clock is a low-impedance node. It is connected to supply through the low impedance of the ohmic PMOS. The “0” phase of the clock is called the precharge phase since the capacitor “precharges” to Vdd regardless of the logic inputs. When the clock is high, the transistor Mp turns off. All access to Vdd is cut off. However, this always happens following a precharge phase: clock is a periodic signal, so every high phase of the clock is preceded by a low phase. Thus the capacitor always enters the “1” phase of the clock with enough initial charge to have Vdd as its starting voltage. With the clock high, Mn is on, thus there is potentially a path to ground. Whether or not this potential path to ground exists depends on the PDN and its inputs. For example, for the inverter, if the input is high then the PDN transistor and Mn are both on and in series. Thus they provide a path to ground and the capacitor will discharge the charge it came into the high clock phase with. This causes the output to discharge to 0 V after a set delay. If the input to the inverter in the “1” phase of the clock is “0”, then the PDN of the inverter cuts off access to ground. In this case, the output node becomes a high-impedance node. It observes the high impedance of the cutoff PMOS on top and the open circuit of the PDN toward the bottom. Thus, it has no access to either ground or supply. Because the output node enters the “1” phase of the clock with an output capacitor charge causing an initial voltage of Vdd, then we observe a situation similar to Fig. 5.3 if the inverter input is “0”. The output node preserves a voltage of Vdd, theoretically indefinitely, albeit in a high-impedance state. Thus for the inverter in Fig. 5.5, during the “1” phase of the clock: when Vin = 0 V, Vout = Vdd. This high voltage is preserved in high impedance on the capacitor. When Vin = Vdd, Vout = 0 V. The output node is low-impedance in this case because it has a path to ground through the PDN and Mn. In the “1” phase of the clock, the circuit in Fig. 5.5 is an inverter.

160

5

Dynamic Logic

Fig. 5.5 Phases and operation of dynamic inverter. When the clock is low the output node capacitor precharges to Vdd. When the clock is high, the function of the PDN is evaluated

The same applies to the circuit F = (AB + CD)′. When the clock is “0”, the capacitor will charge up to Vdd. Then when the clock goes to “1”, the capacitor will preserve its charge if the inputs cause all paths to ground to be unavailable. While it will discharge to ground if there is a path to ground. A dynamic circuit has a low-impedance output node when precharging and when evaluating “0” at the output. It is only when trying to evaluate “1” at the output that we observe a high-impedance node. Thus it is this state that will mostly concern us in terms of signal integrity. If we examine this further, the path to ground exists when the inputs evaluate a “0” in the truth table, the path to ground is blocked if there is supposed to be a “1” in the truth table. Thus, the circuit is “evaluating” the logic function correctly in the “1” phase of the clock. We call this phase the “evaluate phase”.

Note the following about the operation of dynamic CMOS circuits: • The “0” phase of the clock is called the precharge phase, the “1” phase of the clock is called the evaluate phase. The clock is a periodic square wave. Every precharge must be followed by an evaluate, and every evaluate must be preceded by a precharge • There is no logical meaning to the output value in the precharge phase. The precharge phase is concerned only with putting a Vdd on the load capacitor for use in the evaluate phase • If the capacitor discharges in the evaluate phase, the only way there can be a “1” in the next evaluate phase is if we recharge the capacitor. Which is why every evaluate is followed by a precharge. Note no path to supply can exist in the evaluate phase • The number of transistors in the circuit is N + 2, which is much better than 2 N in CMOS. It is very close to circuit area in ratioed logic. This is one of the disadvantages of CMOS that we have addressed

5.2 Dynamic CMOS and Why it is Great

• More importantly, the number of transistor gates to which the input is connected is only 1, this is because there is no PUN in the circuit. This means that dynamic logic faces less than half the external loading of static CMOS, making it much faster • The output node in dynamic logic is loaded only by the PDN transistors and the precharge transistor Mp. This is less self-loading than CMOS where the output node is loaded by both the PDN and the PUN. This further improves the delay of dynamic circuits • Steady-State current is zero in all cases: precharge, evaluate “0” and evaluate “1”. Thus, the static power advantage of CMOS is not lost • Logic is rail-to-rail. Logic “1” is Vdd, logic “0” is ground and is independent of transistor sizes. Preserving another advantage of CMOS • The logic “0” in evaluate is obtained through a low-impedance path to ground through the PDN and Mn. This “0” is qualitatively and quantitatively the same as “0” in static CMOS • The logic “1” in evaluate is Vdd, however, it is a result of the capacitor preserving the charge obtained from the precharge phase due to high impedance observed in all directions. This is very different from “1” in static CMOS where the “1” is obtained through the low impedance of the deeply ohmic PUN. Thus, the nature of “1” is where we would obviously have to give special consideration. Section 5.1 hinted at the main difference between low-impedance and high-impedance nodes: robustness • To reiterate, the output node of a dynamic CMOS gate can be in three steady states: at Vdd during precharge, at Vdd during evaluate, or at 0 V during evaluate. In the precharge phase and while evaluating “0”, the output node has a low-impedance path to a rail. While evaluating “1”, the output node is high-impedance and this is where all the weirdness can happen. This is pointed out in Fig. 5.5. We have hinted so far that the main reason a dynamic logic gate is faster than its static counterpart is that it does not have a PUN. And this is true. We have shown that the lack of a PUN reduces both intrinsic and extrinsic delay by reducing the number of connected drains and gates, respectively. But there are two other less obvious, though equally important, reasons for this speed. These are namely the effect that dynamic gates have on short circuit current, and the time at which output nodes start to discharge. In Sect. 5.3, we will realize that in the evaluate phase we are only concerned with high to low delay where the PDN discharges the output node. During this operation, a dynamic gate reduces crowbar current with its associated power dissipation (Sect. 3.6) and extra delay (Sect. 10.7). In static CMOS both the PUN PMOS and the PDN NMOS are turned

161

on and off by logic signals. Logic signals can suffer from very sluggish slopes. Sluggish input slopes are the main culprit behind increased crowbar current flow. In the dynamic gates in Fig. 5.4, the PMOS transistor is turned on and off by a clock. In Sect. 13.7, we will see that a lot of effort and special design is dedicated to distributing the clock with favorable delay. This includes reducing its rise and fall times. Thus, the clock is much sharper than typical logic signals. This reduces the duration where both the NMOS and the PMOS are on. Which allows more PDN current to be dedicated to discharging and more PMOS current to be dedicated to precharging. Both delay and power are thus reduced by reducing the duration of flow of crowbar current. In a static CMOS gate, the PMOS and NMOS are controlled by the same input. In a dynamic gate, they are controlled by independent signals. This is the reason given for the reduction of crowbar current above. But it also affects when the gate actually starts to switch state. In the dynamic inverter in Fig. 5.5, the pull-down transistors start discharging the output node as soon as they are turned on. Thus, when the input to the logic NMOS is around 2Vth, it will start to discharge the output node. Even if the input never rises above this value, the output will eventually discharge all the way down to ground. This is because the PMOS will be turned off by an independent signal, namely the clock. And thus any value of input that is enough to turn on the pull-down path is enough to switch its output. This is contrary to the CMOS inverter. In Chap. 3, the VTC of the static inverter showed that its output will not switch from “1” to “0” as soon as the NMOS turns on. In fact, for a sizable range around Vm, the output is neither “0” nor “1”. Thus, if the input to the static inverter is Vth, the output will never reach 0 V. In fact, the output will be higher than Vdd/2. For the CMOS inverter to produce a “good” “0” at its output, the input has to close to Vih. This means that the dynamic inverter is more efficient than its static counterpart. After all, it starts discharging the output node unimpeded as soon as its PDN turns on. Meanwhile, the static inverter has a transitional region where the PUN and PDN are fighting each other and allowing a short circuit current to flow. Perhaps the above discussion could have been a lot easier if we had contrasted the VTC of the static inverter with the VTC of the dynamic inverter. After all, the entire discussion above is about static, steady-state behavior. But the VTC of a dynamic gate is not well-defined and can be misleading. If we try to envision the VTC of the dynamic inverter in the evaluate phase then it will look like a brick wall, identical to the ideal inverter. As long as the input is lower than 2Vth, the pull-down path remains closed and the output remains at Vdd. As soon as the pull-down path is turned on, its steady-state output will be 0 V. This is great, but it is also

162

5

very misleading. The VTC is a relation between one input and one output. Thus the dynamic gate VTC ignores the impact of the clock. Or in other words, it is assuming a perfect clock. All the advantages of dynamic logic are because in a static gate, the PDN and PUN are controlled by the same inputs, while in a dynamic gate the precharge transistor is controlled by an independent clock. But notice that in Sect. 5.4 through 5.6 we will cover fundamental problems with dynamic logic that we never had to consider in static CMOS. This will show us why static CMOS is still necessary: because it is robust and reliable. And this robustness is only possible with a structure where there is a PUN and PDN. Thus, the tradeoff between dynamic and static gates is intractable.

5.3

Delay, Period, and Duty Cycle

1. Recognize that delay in dynamic logic is directly related to the minimum cycle of the clock 2. Realize there is no low to high transition in evaluate 3. Calculate precharge delay 4. Calculate evaluate delay 5. Understand there is a significance not just to clock period, but also to how it is divided between the two phases. The cycle of the logic circuit in Fig. 5.5 is shown in Fig. 5.6. A typical cycle consists of Mp charging the load capacitor in the precharge phase, then the PDN potentially discharging it through Mn, then Mp precharging it again at the beginning of the new cycle to get ready for a new evaluate. Delay in dynamic CMOS thus involves calculating the minimum period that properly accommodates these two events. To calculate load capacitance for the dynamic circuit, we follow the same procedure followed for static CMOS. Assuming the circuit is not externally loaded, we size it for worst-case pull-down and pull-up resistance equal to the unit inverter. Note that we take the effect of Mn in series with the PDN into consideration while doing the sizing. This is critical to notice, since Mn imposes an additional resistance that would then reflect on sizing and thus intrinsic loading. The gate F = (AB + CD)′ in Fig. 5.4 would thus have all the NMOS transistors in the PDN sized at 3 and the PMOS transistor sized at 2. Total load capacitance at the output node is thus 8Co. In evaluate there is only one possible transition: High to low. The output value will either remain at Vdd or will go down to zero. A low to high transition is not possible. The evaluate phase starts with the output node at Vdd obtained from the precharge phase, thus there is no way for the output to begin at a low value. The worst-case pull-down (high to low) time-constant can readily be calculated for the circuit in Fig. 5.4 as Ro  8Co = 8RoCo.

Dynamic Logic

In precharge, the best case would be if no high to low transition occurs in the prior evaluate, in which case precharge time is zero. But we need to calculate for the worst case where the capacitor had discharged and needs to charge again through Mp to get ready for the next evaluate. The output node capacitance is still 8Co, and it charges through the resistance of the PMOS, which is Ro given the PMOS is sized at 2/1. The precharge time-constant is thus Rp  8Co = 8RoCo. The cycle must accommodate both precharge and evaluate. The total time-constant of the cycle is 16RoCo. This is the minimum period which must be given to the clock for proper operation. Thus, the maximum frequency at which the circuit can operate is 1/(0.69  16RoCo). Now consider what would happen if Mp is sized at 4/1 instead of 2/1. The PDN is still sized for resistance equal to the unit inverter. The output node capacitance will increase to 10Co. Resistance during evaluate will remain at Ro, and the evaluate time-constant will rise to 10RoCo. The precharge resistance will drop to Ro/2. The precharge time-constant will drop to 5RoCo. The total time-constant is thus 15RoCo, which is an improvement over the original time-constant. The above result shows that the clock period has improved. However, precharging is not generally seen as a critical operation. Thus, precharge transistors are usually small. But in all cases special attention has to be given to the duty cycle of the clock when precharge and pull-down resistance are not equal. If both resistances are equal the duty cycle of the clock can safely be assumed to be 50%. However, if the resistances are unequal for any reason, then just guaranteeing the frequency of operation is not enough, and an appropriate duty cycle must also be maintained.

Any evaluate must be followed by a precharge. The reason is that if we evaluate “0” in the current cycle, there is no way for the next cycle to evaluate “1” unless the capacitor recharges, which only happens if we precharge first. Another way to view this is that during evaluate, there is no path open to supply. And thus the lost charge in evaluate can only be replenished in precharge.

5.4

Leakage in Dynamic Logic

1. Understand the basics of leakage 2. Recognize that leakage has a great impact on high-impedance nodes 3. Derive an equation characterizing the effect of leakage in evaluate phase

5.4 Leakage in Dynamic Logic

163

Fig. 5.6 Dynamic CMOS cycle and delay. There are only two possible delay events: a discharged output node precharging, and a precharged output node evaluating to “0”

4. Understand the function of the charge keeper in compensating leakage 5. Realize the negative impact of the keeper while evaluating “0” 6. Deduce preference for the logic threshold of the feedback inverter in the charge keeper. In Chap. 3, our main concern with the MOSFET as a switch was with its on-resistance. The on-resistance of the transistor is a major determinant of delay. We discussed in detail how this on-resistance can be reduced by sizing up the transistor, but why it can never be reduced to zero. We have so far assumed the resistance of the MOSFET in cutoff was infinite. This corresponds to the MOSFET being a perfect open circuit when acting as an off switch. The current flowing through a cutoff transistor is thus zero regardless of the applied drain to source voltage. This assumption is safe

as long as the node to which the cutoff transistor is connected is a low-impedance node. In modern technologies, a significant current flows through a cutoff transistor. This translates into a noninfinite off resistance for the MOSFET. The total off current of a transistor is called the leakage current. It is usually many orders of magnitude lower than the on current, but its impact on power and high-impedance nodes is significant. There are many sources of leakage. These are discussed in detail (as well as methods to mitigate them) in Sect. 10.3. Very briefly, there are three major sources of leakage current in a MOSFET: • Current that flows from drain to source due to the formation of a weak inversion layer. This source of leakage is called subthreshold current and can be significant if not addressed

164

• Gate current due to tunneling through the oxide • Drain to body and source to body reverse saturation currents through the reverse-biased PN junctions. For dynamic logic, we are concerned by leakage current flowing through the drain of NMOS transistors. There is leakage through the precharge transistor, but because such a transistor is small, its leakage is also small and negligible. If we ignore leakage, we represent the cutoff MOSFET as an open circuit. With leakage taken into consideration, we model the cutoff transistor as a current source, with the value of current equal to the leakage current. Figure 5.7 highlights the impact of leakage current on a dynamic gate. In the precharge phase, the output node is connected to Vdd through a low impedance, that of the PMOS. Thus whether or not there is leakage current through the NMOS, that current is more than offset by the on PMOS transistor. Likewise, in evaluate while the output is 0 V, there is a low-impedance path to ground through the NMOS transistors, and again leakage has no effect. It is while outputting logic “1” during evaluate that the impact of leakage is clear. This is the last evaluate phase in

5

Dynamic Logic

the graph in Fig. 5.7. In this case, the capacitor is holding onto the charge it obtained from the precharge phase to present a voltage of Vdd. Proper operation presupposes perfect open circuits in all directions. If the capacitor; however, observes leakage into the PDN, then a current is drawn from the capacitor. This leakage current discharges the capacitor gradually, leading to a gradual degradation of the value of “1” in the evaluate phase. Figure 5.7 shows a linear degradation of logic “1”, which would happen if the leakage current is constant. KCL at the output node while evaluating “1”: Ileakage ¼ C:

dVout dt

If the leakage current is constant, the differentiation turns into a linear slope: Ileakage ¼ C:

DVout Dt

Thus, the amount of output voltage degradation observed in a certain time is

Fig. 5.7 Impact of leakage current on a dynamic logic gate. Leakage has no visible effect in precharge or while evaluating “0”. While evaluating “1”, the high-impedance node fails to hold on to its charge due to the leakage

5.4 Leakage in Dynamic Logic

165

DVout ¼

Ileakage Dt C

The more time that passes, the more signal degradation we observe. Also, the higher the leakage current, the faster the signal degrades. But note that C also appears in the slope of the V-t graph. And it appears in the denominator. Thus, the higher the load capacitance, the slower the signal degrades. This is the first time we observe a higher capacitance value playing a positive role. A larger capacitance holds onto more charge to produce the same value of voltage. Since C = Q/V, then for the same V Q1 > Q2 if C1 > C2. Thus, a higher capacitance will hold onto proportionately more charge to present Vdd at output. If leakage current is the same in both cases, then leakage would take a longer time to degrade the voltage by the same ratio on the larger capacitance. But recall, that this higher charge must have come from some source, that is to say it had to be supplied during precharge, which means that it would take longer to precharge (and evaluate) the capacitor. This is quite obvious due to the impact of C on precharge and evaluate time-constants. Restating the results, we obtain an interesting new interpretation of the impact of leakage: Dt ¼

CDVout Ileakage

Which means that for a certain acceptable degradation in output voltage (delta V), there is a maximum corresponding time that we can spend in the evaluate phase. If a shorter time than obtained above is spent in the evaluate phase, then the output value would still be acceptable as logic “1”. If we exceed this evaluate time, the output will fall below the acceptable level. This maximum acceptable delta V is usually defined by the lowest voltage we would still accept as a logic “1”. For example, if we can only consider 0.9Vdd to be “1” and no lower, then delta V is 0.1Vdd. Note that the lower bound for acceptable “1” should not be the lower bound for “1” used to calculate the noise margins. It should be a much higher value that would still leave an acceptable range for “1”. The maximum time corresponding to the maximum delta V, when combined with the precharge time forms a clock period:

T\Dt þ tpre This corresponds to a minimum frequency. The clock period also has to be higher than values set by delay (Sect. 5.3). This puts both a lower and an upper bound on the clock period: 0:69Rn CL þ 0:69RMp CL \T\

CDVout þ 0:69RMp CL Ileakage

where Rn ¼ worst case pull down resistance Leakage also flowed into the PDN of a static CMOS circuit while the PDN was off. However, the output node of a static circuit is never in high impedance. While the PDN is off, the PUN is on, providing much higher current than leakage and canceling its effects with little impact on delay. Although it can still be a major source of power dissipation. Leakage is the first of many signal integrity issues that affect dynamic circuits in a more significant way than static circuits. Despite the fact that dynamic circuits address all the issues of static CMOS while preserving its advantages, we still use static CMOS for many purposes due to its high robustness. Addressing the impact of leakage can be done by using a circuit called a charge keeper. The charge keeper is shown in Fig. 5.8. The charge keeper has a very simple functionality. However, the circuit in Fig. 5.8 contains a feedback loop. Whenever we see feedback in a circuit, we should start looking for complications. The aim of the charge keeper is to preserve high output at Vdd in the face of leakage current. The reason leakage has an impact is that the output node in dynamic CMOS is a

T ¼ Dt þ tpre where tpre ¼ 0:69RMp CL The clock period has to be shorter than the value obtained above, otherwise the high output value would degrade beyond our preset delta V, thus:

Fig. 5.8 Charge keeper circuit. Mk is the charge keeper. The small static inverter is used to manipulate its gate enable

166

5

Dynamic Logic

Fig. 5.9 Charge keeper evaluating “0”. Because the charge keeper counteracts the PDN, this increases the time-constant. However, this only continues till the static inverter switches output

high-impedance node with no direct access to supply. To overcome this, we must provide a path to supply. The path to supply is provided by the charge keeper PMOS Mk. However, this transistor must only be on when the output is high (remember, when output is low, the node is low impedance). Thus, the gate of Mk is connected to the inversion of the output. When the output is high, Mk is on and providing a path to supply. The inverter at the gate of Mk must be a static inverter, otherwise it would, itself, suffer from leakage. Leakage is an important effect in modern technologies, affecting not only the performance of dynamic logic and memories, but also power dissipation. Several questions are immediately obvious about the charge keeper: • Does the inclusion of the static inverter mean that the dynamic inverter is now larger than the static inverter? Yes, but note that the overhead of the charge keeper is constant and becomes relatively small if the dynamic gate

itself performs a more complex function. The number of transistors in a dynamic gate including the charge keeper is (N + 2) + 1+2 = N+5 transistors, the number of transistors in a static gate is 2 N. The dynamic gate is smaller for any value of N larger than: N + 5 = 2 N, N = 5 • Should Mk be small or large? It should be sized as small as possible. A unit PMOS is preferable. Mk has to compensate a very small current (leakage), the smallest transistor can do this effortlessly as long as it is on • What should the logic threshold Vm of the static inverter in the feedback be? For most inverters we assume logic threshold is at Vdd/2 owing to the assumption that noise is symmetric. However, in this case an asymmetric Vm is very beneficial. Note that the charge keeper fixed the case where output is Vdd in evaluate. However, we have not considered its effect on the case where evaluated output is supposed to be “0”. Figure 5.9 shows the case where the dynamic gate is trying to evaluate “0”. The PDN in Fig. 5.9 is trying to discharge the output capacitor’s precharge. Normally, the PDN would be doing

5.4 Leakage in Dynamic Logic

this unimpeded by any pull-up current. With the charge keeper, we enter evaluate with output at Vdd, through the static feedback inverter, this turns Mk on. Thus Mk will source a current that will charge the capacitor. At the beginning of evaluate, the net current discharging the capacitor is not the PDN current, but the balance of PDN current and keeper current: IC ¼ IPDN  IMk It is reasonable to assume there is still net current discharging the capacitor. Mk was sized very small, thus the PDN probably supplies much more current than Mk. However, the charge keeper still eats a portion of the current that would have discharged the output capacitor. This effect is reminiscent of crowbar current in static CMOS. Thus, the charge keeper plays a very harmful role in evaluating “1”. By decreasing the available current to discharge the capacitor, it increases delay. Note also that this additional current would lead to a net current component flowing from supply to ground, which constitutes a form of transient power dissipation. Again this is very reminiscent of crowbar current, which was one of the disadvantages of static CMOS we were supposed to have overcome. This informs our choice of Vm for the static inverter. Vm should, in fact, be much closer to Vdd than it is to Vdd/2. As shown in the graph in Fig. 5.9, when the output reaches the logic threshold of the static feedback inverter, the output of the inverter switches. This would lead to a switch of voltage at the gate of Mk from “0” to “1”. This causes Mk to cutoff and stop counteracting the efforts of the PDN to discharge the capacitor. Thus, it is more useful to keep the logic threshold of the feedback inverter high to allow the keeper to turn off as soon as the PDN starts to pull down the output node.

167

One secondary effect we have consistently ignored for static CMOS is the effect of parasitic capacitance in internal nodes. We had two good reasons to do this while calculating delay: • Output node capacitance is generally larger due to the presence of loading from the next stage • Output node capacitance appears significantly magnified in the overall time-constant relative to internal node capacitance (Sect. 13.3). However, in dynamic gates internal node capacitance can lead to a significant signal integrity issue while evaluating “1” at the output. To understand this issue, we have to understand the concept of charge conservation. Consider the system consisting of three capacitors in Fig. 5.10. Before the switches close, each capacitor is at an independent voltage, carrying a different amount of charge. After the switches close, the charges on the upper plates of the capacitors have no path to ground, and thus can only be redistributed among the capacitors. Thus, the total amount of charge among the capacitors before and after the switch closes must remain the same. This is the concept of charge conservation. Once the switches close in Fig. 5.10, the capacitors become parallel. Thus the voltage on their upper plates is the same, namely, Vf. The values of the capacitances are generally different, but they observe the same voltage, it follows that each must be carrying a different amount of charge. If we know the initial voltages (and thus charges) on the capacitors, we can find the final value of voltage once the capacitors become parallel, because we know the total charge must remain the same. Thus charge conservation states that Qopen switch ¼ Qclosed switch The charge before the switches close can be calculated by adding each capacitor charge:

5.5

Charge Sharing

1. Understand the concept of charge sharing in a closed system of capacitors 2. Understand why charge sharing applies to dynamic gates while evaluating “1” 3. Distinguish evaluate “0” and evaluate “1” regarding charge sharing 4. Recognize the role of prior cycles in affecting charge sharing during current cycle 5. Solve charge sharing and understand the effect of single clock design.

Qopen switch ¼ C1 V1 þ C2 V2 þ C3 V3 After the switches close, the capacitors appear in parallel with the same voltage Vf, thus: Qclosed switch ¼ Vf ðC1 þ C2 þ C3 Þ By equating the two charge quantities, we can find the voltage Vf in terms of the initial voltages: Vf ¼ ðC1 V1 þ C2 V2 þ C3 V3 Þ=ðC1 þ C2 þ C3 Þ We can generalize this result for any number of capacitors:

168

5

Dynamic Logic

Fig. 5.10 Charge conservation in a closed system of capacitors. The total charge in the system on the left is equal to the total charge in the system on the right even though voltages on the capacitors differ

X i

C i Vi ¼ Vf

X

Ci

i

P Ci Vi Vf ¼ Pi i Ci How this relates to dynamic gates is immediately clear once we consider a gate with logic “1” output. Consider Fig. 5.11 for example. Now assume this 4-input NAND gate is outputting “1” during the evaluate phase. This “1” is a high-impedance Vdd on CL obtained by precharging CL to Vdd during the precharge phase, then offering no paths to either ground or supply in evaluate. However, during evaluate, CL may observe C1, C2, and C3 in parallel with itself. This depends on what input combination caused the “1” to be evaluated at the output. Depending on the initial voltages of C1, C2, and C3; this may cause serious charge sharing to affect the voltage that appears on CL, which is the output voltage. By applying the charge conservation principle to CL, C1, C2, and C3: C1 V1 þ C2 V2 þ C3 V3 þ CL VDD ¼ C1 Vf þ C2 Vf þ C3 Vf þ CL Vf Vf ¼

C1 V1 þ C2 V2 þ C3 V3 þ CL VDD C1 þ C2 þ C3 þ CL

This above calculation potentially exposes us to the worst case, because it allows the largest number of internal nodes to appear in parallel with the output capacitor. This situation occurs when ABCD = “1110”. Note that in this case the NMOS in the PDN act like the switch network in Fig. 5.10. We can make the following observations: • An input of “1111” will not lead to any charge sharing in evaluate because “1111” will evaluate to “0”. In dynamic circuits this is achieved by discharging CL to ground through the PDN, which means that the output node is low impedance while evaluating “0”. Charge sharing affects high-impedance nodes only

Fig. 5.11 Dynamic 4-input NAND suffering from charge sharing. The system of capacitors includes the internal nodes and the output node capacitor

• Input ABCD = “110X” will lead to a less severe charge sharing scenario where only two internal node capacitors, C1 and C2, appear in the equation. Note that since C is “0”, the value of D is not important, because the path from ground to the output node is already blocked by MC • Input ABCD = “10XX” will lead to a mild case of charge sharing where only one internal capacitor, C1, appears in

5.5 Charge Sharing

169

the equation. Since B is “0”, it closes the path to anything below it, thus all inputs below it are don’t cares • Input ABCD = “0XXX” will not lead to any charge sharing, because transistor A will be cutoff, cutting access to all internal nodes. Thus all other inputs can be whatever combination of values; the output capacitor CL will still be isolated and suffer no charge sharing With input ABCD = “1110”, the maximum number of capacitors appear in parallel with the output node. However, this does not mean that we will see an impact on the output value Vf. Whether or not this happens depends on the initial voltage on the internal capacitors before entering evaluate. The worst case occurs if all internal capacitors enter evaluate with zero charge: V1 ¼ 0;

V2 ¼ 0;

V3 ¼ 0

This causes Vf, which ideally should be Vdd, to fall to a fraction of Vdd: CL VDD Vf ¼ C1 þ C2 þ C3 þ CL The fraction indicates we will observe a degraded “1” at the output. However, since the internal node capacitances are normally smaller than the output node capacitance, the fraction is usually closer to 1 than it is to 0. At the other extreme, we might see the three capacitors appear in parallel with the output without having any impact on it. This happens if the internal nodes all enter evaluate with an initial voltage at Vdd: V1 ¼ VDD ; Vf ¼

V2 ¼ VDD ;

V3 ¼ VDD

C1 þ C2 þ C3 þ CL VDD ¼ VDD C1 þ C2 þ C3 þ CL

Thus, whether or not we suffer due to charge sharing depends on two factors:

3. What input would guarantee no charge sharing regardless of prior cycles. 4. If the worst-case input occurs, what input in the preceding precharge will guarantee charge sharing will have no effect? Question 1 has already been answered. The input that potentially leads to worst-case charge sharing is ABCD = “1110”. Generally, the input combination that opens a path to the maximum number of internal nodes without creating a path between the output node and ground, is the input that causes the worst-case charge sharing. Question 2 asks what inputs in the prior evaluate and precharge will cause the internal node capacitors to enter evaluate with zero initial voltage, leading to the worst-case output. Note that for the internal nodes to be at zero voltage, they must have discharged. The only phase where these internal nodes could possibly have had a path to ground is evaluate, thus we have to consider what happened to the internal nodes in the prior evaluate. In the prior evaluate, we want the three internal capacitors to discharge to ground. Thus we need the input to be ABCD = “X111”. This input opens transistors B, C, and D and allows their respective capacitors to discharge to ground. A is don’t care because whether or not it is “1” affects only the output capacitor. If we discharge the output capacitor it will charge up again in the following precharge. So it does not matter whether we discharge it or not, and it does not matter whether A is “1” or “0”. Immediately after this evaluate we enter precharge. During this precharge if A is “0”, then the internal nodes will have no path to Vdd. Thus they will all remain at zero potential. Note that if A is “0”, it does not matter what the other inputs are. So the sequence that will cause worst case in evaluate 2 is Evaluate 1 ! Precharge 2 ! Evaluate 2 0

• How many capacitors appear parallel with the output node. This is determined by the input during evaluate • What initial voltages the internal capacitors enter evaluate with. This is dictated by inputs in the prior precharge and evaluate Let us consider the 4-input NAND in Fig. 5.11 again, and try to answer the following questions: 1. What is the input combination that would potentially cause the worst-case charge sharing in evaluate. 2. If the input above happens, what inputs in the prior evaluate and precharge will ensure that the worst case does occur, and what is the output in this case?

X1110 !

0

0XXX0 !

0

11100

The reasoning behind these inputs is as follows: Evaluate 1 : Discharge internal node capacitors to ground. Precharge 2 : Prevent internal nodes from charging back to Vdd. Evaluate 2 : Ensure the maximum number of discharged internal nodes appear parallel to output node Question 3 asks what input in evaluate 2 would guarantee no charge sharing regardless of internal capacitor status due to prior cycle. The answer is “0XXX”. This input guarantees no internal node capacitors appear parallel to the output because transistor A is cutoff and prevents all access between the output node and internal nodes. The “1” output in this

170

case will be a clean Vdd as far as charge sharing is concerned. The amount of charge on internal node capacitances is irrelevant. Question 4 asks if input is “1110” during evaluate 2, what could happen in precharge 2 to lead to no charge sharing. In other words, what could happen in precharge 2 to guarantee all internal nodes are precharged to Vdd, thus leading to output Vdd despite all the internal capacitors appearing in parallel with the output node? The input in precharge 2 that does this is “111X”. This input means that all internal nodes will have access to Vdd through Mp. Input D is do not care, since its only effect will be to connect the capacitor at the source of D to Vdd. Since this capacitor cannot appear parallel to output in charge sharing, it does not matter whether or not it is precharged. The reason this capacitor cannot appear in charge sharing is because if it is to appear, the input must be “1111” in evaluate and output will evaluate to “0” which is a low-impedance state and suffers from no charge sharing. Now consider the circuit F′ = ABCD + EG + H shown in Fig. 5.12. Assume the output node has a capacitance of CL = 10Co. Assume also that each internal node has a capacitance of Co, regardless of the number of connected transistors. Let us try to answer the same questions 1, 2, 3, and 4 above. Again we are going to consider two complete phases, thus there is a precharge 1, evaluate 1, precharge 2, and evaluate 2. We are considering the output in evaluate 2. (1) The input that could cause worst-case charge sharing in the second evaluate phase is ABCDEFG = “1110100”. This is the input that leads to the maximum number of

Fig. 5.12 Dynamic F ′ = ABCD + EG + H suffering from charge sharing

5

Dynamic Logic

internal nodes appearing parallel to the output. Specifically 4 internal nodes can appear in parallel to output while still evaluating “1”. These nodes are namely the sources of MA, MB, MC, and ME. The zeros in the input are necessary to guarantee there is no path to ground, which would mean the output is “0” and there is no charge sharing. The worst-case charge sharing output is Vf ¼

Cout VDD 10 ¼ :VDD 4C0 þ Cout 14

(2) The output above is observed in evaluate 2 if evaluate 1 provides a path to ground for all internal nodes involved in charge sharing and precharge 2 does not recharge these internal nodes. This happens if evaluate 1 is “X111X1X”. Note the don’t cares, which only concern the output node having access to ground. The output node will recharge in precharge 2 anyway, so it does not matter whether it discharges in evaluate 1. In precharge 2, we must prevent the internal nodes from precharging, this can be guaranteed by ensuring all inputs next to output node are off, the remaining inputs can be don’t care. Thus input in precharge 2 is “0XXX0XX”. Note input G is an exception here. It is an input next to the output node, but it can be don’t care, because it only provides access to a node that will not be involved in charge sharing anyway. (3) The input that would guarantee no charge sharing regardless of prior cycles is the input pattern that ensures that during evaluate 2, the output node is isolated from all internal nodes. Any input where the

5.5 Charge Sharing

171

inputs nearest output are off will guarantee this, i.e., “0XXX0X0”. Note that in this case G must be “0” otherwise there would be a path to ground and we would be evaluating “0”. (4) If input during evaluate 2 is “1110100”, then any input in precharge 2 that precharges internal nodes to Vdd would guarantee that we do not observe the effects of charge sharing. This input is “111X1XX” where the don’t cares would be responsible only for precharging the bottom node which cannot be involved in charge sharing.

dynamic gates must use the same clock, then in precharge all gate outputs are at “1”. Thus a path automatically exists to precharge all internal nodes during the precharge phase which might mean that we will never suffer from charge sharing. However, as we will see in the next section, output during precharge must actually be reversed if we are to be able to cascade dynamic gates.

Charge sharing is a serious problem for dynamic gates. One approach to manage it would be to precharge important (high loading) internal nodes as we precharge the output node. This is shown in Fig. 5.13. This guarantees internal node capacitances always have an initial voltage Vdd during evaluate, leading to no effect from charge sharing. If we precharge all internal nodes we tend toward a transistor count similar to static CMOS, losing one of the major advantages of dynamic CMOS. However, it is not necessary to charge all nodes, just critical ones with high capacitance. All inputs to any dynamic gate come from a dynamic gate that preceded it. What this means is that during precharge all inputs to all dynamic gates are “1”. This is because all inputs are the outputs of some other dynamic gate. Since all

1. Understand how delay in evaluate phase can lead to irretrievable data loss in subsequent stages 2. Realize why cascading was not an issue in static CMOS 3. Trace how domino logic solves issues in cascading 4. Understand the main drawbacks of domino logic 5. Understand how NP logic solves cascading issues 6. Recognize when the NMOS tail transistor can be removed.

Fig. 5.13 Precharging internal nodes

5.6

Cascading Dynamic Logic

Because all logic inputs to a logic gate must come from a logic gate of the same family, inputs to a dynamic gate must also come from a dynamic gate. This can lead to a very serious issue where delay in evaluating zero in a gate might lead to irretrievable loss of charge in the following stage. Consider the two cascaded inverters in Fig. 5.14. The two inverters should behave as a buffer passing the input to out2 unchanged. This will only be valid in evaluate, in precharge both out1 and out2 will be at Vdd regardless of input. The fact that all stages are precharged by the same clock will lead to possible loss of charge in the second stage. To understand why this happens consider the case in Fig. 5.14 in the first evaluate phase where the input is Vdd or “1”. Out1 should evaluate to “0” and out2 should evaluate to “1” (Vdd). In precharge both out1 and out2 will charge to Vdd. In evaluate the input of “1” will cause Mn1 to be on and the PDN of the first gate to be active, discharging the capacitor at out1, eventually reaching 0 V. However, the problem here is that the discharge operation at out1 is not instantaneous. In fact, we have already calculated the delay for this discharge operation as a high to low propagation delay. Thus out1 does not drop to zero at exactly the clock edge, it drops at a slope determined by the time-constant, eventually reaching 0 V. For a significant time, Mn2 will not see out1 as 0 V but will observe a high enough voltage to turn it on. Until out1 reaches roughly 2Vthn, Mn2 will remain on. Since both NMOS transistors in the second inverter are on for some time during evaluate, there will exist a path to ground, and the capacitor at out2 will discharge part of its charge. The discharge process does not continue for too long as shown in Fig. 5.14, and the current in Mn2 will also be

172

5

Dynamic Logic

Fig. 5.14 Two dynamic inverters in cascade

relatively small. But there is enough flow in Mn2 that the loss of charge in out2 will cause an unbearable deterioration in the value of “1” at the output. This leads to significant loss in the noise margin for high output. This phenomenon also happened in static CMOS. However, we never considered it because it is of no consequence in static circuits. In static CMOS there can always be a path to Vdd if the input is “0”. Thus eventually in the static inverter the PMOS of the second inverter will turn on as its input drops, and it will restore any loss of charge at the output node. In the dynamic gate, there can be no access to Vdd during the evaluate phase, thus loss of charge is irreversible until the next precharge. Notice from Fig. 5.14 that the cascading problem does not occur when the input to the inverter pair is “0”. This is the case in the second evaluate phase in the graph in the figure. In which case out1 remains at high-impedance Vdd, causing the second inverter to properly discharge out2 for an overall steady-state buffer output of 0 V. One of the simplest solutions to this problem is a design style called domino logic. In this style, the basic building block is not the dynamic gate alone, but the dynamic gate followed by a static inverter.

All three phenomena affecting the performance of dynamic logic are also present in static CMOS circuits. However, they are much harder to observe because low impedance connections to either ground or Vdd are always present in static circuits. Domino logic solves the cascading problem because now the actual output of each stage during precharge is at ground. This can then either go up or stay down. In the first case, we will discharge the capacitor when the output goes high enough to turn on the NMOS of the following inverter. In the second case where the output remains at 0 V, the NMOS of the following inverter will start evaluate cutoff, and will remain cutoff. Figure 5.15 illustrates the above solution through an example. When input is “1”, the output of the first dynamic inverter discharges during evaluate. However, out1 is not the output of the dynamic inverter, it is the output of the following static inverter. Thus out1 during the first evaluate appears to charge from 0 V to Vdd. Thus the second dynamic inverter will observe a “0” for some time and then will realize that it is a “1”, at

5.6 Cascading Dynamic Logic

173

Fig. 5.15 Domino logic solving cascading

which point it will start to discharge. So we observe out2 remaining at 0 V until out1 rises to around 2Vth, at which point out2 also starts to rise. Thus, the correct value is evaluated albeit after some additional delay. The second evaluate phase in Fig. 5.11 shows the case where input is “0”. It is important to ensure that while solving the cascading problem for input “1”, we do not introduce a new problem with input “0”. With input “0”, the first dynamic circuit will not discharge its output. Thus its output will remain a high-impedance Vdd. Out1 is 0 V during precharge, and in the second evaluate it will remain at 0 V. Thus, the second dynamic gate observes a “0” input. Its output will remain at Vdd in the second evaluate. Out2 will also remain at 0 V. The buffer functions properly, with a Vdd input appearing as Vdd at out2 and a 0 V input appearing as 0 V at out2. The price paid for domino logic is the additional area of the static inverters. However, this overhead is diminishing if the dynamic gates themselves are large. Also, logic is now non-inverting by definition for the domino gate: you can implement an AND but not a NAND. There is also an additional delay due to the static inverter. Figure 5.16 shows an alternative method to construct dynamic gates. In this case, the PUN is used as the body of the gate instead of the PDN. The output is taken on the lower node instead of the upper node. The NMOS Mn plays the role of the precharge transistor Mp in Sect. 5.2.

Fig. 5.16 PMOS-based dynamic logic. The gate consists of a PUN sandwiched between clocked transistors

When the clock is high, the output node “predischarges” to 0 V. In PMOS-based dynamic gates it is the 0 V instead of the Vdd that is impermanent, and it is the 0 V output that is high-impedance and prone to signal integrity issues.

174

When the clock is low, the output node is evaluated. Mp is on, and if there is a path through the PUN, then the output will be connected to Vdd through a low-impedance path. If there is no path through the PUN, the output will remain at 0 V but will be in high-impedance due to the presence of open circuits in all directions. The PMOS-based dynamic gate is the complement of the traditional dynamic gate in every way. The noise-prone high-impedance outputs are reversed. The role of nodes is reversed, and transistors play the complementary role. Because the PUN is used instead of the PDN, this class of gates is inferior to NMOS-based dynamic gates. Due to the lower mobility of holes, the PMOS transistors in the PUN generally need to be made larger. This increases the time-constant of PMOS-based dynamic gates relative to their NMOS-based counterparts. However, using N and P dynamic gates interchangeably can solve the cascading problem without the need for intervening static inverters. This is shown in Fig. 5.17. Each NMOS gate must be followed by a PMOS gate and vice versa. In the “precharge” phase all N gate outputs will be Vdd, while all P gate outputs will be 0 V. In evaluate, the outputs of N gates will either remain at Vdd or will discharge toward 0 V. All P gate outputs will either charge to Vdd or remain at 0 V. If only N or P gates were used in the chain, then the cascading problem would still be observed. But alternating stages as in Fig. 5.17 solves it. The inputs of all N gates in Fig. 5.17 will observe inputs that start at 0 V and either remain there or charge up to Vdd. As shown in Fig. 5.15 this means that the NMOS dynamic gates will never observe any irretrievable loss of charge. The P gates will also be safe since they observe inputs that start at Vdd and

5

Dynamic Logic

either remain there or discharge to 0 V. Since the PUN in the P stages consists of PMOS, the initial Vdd during evaluate will not allow them to unintentionally lose charges. The method used in Fig. 5.17 might seem superior to that in Fig. 5.15. We do not use static inverters, thus avoiding the extra area and the extra delay. However, it is important to recall that the PMOS in the PUNs are sized larger than a corresponding PDN. Thus both area and delay suffer due to the use of P gates. Whether domino logic or NP logic is better depends on the application. In Chap. 11, we will show that where it makes sense to use dynamic logic, it often makes sense to also use domino logic. There is a hidden advantage for domino logic that we have not yet discussed. In Fig. 5.4, what is the role of Mn? This transistor, alternatively called the NMOS tail transistor or the pull-down clock transistor has only one purpose: to stop precharge current from reaching ground. In other words, it creates an open circuit in the pull-down network to allow the precharge operation to happen properly. But the above is necessary only if we are in the dark about the values of logic inputs during the precharge phase or if we know such values will enable the PDN/PUN. In fact, this is not the case in domino logic. By observing Fig. 5.15, we notice that the input to any domino stage is the output of a previous domino stage. During precharge, all domino stages have outputs that go to 0 V as the output of the dynamic gate charges to Vdd. Thus, all domino stages observe “0” inputs during the precharge phase. Thus, we can rest assured that the PDN will not create a path to ground during precharge, and we can readily remove all the NMOS tail transistors. Notice that these transistors play no role during the evaluate phase. In fact, they only increase the discharge delay of the gate.

Fig. 5.17 NP dynamic logic. Each NMOS stage must be followed by a PMOS stage and vice versa. This removes the cascading problem by ensuring each stage observes a safe starting point in evaluate

5.6 Cascading Dynamic Logic

175

Tail transistors can also be removed in NP logic. In Fig. 5.17, during the precharge/predischarge phase all the outputs of N stages go to Vdd while all the outputs of P stages go to 0 V. Thus, the inputs to all PDNs are 0 V and the inputs to all PUNs are Vdd. This shuts down all PUNs and PDNs during precharge and removes the need for the tail transistors. Notice that the tail transistor that can be removed in P stages is the top PMOS.

5.7

Logical Effort in Dynamic Gates

1. Derive logic effort for a dynamic CMOS inverter 2. Derive logic effort for a random dynamic combinational gate 3. Understand how logic effort demonstrates the advantage of dynamic CMOS 4. Understand the limitations on logic effort in dynamic CMOS Logic effort can be calculated for dynamic CMOS very similarly to static CMOS. We calculate the total input capacitance when sizing matches the unit inverter, and then divide it by the input capacitance of the unit inverter. Note that even though we are calculating the effort of a dynamic gate, we must still refer it to the input capacitance of the static CMOS unit inverter. The static CMOS inverter is the “unit inverter”, there is no dynamic unit inverter. Because the dynamic gate does not have a pull-up network, the input capacitance of the gate is only the capacitance of the logic input NMOS in the PDN. There is no PMOS capacitance, because there is no PUN. As we will see shortly, this is the main reason that dynamic gates are faster than static gates. Figure 5.18 is a recreation of Fig. 5.4. It shows a dynamic CMOS inverter and a larger gate implementing the function F′ = AB + CD. In the inverter, if we assume sizing to match the unit inverter, then the two NMOS transistors are sized at 2. The input capacitance of the logic input “input” is thus 2Co. The logical effort is 2/3. For the right sub-figure in Fig. 5.18, all the NMOS transistors must have an aspect ratio of 3. This leads to a logical effort, for any input, equal to 3Co/3Co = 1. We are seeing two strange phenomena here. First, there is a complex logic gate with an effort of unity. Second the inverter has an effort less than unity. This is counter to the intuition developed throughout Chap. 4, where we assumed that logical effort was at least unity, and was unity only for the inverter. This statement is obviously only true for static gates. The dynamic inverter has a logical effort of 2/3 compared to 1 for the static inverter. The function F′ = AB + CD has an effort of 1 in dynamic CMOS, but in static CMOS it would have a logical effort of 2.

Fig. 5.18 Recreation of Fig. 5.4

In fact the improvement can be even more dramatic. By the end of Sect. 5.6, we established that in any practical setting where dynamic gates can be cascaded, we can get rid of the NMOS tail transistor. This reduces the series chain in the PDN, and allows us to size the NMOS smaller. For example, in the inverter the only NMOS would be the logic input NMOS and its size would be 1. In which case the logic effort of the inverter would be reduced to 1/3. The function F ′ = AB + CD will have logical effort of 2/3, less than the static inverter! In fact, the effort of a dynamic gate will always be lower than the corresponding static gate. How much lower depends on the gate, the input, and whether or not there is an NMOS tail transistor. But improvements of two to threefold are not uncommon. Where does this improvement come from? The answer is immediately obvious in our calculation of logic effort. We have a smaller logic effort because the input only feeds the NMOS transistor, there is no PMOS whose input capacitance we can add up. In other words, logic effort has improved because there is no PUN. Logic effort can give us a lot of intuition and is a good indicator of how a gate behaves. But it fails to capture the entire story. Dynamic gates are faster than their CMOS counterparts for two reasons that are equally important to getting rid of the PUN. A large problem that CMOS gates introduce is crowbar (short circuit) current. This was discussed in Sect. 3.6 in the context of power and in Sect. 3.11 insofar as it affects delay. In dynamic gates, the PMOS is controlled by the clock rather than a logic input. Because clock distribution networks are very well buffered, the clock can quickly shut down the PMOS during the evaluate phase. This allows the entire PDN current to discharge the capacitor, preventing the additional delay incurred by the short circuit path.

176 Table 5.1 Logical effort of NAND and NOR gates in static and dynamic CMOS. All dynamic gates are assumed not to have an NMOS tail transistor

5

Dynamic Logic

Number of inputs

2

3

4

N

Static NAND g

4/3

5/3

2

(N + 2)/3

Static NOR g

5/3

7/3

3

(2N + 1)/3

Dynamic NAND g

2/3

1

4/3

N/3

Dynamic NOR g

1/3

1/3

1/3

1/3

Another cause for the speed of dynamic gates that logic effort fails to capture is the point at which the gate switches states. In the dynamic CMOS inverter, if the input is just enough to turn on the two NMOS transistors, then the pull-down chain will eventually pull down the output to 0 V. In other words, when the input is roughly 2Vth, the gate is ready to produce a full “0” at the output. In static CMOS, the inverter will never give a “0” output until its input is closer to Vih. This is again a result of the existence of the PUN, with its contrarian action to the PDN. One might ask why the precharge PMOS transistor does not make any appearance in this section. The gate of the PMOS transistor is fed by the clock network. Thus, it exists in the chain of the clock distribution network. It will make an appearance while loading the buffer that feeds the network (Chap. 13). But the PMOS also makes an appearance while calculating the electrical effort of the dynamic gate. Note that dynamic gates have a single PMOS instead of an entire PUN, thus they will have lower electrical effort than their static counterparts. The PMOS thus plays a role in defining the intrinsic delay of the dynamic gates, but not in determining their optimal sizes in a chain of logic.

Logical effort can also give us some interesting insight into choice of gates in dynamic circuits versus static circuits. Table 5.1 lists electric effort for NAND and NOR gates of various sizes both in static and dynamic CMOS. In static CMOS, the logic effort of a NAND gate is consistently smaller than a NOR gate of the corresponding size. And in general, we always found it preferable to use logic gates with longer pull-down chains than gates with longer pull-up chains. In dynamic CMOS the results could not be more different. As expected, both NAND and NOR gates have lower effort than their static counterparts. But the surprising result is that the dynamic NOR gate has a consistently better logic effort than the dynamic NAND gate. In fact, the dynamic NOR gate has a constant logic effort that is not a function of the number of inputs. This result can be extended and generalized. In dynamic gates, it is preferable to use gates with multiple pull-down parallel paths than gates with long pull-down chains. This is contrary to the intuition developed for static gates. The reason is, in dynamic gates there is no PUN. And thus the choice is not between having a series chain in the PUN or having a series chain in the PDN. The choice is between series chains in the PDN or parallel paths in the PDN. In terms of logic effort, the latter is more preferable.

6

Pipelines

6.1

Sequential Versus Combinational

1. Contrast sequential and combinational circuits 2. Understand that a sequence of operations by necessity entails memory 3. Categorize circuits in terms of stability as astable, monostable, and bistable 4. Contrast dynamic versus static storage 5. Understand the role of bistability in static storage. Chapters 3 and 4 discussed how to design a logic gate that performed a Boolean function and size it for optimal delay in a chain. However, a fundamental component in digital circuit design is still missing. This component was hinted at in Chap. 5 with the introduction of high-impedance nodes and clocks. In this chapter, we want to discuss this component in more detail, namely, Sequential logic. Combinational logic circuits are circuits where the output changes in response to any change in the inputs. It will take some predictable delay for the change to appear at the output, and the delay may vary from one input combination to another, but eventually all input changes in combinational circuits lead to an output response. In other words, the combinational circuit produces an output as a logic combination of its logic inputs and no other factor. Sequential circuits are a different category of digital circuits characterized by the presence of a clock signal. The clock signal acts as some sort of “enable” for the output. When the clock is in its enabled state (see Sect. 6.2), the output will change in response to the input. When the clock is in its disable state, the output will be disconnected from the input. In that case, the output is not undefined, rather it keeps its latest value before the clock went into its disabled state. Figure 6.1 shows the behavior of a combinational circuit as opposed to a sequential circuit, where the output is prompted by input transitions in the combinational circuit, and by clock transitions in the sequential circuit.

This has a very interesting implication. If the sequential circuit is to “keep” the latest value of the output when the clock went into disable state, then the circuit must have some form of memory. As a consequence, sequential circuits always have a memory mechanism associated with them. Thus sequential circuits are usually identified with memory, although the fact that they must have memory is a result of them being dependent on the value of clock. The reason sequential circuits are called sequential is that they make it possible for circuits to perform a sequence of operations. Consider, for example the following pseudocode sequence: T ¼ A; A ¼ B; B ¼ T; This sequence is a well-known code block used to exchange the values in locations A and B through a temporary location T. This sequence is impossible to implement using combinational circuits. Figure 6.2 shows why. Combinational circuits describe the connection of signals. There can be no memory in a combinational circuit, because by definition all outputs in a combinational circuit must be logic combinations of inputs. Thus, the pseudocode sequence implemented in a combinational circuit describes three nodes A, B, and T shorted to each other (left sub-figure of Fig. 6.2). To allow exchange of values between A and B, there must be a way to memorize the value of A in register T and then assign T to register B, once B has been assigned to A. Two things are obvious here: a sequence of operations must happen in a specific order, and memory is needed to keep the intermediate value T. This can be very easily performed by sequential circuits, right sub-figure of Fig. 6.2. Since all sequential elements must include a form of memory, it is worthwhile to consider what memory mechanisms may be used to store logic values. There are two broad categories of storage: Dynamic and static. Dynamic storage is storage of values on capacitors. If the capacitor is then exposed to open circuits in all directions, it

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_6

177

178

6

Pipelines

Fig. 6.1 Sequential circuit (left), and combinational circuit (right)

Fig. 6.2 Combinational (left) and sequential (right) realization of variable exchange pseudocode

will keep the charge, or lack thereof, on it. Dynamic storage has actually been considered in detail in Chap. 5 as part of dynamic logic. Dynamic CMOS occupies a very peculiar area between combinational and sequential circuits due to the presence of both a clock and storage. Capacitors may be

used as a storage mechanism in exactly the same manner they were utilized in Chap. 5, also suffering from similar signal integrity issues (particularly leakage). Static storage is a bit more complicated. Static storage is bulkier, faster, and more robust than dynamic storage. The main mechanism used for static storage is positive feedback. To understand how this works, let us consider the stability of different CMOS logic circuits. A simple CMOS inverter has two stable states, either the input is “0” and the output is “1”, or the input is “1” and the output is “0”. This kind of circuit is called a bistable circuit since it has two possible stable states. Consider the circuit in Fig. 6.3, how would you define its stability? This circuit has only one stable state: when output is “0”. Trying to trigger a “1” at the output by changing the input will lead to the appearance of a pulse of “1” at the output, but this pulse will always return to zero after the gate delays have been exhausted because the inputs of the NAND do not allow for any other steady-state output. This class of circuits is called monostable since they only have one stable output state. Consider the circuit in Fig. 6.4. For this or any ring of static CMOS inverters with an odd number of inverters, the output will be constantly changing. Tracing the output on the circuit, we find that the output will always switch from “0” to “1” or from “1” to “0” after a delay equal to the total delays of the inverters in the chain. Thus, this circuit behaves as an oscillator with an oscillation frequency that is a function of the inverter delay and of the number of inverters in the chain. This class of circuits is called astable. Neither output “0” or output “1” is stable. The circuit cannot remain

6.1 Sequential Versus Combinational

179

Fig. 6.3 Monostable trigger

indefinitely in a state where the logic output is either value. Whichever state the output is, it has to leave after a set delay. Now consider the circuit in Fig. 6.5. This circuit has an even number of inverters connected back to back in positive feedback (in this case two). This circuit is bistable since it has two possible stable states: “1” at A or “0” at A. These values will keep reinforcing each other through the feedback loop and will be kept indefinitely unless they are changed by external intervention. Specifically, if A is “1”, then the inverter causes B to be “0”. Since B is “0”, the second inverter causes A to be “1”. Thus, both values are reinforcing each other, as opposed to Fig. 6.4 where the journey of a signal through the inverter chain leads to the old value being replaced. The positive feedback loop is resistant to noise or loss of charge. Any loss at either node will be immediately restored by one or both inverters. This form of feedback is the one used for static storage and will be used for all static latches, registers, and memories. The two inverters are storing only one bit of data. The fact that there are two bits, A and B, in the figure does not mean there are two bits of information because one bit is always the inverse of the other. Sequential circuits are defined by the presence of clocks and storage elements. So why are they called sequential circuits? Because they are necessary to implement operations that happen in sequence. Which

Fig. 6.4 Astable oscillator. The ring oscillator oscillates only if total delay through the chain is longer than an inverter delay

Fig. 6.5 Bistable inverters in positive feedback

means that they must have memory to store intermediate results while implementing the sequence. Memory elements require a control signal to distinguish

180

6

storage mode from transparent mode. This signal is the clock. Dynamic CMOS logic is very hard to classify. On the one hand we usually treat it as straight combinational logic, on the same footing as static CMOS. After all, in evaluate Outputs are a logic combination of logic inputs. On the other hand, dynamic logic gates have clock signals, they store data on high impedance nodes, and they behave very differently in the two phases of the clock.

6.2

Latches, Registers, and Timing

1. Contrast latches and registers 2. Understand that standalone latches are rarely used 3. Define setup-time, hold-time, and propagation delay in a sequential element 4. Contrast delay in a combinational circuit to that in a sequential circuit 5. Understand why hold-time is not added to the “total” delay of a sequential element. Sequential elements can be broadly categorized according to their relation to the clock as latches or registers. Latches are storage elements where storage is enabled and disabled through the value of the clock signal. Thus a latch will pass Fig. 6.6 Active-High and active-low latch

Pipelines

the input to output, or keep the old value of the output depending on whether the level of the clock is “0” or “1”. Figure 6.6 shows an active-high and an active-low latch. The active-high latch will change its output in response to its input only when the clock is “1”; when the clock is “0”, the active-high latch will preserve its most recent data at the output and will not respond to any data input changes. The active-low latch behaves in the exact opposite manner; with the latch being “transparent”, or able to pass changes on the input, when the clock is low. Latches are specialized data elements. They have the potential to cause serious problems in the behavior of circuits. They are used carefully in a small category of circuits (Sect. 9.12). But their main use by far is as building blocks for the more ubiquitous registers. Registers are sequential elements where the output takes the value of the input only at an active “edge” of the clock. Thus a register does not care about any changes in input that happen when the clock is stable at either “1” or “0”. It will only register inputs when the edge of the clock occurs, i.e., when the clock changes its value from “0” to “1” (positive edge-triggered register) or from “1” to “0” (negative edge-triggered register). Figure 6.7 shows both types of registers in operation. The only changes “registered” at the output are those on the active edge. In all other cases, the register will keep outputting its latest “registered” data. Registers represent the majority of sequential circuits used in practice. By Sect. 6.6 we will realize registers and combinational CMOS together form synchronous CMOS pipelines, which represent the majority of digital integrated circuit applications.

6.2 Latches, Registers, and Timing

181

Fig. 6.7 Positive and negative edge-triggered registers

Delay in sequential circuits is more complicated than delay in combinational circuits. In combinational circuits we were only interested in propagation delay, which was the time between a change at the input(s) and a corresponding transition in the output. In sequential circuits most output changes occur not as a reaction to changes in the input, but in response to changes in the clock. Thus delay has to describe relations between three signal sets: input(s), clock, and output. Figure 6.8 shows the delay relationships between input, output, and clock. We can define three “delays”: • Setup-time (tsu): This is the minimum time that the desired input must be ready and stable before the active edge of the clock arrives. If the data is ready at the input less than tsu before the edge then the output will not be produced correctly • Propagation delay (tcq): This is the closest of the three times shown in Fig. 6.8 to propagation delay in combinational circuits. It is the time after the clock edge that the correct output will appear. This is of course provided that setup-time has been respected. Sometimes called clock to Q delay for obvious reasons • Hold-time (thold): The time after the clock edge that the data must be held stable at the input to ensure that the output appears correctly If we have to define a single figure as the “delay” of the sequential circuit, then that figure would be the summation of setup-time and propagation delays. While this will not be a very useful quantity on its own, it will be a component of

total clock period in a pipeline. This summation represents the time between data changing on input and data changing on output. However, it is not the same as propagation delay in a combinational circuit since it is contingent on the clock arriving at the proper time. Hold-time is not added to this total delay since it happens in parallel with propagation delay and the latter is typically longer.

6.3

The Static Register

1. 2. 3. 4.

Understand the MUX implementation of latches Understand the master–slave architecture of registers Realize the need for transmission gates as switches Trace events that happen in the master latch inactive phase and calculate setup-time 5. Appreciate why a violation of setup-time will lead to unpredictable behavior 6. Trace and calculate propagation delay in a register.

A static storage mechanism by definition involves some form of positive feedback. The bistable circuit in Fig. 6.5 shows the basic structure of positive feedback used in latches and registers. Two static inverters connected back to back will indefinitely keep their state unless they are forced to change it through an external input. Note that the inverter pair is only storing one bit of information since the two bits are complements of each other and thus the second bit does not carry independent information. Figure 6.9 shows another way to visualize static storage latches. Here, the latch is viewed as a multiplexer with one

182

Fig. 6.8 Parts of delay in sequential circuits. Each part is a relation between either the input or the output on the one hand, and the clock on the other

of the inputs fed-back from the output. The select line is the clock signal. Depending on which input of the multiplexer is connected through feedback we realize either an active-high or an active-low latch. In Fig. 6.9, the sub-figure on the left shows an active-high latch. When the clock is “1”, the feedforward input is chosen and the latch is transparent. When the clock is “0”, the feedback input is chosen and the latch keeps its state. Multiplexer-based latches are a good way to abstract a latch. However, examining the internal structure of the multiplexer as shown in Fig. 6.10, we will always find a bistable inverter pair for storage (I2 and I3). The reason is that feeding back the output to one of the inputs will not work unless there is a storage mechanism to preserve this value of output. Figure 6.10 consists of a connection of inverters, either as drivers or as a bistable pair, together with a set of switches.

Fig. 6.9 A latch based on a multiplexer. The feedback path represents storage

6

Pipelines

The switches in the figure are new and require some examination. Figure 6.11 shows three ways to realize switches in CMOS: Using NMOS, using PMOS, or using the new circuit element called the transmission gate. The NMOS switch can pass 0 V cleanly between source and drain. However, (Chap. 2) it cannot pass Vdd on its drain. Instead it passes Vdd–Vth. Thus, we say that the NMOS can pass a strong “0” but a weak “1”. Similarly, a solitary PMOS can pass a strong “1”. It can pass a Vdd on its source cleanly to its drain as Vdd. However, a 0 V on its drain will pass as |Vthp| to its source. Thus the PMOS can pass a strong “1” but a weak “0”. This is why, in static CMOS, the PUN is made of PMOS, and the PDN is made of NMOS. The transmission gate is a switch that consists of an NMOS and a PMOS connected in parallel. While it could potentially occupy a larger area and impose more loading than a single transistor switch, the transmission gate has the critical advantage that it can pass both a strong “1” and a strong “0”. While the NMOS and PMOS switches had only one control terminal (the MOSFET gate), the transmission gate has two control terminals: The NMOS gate and the PMOS gate. However, it is critical to understand that for proper operation of the transmission gate these two control terminals must always be complements of each other. What this means is that either both the NMOS and the PMOS in the transmission gate are on or they are both cutoff. The existence of the NMOS and PMOS is not to allow one to work while the other is cutoff, they are there to provide both a strong “1” and a strong “0” at the output. With an understanding of transmission gates, we can now approach the latch in Fig. 6.10. The latch in the figure is active-high and represents the inner workings of a multiplexer with feedback. The multiplexing action is provided by the two transmission gates T1 and T2. All the inverters are static CMOS inverters. I2 and I3 are the bistable pair that provide storage (provided T2 is closed). I1 provides drive and ensures that Q and D are true copies rather than complements. When clk is “1” in Fig. 6.10, T2 is open circuited and T1 is conducting with the on-resistance of the two transistors in parallel, which we will consider zero in steady state for simplicity. That is to say we will consider on transmission gates to be short circuits. Thus D can pass to Q through I1 and I2. This is shown in the lower left sub-figure in Fig. 6.10. If clk is “0” then T1 is open circuited, completely isolating Q from D. Any change on D would not reflect on Q because the two are isolated by an open circuit. However, T2 is a short circuit and connects I2 and I3 in positive feedback.

6.3 The Static Register

183

Fig. 6.10 Structure of the multiplexer. The feedback path must have a bistable pair. Lower left is the latch in transparent mode, lower right is the latch in storage mode

Fig. 6.11 Different types of switches. The NMOS is incapable of passing a strong “1”. The PMOS is incapable of passing a strong “0”. The transmission gate must have complementary inputs to transistor gates

Thus Q will keep the last value it had before clk went to “0”. This is shown in the lower right sub-figure of Fig. 6.10. The behavior in the “0” and “1” phase of the clock describes an active-high latch. Constructing an active-low latch would just require exchanging the controls on T1 and T2. As discussed above, the control signals on the NMOS and PMOS in a single transmission gate must be complements of each other to guarantee they either both conduct or are cutoff simultaneously. As an edge-triggered element, a register is significantly more complex than a latch. However, in synchronous pipelines, the properties of registers allow the construction of favorable and predictable circuits.

A register is constructed by connecting two latches of opposite type in cascade as in Fig. 6.12. The first latch is called the master latch and the second latch is called the slave. If the master latch is active-high and the slave is active-low, the resulting register is negative edge-triggered. If the master is active-low and the slave is active-high, the resulting register is positive edge-triggered. Figure 6.12, shows the structure of a positive edge-triggered register. Figure 6.13 shows how data propagates through the positive edge-triggered register in Fig. 6.12. When the clock is low, the master latch is transparent, allowing the input to propagate from D to Qi. So any changes in D would appear on Qi during this phase. However, the slave latch is opaque. The slave thus cannot pass Qi to Q. Instead it is latching the previous value of Qi it was able to record when it was transparent. Thus Q will remain constant while the clock is “0” regardless of any changes on either D or Qi. When the clock goes high, the master becomes opaque. It cuts off D from the rest of the register and preserves the last value of D it saw before the clock switched from “0” to “1”. Thus Qi in the high phase of the clock would not change in response to changes on D. The slave becomes transparent in the high phase of the clock. This allows the latched value of Qi to flow to Q. Since this latched value of Qi at the output of the master is the last

184

6

Pipelines

the low phase of the clock, the slave is opaque, also preventing a direct connection between D and Q. This is an essential feature of registers: that the input and output are never directly in contact. For D to appear on Q, two things must happen, and in order: • D must pass to Qi, which it can only do in the low phase of the clock • Qi must pass to Q, which it can only do in the high phase of the clock

Fig. 6.12 Master–slave register based on MUX latches

Fig. 6.13 Propagation through master–slave register. In the low phase data passes from D to Qi. Upon the high phase arriving Qi passes to Q

value that the master saw on D before the clock rose, we can say that Qi appears on Q when the clock rises. We can also safely say that the value of D observed just before the positive edge, appears on Q just after the active edge. The circuit is thus edge-triggered. Note that during the high phase of the clock the master is opaque, preventing a direct connection between D and Q. In

Thus the jump from D to Q happens in two steps: D to Qi and Qi to Q. Since the first can only happen in the low phase, and the second in the high phase, then a positive edge is necessary for D to appear on Q. This is the behavior of a positive edge-triggered register. The dependency arrows in Fig. 6.13 show this in more detail. Figure 6.14 shows the internal implementation of the MUX-based register in Fig. 6.12. The discussion above of how data flows in the register does not account for delays at each step. This is because we assumed all transmission gates had zero on-resistance, and all inverters had zero propagation delay. To calculate delays in a register, we must first understand what happens during each phase of the clock. Also recall that register timing consists of three different components: Setup-time, hold-time, and propagation delay. Setup-time is a relation between the active edge of the clock and the input data. Specifically, it is the time before the clock edge that data must settle. Thus it describes events before the active edge. So for the positive edge-triggered register in Fig. 6.14, it describes events that happen when the clock is “0”. When the clock is low T3 is off, isolating the slave latch from D completely. Since setup-time describes events that relate to D, we can already conclude that setup-time has nothing to do with the slave latch, which has no access to D before the active edge. However, T1 is on, and thus the master latch can be affected by D. Setup-time is a phenomenon that has to do with the edge of the clock, the input signal D, and the master latch. Assume that D has settled at the input in Fig. 6.14. When the clock rises from “0” to “1”, T1 will turn off. This will cut the master latch off from D. Thus setup-time is the time that D must remain stable before T1 turns off. The master latch needs D to remain stable long enough to latch its value correctly in the feedback loop. Thus, it is critical to not raise clk, cutting D off from the master, before D is properly latched at Qi. For the master latch to properly store D into Qi, D must pass through I1, T1, I2, and I3; or the path shown in Fig. 6.14. I1, T1 and I2 are immediately obvious since they

6.3 The Static Register

185

Fig. 6.14 Setup delay through register. Setup-time is the time it takes for data to settle in the master feedback loop

represent the linear path from D to Qi. However, I3 must also be included in the path. To understand why; consider what happens in the master latch when the clock switches from “0” to “1”. As soon as the clock goes high T1 turns off and T2 turns on. I2 and I3 enter into positive feedback since T2 completes the loop. They then preserve the last value of D on Qi, allowing the master to become opaque to the inputs. If T2 turns on when the outputs of I2 and I3 have already settled at logic complements, the positive feedback will kick in and quickly reiterate these values, helping the inverter pair preserve the value of Qi during the high phase of the clock. This will always be the case if we allow enough time for D to flow through both I2 and I3 while the clock is low. However, if the clock rises when data has passed through I2 but not I3, then we might end up with unpredictable results. In this case, T1 disconnects D from the master latch, while T2 connects I2 and I3 in positive feedback. Assume that before the clock edge arrives, the output of I2 was 0 V and thus the output of I3 was Vdd. If D rises, the output of I1 will drop and the output of I2 will rise. However, if the clock edge comes before the output of I3 has also managed to drop in response to the output of I2 rising, we end up with a very strange situation. Figure 6.15 shows this event, which is called a setup-time violation. The output of I2 will be Vdd, the output of I3 will also be Vdd, and yet they are connected in positive feedback. Because of the regenerative property, the inverter pair will exit this state. However, how long it takes them to exit and what state they settle to (i.e., what value of Qi) are random processes. More critically, the output of the register will take an unpredictable

delay to appear after the active edge. For a deeper discussion of setup-time violations, see Sect. 13.8. Thus, the setup-time must allow for D to flow through both I2 and I3, otherwise neither the resulting data nor its delay can be guaranteed: tsu ¼ tI1 þ tT1 þ tI2 þ tI3 Propagation delay is defined as the time between the active edge of the clock and the correct value of Q appearing on the output. Figure 6.16 shows the path of this delay; the path is obvious enough from the definition. Propagation delay is essentially the time it would take Qi to appear on Q. At first glance, this should be the delays of I4, T3, and I5. However, if we consider where data would be at the active edge we would discover this delay is only T3 and I5. At the positive edge of the clock, D is not at Qi (output of I2), instead as we already discussed data would have propagated to the output of I3. If we assume that the delay of I4 is less than or equal that of I3 (very valid assumption given the loading of the two), and considering that propagation happens from the output of I2 through I3 and I4 in parallel, then D would be at the outputs of both I3 and I4 at the positive edge. Since the definition of propagation delay is the exact time it takes for D to appear at Q, then we should not include the delay of I4: tCQ ¼ tI5 þ tT5 A legitimate question at this point would be why we did not also include the delay of I6 in propagation delay to guarantee that data would be preserved in the slave latch properly. There are two responses to this

186

6

Pipelines

Fig. 6.15 Master latch with a setup-time violation. The state where I2 and I3 have the same output cannot be maintained and will be exited randomly

Fig. 6.16 Propagation delay through register. Data is at the output of I4 by the time the positive edge arrives, thus do not add the delay of I4

• The definition of propagation delay is the exact time it would take for data to appear at the output of the register after the active edge. This definition does not make any supposition about data being latched properly in the slave • The additional delay it would take to allow data to settle in the slave is very tiny relative to delay in the rest of the pipeline. Since the period will contain a combinational delay much larger than inverter delay, then we can guarantee the high phase will be long enough to preserve data. This will become clear in Sect. 6.6 when we consider the concept of critical path and how clock period is calculated. But for now consider that this additional delay that we need for the positive phase of the clock is only an inverter delay to account for I6. Note that this assurance could not apply to setup-time because setup-time occurred before the active edge of the clock while propagation delay occurs after it

At this point, it is worth exploring what the inverter and transmission gate delays in the above equations are. All these delays are simple propagation delays that can be derived from the first principles discussed in Chap. 3. To calculate any delay: • • • •

Define the node at which the delay occurs Define the resistance that charges/discharges the node Define the total capacitive loading at the node Calculate the time-constant for the node

For example to calculate the delay of inverter I1: The node under consideration is the output of the inverter I1. The node is charged and discharged through the MOSFET channel resistances of the transistors of inverter I1. The total capacitive load at the output of I1 consists of an intrinsic component from the drains of the inverter

6.3 The Static Register

187

transistors, and an external component from the drains of the two transmission gate transistors. Thus, the time-constant can be calculated as CL ¼ CdnI1 þ CdpI1 þ CdnT1 þ CdpT1  sI1HL ¼ RnI1 CdnI1 þ CdpI1 þ CdnT1 þ CdpT1  sI1LH ¼ RpI1 CdnI1 þ CdpI1 þ CdnT1 þ CdpT1 Transmission gate delay calculation is only slightly different. Load capacitance can be calculated the same way. Charging/discharging resistance is always the parallel resistance of the two transistors of the transmission gate, regardless of the direction of transition. For example for transmission gate T1, self-loading is due to the drains/sources of the transmission gate T1. External loading is due to the gate capacitances of the inverter I2 and the drain/source capacitances of the transmission gate T2. And thus the time-constant for both low to high and high to low transitions can be calculated as

Fig. 6.17 Dynamic active-high latch. Storage takes place on capacitor C

since the user can always expect the inversion, and it becomes part of the information. When clk is low, T1 is off, and D is cutoff from the output Q. Capacitor C observes transistor gates to the right and a cutoff transmission gate to the left. To a first-order C observes a perfect high-impedance state and should keep its charge or lack thereof indefinitely. Thus the input to I1 will be defined indefinitely as the last CL ¼ CdnT1 þ CdpT1 þ CdnT2 þ CdpT2 þ CgnI2 þ CgpI2 value stored on C before it went into high impedance. And   since the input of I1 is defined, its output is also defined. sT1 ¼ RnT1 jjRpT1 CdnT1 þ CdpT1 þ CdnT2 þ CdpT2 þ CgnI2 þ CgpI2 Thus the latch in Fig. 6.17 allows D to flow to Q in the high This method can be used to calculate all the components phase of the clock, and preserves the last captured value of D of setup-time and propagation delays. This applies to latches on the capacitor in the low phase of the clock. Building a dynamic register from dynamic latches foland registers, whether static or dynamic. lows the master–slave architecture used to build static registers from static latches. Figure 6.18 shows a positive edge-triggered dynamic register built by cascading an 6.4 Dynamic Registers active-low latch followed by an active-high latch. Behavior and flow of data still follows Fig. 6.13. In other 1. Contrast static and dynamic latches words, the high-level behavior of static and dynamic regis2. Recognize the danger leakage poses to dynamic latches 3. Realize where the true position of the storage capacitor is ters is the same. D is transferred to Qi in the low phase of the clock when the master is transparent and the slave is opaque. in a dynamic register As soon as the clock goes high, the master will latch its last 4. Trace and calculate setup-time in a dynamic register 5. Trace and calculate propagation delay in a dynamic observed D on Qi by going into opaque mode. The slave becomes transparent, allowing Qi (which is equal to last register. observed D) to flow to Q. Thus, the only way for D to pass Section 6.3 showed how registers can be constructed using to Q is for the clock to first be low then to go high which static latches. In static latches the method of storage is happens on the positive edge of the clock. The dynamic latch, and consequently register is much positive feedback in a bistable pair. Bistable pairs offer very reliable and fast storage, however, they occupy significant smaller than its static counterpart. The dynamic latch consists of 4 transistors, while a static latch consists of 10 area. An alternative is to use dynamic latches where the main transistors. Since registers, and thus their constituent latches method of storage is on a capacitor. This was implicitly occur in significant numbers in most digital circuits, this difference can quickly add up. introduced in Chap. 5. However, dynamic latches suffer from the same signal Figure 6.17 shows a dynamic latch. Storage takes place on the capacitor C. Transmission gate T1 controls the mode integrity issues that affect dynamic logic. Particularly seriof operation of the latch: opaque or transparent. The latch in ous, is the effect of leakage. The value stored on C when the latch is opaque is considered constant only if the capacitor the figure is active-high. When clk is high T1 is on. D can directly flow to Q can be expected to preserve its charge indefinitely. Leakage current will flow, both into the gates of the through T1 and I1. Even though Q is an inverted copy of D, we can still consider that the bit has shown up at the output transistors of I1, and the drains of the transistors of T1. This

188

Fig. 6.18 Dynamic master–slave positive edge-triggered register

leakage current will cause the voltage on C to deteriorate with time. This imposes a lower limit on operating frequency similar to dynamic logic. This is exacerbated by the fact that the clock period will be dominated by the delay of the combinational critical path (Sect. 6.6), which is significantly larger than register delays. The structure of the latch in Fig. 6.17 raises a couple of questions: • What is the role of I1? It seems that it serves no functional role and that the latch will function properly without it • Why did we consider the storage capacitor to be at the input of I1 instead of at its output? Both questions can be answered properly by observing what happens when dynamic latches are cascaded, for example when two are cascaded into a register as in Fig. 6.18: • If the latch was constructed using only the transmission gate, then latches in cascade would consist of a cascade of transmission gates. The transmission gates can be modeled as resistors when in on mode. There will be parasitic capacitances at each drain node. Thus a chain of transmission gates will translate into an RC ladder circuit. This is shown in the right sub-figure of Fig. 6.19. The delay of an RC ladder grows roughly quadratically with the number of stages (Sect. 13.3). Inserting static inverters into the latches, allows the RC latch to be cut up into single RC sections as shown in the left sub-figure of Fig. 6.19. This causes the delay to grow linearly with the number of stages • It is enough for the value to be stored properly at C before we switch off T1. If C has stored the proper value, then we know for a fact that after an inverter delay the output of I1 will be proper. Thus the true storage node is the input of I1 rather than its output Setup-time in a dynamic register is very straight forward if we properly recall its definition. Setup-time is the minimum time data must be stable at input before the active edge is observed. As with the static register, setup-time has to do with the master latch properly storing the value of D before it

6

Pipelines

becomes opaque when the active edge of the clock arrives. In Fig. 6.20, the data needs to be present and the clock low enough time for data to be secured on C1. This means that we just need to allow enough time for data to pass through T1. Setup-time does not and should not include the delay of I1. Setup-time is the bare minimum time before we can raise the clock. It is enough for data to settle on the input of I1 before we switch off T1. There is no danger of the output of I1 settling on the wrong value if its input has already settled. tsu ¼ tT1 However, the fact that we did not account for the delay of I1 in setup-time is important when calculating propagation delay. As we did with the static register, to properly calculate propagation delay we have to define where the data was when the active edge of the clock arrived. At the active edge, data is stable at the input of I1, rather than at Qi. Thus, for this data to appear on Q, it needs to pass through I1, T2, and I2, as shown in Fig. 6.21. Thus propagation delay for the dynamic register is tCQ ¼ tI1 þ tT2 þ tI2 Why did we only have to wait for data to settle at the input of the inverter in setup-time? This is a great contradiction to setup-time for the static register, where not only did we have to wait for the output of the inverter to settle, we had to wait for the output of two inverters. The main difference is in the presence or lack thereof feedback. For a static latch, storage takes place through positive feedback. Feedback is dangerous, because it introduces the possibility of instability. With the positive feedback of a bistable pair this manifests in the possibility of metastability (Sect. 13.8). The dynamic latch has only a feedforward path. Thus, the steady-state output of an inverter can be fully guaranteed by guaranteeing its input.

6.5

Imperfect Clocks and Hold-Time

1. Understand why clock overlaps necessarily occur 2. Relate malfunctions in transmission gates to clock overlaps 3. Trace the need to hold data on the active edge 4. Understand racing on the trailing edge and when it would or would not occur. When considering either the static or the dynamic register, there was always a transmission gate near the input (T1) that completely cut off D from the rest of the circuit as soon as the active edge of the clock arrived. Hold-time is defined as the time after the active edge that data has to be maintained at D. Since In Sects. 6.3 and 6.4 D had no access to the rest

6.5 Imperfect Clocks and Hold-Time

189

Fig. 6.19 RC ladder versus summation of single time-constants. The RC ladder delay will increase nearly quadratically

Fig. 6.20 Setup-time in dynamic register

Fig. 6.21 Propagation delay through dynamic register

of the register after the active edge, hold-time was zero. Holding data would be both unnecessary and pointless because T1 cuts off D from the rest of the register as soon as the active edge arrives. However, this assumes an ideal clock. We assume that clk’ is a perfect logic inversion of clk. Since clk’ is typically obtained from clk through an inverter, then clk’ will be delayed relative to clk by about an inverter delay. As Fig. 6.22 shows, this leads to a situation where, for some short periods of time, clk and clk’ have the same value. This always happens at clock transitions. On the positive edge clk and clk’ will both be “1”, leading to a period of “1– 1 overlap”. On the negative edge there will be a period of “0–0 overlap”. These overlaps will lead to serious issues since we have always made the assumption that the two signals are perfect logic complements. The main impact of clock overlaps is on transmission gates. Recall that for proper operation of a transmission gate, its two constituent transistors must both be on or off simultaneously. Because we provided the gates of the NMOS and PMOS in the transmission gate with complement controls, we never considered nor did we see the need

Fig. 6.22 Realistic clock with clk-clk’ overlaps due to inverter delay

to, any scenario where only one of the transistors is on and the other is off. Because clk and clk′ are delayed inverses rather than true instantaneous logic inverses, we can readily deduce that transmission gates will be affected. Figure 6.23 shows how and when this impact happens. The top shows a circuit where two transmission gates are cascaded. The two gates have opposite control signals. Two transmission gates with opposite controls should never conduct simultaneously. After all, when both transistors are on in one gate, both transistors will be off in the other. This is the case in the lower left sub-figure. In the high phase of the clock the right transmission gate conducts. In the zero phase of the clock, the right transmission gate conducts and the left transmission gate turns off. This is the ideal behavior of the series gates. The lower right sub-figure in Fig. 6.23 shows how the transmission gates would behave if the inverter delay of clk’ is taken into consideration. Behavior is similar to the ideal case except for areas of 1–1 overlap and 0–0 overlap.

190

6

Pipelines

Fig. 6.23 Two opposite transmission gates in cascade. Lower left, ideal clock. Lower right, overlapping clocks

During 1–1 overlap the two PMOS transistors of the two transmission gates are off. The two NMOS transistors of the two transmission gates are on. This leads to a very strange situation where there is a conducting path through both transmission gates even though they have opposite control signals. At the trailing edge of the clock, there will be a period of 0–0 overlap. Ideally, we expect both transistors of the first transmission gate to be on, and the second transmission gate to be off. However, due to the overlap, there will be a period where both PMOS transistors are on and both NMOS transistors are off. Again, this creates a conducting path through the two transmission gates where none should exist. In Sect. 6.3 we saw that when the active edge of a register occurred, the master instantaneously became opaque and the slave instantaneously became transparent. In Fig. 6.24 this translates to T1 turning off, T2 turning on, T3 turning on, and T4 turning off. However, if we consider the fact that there is 1–1 overlap, then at the active edge and for the duration of the overlap, all four transmission gates can simultaneously conduct through their NMOS. This means data can flow everywhere in the register for as long as there is an overlap. This is clearly not proper operation. For example, point X in Fig. 6.24 is now under contention with both the output of T2 and the output of T1 trying to write it. That is to say we have a fundamental problem because during overlaps both latches (master and slave) are working in both modes (opaque and transparent). The problem thus is that after the active edge, the master remains transparent for the duration of overlap when it

should have been off. The easiest way to address this issue is to hold the value of D constant after the clock edge until the overlap resolves. This ensures that the data within the master latch remains unchanged. When the overlap resolves the NMOS in T1 will turn off, and the master latch will be in opaque mode, allowing us to stop holding D. This is the circuit level origin of hold-time. To reiterate, hold-time is due to the fact that clocks cannot be ideal, this prevents the master latch from becoming opaque at the edge of the clock. Thus data must be held for thold ¼ t11overalp Notice that holding D is only necessary if a change on data manages to get into the feedback loop of the master latch. In other words if 1–1 overlap was shorter in duration then the summation of the delays of both I1 and T1 then there would be no need to hold data after the active edge. This is because T1 will turn off before data has the chance to propagate through it. Thus hold-time is nonzero only if t11overalp [ tI1 þ tT1 0–0 overlap for a positive edge-triggered register poses the same risks of data moving beyond the point it is supposed to. But 0–0 overlap happens on the inactive edge. In proper operation, on the inactive edge, the master becomes transparent and the slave becomes opaque. That is to say in Fig. 6.25, T3 should ideally immediately switch off at the trailing edge of the clock. Because there is a period of 0–0 overlap, all transmission gates are on at the trailing edge for the duration of the

6.5 Imperfect Clocks and Hold-Time

191

Fig. 6.24 Racing on active edge overlap in a static register. Data ideally stops at T1, but due to overlap can race into the master latch

overlap. Notice that as with the active edge, our main concern is with transmission gates that should close but fail to do so. In this case, this is transmission gate T3. So we are concerned about changes in data input D managing to “race” through the master and passing through T3 while in overlap. This will cause Q to change on the inactive edge of the clock. This is why we call this condition “racing” because it is a problem only if the data manages to race and beat the clk’ transition all the way to T3. We do not need to worry about this issue if the delay that D faces from input to T3 is less than 0–0 overlap, in other words we would not suffer from trailing edge racing if t00overalp [ tI1 þ tT1 þ tI2 þ tI4 þ tT3 Note that since 0–0 overlap occurs due to an inverter delay. So the above condition might not be very hard to enforce because we need three inverter delays and two transmission gate delays to be more than a single inverter delay. Although it is also important to note that the clock inverter is significantly more loaded than the inverters in the register, so 0–0 overlap is not equal to inverter delays within the register, and thus racing is not impossible. We cannot address racing on the inactive edge by imposing a condition on D as we did with hold-time on the active edge. Recall that by the very definition of hold-time, D would be released and allowed to change after the active edge by a duration equal to the hold-time. Thus we cannot hold D on the inactive edge of the clock, because D is

already long gone. Racing on the inactive edge can only be addressed by making sure the inequality above is satisfied. Hold-time and racing are even easier to envision in a dynamic register, Fig. 6.18. On the active edge, T1 should turn off and T2 should turn on. Due to 1–1 overlap, both gates will have their NMOS conducting. The fact that T2 conducts is not an issue since it should be conducting. However, T1 should not be conducting. Thus we must maintain data hold on input till the overlap resolves. This will only be necessary if the overlap is longer than the delays of I1 and T1, otherwise changes on D would not have enough time to propagate through T1 and there would be no need to hold data. Conditions on hold-time for the dynamic register are identical to those of the static register. On the inactive (negative) edge, the register in Fig. 6.18 should see T1 turning on and T2 turning off. Due to 0–0 overlap, both transmission gates will conduct through their PMOS. T2 should not be conducting, and the fact that it conducts will cause disastrous loss of data if D manages to race the clock through T2. That is to say, to avoid facing racing problems, the delay of the path from the input through T2 must be more than the duration of 0–0 overlap, or in other words: t00overalp \tI1 þ tT1 þ tT2 Compare this result to the static register. We can see that dynamic registers are more prone to racing than static

192

6

Pipelines

Fig. 6.25 Racing on inactive edge overlap. Data ideally stops at T3, but due to overlap can race into the slave latch

registers. This is because a dynamic register has a shorter path from the input to the transmission gate of the slave latch.

6.6

Pipelines, Critical Path, and Slack

1. Define a synchronous pipeline 2. Understand that clock frequency is the ultimate definition of delay 3. Define and extract critical path 4. Understand slack: positive, negative, and zero 5. Understand the cause of hold-time violations and how to solve them 6. Understand the cause of setup-time violations and how to solve them. In a large circuit that contains multiple combinational gates connected in complicated ways, what do we mean when we mention the word “delay”? In Sect. 3.10, we considered an example where we calculated the total delay through a number of combinational gates. The delay depends on how gates are connected. When combinational gates are connected in cascade, their delay is added. When the gates appear in parallel, then only the longer of the delays counts. For registers, we defined “delay” very roughly as the summation of setup-time and propagation delay. However, the majority of digital circuits are neither purely combinational, nor purely sequential. When both types of circuits are combined, we have to define a new systematic form of delay.

Figure 6.26 shows three combinational logic blocks (CLBs) connected in series. A CLB is just another word for a combinational gate or set of gates. Whenever trying to calculate “delay” from now on, we will assume (even if it is not stated explicitly) that there are registers at all the initial inputs and the ultimate outputs. This makes sense since the majority of digital circuits are synchronous pipelines, and also because it allows us to calculate delay according to a systematic definition. Note that registers R1 and R2 in Fig. 6.26 are not necessarily single bit registers. They can be multiple bits; in which case each register consists of multiple single bit registers in parallel. A pipeline is a circuit that consists of combinational circuits (or CLBs) lying between registers. Pipelines allow higher throughput to be obtained from circuits, and they also allow predictable behavior. A pipeline is synchronous if there is only one clock. In other words, synchronicity assumes all registers are fed from one global clock signal. The most meaningful measure of delay in Fig. 6.26 is the clock frequency that can be used on the registers. This will also include a measure of the delay in the combinational

Fig. 6.26 An I/O pipelined Combinational Logic Block (CLB). There are registers only on the input(s) R1 and the output(s) R2. The CLB consists of three combinational gates in cascade CLB1 through CLB3

6.6 Pipelines, Critical Path, and Slack

circuits, as will become clear shortly. Clock frequency is the “speed” at which the circuit can be operated. Clocks are periodic square waves characterized by both the frequency (or period) and the duty cycle. The period of the clock is the time between two positive edges, but also the time between any two identical points in consecutive cycles. Frequency is the reciprocal of period. Duty cycle is the proportion of the period during which the clock is high. To calculate the necessary clock period in Fig. 6.26 we must define all the operations that happen between two active edges and account for the delay of said operations. The period must at least accommodate these events. Figure 6.27 shows the events that occur between the two active edges of the clock, in the order that they happen: • First at the clock edge data starts propagating through R1 and after a propagation delay (tcq) it appears at the output of R1 • Data must then traverse through all three CLBs, which takes the summation of their propagation delays • If the data then arrives at the input of R2 exactly at the next positive clock edge, it will be registered improperly. For proper operation it must arrive setup-time (tsu) before the clock edge in order not to raise a setup-time violation in R2 Figure 6.27 thus shows that three things must be accommodated between the active edges: propagate through first register, propagate through combinational logic, and arrive early enough to allow setup-time for the second register. The summation of these times is the clock period. The summation of the three delays is obviously a minimum for the clock period, and it can be greater than or equal to this quantity. This corresponds to a maximum frequency at which the circuit can operate. Thus for Fig. 6.26:

193

T  tCQ R1 þ tCLB1 þ tCLB2 þ tCLB3 þ tsu R2 The combinational delay here consists of the summation of the three CLB delays. But in a general pipeline, the three terms can be replaced by whatever maximum combinational delay exists between the two registers. Also, for most registers setup-time and propagation delay are identical, so it is superfluous to mark the registers responsible for setup-time and the register responsible for propagation delay. Thus the clock period in a general case is T  tCQ þ tpd combinational þ tsu And the frequency of operation is f

1 tCQ þ tpd combinational þ tsu

The question then is: what do we mean when we say “combinational delay” in the context of a pipeline? In Fig. 6.26 this was pretty straightforward: The path from input to output register passed through three combinational blocks in cascade, thus combinational delay is the sum of the individual blocks’ combinational delay. Figure 6.28 shows a significantly more complicated situation. There are multiple input and output registers and multiple combinational paths from said inputs and to outputs. The aim is to find the minimum clock period at which the registers of the circuit must operate. While this might seem like a fairly more complicated task than in Fig. 6.26, we will find out that a unified approach can be used for any circuit. The critical fact that we must be aware of is that all registers use the same clock in a synchronous pipeline. Thus, the same events we accounted for in Fig. 6.27 must occur between any two active edges, and between any two consecutive registers, in any and all pipelines. For any two consecutive registers, the clock period must account for the register propagation delay, combinational delay, and register setup-time. This will yield 5 conditions for the circuit in Fig. 6.28. The number of conditions is equal to the number of paths between any two registers, marked on the figure as paths 1 through 5. For path 1, only CLB1 is between the two registers, thus it imposes the following condition on the clock period: T  tCQ þ tCLB1 þ tsu For path 2, CLB2 and CLB3 are in cascade. Thus, their combined delay must appear as the “combinational” delay, leading to a clock period condition: T  tCQ þ tCLB2 þ tCLB3 þ tsu

Fig. 6.27 Cycle time. Data must exit from R1, propagate through the CLB and arrive a setup-time ahead at the input of R2

Similarly for the remaining paths:

194

6

Pipelines

Fig. 6.28 An I/O pipelined circuit with multiple paths

T  tCQ þ tCLB4 þ tCLB5 þ tCLB6 þ tsu T  tCQ þ tCLB7 þ tCLB8 þ tCLB9 þ tsu T  tCQ þ tCLB10 þ tsu But the pipeline is synchronous. Thus there is only one clock. There must be one clock period T. The above five conditions must collapse into a single condition. Setup-time and register propagation time are common to all five conditions due to the assumption that they are equal across all registers. The factor that sets the five conditions apart is the CLB delay. Thus each “path” is distinguished not by its registers, but by the combinational logic that it passes through.

The clock in Fig. 6.28 has to have a single lower bound imposed on its period. So if we have to choose one of the five conditions on the period imposed above to apply to the entire circuit, we must choose the strictest condition, i.e., the one that yields the maximum clock period or the minimum clock frequency: T  tCQ þ maxftCLB1 ; tCLB2 þ tCLB3 ; tCLB4 þ tCLB5 þ tCLB6 ; CLB7

þ tCLB8 þ tCLB9 ; tCLB10 g þ tsu

Or in terms of the path delays: T  tCQ þ maxfpath1; path2; path3; path4; path5g þ tsu

6.6 Pipelines, Critical Path, and Slack

This is a foundational result. To find the operating frequency of a synchronous pipeline we must define the combinational path between (any) two consecutive registers with the longest propagation delay. This path is called the critical path, and it defines the maximum possible speed of the circuit. Note that once the critical path is defined, the period it dictates will be used for the clock which feeds the entire circuit. What happens to paths other than the critical path under such a clock? These paths, by definition, have lower combinational delay than the critical path. Thus, data will propagate through them and be ready at the input of the next register earlier than it needs to be. This is not a disastrous scenario since setup-time is a lower bound. If data arrives earlier than it needs to, it will still be registered properly. The problem occurs when the data arrives too late. The opposite is not true, if we had used a period lower than that dictated by the critical path, then at least the critical path would not be able to provide its output to the next register a setup-time before the clock edge. The data would not be correctly registered, it would be lost, and the circuit would fail. The left sub-figure in Fig. 6.29 shows a circuit where the input and output are pipelined. The minimum clock period can be calculated using the approach discussed above: • First find the longest combinational path between two registers, this is the critical path • Minimum clock period is the critical path delay plus the overheads from the register There are two paths in the circuit, one through CLB’s 1 through 3 and one through CLB4 and CLB5, thus period can be calculated as T  tCQ þ maxfðtCLB1 þ tCLB2 þ tCLB3 Þ; ðtCLB4 þ tCLB5 Þg þ tsu The right sub-figure of Fig. 6.29 shows the same circuit but fully internally pipelined. By full internal pipelining we mean each CLB is sandwiched between two registers. To find the clock period and thus operating frequency of the

195

fully pipelined circuit we use the same approach outlined above, i.e., we find the longest combinational delay between two registers. This is then defined as the critical path and its summation with register setup and propagation gives the clock period. When applied to the right sub-figure of Fig. 6.29, this gives T  tCQ þ maxftCLB1 ; tCLB2 ; tCLB3 ; tCLB4 ; tCLB5 g þ tsu Internal pipelining is used to increase the operating frequency of a pipeline. This is immediately obvious by examining the two parts of Fig. 6.29. Even without knowing particulars, the critical path in the right sub-figure must be shorter than that in the left sub-figure, more specifically: maxftCLB1 ; tCLB2 ; tCLB3 ; tCLB4 ; tCLB5 g  maxfðtCLB1 þ tCLB2 þ tCLB3 Þ; ðtCLB4 þ tCLB5 Þg However, it is hard to tell how much of an improvement this would be without more information about CLB delays. Internal pipelining works best when delays of the individual CLBs are close to each other, and is worst when one CLB has dominant delay. For example, if the original critical path in the left sub-figure in Fig. 6.29 was CLB1 þ CLB2 þ CLB3; then frequency would improve by roughly a factor of 3 if the delays of CLB1, CLB2, and CLB3 are equal. However, if CLB1 has a much higher delay than CLB2 and CLB3, then frequency would hardly change as the critical path moves from CLB1 þ CLB2 þ CLB3 to CLB1. Note that this all assumes that after pipelining the critical path does not migrate to the other path (CLB4 þ CLB5). A numerical example might clarify this even further. All delays will be given in an arbitrary unit. Assume that setup-time is 1, and register propagation delay is 1. The delays of the CLBs are as follows: CLB1 ¼ 10, CLB2 11, CLB3 ¼ 12, CLB4 ¼ 6, and CLB5 ¼ 7. Internal pipelining would be a perfect proposition in this case. Before pipelining the critical path is max(CLB1 þ CLB2 þ CLB3, CLB4 þ CLB5) ¼ max(33, 13) ¼ 33. The critical path is obviously CLB1 þ CLB2 þ CLB3 with an operating frequency of 1/(2 þ 33) ¼ 1/(35) and a period of 35.

Fig. 6.29 Internal pipelining. Left, the circuit is only input–output pipelined. Right, the same circuit is internally pipelined between each CLB

196

If the circuit is fully pipelined the new critical path is max (10, 11, 12, 6, 7) ¼ 12 with a period of 14 and a frequency of 1/14. This is an improvement of 35/14 ¼ 2.5 times in speed. However, this significant improvement is due to the closeness of the delays of the individual CLBs. Consider, on the other hand, the case where: CLB1 ¼ 30, CLB2 ¼ 2, CLB3 ¼ 1, CLB4 ¼ 6, CLB5 ¼ 7. The unpipelined period is still 35 and the critical path is still CLB1 þ CLB2 þ CLB3. However, after internal pipelining, the critical path becomes CLB1 leading to a period of T ¼ 30 þ 1 þ 1 ¼ 32, an improvement of 35/32 ¼ 1.09 times, which is negligible. Notice that improvement due to pipelining can also be stunted by the critical path migrating to another part of the circuit. For example if CLB1 ¼ 10, CLB2 ¼ 11, CLB3 ¼ 12, CLB4 ¼ 31, CLB5 ¼ 1. The unpipelined critical path still goes through CLB1 þ CLB2 þ CLB3 at a period of 35. However, after full pipelining, the critical path migrates to CLB4 at a combinational delay of 31 leading to a period of 33 and a negligible improvement in operating frequency. What can we do if we still need an improvement in speed in the cases where internal pipelining does not lead to an improvement. The answer lies in the definition of what constitutes a “CLB”. We could very well go inside a CLB and break it up into multiple smaller CLBs, pipelining in between to get a shorter critical path. Section 6.8 expands on the definition of speed in pipelining and the tradeoffs involved in internal pipelining. Figure 6.30 shows a circuit with three combinational paths. The graphs on the right show the timing of the output of the combinational paths becoming available. In other words, when the inputs to the second register are ready. We can clearly see that none of the paths fail to arrive in time. Specifically for path 3, the graph shows that data changes exactly setup-time before the clock edge. This defines path 3 as the critical path of the circuit, since any path with any longer delay will lead the circuit to malfunction at the shown clock period. The graph for path 2 shows that data manages to pass through the path and end up at the input of the register more than setup-time before the next edge of the clock. This means that data was ready earlier than it needed to be, but that the circuit will still work properly because on the clock edge data from path 2 will be registered properly. The interval of time between when path 2 was ready and when it needed to be ready is called the slack for that path. The presence of (positive) slack represents waste, but it does not represent failure of the circuit. It will be used in Sect. 6.7 when we consider power reduction. To define slack more rigorously, it is the time interval between the point at which

6

Pipelines

data exits the combinational circuit in the path and the point a setup-time before the next clock edge. For path 3, slack is zero. That is to say, the data is ready exactly when it was necessary for it to be ready, which is a setup-time before the next edge. Thus, in a properly operating circuit, the slack of the critical path is zero. Also, in a properly operating circuit, all path slacks have to be either zero or positive. The graph for path 1 shows a very large positive slack, much larger than for path 2. However, this is not the most striking thing about this path. The graph marks the hold-time interval necessary after the first clock edge. As discussed in Sect. 6.4, hold-time occurs after the active edge of the clock due to the presence of overlaps in the clock and its complement, arising from the delay of the complement clock inverter. The solution to this overlap after the active edge was to ensure that old data remains held at the input for the duration of the overlap to prevent racing to the output. Path 1 violates this. Its combinational delay is very small, data propagates very fast through it, appearing at its output before a hold-time has passed. Thus, the data from the previous cycle failed to be held at the input of the second register, instead being overwritten by new data that quickly exited through path 1. This is called a hold-time violation and is a serious failure mode for the circuit. Because data is not held at the input of the second register long enough, it is left at the mercy of clock overlaps, allowing new data to race through to the output register. Hold-time violations require designer intervention. Hold-time violations require the addition of a new condition, involving not the longest combinational delay, but the shortest: minftcombinational g þ tCQ [ thold Note we are not including setup-time in the definition, because it is enough for the new data to arrive at the input of the second register to lead to a violation. If we added setup-time as well, we would be making the condition more lenient by requiring that new data also settle in the master latch. Hold-time violations indicate a path in the circuit is too short and data is passing through it too fast, thus arriving at the input of the next register before hold-time has expired. A hold-time violation is also obviously related to large positive slacks. Raising the clock period or lowering it has no effect on the hold-time violation. This is because hold violations happen after the active edge of the clock. Increasing the clock period would just increase the time to the next active

6.6 Pipelines, Critical Path, and Slack

197

Fig. 6.30 Condition on hold-time and hold-time violation. In paths with very small combinational delay, the new input can appear on the output register before the old input has enough time to be “held”. The graphs show the outputs of the paths

edge, which would do nothing to solve the problem. In other words, hold-time violations occur in the presence of large positive slacks, so why would a reduction in frequency solve them? Hold-time violations can be solved as shown in Fig. 6.31. In Fig. 6.30 path 1 was the path suffering from the hold-time violation. To solve this issue, we add combinational delay to the path until its total delay is long enough that it gives the second register a chance to have the data held at its input. That is to say, we need to add as much combinational delay as the difference between the original path delay and the hold-time: added delay ¼ tpath1 old þ tCQ  thold Figure 6.31 shows this would make the data change exactly at the hold-time, removing the violation. The most

Fig. 6.31 Hold-time violation addressed through delay element (series of inverters)

common way to add combinational delay is by adding an even series of inverters. The number and sizing of the inverters in the chain depends on the amount of delay we need to add. Many hold-time violations can be caught and solved automatically by the CAD tool. But sometimes hold-time violations can be complicated by other requirements in the circuit and require intervention by the designer. Slack has been a very useful tool in diagnosing paths in a circuit so far. A path with zero slack is a critical path and it demands a lot of designer attention. Paths with positive slacks are safe, and their data will be registered properly. Unless the positive slack is too large, in which case we must be wary of the possibility of hold-time violations. But what if slack is negative? Negative slack means that data exits the combinational path later than a setup-time before the active edge. This is

198

obviously a failure. The register will not catch the data properly because the data will not settle in the master latch before the active edge. This is called a setup-time violation. Setup-time violations normally require more designer intervention to resolve. As shown in Fig. 6.32 setup-time violations are the exact equivalent of having negative slack. In the figure both paths originally (left sub-figure) have negative slacks with their outputs changing later than setup-time before the active edge. This leads to the circuit malfunctioning because the second register will always have wrong data from both paths. One way to resolve the situation in Fig. 6.32 is to increase the clock period. In particular path 2 has a larger negative slack, thus if the period is increased by the absolute value of path 2 slack, then path 2 will have zero slack and path 1 will have a positive slack equal to abs(path 2 slack)-abs(path 1 slack). If we cannot tolerate the deterioration in operating frequency that the increase in period entails, then we have to break the critical paths. The left side of Fig. 6.32 shows how this can be done. Path 1 is broken down into two paths path 1a and path 1b. The same is done to path 2. “Breaking down” means inserting internal registers to create new paths. Recall that a “path” is the combinational path between two consecutive registers. Thus the right sub-figure of Fig. 6.32 has four paths, while the original left sub-figure has only 2. If the paths are broken down properly, then all the new paths will have combinational delays small enough that none would have setup-time violations. As the right graph in Fig. 6.32 Negative slack addressed by further pipelining. Both paths in the original pipeline on the left had setup-time violations. Both are resolved by internal pipelining

6

Pipelines

Fig. 6.32 shows. The new critical path with zero slack is path 2b. All the other paths have positive slack. Note that successful resolution of setup-time violations should result in at least one path with zero slack, while the remaining paths should have positive slack. The concepts introduced in this section are critical to understanding timing in a digital circuit. Terminology introduced here will be used extensively in Chaps. 8 and 9 when considering the design flow. Below, we reiterate the definitions discussed in this and earlier sections that help cement timing in a pipeline: Pipeline: A digital circuit that consists of combinational logic blocks and registers. The pipeline potentially contains multiple paths. Synchronous pipeline: A pipeline where all registers are fed the same clock signal. Most pipelines are synchronous. Multiple clocked circuits require special consideration to avoid metastable latching (Sect. 13.8). Clock: A periodic square wave used as a control signal for registers and latches. The phase or edge of the clock defines when a latch or register is transparent or opaque. Clocks are characterized by their period and duty cycle. Setup-time: The time before the active edge of a clock that the input to a register must have stabilized. If data changes

6.6 Pipelines, Critical Path, and Slack

199

within the setup-time, we cannot guarantee the output of the register will be correct, leading to a setup-time violation.

frequency. All setup-time violations are resolved when there are no longer negative slacks.

Hold-time: The time after the active edge of the clock that input data must be held constant. Hold-time is a result of clock overlaps which keep transmission gates transparent longer than they should be. Not respecting hold-time leads to hold-time violations.

Hold-time violation: Hold-time violations occur when the input of a register occurs too soon after an active edge of the clock. This happens due to an extremely fast path between registers. Hold-time violations cannot be resolved by reducing operating frequency. They require the addition of combinational delays in the violating path. A hold-time violation is most likely to occur in a path with a very large positive slack.

Propagation delay: In a combinational circuit, it is the time between (one of) the inputs changing, and the output changing. The term is also sometimes used to describe clock to Q delay in registers. Clock to Q delay: The time after the active edge of a clock that the output of a register changes. This will only happen given setup-time was respected. Path: The trajectory of data from the output of a register through combinational logic and to the input of another register. The path must lie between only two registers. A pipeline consists of multiple paths. A synchronous pipeline is a pipeline where all path registers use the same clock signal. A pair of registers might contain multiple paths in between depending on the number of independent combinational trajectories. Critical path: The path in a pipeline with the longest total delay. It is often reasonable to assume all registers have the same setup-times and clock to Q delays. Thus the critical path is also the path with the longest possible combinational delay. Clock period: The time between two identical points in the period of a clock. It is also the minimum time that accommodates three events: clock to Q delay, setup-time, and the combinational delay of the path. In a synchronous pipeline the clock period is determined by the delay of the critical path. Slack: The difference in time between the delay of a path and the delay of the critical path. If setup and clock to Q times are equal for all registers, then slacks are fully determined by differences in combinational delays. The slack of the critical path is null by definition. Any path that has a negative slack is also a path that has a setup-time violation. Setup-time violation: A condition where the input to a register in a pipeline does not respect setup-time. This is identical to the path having negative slack. Setup-time violations mean data will not be properly sampled. It occurs because data fails to latch properly in the master latch, leading to metastable sampling. It is a failure mode for the circuit. Setup-time violations can be resolved by breaking up violating paths with registers, or by reducing operating

The discussion of hold-time violations suggests that they are unimportant. After all, if detecting hold-time violations and solving them is a systematic process, why should we care. CAD tools can and do readily cover hold-time conditions. And if they did not, the number of violations would be untenable. While this is mostly true, it is untrue in some very important cases. As shown in Chap. 3, any logic gate will have both a worst-case and a best-case delay. Thus, the shortest combinational delay in a circuit does not necessarily need to come from a simple gate. It can come from a complicated gate in cases where the input activates a very small pull-up or pull-down resistance. In fact, it is very possible for both the critical path and the shortest path in a circuit to come from the same CLB. Correcting hold-time violations requires the insertion of combinational delays until the hold-time has been restored. In some cases, adding enough delay to resolve the hold-time violation will lead to a setup-time violation. This is because the worst case delay of the CLB is also affected by the additional combinational delay. This creates a situation where solving hold-time violations creates setup-time violations and vice versa, requiring designer intervention. In Sect. 6.8 we will discuss an example that helps illustrate this. Setup-time violation, negative slack, and improper clock period design are all synonymous. Saying one means the others. A setup-time violation always occurs due to negative slack, and negative slack always leads to setup-time violations. Both can be readily addressed by a change in clock period or by internal pipelining.

6.7

Managing Power in a Pipeline

1. Distinguish between power and energy 2. Understand the interplay of supply, power, and delay 3. Use parallelism and pipelining to achieve excessive slack and back off to create power savings

200

4. Comprehend the superiority of slack minimization as a power reduction technique 5. Realize that slack nullification is impossible in a practical supply setting. Digital circuits are characterized based on their area, speed, and power. Technology scaling has meant that transistors have constantly grown smaller. The cost of implementing the same logic function has consistently dropped as the area has dropped. Thus area has retracted as the most important metric to characterize a digital circuit. Speed and power remain the most critical measures of a circuit’s performance. As area became noncritical, most applications of digital electronics moved from stationary platforms to mobile, wireless, and most critically, battery-powered platforms. The imperative of modern consumer electronics is to support new functions with each new iteration, and to maintain or improve battery charge duration. Battery performance is obviously directly related to power. How much power the circuit draws corresponds to how long the battery can hold on before needing a recharge. When we say power, we actually mean a combination of speed and power. Consider the example in Fig. 6.33. It shows two circuits performing the same logic function. One is performing the function at double the power of the other. But it is also producing double the throughput. In fact, if we look inside the top-level wrapper of circuit B, we find out that it consists of two parallel instances of circuit A. Which of the two circuits in Fig. 6.33 is “better” in terms of power? We might hastily answer the above question by saying that circuit A is “better”. However, we have to realize that power alone is meaningless, and that it has to be somehow combined with delay to produce a meaningful figure of merit. Circuit A has half the power, but circuit B produces double the throughput. Intuitively, they should both be “the same”. But how do we determine this systematically?

6

Pipelines

A common way to characterize power is to compare it at the same throughput. Another method that uses only one metric is to compare power–delay product or PDP. PDP ¼ Power  delay ¼ Power/Throughput. The unit of PDP is Joule, and physically it represents the energy consumed to produce one output sample. Energy is a more representative metric for the user experience and corresponds directly with battery life. The energies to produce one output (or PDPs) in the two circuits in Fig. 6.33 are equal, which is in-line with the intuition that they are “the same”. The main goal of modern circuit design is thus reducing power consumption in a circuit while preserving or increasing its throughput, or alternatively reducing energy consumption per sample. We will consider three techniques to achieve this. All three techniques depend on the interplay of active power, delay, and supply as shown in Fig. 6.34. Active (dynamic) power is directly proportional to operating frequency, or inversely proportional to delay (Sect. 3.6 ). Delay is inversely proportional to supply (Sect. 3.10). Active power is directly proportional to the square of supply (Sect. 3.6). These opposing dependencies, and especially the quadratic dependence of power on supply, is where power reduction techniques find an opening. The three methods of power reduction we will describe below are: • Parallelism • Pipelining • Slack minimization Figure 6.35 shows the first method of power reduction. In this method, parallelism is used to reduce power by a significant factor. Here we sacrifice area for lower power while preserving speed. The main idea behind this method is that if we increase speed beyond our needs we can back power off by a larger factor than the increase in speed. If the original circuit (the left of Fig. 6.35) has an area A corresponding to capacitance C and it operates at a frequency f at supply Vdd, producing throughput S, then its active power, frequency, supply, and throughput respectively are 2 P ¼ CfVDD ;

f;

VDD ;

S

If two copies of this circuit are implemented in parallel, then each will be a copy of the original, producing the same

Fig. 6.33 Power and energy. The two circuits have different powers but actually have identical “performance”

Fig. 6.34 Interdependency of delay, power, and supply

6.7 Managing Power in a Pipeline

201

throughput, operating at the same frequency, and consuming the same power. Thus the total throughput for the parallel circuits is double that of the original circuit, but they also consume double the power, operating at the same frequency: 2 P ¼ 2CfVDD ;

f;

VDD ;

2S

Each of the sub-circuits operate at f, each producing S throughput, and consuming P power. In combination they produce 2S throughput and consume 2P power. Throughput requirements are defined by the application. So if we originally needed S, we are likely to still need S. Thus, we can back off the operating frequency of the two sub-circuits to f/2. Each would produce only S/2, for a total throughput of S. So power, frequency, supply, and throughput are: 2 2 CfVDD CfVDD 2 þ ¼ CfVDD ; 2 2

f =2;

VDD ;

S

This is the same power and throughput of the original circuit but at double the area! We are still missing one link. Dropping the frequency by half means we can also reduce the supply by half. This is because frequency is directly proportional to supply, Fig. 6.34. Thus, we can finally reach a significant power reduction at the same throughput 2 CfVDD =4;

f =2;

VDD =2;

S

The quadratic dependence of power on supply yielded a significant drop in power. This can be extended by adding more parallel circuits. If we use three parallel circuits, power will be reduced by 9. Four parallel circuits provide a reduction by a factor of 16, and so on. There are two limiting factors that affect this • Reduction in supply directly corresponds to reduction in noise margins. This will lead to progressively worse noise resistance as more and more parallel circuits are added • Even though area is not as critical as power and speed, it is still important. At some degree of parallelism the additional area of the parallel units will be too costly

Fig. 6.35 Parallelism and power management. Left, the original circuit. Right, the low power circuit

The second method of active power reduction is by using pipelining. Pipelining is typically used to increase throughput or to solve setup-time violations when the critical path is too long. However, it can also be used to reduce power in a very similar manner to the use of parallelism. Pipelining will be used to increase the possible operating frequency. Then the actual used frequency is backed off by enough of a factor that the original throughput is restored. Power supply can then be reduced by the same factor that frequency was reduced by, leading to a significant reduction in power. Figure 6.36 shows an example on this. The top circuit is the original unpipelined combinational path. Its power is 2 P ¼ CfVDD

The pipelined circuit has two internal pipeline registers. If CLB1, CLB2, and CLB3 have equal delay (best case for pipelining), then the pipelined circuit can be operated at 3f, where f is the frequency of the unpipelined circuit. In such a case its power consumption would be 2 C3fVDD

However, if we only need the circuit to operate at f, we can back off the supply by a factor of 3. The active power thus drops by almost a factor of 9: Cf ðVDD =3Þ2 þ d where delta is the additional power consumed by the internal pipeline registers, which typically should be negligible. So a significant drop in power can be observed with a much lower overhead in area than using parallelism. There are two limiting factors to using pipelining to reduce power: • Latency is increased with more pipeline stages • Reduced supply corresponds to reduced noise margins, as with parallelism The third and most practical way to reduce power is slack minimization. The advantage of this method is that there are no penalties other than reduced noise margins. Consider for example the circuit in Fig. 6.28. Assume the following path delays in an arbitrary unit: path1 ¼ 10, path2 ¼ 15, path3 40, path4 ¼ 30, and path5 ¼ 20. It is obvious that path3 is the critical path, with a clock period designed to accommodate it. All the other paths will have a positive slack. For example, path1 will have a slack of 30. The existence of positive slack in a path means that the supply for this path is higher than needed. This is because positive slack means that the combinational path provides its output earlier than it has to. This means that the supply to this path can be reduced down until the combinational delay

202

6

Pipelines

Fig. 6.36 Pipelining and lower power. The top pipeline is internally unpipelined. The bottom circuit is internally pipelined but operated at the same frequency as the top pipeline and a lower supply

rises to become equal to the critical path, and the circuit would still function perfectly fine. Thus the supply for path1 can be reduced by a ratio of path3 delay/path1 delay ¼ 40/10. This would cause the delay of the combinational path1 to rise by the same factor. However, the clocking frequency of the registers remains the same. This is because we are in a synchronous pipeline where the clocking frequency for everyone is determined by the critical path delay, which does not change. Table 6.1 shows the results of slack nullification for the circuit. For each path, the ratio of its delay to that of path3 (the critical path) is the ratio of supply reduction. Power is reduced by the square of supply reduction. Paths with higher slack enjoy more power reduction. After supplies are reduced all paths have the same delay as path 3, so their slacks will all be zero. Thus the name slack nullification. Total power after supply reduction is obviously smaller than the original total power. The ratio of overall reduction is complicated and depends on how power was originally distributed among the paths. Figure 6.37 shows a very useful way to visualize the impact of slack nullification. The x-axis in the figure represents the path’s delay. The y-axis represents the number of paths which have that specific delay. So the curve represents a histogram of the delays of the circuit. Before slack nullification, slacks have a distribution with a relatively large variance. The lower limit on the x-axis must be zero, with no path having exactly zero delay. The upper bound of the curve is critical path delay, because no path could have delay greater than that of the critical path, otherwise it would become the critical path itself. If there is a unique critical Table 6.1 Path delays and powers after slack nullification

path, then the y-axis value at the last point of the curve will be 1. After slack nullification, none of the paths can preserve any slack. So all the paths would have the same delay as the critical path. The distribution changes into an impulse at the critical path delay with a magnitude equal to the total number of paths in the circuit. Figure 6.38 shows the distribution of slack before and after slack nullification. This differs from Fig. 6.37 in that the x-axis is slack rather than delay. Before slack nullification, at least the critical path has zero slack. The rest of the paths have a distribution of positive slacks. After slack nullification, all paths will have zero slack, and the distribution will become an impulse at zero slack with a magnitude equal to the number of paths in the circuit. Slack nullification apparently has no disadvantage except the inescapable drop in noise margins corresponding to supply reduction. However, there is another limitation that comes from practical considerations regarding the supply. In an integrated circuit, not all signals are treated equally. For example, clock signals are treated with care due to the very large loading they face (Sect. 13.7). Power supply is also not considered an ordinary signal. Supplies have to reach all logic gates, whether combinational or sequential. Supplies seen by gates also have to be of equal values, otherwise we will have serious complications. Thus supply (and ground) are distributed in a special way that consumes a lot of resources (Sect. 8.3). The path nullification technique described above assumes we can use any value of supply as demanded by each individual path. This cannot be implemented since it requires

Path

Original delay

Original slack

Original power

Supply reduction

New slack

New power

Path1

10

30

P1

4

0

P1/16

Path2

15

25

P2

2.67

0

9P2/64

Path3

40

0

P3

1

0

P3

Path4

30

10

P4

1.33

0

9P4/16

Path5

20

20

P5

2

0

P5/4

6.7 Managing Power in a Pipeline

Fig. 6.37 Distribution of path delay before and after slack nullification

Fig. 6.38 Distribution of slacks before and after slack nullification

an arbitrary number of supply values to be made available on-chip. A more practical approach is to use a finite number of supplies among which paths can choose. Typically this finite number is two, with a high supply and a low supply. High supplies are chosen for paths that need high performance because they do not have a lot of reserve slack. Low supplies are used for paths with plenty of slack to reduce their power. For example, assume the same delay figures in Table 6.1. But this time assume there are only two supplies available: A 1 V supply, and an 0.6 V supply. By testing the limits that each path can go down to we can decide which supply each path can use and thus end up with a potentially lower slack situation, although generally not a zero slack situation. This is summarized in Table 6.2. The lowest possible supply is calculated as the ratio of path delay divided by critical path delay multiplied by the

203

original supply (1 V). This is the value of supply, which, had it been chosen, would have led to zero slack for the path. By definition, the lowest possible supply for the critical path has to be the original supply, since the critical path has zero slack. The “Chosen supply” column indicates which supply is picked for the path. The way to make this decision is to test the delay of the path at the low supply (0.6 V). If the delay of the path with the low supply exceeds the critical path, then the path must use the high supply. If it does not, then the path may (and should) use the low supply. In case we had an infinite number of supplies to choose from, as in Table 6.1, the chosen supply would equal the lowest possible supply. If the path can use the low supply a “New delay” entry is calculated as: old delay  (High supply)/(Low supply). The new delay equals the old delay for paths that chose to keep the high supply. A new value for slack is then calculated based on the new delay. The new slack value will equal the old slack value for all paths that choose the high supply. Power saving ratio is the square of (High supply/Low supply) for paths that use the low supply and unity for paths that must use the high supply. Using a finite number of supplies will generally never lead to zero total slack, thus the total slack after minimization is still positive. In fact in Table 6.2, all noncritical paths still have some slack, even those that use the low supply. However, we can clearly see that using this approach will reduce total slack in most cases, with total new slack being lower than total original slack. Figure 6.39 shows the distribution of path delays if a small number of supplies is used. While the path distribution after supply reconfiguration does not become an impulse, the curve tends to have its mean moved toward the critical path and its variance reduced. How much movement there is

Fig. 6.39 Distribution of path delay with a finite number of supplies

204 Table 6.2 Path delays and power after slack minimization. The high supply is 1 V, the low supply is 0.6 V

6

Pipelines

Path

Original delay

Original slack

Lowest possible supply

Chosen supply

New delay

New slack

Power saving ratio

Path1

10

30

0.33

Low

16.67

23.33

2.78

Path2

15

25

0.375

Low

25

15

2.78

Path3

40

0

1

High

40

0

1

Path4

30

10

0.75

High

30

10

1

Path5

20

20

0.5

Low

33.33

6.67

2.78

Total

85

toward the critical path is related to how much residual slack remains. Both are related to the original distribution of slacks, and the number and values of supplies available. Power reduction techniques discussed in this section suffer from two limitations when exposed to highly scaled transistors (Chap. 10): • The dependence of delay on supply is not inverse. Current is not a quadratic function of supply, instead it is closer to a linear function. This causes delay to lose a lot of its dependence on supply, with diminishing returns the higher the supply. Most of the techniques discussed here are contingent on this inverse relation • Leakage is a major, sometimes dominant, source of power dissipation for a lot of technologies. Leakage power is not a quadratic function of supply the same way active (dynamic) power is The first factor is a major limitation on power reduction techniques. The second factor is mitigated by the fact that most modern transistors are FinFETs (Chap. 10). In FinFETs leakage can be largely controlled and dynamic power is back to being dominant. The most practical method for power reduction in modern technologies is slack minimization. It is not contingent on a quadratic power-voltage relation, and the supply reduction range still extracts some gains from the linear current.

In slack minimization, the entire path need not be assigned a single power supply. In fact, individual standard cells in the path can be assigned the high or low supply (Chap. 8). Most modern libraries contain multiple versions of standard cells for multiple values

55

of supply, allowing the mixing of cells in a path until slack is minimized without becoming negative.

Although we focused entirely on dynamic power in this section, some concepts can be extended to other forms of power dissipation. Short-circuit power dissipation can be reduced most effectively by reducing the duration of flow of crowbar current. This duration is primarily contingent upon the slopes of input signals. Thus, some sort of “slope management” is needed for input signals. Since input signals are output signals from a preceding stage, most slope management techniques do not focus on reducing the absolute slope of a certain input, but on equalizing slopes for all inputs and outputs in the circuit. Power management methods that reduce supply without impacting area or delay will also reduce leakage power. Thus slack minimization also reduces leakage power. In fact it reduces all forms of power. But slack management can be reformulated to better address leakage. In Sect. 10.3 we will find out that subthreshold conduction is most effectively reduced by raising threshold voltage. However, raising threshold voltage increases delay by reducing charges in the channel and reducing on-current. Many libraries provide two copies of the same cell, one using a high threshold, and one using a low threshold. This is similar to the choice of high supply and low supply cells. And as in the case of multiple supplies, we can assign high threshold cells to non-critical paths until their slack starts to become negative. This is exactly the slack minimization technique we used in this section, but with a focus on subthreshold conduction

6.8 Examples on Pipelining

6.8

205

Examples on Pipelining

1. Use the N-bit adder to understand how hierarchical designs can be obtained from high-level descriptions 2. Understand the need to align signals in a pipeline 3. Recognize the need to break down CLBs if throughput is not satisfactory 4. Realize that in circuits with multiple critical paths, all have to be broken down to improve performance 5. Understand how solving hold-time violations and setup-time violations can be contradictory. In this section, we will use three detailed examples that combine concepts of combinational and sequential design together with a higher level understanding of pipelining. In the first part of the section, we will design an N-bit adder and then fully pipeline it to obtain a higher throughput, understanding the overhead involved in aligning the inputs and outputs of the pipeline. In the second part, we will deal with a more generic logic function and understand how to approach a problem where pipelining between combinational logic blocks does not yield the required throughput, necessitating pipelining within the blocks. In the third part, we will consider a pipeline where hold-time and setup-time requirements are contradictory. Example 1 Design and pipelining of an N-bit adder To design an adder that performs N-bit binary addition, we first have to write down the addition operation, then figure out how it can be broken down into simpler building blocks. We can figure out how these simpler building blocks are built in CMOS, and then move up from there with a full understanding of the delay behavior of the building blocks. N-bit addition is shown below for a 4-bit example adding “0111” þ “0110”: 110 0111 0110 1101 For each bit position we do the following: add the two bits in the bit position together with the carry coming from the preceding position. We produce a sum bit in the same position, and move a carry bit to the next position. Carry can be “1” or “0”. The result is 5-bits since the production of a significant carry from the last bit position is a possibility. The main take away is that in each bit position (with the weak exception of the first bit position) we perform an identical operation, this is very suitable for a hierarchical implementation where we focus first on the architecture of

one small block, then build a much bigger circuit using repetitions of the small block. The building block in this case is the full adder, a digital circuit that accepts three bits (2-input bits and a carry-in, and produces two output bits, a sum and a carry. We are sort of using misleading language here. The full adder is in fact not a single logic gate, it is two logic gates. There will always be at least one logic gate per output, and since the FA (full adder) has two outputs it has two logic gates corresponding to two logic functions: sum (S) and carryout (Cout). The truth table for the two functions is shown in Table 6.3. For more information on full adders and N-bit addition consult Chap. 11. The best way to simplify a logic function is by drawing its K-map, a visual representation of the position of minterms that allows logic minimization. Figure 6.40 shows the K-maps of S and Cout. It is immediately obvious from the K-maps that S leaves absolutely no space for logic minimization. Its logic function has to be stated as S ¼ A′B′Cin þ A′BCin′ þ AB′Cin′ ABCin. This is in fact the expression of a 3-input XOR. Cout offers plenty of space for simplification, in fact all its minterms can be reduced so that there are only 2-variable terms in the expression: Cout ¼ AB þ ACin þ BCin. These two logic expressions can then be implemented as two CMOS circuits as shown in Figs. 6.41 and 6.42. This is done by first finding the expression of S′ and Cout’ as S ′ ¼ (A þ B þ Cin′)(A þ B′ þ Cin)(A′ þ B þ Cin)(A ′ þ B′ þ Cin′), and Cout’ ¼ (A′ þ B′)(A′ þ B′)(B′ þ Cin′), then implementing the connections in the PDN and their complements in the PUN (Chap. 3). It must be noted that a straightforward CMOS implementation of S and Cout is not necessarily the most efficient implementation. However, nonstandard CMOS is beyond the scope of this chapter. If we assume that branches are sized for worst-case resistances like the unit inverter, then all NMOS transistors in the S circuit have a sizing of 4, while all PMOS transistors have a sizing of 6. All NMOS transistors in the Cout circuit have a sizing of 3 while the PMOS will have a sizing of 4. This leads to intrinsic capacitance at S of (3  4 þ 4  6) Co ¼ 36Co. Intrinsic capacitance at Cout is (2  3 þ 3  4)Co ¼ 18Co. It is immediately obvious that the delay of S is much higher than that of Cout, in fact the intrinsic delay is double. A 4-bit adder can be constructed by cascading four full adders as shown in Fig. 6.43. This implementation directly mimics the long addition operation discussed above with each bit position adding its two bits to the carry from the previous position, producing a sum bit at the same position and a carryout to the next position.

206 Table 6.3 Truth table(s) of the full adder

6 A, B, Cin

S

Cout

‘000’

0

0

‘001’

1

0

‘010’

1

0

‘011’

0

1

‘100’

1

0

‘101’

0

1

‘110’

0

1

‘111’

1

1

Pipelines

loaded by multiple NMOS and PMOS gates. This external loading can be calculated by examining Cin inputs in Figs. 6.41 and 6.42. Cin’ feeds one NMOS and one PMOS transistor in the Cout circuit, it also feeds two NMOS and two PMOS transistors in true form in S as well as two NMOS and two PMOS in complement form. Cin’ means the output from the preceding FA does not observe the gates of the MOSFET in the following FA, rather the gate inputs of an inverter (which we will assume to be a unit inverter). This leads to the following calculations for external loading on Cout in the 4-bit adder:

Fig. 6.40 K-maps of full adder

If we assume registers at all the inputs and outputs of the 4-bit adder, then the critical path is longest path from any input register to any output register. It is obvious that S3 will be the last output calculated, and thus the critical path must end there. For S3 to be calculated C2 must be available, and C2 needs C1 to be calculated, which in turn needs Co. Thus the following operations must be done in series: Co, C1, C2, S3. This leads to a critical path as shown by the dotted line in Fig. 6.43. Calculating the critical path delay in Fig. 6.43 is simple if we assume a fixed value for carry delay Tc and for sum delay Ts. In this case total path delay is 3Tc þ Ts. Note that while Ts Tc, Tc tends to be more significant since it is multiplied by an integer factor. This factor is (N–1) where N is the number of bits, so carry delay becomes even more significant in larger adders. Since all the S outputs directly feed into the output registers, then neglecting loading imposed by the registers, we can calculate Ts as Ts ¼ 0.69  Ro  36Co, which is the sum intrinsic delay. Tc is a little more complicated. All carry outputs from FAs except the last are loaded by the inputs of the next FA. Cout Feeds the next stage as that stage’s Cin, thus being

• Loading from Cout circuit ¼ (1 þ 2)Co ¼ 3Co (one complemented input that gets distributed to two MOSFET gates) • Loading from S circuit, true inputs ¼ (4 þ 4 þ 6 6)Co ¼ 20Co • Loading from S circuit, complements ¼ (1 þ 2)Co 3Co (one complemented input that gets distributed to four MOSFET gates) • Total external loading ¼ (3 þ 20 þ 3)Co ¼ 26Co Thus total loading for Cout is (18 þ 26)Co ¼ 44Co, which further increases the urgency of addressing Cout rather than S. An important part of the digital circuit design loop is to find the critical path of the circuit, determine its operating speed, and then examine whether the performance is acceptable. If the speed obtained is not acceptable, one of the most efficient ways to address the throughput gap is to break the critical path by inserting registers, thus performing internal pipelining. This creates a new critical path, which is hopefully significantly shorter than the original. For the 4-bit adder, the ultimate level of internal pipelining involves putting internal pipeline registers between each FA. This is shown in Fig. 6.44 where registers are marked as solid black circles for simplicity. The new critical path is by definition the new longest path between any two registers. There are only two types of paths

6.8 Examples on Pipelining

207

Fig. 6.41 CMOS implementation of full adder S. If transistors are sized for worst-case resistance equal to unit inverter total self-loading is 36Co

in Fig. 6.44: Paths from inputs to S in all FAs and paths from inputs to Cout in all FAs. Noticing that the internal registers also break the loading that FAs impose on the Cout of the preceding FA, then both Tc and Ts now suffer from negligible external loading. This means that the critical path is now the path from inputs to S, whose delay is Ts. Note that this path is nonunique as there are four identical paths with the same delay in Fig. 6.44. Pipelining is contingent on each stage in the pipelined circuit being busy in all cycles beyond an initial pipeline filling stage. The fully pipelined adder in Fig. 6.44 does not

satisfy this since it catastrophically misaligns the signal. To understand what this means consider the example where the adder is adding the two numbers “1101” and “0010” in the first cycle followed by adding “0000” and “0001” in the second cycle. In the first cycle all A’s and B’s pass the input registers and are ready at the inputs of the FAs. However only the first FA can calculate properly since it is the only FA which also managed to get Cin. The second FA will have “1” and “0” as A1 and B1 but will have a trash value for C0. Thus it will calculate a trash output. It will, however, get the proper

208

Fig. 6.42 CMOS implementation of full adder Cout. If transistors are sized for worst-case resistance equal to unit inverter total self-loading is 18Co

value of C0 from the first FA in the second cycle, but by that time it will already have received A1 and B1 for inputs “0000” and “0001”, so it will be adding the wrong C0 to the inputs. This failure will also occur in all the remaining FAs

Fig. 6.43 Critical path of 4-bit adder. The path starts in the first FA inputs, passes through all the carries, and ends at the sum of the final FA

6

Pipelines

with the gap between A, B and Cin increasing as we head toward the MSB. This failure occurs because we did not align the signals properly after internally pipelining. Alignment is the process of ensuring that all the inputs of a CLB arrive there at the proper clock cycle, thus producing a meaningful output. Figure 6.45 shows the adder after additional input and output registers are inserted to ensure alignment. Examine the number of registers each input must pass through before reaching the ports of FA1. Ci must pass through two registers through either Ci, Ao, or Bo, followed by the first internal register. Input A1 and B1 will also pass through two input registers. Thus all signals must pass through two registers to reach the ports of FA1, which means the inputs will be perfectly aligned. Meaning that they all arrive at the combinational inputs in the same cycle. If we examine the inputs of FA2, they must all pass through 3 registers, which could be a combination of input registers or internal pipeline registers. The same applies for FA3, where its Cin as well as A and B all pass through four registers. Figure 6.45 also shows alignment registers on the outputs. This is to ensure that output bits are aligned and the output is produced as a single readable word. For example, S3 is ready after five cycles, which we know because the path to S3 from any input passes through five registers. For example A3 and B3 both pass through four input registers and then there is one more output register for S3, which adds up to five registers. But A0 and B0 also reach S3 in five cycles, one at the input register, three in the internal pipeline registers between the full adders, and one output register for S3. The same applies for all other inputs. However, S2 is ready in only four cycles. Any path from any input on which S2 depends to S2 will only pass through four registers. S1 is ready in three cycles, and S0 is ready in two. So to align S2 with S3, S2 must have an extra register to delay it by one cycle. S1 will thus need two extra registers, and S0 three. This is shown in Fig. 6.45.

6.8 Examples on Pipelining

209

Fig. 6.44 Internally pipelined 4-bit adder, the critical path is now a single sum delay. There is no path between two registers that is longer

Fig. 6.45 Internally pipelined 4-bit adder with input and output shift registers for signal alignment

Now if inputs “1101” and “0010” are presented at the inputs A and B, the output “1111” will appear on S five cycles later. These five cycles are the latency of the pipeline. However, if inputs “0000” and “0001” are fed into A and B in the cycle immediately following “1101” and “0010”, then the output “0001” will appear on S at cycle six, the cycle immediately following output “1111”. This means that after the initial latency the pipeline will produce a new output every cycle. Thus the throughput of this pipeline is 1/(critical path delay) samples per second, while its latency is 5  (critical path delay). This is how pipelines function. They consume a number of cycles in the beginning for the pipeline to fill. This period of time is called the latency and is normally measured in cycles. Once the pipeline fills, the first output appears. So the latency is also defined as the time between an input being given to the pipeline and the output corresponding to that particular input being produced. After the first output has been produced, a new output will be generated every cycle, this is throughput. Throughput in a pipelined design is normally much higher than in an

unpipelined design, whereas latency is an overhead of pipelining. Example 2 Internal violations

pipelining

to

solve

setup-time

For the second example, we will consider the case of the logic function F ¼ (ABCDEG þ HJKLMN)’. A straightforward implementation of the function as a single static CMOS gate will yield a gate with two long NMOS chains in the PDN, each corresponding to one of the long series Boolean terms. There will also be two groups of parallel PMOS in the PUN, each containing six transistors in parallel. Regardless of sizing, this direct implementation will lead to a very high intrinsic delay. The long NMOS chains will lead to high resistance if sized small or to high capacitance if sized large. The parallel PMOS groups will lead to very high loading on the output node due to the large number of transistor drains connected. If the circuit is sized for worst-case

210

resistance equal to the unit inverter, total self-loading at node F can be calculated as 6  2Co þ 4  6Co ¼ 36Co. Two NMOS transistors next to the output node contribute 6Co each and 6 PMOS transistors contribute 4Co each. The total time-constant of the circuit is thus 36RoCo. The logic function has two large terms, which is very suggestive that it should be broken down as follows: Ft ¼ (F1 þ F2)’, F1 ¼ ABCDEG, F2 ¼ HJKLMN. The number of transistors in this implementation is as follows: 4 transistors for F, and 12 transistors for F1 and F2 each. This adds up to 28 transistors, assuming input complements are available without additional inverters. If input complements need additional inverters, the total would increase by 24 transistors to 52. The original single gate implementation requires only 24 transistors and does not need any input complements. However, the second implementation, shown in Fig. 6.46, offers some interesting choices in terms of delay. Delay from inputs to output F can intuitively be calculated as max{delay of F1, delay of F2} þ delay of Ft. The delays of F1 and F2 are not added because they are parallel. Since they have equal delay, either can be used to obtain total delay. To implement F1 we obtain the expression F1′ ¼ A ′ þ B′ þ C′ þ D′ þ E′ þ G′. A straightforward CMOS implementation assuming that input complements are available with no additional hardware yields a 6-input NOR gate for F1 and F2 each, while Ft is a 2-input NOR gate. The intrinsic time-constant of F1 is 18RoCo of which 12RoCo is contributed by one PMOS in the PUN and 6RoCo are contributed by 6 NMOS in the PDN. F1 is also externally loaded by the F gate, leading to an external time-constant of 5RoCo. The 2-input NOR has a time-constant of 6RoCo. The total time-constant of the circuit is 29RoCo which is an improvement over the single gate implementation. The time-constant calculation above assumes the critical path passes through either F1 or F2 and then Ft. This implies

6

Pipelines

there are only registers at the inputs and output of the circuit, it also ignores the loading such registers impose on the CLBs. If we assume that we also internally pipeline the circuit by adding registers between F1 and Ft and between F2 and Ft, and making the same assumption about register loading to keep the comparison fair, the new critical path is now max {F1, F2, Ft} instead of F1 þ Ft. The new critical path is thus 18RoCo, which is an improvement over the unpipelined 29RoCo. Note that this also removes the external loading of Ft on F1 and F2. We can already observe that this improvement from 29RoCo to 18RoCo does not seem dramatic. The reason is that one of the gates (F1 and F2) has a much higher delay than the other (Ft). For pipelining to be effective, the individual CLBs before pipelining have to have comparable delays. The question is, what do we do if we need to improve speed further. The answer is always to examine the critical path and try to break it down. The critical path here is a single gate, either F1 or F2, each a 6-input NOR. Thus breaking down the critical path involves more than simply inserting registers: we have to examine and manipulate the original gates. F1 ¼ ABCDEG, we can rewrite F1 as the result of two smaller gates F11 and F12 as follows: F1 ¼ (ABC)(DEG), F1′ ¼ ((ABC)(DEG))′ ¼ (ABC)′ þ (DEG)′ ¼ F11 þ F12. Thus F1 is a 2-input NOR gate as shown in Fig. 6.47, while each of F11 and F12 are 3-input NAND gates since F11 ¼ (ABC)′. Figure 6.47 shows the new architecture of the circuit with the same decomposition extended to F2. Note that if we do not also decompose F2, we will not end up with a faster circuit since the critical path would simply become F2. That is to say, when breaking a critical path, be sure to break all critical paths. The total time-constant of the unpipelined circuit in Fig. 6.47 is the summation of the time-constants of any three stages along the path, for example F11, F1, and Ft. When fully internally pipelined, the critical path becomes the maximum delay among all the blocks in Fig. 6.47. The critical paths thus become F11, F12, F21, and F22 with a time-constant of 9RoCo. Example 3 Hold-time violations as a fundamental limit

Fig. 6.46 First level breakdown and pipelining

Consider the pipeline in Fig. 6.48. It consists of four combinational logic blocks, each is independent from the others, in other words, none of the blocks feed another. However, they still form a synchronous circuit because they use the same clock. This circuit helps us illustrate a major issue: how hold-time violations arise and how they should be addressed. As discussed in Sect. 6.6, hold-time violations arise because

6.8 Examples on Pipelining

211

Fig. 6.47 Second level breakdown and pipelining

there are fast paths through the circuit which cause the inputs of certain registers to change before they should. A hold-time violation can only be addressed by adding enough combinational delay to resolve the violation. This process is systematic enough that it is often performed by design tools without the designer even being aware. Consider, for example, the fully pipelined N-bit adder in Fig. 6.45. Multiple shift registers are used at the inputs and outputs to align signals. The combinational delays between registers in these shift registers is null, and a hold-time violation is guaranteed. The design tool will detect this and add enough static CMOS buffers between register pairs so that combinational delay exceeds hold-time. Thus, all these “offending” paths will not show in the implementation reports (Sect. 8.7) as hold-time violations. So does this mean that hold-time violations are never a concern? No. In fact, hold-time violations can be such a severe issue that they would require a total redesign. This occurs when setup and hold-time violation requirements contradict. The circuit in Fig. 6.48 helps illustrate this. To uncover setup and hold-time issues in the circuit we must uncover all the worst and best-case delays for all the paths. The best-case delays expose us to hold-time violations. Worst-case delays expose us to setup-time violations. These delays are listed in Table 6.4. Now assume we need the circuit to operate with a clock period equal to 0.69  16RoCo þ Tsu þ Tcq. Assume also that the hold-time is 0.69  2.5  RoCo. We can see from the first row in Table 6.4 that the worst-case low to high

Fig. 6.48 Synchronous pipeline consisting of four independent CLB’s. There are registers at all the inputs and all the outputs of all the CMOS gates

delay of the 4-input NOR will have exactly that maximum delay we can accommodate. This makes the 4-input NOR the critical path. It will have zero slack while transitioning from low to high. All other paths will have various positive slacks as shown in Table 6.5. The circuit will thus have no setup-time violations. Next, we examine hold-time violations. The hold-time time-constant is 2.5  RoCo. The first row of Table 6.4 shows that the best case of high to low transition is the only hold-time violation. The time-constant of this transition is 2RoCo, which is less than the hold-time. What is curious about Table 6.4 is that the hold-time violation occurs in the critical path. Since setup and hold-time violations are likely to occur in the paths with longest and shortest delay respectively, we would expect the two to occur in completely separate paths. However, this is not true. Because a single CLB has multiple possible propagation delays, it can cause both setup and hold-time violations. Thus all paths, including the critical path need to have their best-case delays examined for hold-time violations. This is not just a curiosity, it is a very likely phenomenon, especially in large pipelines where CLBs have many potential paths. In all cases, the hold-time violation we just detected is very simple to address. Without changing logic, we can solve the problem by adding an even number of inverters to

212

6

Table 6.4 Time-Constants for low to high and high to low best case and worst case for the four paths in Fig. 6.48

Table 6.5 Slacks for the paths in Fig. 6.48. Slacks are calculated for both the best and worst case. The critical path time-constant is 16RoCo. Slacks are very helpful in detecting hold-time violations. Larger positive slacks are more likely to cause hold-time violations

Pipelines

Path

Worst-case HL

Worst-case LH

Best-case HL

Best-case LH

4-input NOR

8RoCo

16RoCo

2RoCo

16RoCo

3-input NOR

9RoCo

9RoCo

3RoCo

9RoCo

2-input NOR

6RoCo

6RoCo

3RoCo

6RoCo

Inverter

3RoCo

3RoCo

3RoCo

3RoCo

Path

HL slack worst case

LH slack worst case

HL slack best case

LH slack best case

4-input NOR

8RoCo

0

14RoCo

0

3-input NOR

7RoCo

7RoCo

13RoCo

7RoCo

2-input NOR

10RoCo

10RoCo

13RoCo

10RoCo

Inverter

13RoCo

13RoCo

13RoCo

13RoCo

the output of the 4-input NOR. The smallest symmetric inverters we can add are unit inverters. The “fixed” path is shown in Fig. 6.49. To confirm that this solves the hold-time violation we have to calculate the new best-case delay. Notice that the addition of the buffer leads to two effects. First, there is an additional external load on the NOR gate. Second, the buffer itself adds more delay. The best-case path time-constant in Fig. 6.49 can be calculated as: sHL;b:case ¼ R80 ð16 þ 3ÞC0 þ R0 ð3 þ 3ÞC0 þ R0 3C0 sHL;b:case ¼ 11:375R0 C0 The new best-case delay is significantly higher than the hold-time, and thus the hold-time violation has been resolved. However, we have added combinational delay in cascade with the original critical path in Fig. 6.48. Thus, we have to check that we have not created new setup-time violations. Without calculation, we can guarantee that such violations have been created. The critical path originally had zero slack. The addition of inevitable combinational delay in cascade will create negative slacks. The question is how much, and how we solve it. The new worst-case delay in Fig. 6.49 is sLH;w:case ¼ R0 ð16 þ 3ÞC0 þ R0 ð3 þ 3ÞC0 þ R0 3C0 ¼ 28R0 C0 This is obviously more than the design requirement of 16RoCo. Thus, a negative slack has been created, and its value is 12RoCo. This is the main danger of hold-time violations: they often occur in paths where solving them will create setup-time violations. Note that this contradiction does not only happen when both the shortest and longest path are in the same CLB. It occurs whenever adding enough

Fig. 6.49 Offending path from Fig. 6.48 with hold-time violations fixed

combinational delay to solve hold-time violations creates negative slack for the worst case. So is this a hold-time or a setup-time violation? This is a hold-time violation. Even though the violation manifests as a setup-time violation when we try to address it, the root cause is that there is a path that is too rapid through the CLB. This contradiction is very challenging to solve. In fact, it requires deep redesign. The easiest way to resolve the contradiction is to further pipeline the path, creating a credit of positive slack. In Sect. 6.6, we clearly stated that hold-time violations can never be addressed by reducing clock frequency or further pipelining. However, in this case we are further pipelining so that when we add combinational delay to resolve the hold-time violation, we do not end up creating new negative slacks.

6.8 Examples on Pipelining

The solution is much simpler for the specific circuit in Fig. 6.48. In fact, the hold-time violation can be easily resolved if all the transistors in the PDN of the 4-input NOR are resized from 2 to 1. This increases the best-case delay of the 4-input NOR to 3RoCo, removing the hold-time violation. Note that this does not change the behavior of the circuit or introduce new setup-time violations. The period of the clock is determined by the low to high delay of the 4-input NOR. And as shown in Table 6.5, making the PDN stronger only creates further slack for that transition.

6.9 1. 2. 3. 4. 5.

Impact of Variations

Discuss sources of process variation Distinguish in-chip variations from process variations Understand the fast, slow, typical corners Conclude the impact of design corners on performance Compare setup and hold-time violations in terms of severity of impact.

Chapters 7, 8, and 14 will demonstrate that the CMOS fabrication process is a complicated multifaceted operation. Things can and often do go wrong. This leads to a reduction in the yield of the process (Sect. 14.1). But even when the process produces “working” die, the results often suffer from significant variability. In other words, not all die that result from the process behave the same way. There are many reasons for die to come out differently. The process can lead to variations from day to day, wafer to wafer, or die to die. Conditions in the cleanroom (Sect. 7.1) can vary. This may be as seemingly innocuous as a slight variation in ambient humidity. But it often leads to a significant impact on device performance. Within the same wafer, different die can be processed differently. In Sect. 7.5, one obvious source of in wafer variability is in the photoresist coating, where the spread of the liquid resist can never be guaranteed to be even. But any step in photolithography can deviate from the norm, leading to variations between chips on the same wafer. Variations can also occur within the same die, with logic gates behaving differently based on their location on the chip. On-die variations are often systematic. That is to say, the variability occurs due to predictable reasons and affects all chips the same way. In-die variations normally fall under one of three categories: • Variations in clock delivery. This is a very systematic source of variations. It means that the phase of the clock reaching different registers can vary substantially. Clock variability is a topic of its own and Sect. 13.7 is dedicated to discussing its impact and mitigation

213

• Variations in power supply delivery. In Sect. 8.3 we will discuss different ways to distribute the power grid to different gates to reduce variations in perceived supply due to resistive drops over wires. However, drops are inevitable, and thus it is inevitable that some gates will observe lower supply than others. A lower supply translates into a lower available on-current, which often translates into longer delay • Variations in temperature. Power density varies through the chip depending on the activity factor. Areas with higher switching activity will dissipate more power, leading to more heating. Heat degrades the drift velocity of carriers, leading to a degradation in delay As discussed above, in-die variations are more systematic and predictable than process variations. In fact, for all three causes of in-die variability, the pattern of variation can be graphed across the area of the chip. A good simulation or prototyping phase may allow these patterns to become well established and better addressed. Process variations are often perceived as more inevitable and random. Whatever the source of variations, they lead to deviations in parameters of the resulting circuit. This often impacts performance in one way or another. The impact can be modeled at different levels of abstraction. One useful model is to collapse the effect of variation as impacting the mobility of carriers. Thus all variations affect the mobility of electrons, holes, both, or neither. The impact on mobility can be detrimental or positive. In other words, variability can raise or lower the mobility of electrons or holes. The value of this model is that it allows designers to model the impact of variations in simulation. The designer can ask the question: what if variations lead to this much increase in electron mobility, does the circuit still function? Does its speed suffer or improve? And do any new violations arise? Obviously, the mobility of both charge carriers are now modeled as random variables. We can also safely assume that, if the process and design tools are mature, the nominal values of mobility are the means of these variables. These are called the typical values of mobility. We can also assume extreme maximum and minimum possible values for mobility. These are the maxima and minima that could reasonably result from the process. They are usually some multiple of the standard deviation of the random variables. Thus for the NMOS, we can expect its mobility to fit within a range, from the fastest (highest mobility) to the slowest (lower mobility). In a good process, the range should be narrow, and the mean should be the nominal value used in the design library (Chap. 8). There is also a similar treatment for PMOS.

214

The extreme cases of each device are of particular interest, because they lead to extremes in behavior. Thus for NMOS we are interested in the case where it is fast (F), slow (S), and typical (T). And similarly for PMOS. Any combination of NMOS and PMOS variations will lead to what is known as a “design corner”. Specifically, a design condition in which the two MOSFET types lie in a certain extreme, straining the performance of a circuit in a different way. There are four proper “corners”, namely: FF, FS, SF, and SS. FS means that the NMOS devices are fast, while the PMOS devices are slow. FF and SS are symmetric corners that affect both devices equally. FS and SF are skewed corners that affect the two devices differently. Colloquially, the design conditions TT, FT, ST, TS, and TF are also sometimes termed “corners”; although in reality they always represent less strenuous situations than the four true corners. Corners can also affect different areas of the die. This follows from in-die variability as discussed above. For in-die variability, skewed corners are very rare. If temperature or power variations impact an area of the chip, they will impact both device types equally. Symmetric corners are often thought of as less dangerous than skewed corners. Because they impact both device types identically, the chip will suffer in a more predictable and systematic manner. To a first order, SS chips will still be operational, albeit at a lower operating frequency than a TT chip. Similarly, an FF chip will be operational at a higher frequency than a TT chip. Commercial microchips are often graded according to a speed level, this is often the result of symmetric process corners. Chips are automatically sorted into bins according to the speed grade and sold at a commensurate price. Asymmetric process corners are often considered a more major headache. The most obvious impact of a skewed corner is asymmetric delay. Low to high and high to low delays will be skewed relative to their designed values. For register pipelines this might not lead to much beyond a change in the operating frequency. But in latch loops, Sect. 9.12, this would require an adaptation of the clock duty cycle; and even then may throw slack borrowing off balance. In dynamic circuits (Chap. 5) skewed corners can also cause functional failure if the duty cycle is not adapted. The corner will cause evaluation and precharge times to deviate from their designed values. This requires the clock to be properly redivided between the two phases. Skewed corners are also dangerous because they impact static behavior. For ratioed logic (Chap. 2), skewed devices will change the value of the ratioed output. In CMOS, the skew will change the values of Vil, Vih, and more critically logic threshold. In all cases, the noise margins will be impacted, and the impact will always be undesirable.

6

Pipelines

But the above discussion is a little bit misleading. Symmetric variations are almost as concerning as skewed variations, especially in the design and prototyping phase. To explore why, we need to ask a couple of questions: • Which design corner is most likely to cause setup-time violations? Which design corner is most likely to cause hold-time violations? • Which is more “dangerous” a setup-time violation, or a hold-time violation? Assume we have ensured that for nominal value, i.e., the TT corner, our design is free from setup and hold-time violations. We have to consider if the devices become faster or slower, which type of violation this is most likely to cause. The SS corner is clearly more likely to cause a setup-time violation, while the FF corner is more likely to cause a hold-time violation. The SS corner will create slower devices in both pull-up and pull-down, leading to a symmetric increase in worst-case delay. This will increase the delay of the critical path, and lead to the creation of new negative slacks if the typical clock period is applied. The FF corner will improve all delays in the circuit, leading to the creation of new fast paths with larger positive slacks. These paths can possibly create emergent hold-time violations depending on the amount of slack created. At first glance, it might seem like setup-time violations are more dangerous than hold-time violations. Setup-time violations lead to a reduction in the available clock period and a decrease of all slacks. This requires us to either accept the fact that we have to operate at a lower frequency, or to go in and break the critical paths until we resolve all unplanned violations. Hold-time violations are apparently very easy to solve. All that is required is the insertion of additional combinational delay in the form of buffer stages until enough delay is created after the clock edge to address the violations. This is so systematic, that design tools often address all hold-time violations without even reporting them. But the reality is that hold-time violations are significantly more dangerous than setup-time violations. The main reason is that setup-time violations can be diagnosed or even “resolved” by lowering the clock frequency. If we have a finished die that does not work properly under typical clock frequency, the first thing we might attempt, is to lower the clock frequency. If the chip works at a lower frequency, no matter how much lower, then we both know that a setup-time violation has occurred, and what is the hit in operating frequency that we have taken due to the violation. Hold-time violations cannot be resolved by lowering clock frequency. Hold-time violations occur after the active

6.9 Impact of Variations

edge of a register clock. Increasing the clock period means pushing the subsequent active edge further from the current edge. But since the subsequent edge was never the problem for the hold-time violation, this will solve nothing. If a hold-time violation occurs due to an FF process variation, then the circuit will never function. To solve the violation, we have to go back to the design stage, insert combinational delays, and go through fabrication again. Even worse, the finished chip with the violation will not give any indication of where the hold-time violation occurred, or even that the failure has occurred due to hold-time violations. This is why it is very important to ensure before fabrication, by simulation, that the FF corner will not create emergent hold-time violations. It is also misleading to assume that design tools can automatically address all hold-time violations. Even at the design stage hold-time violations often require more designer involvement than setup-time violations. The misconception stems from the impression that the critical path of a pipeline will be in a certain, more complicated, combinational logic block. While hold-time violations must occur in certain, simple, combinational logic blocks. Simple and short CLBs will very often create hold-time violations. But these are exactly the kind of violations that the design tool will automatically address. The empty stages between shift register registers, or logic stages containing only an inverter; will always require the tool to insert buffers to prevent hold-time violations. But hold-time violations can also occur in seemingly complicated logic blocks. Strictly speaking, a hold-time violation may even occur in the critical path (Sect. 6.8). This is no contradiction. Recall that all CMOS circuits have

215

multiple values for propagation delay depending on the input patterns and the number of active branched in the PDN or the PUN. Thus, a combinational logic block with a very small slack, can still cause a hold-time violation. This is because slack is calculated from worst-case delay, while hold-time violations may occur due to best-case delays. In fact, this is exactly the kind of situation which leads to complicated designs decisions vis a vis hold-time. It is not a very uncommon occurrence for the design tool to find a combinational logic block where solving hold-time violations creates setup-time violations and vice versa.

It might still not be very clear why reducing operating frequency addresses setup-time violations but not hold-time violations. Setup violations occur before the active edge. They occur because we did not allow enough time for data to exit the path to the input of the output register with more than a setup-time left before the active edge. Thus pushing the next active edge away resolves the violation because we leave extra time before the edge. Hold-time violations occur immediately after the active edge. We had to hold data for a certain time, but data exited the path too fast. Everything that has to do with hold-time happens in a short duration after the active edge. The next active edge comes a very long time after hold-time and its violations have already happened. Thus pushing the next active edge further solves nothing.

7

CMOS Process

7.1

Setting and Location

1. Discuss the question of fabrication of highly miniaturized, integrated components 2. Understand the need for classified environments in semiconductor fabrication 3. Understand how clean rooms are classified 4. Describe how clean rooms achieve a clean air environment 5. Realize that classification extends beyond air quality to lighting purity. Digital Integrated Circuits typically contain circuits of the CMOS logic family (Chap. 3). Some circuits combine CMOS with specialty circuits such as dynamic CMOS (Chap. 5) or mixed-signal and analog circuits. In all cases, the number of transistors in a single digital chip is typically in the millions, often in the billions. To accommodate such a large number of components on a single die, the feature size of the transistor has to be very small. There are two questions that arise from this 1. How can we fabricate such microscopic components. This is the main question considered by this chapter. 2. How can we manage designs with such a large number of components. This is a question answered by Chap. 8. An Integrated Circuit (IC), alternatively called a microchip, is a circuit where all the components are integrated on a single chip. A “chip” or “microchip” is a piece of single crystal silicon. Thus all components are formed of or grown on the single crystal. Whereas most integrated circuits need to be combined with other ICs or discrete components in a larger system to function properly, the functionality of the IC itself is self-contained and complete. ICs are made in fabs, short for IC fabrication facility. Fabrication facilities differ in size and complexity from industrial level sites that fabricate ICs en masse for

commercial consumption, to small-scale educational laboratories that fabricate very low volumes. However, all fabs share a number of features. Fabs have highly controlled and clean environments. The number of particles in the air, and the mixture of gases that form the air has to be controlled very precisely. The reason is that in many steps of the fabrication process, the surface of the IC is exposed. The surface can react with pollutants, and since silicon conductivity is very sensitive to impurities, this reaction has to be controlled. Thus the reactive environment to which the wafer is exposed has to be managed very carefully. This includes having inert and clean air, and also having precisely controlled temperature and humidity. A semiconductor fab facility is a “classified environment”. That is, an environment where cleanliness of the air is controlled to within certain limits. Table 7.1 illustrates different categories of clean rooms. The first column of the table indexes the different cleanroom classifications. The following columns define the maximum number of particles of a particular diameter per cubic foot of air in a room for the given classification. For example, the first row defines a class 1 clean room as a room where there are no particles of diameter greater than 5 lm in the air, there is at most one particle of diameter 0.5 lm in a cubic foot of air, at most 3 particles of diameter 0.3 lm in a cubic foot of air, at most 7 particles of diameter greater than 0.2 lm, and at most 35 particles of diameter greater than 0.1 lm. A class 10000 cleanroom, on the other hand, can have up to 10000 particles of diameter greater than 0.5 lm, but only 70 particles of diameter greater than 5 lm in a cubic foot of air. It can have any number of particles smaller than diameter 0.5 lm since this class is not controlled for these particle sizes. Semiconductor fabrication must take place in class 1 clean rooms, the strictest of classifications. However, testing and packaging of exposed wafers (Chap. 14) can take place in less strict environments, with class 100 often being

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_7

217

218 Table 7.1 Clean room class definitions. Each column represents the maximum number of particles of a particular size per cubic foot of air. N/A entries indicate the particular class does not control for the given particle size

7 Class

>0.1 lm

>0.2 lm

>0.3 lm

1

35

7

3

>0.5 lm 1

CMOS Process >5 lm 0

10

350

75

30

100

N/A

N/A

N/A

10

0

100

0

1000

N/A

N/A

N/A

1000

7

10000

N/A

N/A

N/A

10000

70

100000

N/A

N/A

N/A

100000

700

Fig. 7.1 Cross sectional side view of a clean room. Entry is through an airlock. Positive air pressure is maintained in the room. Air enters through controlled filtered inlets and exits through controlled outlets. Lighting is controlled

enough. Board fabrication and component mounting (Sect. 14.6) can take place in class 10000 or even class 100000 rooms. A clean room environment is achieved through the use of powerful air conditioning systems combined with extremely high-efficiency air filters. A cleanroom has air flowing in at a high volumetric rate. Air inflow takes place only through controlled inlets. These inlets allow cool air to flow in only through high efficiency particulate air (HEPA) filters. Impurities that flow through a HEPA filter get trapped in its filter material, allowing only clean air to flow through. Cleanrooms have tightly sealed walls and ceilings, allowing no air to flow from the outside environment into the room. A positive air pressure is maintained relative to all surrounding rooms. Theoretically, air should only exit the room through controlled outlets, but if there are any open seems in the walls, positive air pressure ensures that clean air flows from the room into the surrounding environment, rather than dirty air flowing into the clean room from the outside environment. Access to and from a clean room is through an airlock. The airlock allows people to enter and exit while

maintaining air cleanliness and pressure in the room. A clean room is illustrated in Fig. 7.1. Very high class rooms (class 1, 10, and sometimes 100). Also require that air flows in predictable straight line patterns down from inlets and into the outlets. This ensures that the clean air does not disturb materials in the room, causing particles to become airborne. This pattern of air flow is called Laminar Air Flow or LAF for short, Fig. 7.2. Higher classification rooms require air to be changed more often. This is characterized by the number of air exchanges per hour. This is the number of times the entire volume of the room is pumped in and out per hour. The higher the number of exchanges, the faster pollutants can be flushed out of the room, and the higher the classification. To maintain the purity of the environment, all humans entering a clean room have to be properly dressed in clean room clothes. These overclothes completely cover the body and face, and are complemented by goggles and breathing apparatus for class 1 rooms. This protects the operator from some of the dangerous materials used in the fabrication process. But more importantly it prevents particles from the operator contaminating

7.1 Setting and Location

219

Fig. 7.2 Laminar air (left) flows in regular planes. Non laminar air (right) flows in random directions

the cleanroom environment. Humans typically shed enormous amounts of particles from skin, hair, breathing, and perspiration. The overclothes are made from materials that do not shed particles and fibers the way ordinary clothing would. The flooring of a clean room is made of a non-particle shedding material. Epoxies and polymers are solid and clean enough as long as they are not scratched. As will become clear shortly, CMOS fabrication is a process that relies on the interplay of light and chemistry. Thus the cleanroom environment also has to be photo-clean, meaning that any sources of light or electromagnetic radiation around the visible spectrum have to be tightly controlled. Cleanrooms are typically lit in dim heavily yellow-orange tinged lighting rather than typical light bulb illumination.

7.2 1. 2. 3. 4. 5.

Photolithography Iteration

Understand the flow of the photolithography process Realize the role that light plays in CMOS processing Understand what a photoresist is Form a preliminary idea of the role of masks Comprehend that the CMOS process is composed of multiple iterations of photolithography.

Photolithography is the process by which small features are formed on an IC. As we will shortly find out, the reason we can fabricate such small features on a chip is the result of the interplay of chemistry and optics. The word photolithography comes from Greek. When translated, it roughly means “drawing with light on stone”. This rough phrase in fact describes what happens in photolithography very accurately. We will be using light to draw patterns, but on glass instead of stone. All operations discussed in this section will be discussed in much more detail in Sect. 7.5. This section aims to establish how a single iteration flows, stressing similarities between iterations. Section 7.6 will discuss a more realistic process with multiple iterations, stressing the differences between each iteration.

Fig. 7.3 Silicon wafer, figure shows part of the side cross section

The photolithographic process starts with a wafer of a certain type of silicon. In this example, the wafer is p-type (Fig. 7.3). The wafer will theoretically form the bodies of all the NMOS transistors. The wafer has to be very smooth; CMP is often used to ensure this. See Sect. 7.5 for a definition of CMP. This wafer is then allowed to “grow” a layer of silicon dioxide, Fig. 7.4. Silicon dioxide forms when silicon is heated in an oxygen-rich environment. Its growth rate on silicon is dependent on the temperature and the concentration of oxygen. The thickness of the grown oxide can be controlled very precisely if conditions in the environment are controlled precisely. Once the oxide is grown, a layer of a material called photoresist is spread and solidified over the entirety of the wafer. A photoresist is a material whose chemical properties are radically changed by exposure to light of a certain wavelength. Thus the need for the cleanroom to be controlled photographically. To learn more about the photoresist coating step, as well, as all the steps discussed in this section, see Sect. 7.5. The result of coating is shown in Fig. 7.5. Once the photoresist solidifies a layer of opaque material, called the photomask, is held over the wafer. Light is shined

Fig. 7.4 Oxidation, wafer is allowed to grow a covering layer of silicon dioxide (gray)

220

7

CMOS Process

Fig. 7.5 Photoresist coating applied to wafer, fully coating the oxide Fig. 7.7 Developed wafer with exposed photoresist stripped away by dipping in a developer bath

Fig. 7.8 Wafer with etched oxide under developed photoresist

Fig. 7.6 Exposure, the wafer is exposed to light through a “mask”. Photoresist drastically changes where exposed

through the mask. Light will pass through the mask only where it is transparent and will form a pattern on the photoresist, Fig. 7.6. Notice that light shone through the mask also passes through a system of lenses below the mask. This allows light that passed through the mask to be focused by a large ratio. The main reason that very small features can be realized on a wafer is due to one or both of: 1. The fact that high precision beams and optics can be used to draw the photomask. 2. The photomask can be drawn very precisely using electron beams, and optics can be used to focus such beams. The wafer is then dipped in a development bath. The development liquid is typically an organic solvent. The solvent can strip away any photoresist that was exposed to light. But photoresist that was protected by the mask as well as oxides remains behind, Fig. 7.7. The wafer is then dipped in a bath of a certain combination of acids. These acids “etch” or eat away at silicon dioxide. But they do not react with the photoresist. Thus the feature drawn on the photoresist by the light is transferred

through the oxide as an opening all the way down to the body, Fig. 7.8. The remaining photoresist is then removed by using a solvent that can remove unexposed photoresist (Fig. 7.9). If the wafer is now inserted in an environment where impurities are introduced, these impurities can infiltrate and change the nature of the silicon in areas exposed by photolithography. Areas where silicon dioxide still exists cannot react with the impurities since the silicon is covered, and the oxide is inert. Thus, for example, if donors are somehow introduced, they change the silicon into n-type only in areas where the feature was exposed, as shown in Fig. 7.10. What we described in this section was one step, that realized one feature on the wafer. The CMOS process in its entirety requires many iterations of photolithography to realize all the features of the two types of MOSFET. Thus photolithography is a cycle that has to be repeated many times, Fig. 7.11. A simple CMOS process with multiple iterations is discussed in detail in Sect. 7.6.

Fig. 7.9 Photoresist stripped away from unexposed areas

7.2 Photolithography Iteration

221

7.3 1. 2. 3. 4. Fig. 7.10 Feature realized in opened area. In this case, the feature is an area of n+

The majority of photolithographic iterations are identical except for the last step: feature realization. Some of the different feature realization operations include CVD, PVD, ion implantation, and diffusion. All these operations are discussed in detail in Sect. 7.5 and utilized in an integrated flow in Sect. 7.6. Section 7.7 discusses more advanced techniques that can be combined with photolithography. Notice also that in practice the feature realization step does not realize just one feature on the wafer, i.e., it does not realize just one area of n+. Typically each pass of photolithography realizes one type of feature at the end of the iteration. That is to say, we would realize all n+ areas on the wafer in one step.

Fig. 7.11 An iteration of photolithography

Account of Materials

Contrast silicon by type of impurity Contrast silicon by the amount of crystallization Understand the role that insulators play in CMOS Understand the role that metals play in CMOS.

Figure 7.12 shows a cross section of a MOSFET, particularly an NMOS transistor. The transistor has a drain and a source, each n+; as well as a gate shown in red. There is also an area of p+ doping in the substrate. This is used to create an ohmic contact between the body and ground. See Sect. 1.14 for why this is necessary and what would happen if we create a direct metal contact with the p-substrate. Metal wires (blue) are used to connect terminals of the device to other terminals in the circuit. In Chap. 13 we will further discuss the role of metals in creating wires. There is silicon dioxide insulating all materials above the bulk from each other. If we try to summarize the materials we need to fabricate to obtain the device in Fig. 7.12:

222

7

CMOS Process

Fig. 7.12 Cross section of an NMOS. The p+ area to the right is used to create an ohmic contact with the body and connect it to ground

• p-type silicon to form the body. This is normally the bulk of the wafer so we do not normally “fabricate” it. However, because CMOS needs both types of devices, it will be necessary to create large n-type “wells” in which PMOS transistors are created. Thus we will need to create relatively lightly doped n and p-type regions • n+ silicon for the drains and sources of NMOS. But also to ohmically contact the N-well for the body connection of PMOS transistors • MOSFET gates. As discussed in Sect. 1.15, they are made using silicon. We will understand in Sect. 7.5 why this has to be so. Chap. 1 showed that if the silicon of the gate is heavily doped, the device would behave as if the gate were metal. However, in Chap. 13 we will recognize that in terms of delay, silicon and metal are significantly different • Metal for almost all wires • p+ for PMOS sources and drains and to create the body contact to ground for NMOS transistors • A relatively thin oxide to form MOSFET capacitors • A relatively thick oxide to insulate all the materials in the wafer from each other While it is obvious that our primary material is silicon, we have to be able to differentiate the types of silicon we use. The primary classification of silicon is by the impurity type and level: intrinsic, n-type, n+, p-type, and p+. We already understand that impurities dramatically change the

Table 7.2 Silicon as classified by impurity and where it is used

conductivity of silicon, both quantitatively and in terms of the main charge carriers. Intrinsic silicon has a much lower conductivity than any form of doped silicon. Also, n-type silicon has less conductivity than n+. n-type generally has better conductive properties than similarly doped p-type due to the better mobility of electrons. Intrinsic silicon has no desirable properties and is thus not used in any step in the CMOS process. In fact in Chap. 1, we concluded that when the carrier concentrations in silicon approached intrinsic levels, we could consider the silicon to be depleted for all intents and purposes. n-type silicon is used to form the bodies of PMOS transistors. n+ silicon is used to form the sources and drains of NMOS transistors, as well as Vdd contacts for the bodies of PMOS (more on this later). p-type silicon is used to construct the bodies of NMOS transistors. p+ is used to form the sources and drains of PMOS transistors and the ground contact for the bodies of NMOS. This is summarized in Table 7.2. Note that it is not desirable to heavily dope transistor bodies. Heavy doping reduces resistance and thus delay. However, impurities have a strong negative impact on mobility. Thus, it is desirable to lightly dope areas where MOSFET channels will form. Silicon can also be classified according to the underlying crystal structure. Silicon atoms are bonded together by covalent bonds, each atom must form bonds with four neighboring atoms (Sect. 1.1). However, the planar

Type

Conductivity

Used for

Intrinsic

Lowest

Not used. In Chap. 1, any silicon that approaches intrinsic levels is considered depleted

n-type

Higher than intrinsic and p-type

Bodies of PMOS

p-type

Higher than intrinsic

Bodies of NMOS

n+

Highest

Sources and drains of NMOS, body contact for PMOS

p+

Lower only than n+

Sources and drains for PMOS, body contact for NMOS

7.3 Account of Materials

Fig. 7.13 Tetrahedral arrangement of silicon atoms. Left, four atoms, right, a continuation of the structure through more atoms

arrangement of silicon atoms in Fig. 1.5 is not realistic. Instead, the atoms are arranged in a three dimensional tetrahedral structure. Figure 7.13 shows the crystalline 3D structure of silicon. Each atom connects to four surrounding atoms so that the atom itself forms the center of a prism formed by the four other atoms. The right side of Fig. 7.13 also shows how more silicon atoms can fit in with the four atoms. The figure shows additional atoms themselves forming prisms and continuing the crystal structure. However, the assumption that more silicon atoms will align themselves with the core crystal structure is not always justified. In fact depending on the conditions under which the silicon solidified, we can have radically different atomic arrangements in solid-state silicon. The most chaotic form of silicon is amorphous silicon. In this form, there is little to no crystalline structure. Each tetrahedral structure in the solid is oriented in a different direction. We cannot assume that there is any regularity through the material. The other extreme is monocrystalline silicon. In this form, the entirety of the material is a single crystal. The orientation of atoms is roughly the same throughout the bulk. Thus, the regularity assumed in the right sub-figure of Fig. 7.13 continues throughout the material. The only significant imperfections are on the surface where the crystal structure has to end abruptly. In between the two extremes, there is polycrystalline silicon, more commonly called polysilicon. Polysilicon preserves a crystal structure within small domains called grains. However, the material as a whole consists of many such grains, each of which is discontinuous with the others. Thus, it is as if the material consists of multiple small crystals. Naturally, there are grades of polysilicon, the larger the grain, the closer the polysilicon is to monocrystalline silicon and the higher its grade. Discontinuities in the silicon crystal create imperfect alignment of energy levels. This causes the band structure to form differently from the assumptions in Sect. 1.1. Thus

223

crystalline structure has a significant impact on both the mechanical and electrical properties of silicon. Electrical and mechanical properties are at their best for single crystal silicon, and are unacceptably bad (for semiconductors) in amorphous silicon. In polycrystalline silicon, the more regular the crystal structure (larger and fewer domains), the better the material properties (Table 7.3). However, the more regular the crystal structure, the higher the temperature needed to fabricate it. To form monocrystalline silicon, the material has to be heated so much that it first melts and becomes a liquid and is then resolidified (Sect. 7.4). Polysilicon can be deposited at temperatures where silicon is still a solid, but it still needs very high temperature (Sect. 7.5). For reasons that will become clear in Sect. 7.6, MOSFET gates are not made using metal, instead, they are made using polycrystalline silicon. Thus the M in MOSFET is a misnomer of sorts. If we were able to form MOSFET gates of monocrystalline silicon, we would. However, the temperature that this requires is prohibitive. Metals are very important in ICs, they are used to make the interconnects or wires that connect the terminals of transistors together. Metals with higher conductivity are preferable in making wires. Higher conductivity translates into lower time-constants and better delay performance (Chap. 13). Older technologies used aluminum to manufacture wires. Aluminum is very easy to work with: it can be deposited very easily, and it can be patterned using dry etching (Sect. 7.5). Newer technologies use and experiment with combinations of gold and silver, but mostly use copper due to its preferable conductivity (Table 7.3). The silicon dioxide used to make the insulator in MOSFET can be fabricated simply by heating silicon in the presence of oxygen. It can also be deposited extraneously on top of the wafer, depending on the requirements of the manufacturing step. Silicon nitride is a relatively stable dielectric nitrogen compound of silicon. Silicon nitride is relatively simple to form either by reaction with the silicon of the wafer or by deposition (Sect. 7.5). Silicon nitride is a dielectric with different properties from silicon dioxide. It has a nasty habit of trapping electrons, so its use as a gate dielectric is limited. But its more interesting feature is that it does not allow silicon dioxide to deposit or grow over it. Thus it is used to selectively pattern oxide growth.

7.4

Wafer Fabrication

1. Understand how silicon is extracted from silica 2. Compare raw silicon, metallurgical grade silicon, and semiconductor grade silicon

224

7

Table 7.3 List of materials used in CMOS process

CMOS Process

Material

Variants

Uses

Silicon

n, n+, p, p+, and all permutations with: crystalline, polycrystalline

Gates, sources, drains, bodies, and body contacts

Metal

Aluminum, gold, silver, and copper

Interconnects

Insulator

Silicon dioxide and silicon nitride

Insulation and sacrificial layer

3. Describe the Czochralsky process for obtaining a monocrystalline silicon ingot 4. Understand post processing on silicon ingots to obtain high-quality wafers for CMOS processing. To start making CMOS circuits, we have to first find silicon somewhere. Fortunately, silicon is one of the most abundant elements on Earth. Unfortunately, it almost never exists in its elemental form. All sand and dirt are composed mainly of silica, a powdery form of silicon dioxide. The question is how to extract silicon from silica. The oxygen content has to be removed from the silicon dioxide. Removing oxygen from a compound is a process known as reduction. For silicon dioxide, this can be done by heating silica together with carbon in a high-temperature oven in the absence of oxygen. This results in the mixture melting and carbon absorbing the oxygen content of silica, leaving behind silicon and carbon monoxide. Carbon monoxide is a gaseous byproduct that has to be safely disposed of since it is highly toxic. The chemical reaction is very simple:

However, the Czochralsky process, shown in Fig. 7.15, is one of the simplest and most common. In the Czochralsky process, powdered amorphous silicon is put in a Quartz crucible surrounded by a graphite encasing. Quartz (which is a form of silicon dioxide) has a higher heat resistance than silicon, it is also inert and will not

SiO2 þ 2C ! heat ! 2CO þ Si The silicon obtained by reducing silica is impure and has no crystal structure. It is allowed to cool and solidify. Because there is no crystal structure to lose, the resulting silicon is crushed into a powder of amorphous “dirty” silicon. The impurities are removed from the powder using a variety of chemical and physical purification methods. This results in a very pure but still amorphous silicon powder. This type of silicon is called metallurgical grade silicon. It is used in non-semiconductor industries, but is also useful in photovoltaics. Metallurgical grade silicon is still not pure enough for use in IC fabrication. It has to pass through another pass of chemical and physical purification before it becomes pure enough to use in CMOS. Silicon that is ready for use in ICs is called semiconductor grade silicon. The flow for obtaining silicon from silica is shown in Fig. 7.14. Semiconductor grade silicon is very pure. However, it is still amorphous. Photolithography (Sect. 7.2) depends on having a monocrystalline bulk upon which other features are realized. There are various processes through which amorphous silicon can be turned into monocrystalline silicon.

Fig. 7.14 Flowchart for obtaining semiconductor grade silicon from silica

7.4 Wafer Fabrication

Fig. 7.15 The Czochralsky process for obtaining silicon crystals

interact with the silicon inside and change its purity. This crucible is heated to a temperature slightly above the melting point of silicon. This guarantees that silicon will be in liquid form, but will also be at the tip of solidifying. The challenge is now to get the silicon to solidify in a regularly crystallized form. Atoms in molten silicon are freed from solid structure. Since the silicon in the crucible is very near its melting point, it only needs to cool slightly to start solidifying. While in molten form, impurities can be added to turn the silicon into the type the wafer should be. The dopants are thoroughly mixed with the silicon melt to guarantee an even distribution throughout the resulting silicon ingot. Next a seed is lowered just enough that it touches the surface of the doped melt. The seed is composed of a very high grade, but very small, piece of monocrystalline silicon. The seed is rotated at a regular rate as it is lowered. The crucible itself is rotated in the opposite direction to the seed. When the seed touches the melt, the top of the melt starts to solidify. This is because the melt is only slightly above the melting point. The seed is then lifted at an extremely slow rate while maintaining rotation. As it rises, the seed pulls the solidifying melt alongside, and the whole structure starts to cool. Because cooling is very gradual, the melt solidifies in a crystalline pattern. The orientation of the crystal is dictated by the seed. Rotation and lifting together ensure that the final shape of the ingot is cylindrical. As the cylinder is raised, upper parts start to cool faster, further solidifying into the regular crystal structure. All the while, lower parts touching the melt cool

225

Fig. 7.16 Side view of ingot edge trimming and side polishing. At the top, the pointed right side represents the taper due to final lifting

around the solid parts that have been lifted. Once a standard length cylinder is reached, the cylinder is rapidly lifted from the melt, allowing the tail end to taper off. The obtained silicon ingot is not exactly cylindrical. It is monocrystalline, however, it has imperfectly pointy edges on top and bottom. The top pointed part is due to the initial seed pulling phase, the bottom pointed part is due to the final lifting of the ingot. This is shown in the top of Fig. 7.16. The bulk of the ingot has roughly cylindrical shape, however, there are undulations along the body due to the interaction of the rotation, lifting, and cooling. To obtain a perfectly cylindrical shape, the top and bottom are cut off, and the sides are ground and polished to make a smooth lateral surface. The ingot is then sliced to form disc-like units called wafers (Fig. 7.17). Wafers are the basic building block of the semiconductor industry. All steps in photolithography (with the possible exception of exposure) take place over an entire wafer or even over groups of wafers. A wafer has many die on its surface that can be identical or different. The slicing process used to obtain wafers from ingots has to be very precise. firstly wafers have to be sliced at even, and very small thicknesses. Secondly, the plane at which slicing takes place has to be nearly perfectly horizontal. This guarantees an even thickness substrate and also preserves the crystal structure of the wafer at the surface. The surface of each wafer is then further cleaned, lapped, and polished to make it as even as possible. This is in preparation for creation of surface features which rely on the surface of the wafer being flat.

226

7

Fig. 7.17 Wafers from ingots

7.5

Operations and Equipment

1. Understand the function of all operations used in the CMOS process 2. Distinguish which operation is used with which materials 3. Realize that most CMOS processing operations must balance contradictory requirements 4. Recognize the terminology used in every step of the CMOS process 5. Contrast operations in terms of advantages and disadvantages. In this section, we will discuss the different operations/steps in a typical CMOS process. Each of these steps has a place in the overall flow. However, it might not be immediately obvious where it fits in as it is discussed in isolation. These processes will be combined together in an overall flow in Sect. 7.6. Some operations have already been mentioned in Sect. 7.3. The operations are not listed in any particular order. Ion implantation and annealing When used: To implant dopants in silicon Advantages: Can control dopant concentration very precisely. Low temperature Disadvantages: Can create severe damage to the crystal structure and requires annealing Ion implantation is a way to introduce impurities into exposed parts of the bulk of a wafer. It can also be used to dope polysilicon deposited above the bulk. An ion implantation setup is shown in Fig. 7.18. The ion source produces ions of a particular material (group 15 when doping donors, group 13 when doping acceptors). These ions are accelerated through the application of a very high potential across a relatively long tube. This gives the ions a very high kinetic energy as they exit the accelerator tube. The ions then pass through a magnetic field. The field causes ions to curve as they move. The amount of curvature is a function of the energy of ions. More energetic ions suffer

CMOS Process

less curvature. An exit from the magnetic field is provided only through a slit, thus only ions with a certain energy profile can exit. Lower or higher energy ions will be too curved or not curved enough and will not exit the slit. The ion stream is thus of relatively uniform energy. The high energy ion stream is directed at the surface of the substrate, the ions hit the (exposed) silicon, burying to a certain depth that depends on their energy, and then stopping. The depth distribution of the resulting ion concentration in the substrate is a normal distribution. The mean depth of implanted ions is not at the surface but instead is at a depth determined by the energy obtained from the accelerator tube. Higher energy beams have a higher mean depth. The variance of the distribution is a function of the selectivity of the magnetic field-slit combination. Ion implantation itself is a low-temperature operation. It can be highly spatially selective and can create relatively accurate doping concentration. It is also very useful in how it can create a buried layer of doped silicon. However, the process is extremely damaging to the crystal structure. The high energy ions invade the silicon crystals violently, creating micro-cracks in the crystal and worsening its electrical and mechanical properties. This is shown in Fig. 7.19 and is inevitable. To heal this damage, ion implantation is always followed by annealing. Annealing is a process where the wafer is rapidly heated, then allowed to cool extremely slowly. For annealing to work properly, heating has to be to a very high temperature, approaching but never coming too close to the melting point of silicon. This point is called the recrystallization point. This extreme heating loosens the covalent bonds between the silicon atoms as they get ready to become fluid, however, it never severs such bonds, thus maintaining all structures on the wafer. The loosened bonds create a situation where the invading dopants are mobile enough, and the substrate silicon is malleable enough to allow the crystal structure to heal. Annealing requires cooling to be very slow. The temperature used in annealing is high enough to melt all metals used in CMOS. In Sect. 7.6 we will discover that after MOSFET gates are created, we have to use ion implantation to create sources and drains. Because the temperature used to anneal the wafer would melt all MOSFET gates if they were made of metal, we have to make them using polysilicon instead. One fair question to ask is why we have to fire dopant ions at the wafer, why not dopant atoms? Atoms are electrically neutral; they would not accelerate or bend in the setup in Fig. 7.18. Thus, we have to use ions. But would implanting ions cause the wafer to accumulate excess

7.5 Operations and Equipment

227

Fig. 7.18 Ion beam implantation, left; and profiles of dopant concentration, right. The mean depth of implant is related to beam energy. The spread is related to magnetic field and slit design

Diffusion When used: To implant dopants in silicon, typically to create wells Advantages: Gentle to the crystal structure Disadvantages: Imprecise. Concentration and extent not easily controllable

Fig. 7.19 Dopant ions (brown) invade the silicon crystal lattice (blue) at very high energy causing damage to the crystal structure as they displace silicon atoms

charge? Yes, the wafer would accumulate excess static charge, but only for a negligible time. As soon as the wafer is grounded, the extra charge is equalized.

Diffusion is a physical phenomenon where matter moves from one position to another along a concentration gradient. The material will move from an area where it has a high concentration, to an area where it has a lower concentration till the concentration of the material is uniform. We discussed this in detail in Sect. 1.6 as it relates to carrier transport. In circuit fabrication, diffusion can be used to dope the wafer. The wafer is introduced into a quartz chamber (Fig. 7.20) filled with a gaseous form of the material we want to diffuse into the bulk. The areas where the dopant should be introduced are selectively exposed through photolithography. The rate at which the dopant diffuses into the material is a strong function of the concentration of the gas in the chamber and the temperature. Some heating is necessary to increase the rate of the process. Heating is usually applied

228

7

CMOS Process

Fig. 7.20 Diffusion setup, gases to be diffused are allowed in through gas inlets, multiple wafers are exposed at a time, loaded through the loading bay

directly to the wafers. Multiple wafers can be loaded into the diffusion chamber at once. As with all steps in CMOS fabrication, precise control over air composition and temperature is a must. Chemical Vapor Deposition (CVD) When used: To build most nonmetallic surface features (polysilicon, silicon dioxide, silicon nitride, and tungsten as an exception to metals) Advantages: Introduces uniform, high purity films Disadvantages: Can only be done at very high temperatures. No precursors available for most metals CVD is a process used to grow surface features on the wafer. Contrast this with ion implantation which is mostly used to change the properties of the bulk of the wafer. CVD can be used to grow certain materials on certain areas of the wafer selectively if combined with photolithography. In CVD, a precursor or a set of precursors is introduced into a chamber as shown in Fig. 7.21. The chamber is usually at a very low atmospheric pressure or near vacuum. The precursors can be in the form of a gas or a vapor/aerosol depending on their boiling point. Heat is a necessary catalyst for CVD. Heat is introduced either by heating the chamber itself through its walls, or by heating the wafer directly through induction coils. Heat causes the precursors to undergo a chemical reaction, this yields a new material with a high condensation point. This product of the reaction promptly proceeds to precipitate or crystallize over the wafer. There are usually undesirable (gaseous) byproducts from the reaction which must be guided out through an exhaust. The exhaust can be at the bottom due to the very low-pressure conditions maintained in the chamber. CVD is capable of producing very pure and uniform thickness films; thus it is desirable when forming most surface features. It can be used to grow high elevation oxide,

Fig. 7.21 CVD setup with inductive heating of wafer

polysilicon, silicon nitride, exotic dielectrics, and even organic films. However, CVD requires very tight control compared to PVD. Additionally, no suitable precursors are available for many types of films, particularly metals. CVD also has to be performed at a very high temperature, which limits its use for any step that comes after metals have been created, otherwise metal wires would melt. Physical Vapor Deposition (PVD) When Used: Metallization Advantages: Can be performed at low temperature Disadvantages: Film has irregularities and needs to be planarized PVD is an alternative to CVD that also creates thin films. It contrasts with CVD in that it does not require a chemical

7.5 Operations and Equipment

229

Fig. 7.22 Sputtering for metal coating. Pressure gradient is from left to right

reaction or heat. Figure 7.22 shows a sputtering setup, the most common setup for PVD. The material from which the coating is made is hit by a very high velocity sputtering gas. The sputtering gas is an inert gas that does not react with anything in the chamber. But because the sputtering gas has high energy, when it hits the coat material is removes small pieces of it. This creates an aerosol of the coat material. The chamber has a very strong pressure gradient, with very high pressure around the coat material and very low pressure around the wafer. The pressure gradient causes the coat material aerosol to move toward the wafer. At the wafer, the low temperature and pressure conditions cause the aerosol to condense and solidify on the wafer, forming the thin film. Because no chemical reaction or heat is necessary, PVD is the only option available for building metal wires. Sputtering creates relatively regular films. However, the thickness and regularity of all PVD films are inferior to those that result from CVD. Because perfect planarity is very important for modern techniques (Sect. 7.7), all PVD steps must be followed by very careful CMP. Photoresist coating and baking When used: Every pass of photolithography Advantages: Necessary to ensure proper resist behavior Disadvantages: Imperfections in distribution of coat Application of photoresist is an essential step in each pass at photolithography (Sects. 7.3 and 7.6). There are multiple ways to apply photoresist including dip coating, spray coating, and spin coating. Spin coating is relatively simple and fast, and even though it introduces imperfections to the shape of the final coat, it is the most widely used coating technique.

In spin coating, a small amount of liquid (but viscous) photoresist is applied near the center of the wafer as the wafer is continuously spun, Fig. 7.23 top left. The spinning causes centripetal forces on the liquid. The resist will redistribute, forming a film of required thickness on the wafer (top right of Fig. 7.23). While spinning, the resist will form waves on top of the wafer due to the interaction of air with spinning wafer (bottom left of Fig. 7.23). As the spin is sped, these waves will be pushed toward the outside of the wafer, forming a “bead” ring around the perimeter of the wafer (bottom right of Fig. 7.23). The remainder of the wafer is covered in a relatively uniform film. Since the perimeter of the wafer does not usually contain useful die, concentrating imperfections in the perimeter bead is acceptable. The photoresist is then “baked” to allow it to solidify. Baking causes the solvent in the photoresist to evaporate. This causes the photoresist to solidify, and it helps it to attach firmly to the surface of the wafer. Baking is typically done in three steps. Soft bake is done immediately after applying the resist. It is a short bake at low temperature used to drive away most of the solvent. The second bake is applied after exposure but before development. It allows the resist to further solidify. The second bake is delayed till after exposure because the hardness of the photoresist affects the rate at which its properties are changed by exposure. A third “hard” bake is selectively applied to firmly harden the resist after development. This long, high-temperature bake is applied before etching to ensure the resist adheres firmly to the protected parts of the oxide and to make sure it is solid enough to protect against acids during etching. Baking introduces an interesting tradeoff. A very solidly baked photoresist is extremely resistant to solvents, which greatly complicates development. A more mildly baked

230

7

CMOS Process

Fig. 7.23 Photoresist spin coating (above), and artifact formation and concentration during spinning (below)

Oxidation When used: To grow oxide on the surface of the wafer for isolation, protection, and separation Advantages: Relatively simple process that is critical for proper operation Disadvantages: Eats up some of the silicon of the bulk to form the oxide. Cannot be used once surface features have been built

Fig. 7.24 Oxidation setup. Heating is resistive. Oxygen and hydrogen combine to form water vapor. Multiple wafers can be oxidized at once. A quartz carrier is used due to its heat resistance

photoresist does not attach well to the wafer, potentially causing damage to the photomask, or complicating etching. There are two types of photoresist. Positive photoresist is one where the resist exposed to light becomes soluble in the development solvent. Negative photoresist is one where unexposed resist is soluble in the development solution.

Silicon has a very good native oxide: silicon dioxide. When left exposed to air at room temperature silicon will form a layer of natural oxide. This layer is around 2 nm thick and is thus too thin for most applications. Thicker oxides must be grown, especially when used to isolate surface features from each other. Oxide functions as the thin oxide between the transistor gate and channel. It also forms a thick insulating oxide between transistors. Oxides also separate all surface features built on the wafer from each other (gates, wires, etc., see Sect. 7.7 and Chap. 8). Additionally, oxides and their reaction to acids are fundamental in photolithography (Sect. 7.3). Finally, all fully processed wafers must be covered by a very thick layer of oxide on top. This layer, called the passivation layer, acts as a glass protector that protects dies from scratches, dust, and contamination with impurities.

7.5 Operations and Equipment

231

Fig. 7.25 Developed positive photoresist, left, with exposed parts removed. Developed negative photoresist, right, with unexposed parts removed

The main method used to “grow” oxide from the wafer is thermal oxidation. In thermal oxidation the wafer is inserted into an oven with a precisely controlled high temperature and either water vapor or oxygen (Fig. 7.24). The surface of the silicon wafer starts reacting and forms an oxide layer. The thickness of the grown oxide and the rate at which it grows depends on the temperature and the gas mixture and concentration in the oven. Thus precise control must be maintained. Oven startup and cool down has to be very gradual to avoid introducing damage to the crystal structure. Oxidation takes place at temperatures between 800 and 1200 Centigrade. If further oxide needs to be grown after initial Oxidation, CVD is typically used. This is both to avoid the high temperatures used in oxidation, and because the bulk silicon is already buried under a lot of features at that point and thus cannot easily oxidize. The term oxidation is usually used only when the oxide is grown from the silicon of the wafer. If oxide is formed using CVD, we will use the term “deposited”. Photoresist development When used: Every pass of photolithography. Advantages: Exposes parts of the oxide to be selectively etched. Disadvantages: Requires precise control on concentration, baking, and time. Development is the process of stripping away the parts of photoresist exposed to light (in the case of positive photoresist). The resulting wafer will have light-exposed parts with oxide exposed, and mask-protected parts with oxide covered by photoresist. Development is performed by immersing the wafer or covering it in a solvent that can

dissolve only light-developed photoresist (positive photoresist) (Fig. 7.25). However, the process is complicated by several factors. First, the rate at which photoresist is removed is a function of time and solvent concentration, thus enough solvent and time must be given for the photoresist to be completely removed. However, the assumption that exposed photoresist dissolves while unexposed photoresist is completely insolvent is an oversimplification. The reality is that unexposed photoresist dissolves at a lower rate than exposed photoresist. However, it still dissolves. Thus the interval of development and concentration of the solvent must be adjusted so that exposed resist is fully removed, while unexposed resist is not too degraded. Additionally, strongly baked exposed resist is less solvent than soft resist. We can imagine a situation where the photoresist was “too” baked and thus the exposed resist could not be effectively removed by the solvent without degrading the unexposed resist. This is the reason resist baking is divided into soft and hard phases, with only soft baking performed before development. Etching (wet and dry) When used: To remove any material from the surface or bulk. Typically oxide, metals or silicon Advantages: Wet etch is simple, dry etch is precise and directional Disadvantages: Wet etch is imprecise and nondirectional, dry etch is complex and does not work for all materials Etching is the process of selectively removing materials from the surface of a wafer. The material used to perform the etching on the wafer is called the etchant. Different etchants

232

7

CMOS Process

Fig. 7.26 Wet etching to the left can eat under the resist. Dry etching to the right is more directional

have different properties. The ideal etchant would be materially selective, i.e., it would remove only the target material but leave other materials intact. It should also be directional, removing material only in the direction intended. Etching is mostly used to remove silicon dioxide during photolithography. But is also be used to pattern silicon or metals. There are two main types of etching, wet and dry. In wet etching, the etchant is a liquid. The wafer is covered in the etchant or dipped in it. This is normally done at low temperature. The amount of time the wafer is dipped decides the depth of etching. Different etchant solutions might selectively remove one material but not another, or might remove all types of materials it comes in contact with. Wet etching is fairly simple, however, it requires washing the wafer after etching, it is also very imprecise, unable to create fine features. Wet etchants eat into the target material in all directions as shown in Fig. 7.26, leading to patterns that do not perfectly match the mask. They typically over-etch laterally as the etchant tries to reach the target depth. Wet etching is often used to remove oxide, the assumption being that exposed oxide will react with the etchant while oxide still covered with photoresist will remain intact. Other than the problem of lateral etching, there is another complication. Photoresist will often react with the etchant, albeit at a much lower rate than the oxide. The harder the photoresist, the more resistant it is to the etchant. This is why hard baking is sometimes performed on the photoresist before etching. Dry etching uses etchants in the plasma or ionic phase. These etchants do not condense. Thus the wafer does not become wet and does not require washing. The plasma is directed at the wafer and etches the material very directionally. This can lead to much more precise features than in wet etching. However, dry etchants are often not as materially selective as wet etchants. They also require processing at elevated temperatures which might damage the crystal structure of the wafer. Because a plasma is used, dry etching also leads to the accumulation of significant static charge. This complicates the antenna effect. Dry etching is the preferred method

for patterning metals. Its selectivity is very attractive, and we can overcome the antenna effect with proper design decision. However, in some cases no suitable dry etchants can be found for the target material. This is particularly true when copper is used in metallization (Sect. 7.7). Photomask manufacturing When used: After the design is finished and before fabrication starts Advantages: Necessary to realize small features on chips Disadvantages: Very complex and time-consuming process Photomasks are the masks used to pattern light before it hits the photoresist. Figure 7.6 showed how a mask can be used in conjunction with optics to pattern the photoresist. Photomasks are derived from the layout of the circuit, which is the final result of the design flow (Chap. 8). They are manufactured by highly specialized labs and their fabrication is a precise and demanding process. Historically masks corresponded directly to the pattern that needs to be realized on the wafer. Areas that need to be etched are transparent and areas that should not be etched opaque (if positive photoresist is used). Photomasks are made on a transparent substrate, silica (glass) is often transparent enough. The opaque pattern is drawn on the transparent substrate using a material that absorbs and reflects the frequency of light used in photolithography. Chrome is a metal of choice for relatively long channels. The pattern on the photomask in modern technologies does not correspond one to one with the feature to be realized. Guides features are introduced to take into account refraction and interference patterns and to allow very small features to be realized accurately. Thus the pattern seen on the photomask is at best somehow similar to the pattern that ends up on the wafer. The pattern to be drawn on the photomask is stored in a computer file. It is obtained directly from the layout produced by the designer. The silica substrate is covered entirely in chrome using PVD. It is then covered in

7.5 Operations and Equipment

photoresist. An electron beam is guided according to the stored coordinates in the mask file to draw the pattern on the photoresist. Electron beams can create much finer features than visible light. The beam is focused, transmitted, or blocked by micromirrors that further allow the features to be refined. The exposed parts of the photoresist have their properties changed. They can then be stripped away through development along the same lines as photolithography. The exposed Chrome under the developed photoresist can then be removed using a suitable etchant. The remaining photoresist is washed away. The photomask is covered with a transparent film. The film is called a pellicle, and is used to protect the mask from particles that might settle on it and change its transparent pattern. The pellicle is separated from the mask by enough distance that particles that settle on the pellicle do not significantly change the mask pattern. Mask manufacturing is a long and expensive process. The features drawn on the mask are typically very small, usually only four times larger than the features realized on the wafer. This requires precise electron beam equipment and a very slow and deliberate drawing process. However, since only one set (or a few sets) of masks is used per circuit, the cost and time of making the photomasks is a one-off cost that becomes insignificant when divided by the volume of chips produced. However, we cannot use these electron beams “printers” to pattern wafers, photolithography makes much more economic sense. CMP When used: To prepare the wafer, and after steps that cause irregularities. Critical in modern metallization Advantages: Can create incredibly planar and smooth surfaces Disadvantages: Dirty and dirtying, needs to be quarantined Chemical Mechanical Planarization, or Chemical Mechanical Polishing is a process by which both a mechanical action and a chemical reaction are used to polish the surface of the wafer. The wafer is mounted on a circular carrier that is larger than the wafer (Fig. 7.27). The wafer is held firmly in place through a chuck and vacuum suction. An abrasive polishing head is brought into contact. A corrosive chemical slurry is applied to the surface. The polish pad starts moving in a noncircular pattern while the carrier plate oscillates. The interaction between the pad motion, particles in the slurry, and the corrosive chemicals causes the surface of the wafer to be polished to incredible smoothness. Because of the slurry that forms during polishing, CMP is seen as an inherently dirty process that could contaminate

233

Fig. 7.27 CMP, the wafer is held to a mounting pad while the chuck/pad performs the polishing

the environment with particles. It is thus usually kept isolated from all other equipment in the fab that need a cleanroom environment. CMP was originally only important in initial wafer preparation after the ingot is sliced. The importance of CMP changed dramatically when copper supplanted aluminum as the metal of choice. CMP is also in favor due to its use in Shallow Trench Isolation (STI). Both topics will be discussed in detail in Sect. 7.7. In modern CMOS processes, a large number of metal layers are often used (Chap. 8). CMP also becomes very important because of this. More metal layers lead to more buildup of irregularities in film and oxide thickness, which can potentially lead to wire and via opens. Thus, CMP must be used after every metallization step even if the metal used is not copper. Alignment, stepping, and exposure Alignment is an extremely important step. It involves making sure that the mask is properly placed above the wafer before exposure. The main issue arises if the masks for different steps are misaligned over the wafer relative to each other, not in absolute terms. To understand how far reaching this can be, examine the scenario shown in Fig. 7.28. In this figure, a contact is supposed to be manufactured between a polysilicon gate shown in red and a metal wire shown in blue. However, if the masks for the metal and polysilicon steps are misaligned, the area in which the two layers contact may be reduced (middle sub-figure). This increases the resistance of the contact, adding unaccounted delay. If the misalignment is bad enough, the contact might be completely missed, creating an open (right sub-figure). This can happen in relation

234

7

CMOS Process

Fig. 7.28 Misalignment of masks can lead to disastrous failure of the circuit

to any pair of layers, meaning that alignment must be very accurately managed once the first mask has been used. Global (coarse) alignment of the wafer can be managed in the first photolithography step by aligning notches that are cut out to define artificial “corners” in the wafer. This is shown in Fig. 7.29. This initial alignment can be made roughly. In fact it can be done manually. Initial alignment is obviously not critical, what is critical and must be managed successfully is alignment from layer to layer. To guarantee alignment between layers, dummy features are created on the wafer in earlier masks. These dummy patterns also exist in all other masks. If the dummy pattern on the mask is matched to the dummy pattern on the wafer, then we can guarantee decent placement from step to step. This is illustrated in Fig. 7.30. In step n of photolithography a cross pattern from the mask is realized on the wafer. In step n+1 the same cross exists on the photomask in the same location, if the mask and the wafer are aligned so that the cross on the wafer is visible through the cross on the mask, then alignment is complete.

Fig. 7.29 The notches on the wafer are used to align the first mask

There are three degrees of freedom that need to be adjusted to guarantee a wafer is properly placed. The three degrees are its position along the x-axis, its position along the y-axis, and its rotation. A cross pattern is a good dummy structure to use for orienting all three degrees. Two cross patterns on opposite sides of the wafer/mask are sufficient for proper orientation. Classical exposure systems used a mask that exposed the entire wafer in one step. Modern exposure systems use masks that expose a subsection of the wafer or even a single die on a wafer at a time. The mask is moved in steps across the wafer and in each step enough time is given for exposure so that the underlying photoresist reacts properly. The machine that moves the mask across the wafer is called a stepper. This is an advanced machine that moves the mask with very high precision. Alignment markers may need to be matched for each step, thus alignment markers may exist on every subsection of the wafer. Exposing only a small area of the wafer at a time is necessary to realize the fine features of modern CMOS processes. This will become clear shortly. The way light is projected onto the wafer in the exposure step can vary between three major methods, Fig. 7.31: • Contact exposure: In this method, the mask is brought into contact with the wafer. This method is fairly simple. It allows light to fall on the wafer with reasonable sharpness. Optical artifacts like diffraction and refraction will only happen at the edges of the mask. However, features are realized 1:1, thus features have to be very small on the mask. Additionally, the mask coming into contact with the wafer has a destructive effect on the wafer, but more importantly the mask. As the mask is aligned, comes into contact, and is lifted off the wafer, it can be contaminated by the still fluid photoresist, and its surface can be scratched. A contaminated or scratched mask may become opaque where it is supposed to be transparent, and transparent where it is supposed to be

7.5 Operations and Equipment Fig. 7.30 A dummy feature created in the first mask can be used to align masks in next steps

Fig. 7.31 Contact exposure (top left), proximity exposure (top right), and projection exposure (bottom)

235

236

7

opaque. This would render the mask unusable in the future. Note that wafers are supposed to be mass produced. • Proximity exposure: In this method, the mask is close to the wafer but does not come into contact with it. This prevents damage to the mask. But features are still realized 1:1. Additionally, light is not as focused as in contact projection, and features may not be realized as sharply. This is because the mask is only throwing a shadow over the resist. optical phenomena have to be taken into account. The gap might allow us to play with the minimum feature size as will become clear shortly • Projection exposure: Light is shone on the wafer through a system of lenses. This necessitates that only a small part of the wafer is exposed at a time. It becomes necessary to use a stepper, and exposure becomes a more time-consuming process. However, the lenses can bring light on the wafer into very sharp focus, allowing very fine features to be realized. Additionally, features do not have to be 1:1 between the mask and the wafer, allowing mask features to be larger. Projection exposure is universally used in modern CMOS processes The Critical Dimension (CD) of an exposure method, especially projection exposure, is defined as the minimum feature size that can be realized on the wafer. This typically corresponds to the minimum channel length, or the technology parameter (Chap. 8). CD can be defined as CD ¼ k:

k NA

k is a unitless constant used to account for all technology variables. It is typically around 0.4 in modern processes. NA is the numerical aperture of the lens as seen at the wafer. Numerical aperture is defined in Fig. 7.32. k is the wavelength of light used. There are obviously three ways to improve CD: Reduce k, reduce the wavelength of light used, and increase NA. k Fig. 7.32 Numerical aperture is a representation of the narrowness of the cone of light that a lens projects on a surface

CMOS Process

can be reduced by optimizing the fabrication process, and for most mature manufacturers it is usually as low as can be. The wavelength of light used is the most direct way to improve the resolution of realized features. This is why modern CMOS processes typically use deep ultraviolet (EUV) light to expose wafers. Cleanrooms are usually lit with a very distinct yellowish tinge. The reason is that the “light” used in photolithography is EUV, thus the photoresist has some sensitivity to lighting of color nearest EUV. The room should certainly not be lit in violet or blue shades, or the photoresist may not develop properly. The best light to use is at wavelengths as far away from EUV as possible, more toward the red end of the visible spectrum. Trying to improve feature size by using wavelengths lower than EUV can be very difficult because photoresists for these spectra are very difficult to manage. There are, however, some exposure techniques that use X-rays or even electron beams to perform some form of lithography. However, these are usually so slow, that they are only valuable for creating photomasks. An open area of improvement for feature size is the numerical aperture. NA is a measure of how narrow or wide the cone of light emitted from the lens is perceived at the wafer. NA has the form: NA ¼ n: sin h where theta is the half-angle of the cone and n is the index of refraction of the material between the lens and the wafer. This has led to experimentation with different materials separating the lens and the wafer. Air (or more accurately vacuum) has the worst index of refraction at 1. Other fluids have better indices. Pure water has an index of 1.33, while some oils have even higher indices. Thus immersion of the entire system in a fluid is a valid method to improve CD as long as proper cleaning and drying can be performed. Increasing CD conflicts with another requirement, namely maintaining depth of focus. Depth of focus is defined as the acceptable range within which the wafer can be placed to still have light in focus from the lens. Depth of focus can be evaluated as DF ¼ g:

k NA2

where g is another technology miscellaneous parameter. Increasing NA to decrease CD will lead to an even steeper decrease in depth of field. This gives us much less tolerance on where the wafer can be placed. We have to be very accurate about how far away the wafer is from the lens. We even have to flatten the wafer more precisely by CMP to guarantee the same depth is seen over its surface. The

7.5 Operations and Equipment

237

Table 7.4 Summary of processes in photolithography Operation

Used

Advantages

Disadvantages

Ion implantation

Introduce dopants, usually in the bulk

High precision, can create buried layers

Requires annealing

Diffusion

Introduce dopants, usually to surface features and wells

No damage to crystal structure

Low accuracy concentration and features

CVD

To build the majority of (nonmetal) surface features

Uniform, high purity films

Requires high temperatures, no precursors for some materials, especially metals

PVD

To build some surface features (mainly metallization)

Low temperature

Creates irregular films that need CMP

Photoresist coating and baking

Every pass of photolithography

Necessary

Imperfections in coat uniformity. Coat hardness can hamper development

Oxidation

Initial oxide layer growth (FOX, TOX)

Simple and easy to control

Consumes substrate, cannot be used over surface features, must use CVD instead

sensitivity is so high that the thickness of deposited materials and photoresist might affect the placement of the wafer. Tables 7.4 and 7.5 summarize the operations discussed in this section. The processes are not listed in any particular order. To understand how they can be combined into one pass of photolithography consult Sect. 7.3. Section 7.6 illustrates a more comprehensive account of an overall CMOS process combining all these steps.

7.6

Locos

1. Understand that different CMOS flows combine iterations of photolithography differently 2. Understand the role that field oxidation plays in isolating devices 3. Understand what a self-aligned transistor means for MOSFET gates 4. Understand how every layer in the wafer is fabricated 5. Examine the relation between the mask, top view, and cross section of the wafer. LOCOS stands for LOCal Oxidation of Silicon. LOCOS is one type of CMOS fabrication flow. In LOCOS the wafer is divided into two areas: active and non-active. Every part of the wafer must be one or the other. Active areas indicate all areas where a transistor exists, this includes the source, the drain, the area of the Poly covering the channel, and contacts to body and well. Non-active areas are everywhere else in the wafer. In LOCOS, active areas are covered in a thin layer of oxide, called thin oxide, or TOX for short. Non-active areas are covered by a much thicker oxide called the Field OXide or FOX for short. FOX is used to introduce isolation between different components on the wafer, it prevents cross talk between different devices, and it also mitigates a major

issue called latch-up that will be discussed in Sect. 7.7. The distinction of what constitutes an active area as well as the difference between thin and field oxides will become clear as we examine the LOCOS flow. LOCOS is very easy to understand, but it differs substantially from modern fabrication technologies in several key areas. Some of these differences are discussed in Sect. 7.7. LOCOS and other processes consist of multiple applications of the photolithographic process, thus a thorough understanding of photolithography (Sect. 7.3) is necessary before attempting to understand the overall CMOS flow. Each pass of photolithography will employ some of the operations discussed in detail in Sect. 7.5, thus that section also needs to be understood first. For all steps, we will assume positive photoresist (Sect. 7.5). For all steps, a figure will show both a cross section of the wafer at the current step, and a top view. The top view is not of the wafer, but rather of the mask used in the exposure step for that feature. For all steps, we will be assuming a form of exposure where the ratio of features on the mask to features on the wafer is 1:1 and that there are no optical artifacts. This is only for illustration, more realistic exposure mechanisms are discussed in Sect. 7.5. We start the process with a wafer of p-type silicon. This p-type will represent the substrate. A layer of oxide is grown on top using oxidation. Photoresist is then applied to the surface of the oxide. The three layers are shown in Fig. 7.33. Figure 7.34 shows the first mask aligned above the wafer, with light shining through it on the photoresist below. We will always show the wafer in cross section. The mask is shown both in top view and cross section. Dark areas of the mask are shown in black. The pattern drawn on the photoresist is that of the N-well. All NMOS transistors can be built in the substrate since their bodies are p-type. PMOS

238 Table 7.5 Continue summary of processes in photolithography

7

CMOS Process

Operation

Used

Advantages

Disadvantages

Development

Every pass of photolithography

Necessary

Contradictory requirements in baking, solvent concentration, and development time

Etching

Every pass of photolithography

Wet etch is simple and works for many materials. Dry etch is precise, clean, and directional

Wet etch is nondirectional, dirty, and can eat under the resist. Dry etch is complex, requires heat, and does not work for critical materials

Photomask fabrication

Once before fabrication

Necessary

complex and time consuming

CMP

Initially to prepare chip, and after steps that cause significant irregularity. Necessary in modern metallization

Very smooth and precise surfaces, selective, can stop at a certain material

Very dirty, creates a lot of particles, must be separated from other steps. Corrosive

Exposure and alignment

Necessary

Depends on method

Depends on method

transistors need an n-type area to be their bodies. This is the N-well. The photoresist is developed exposing the oxide above the area that will become the N-well. The wafer is bathed in an etchant, usually a wet etchant. The oxide reacts with the etchant wherever it is exposed. Thus areas of the oxide below developed photoresist are etched away. Wet etching can be used because oxide dissolve well in liquid solvents. The inaccuracies caused by lateral etching in wet etching are tolerable because the minimum dimensions of wells are high by virtue of design rules (Chap. 8). The result is shown in Fig. 7.35. We are creating the N-well in this step. This can be achieved by exposing the area of the p-substrate exposed under the etched oxide to donors. A well covers a large area by definition. Its doping level is low, and the need for precise dimensions and doping concentrations is not high. Thus diffusion can be used as a method of doping. Note that no structures have been built on the wafer so far, so the heat used in diffusion cannot cause harm. The result is shown in Fig. 7.36 with the yellow area indicating the area where silicon has turned from p-type to n-type.

Fig. 7.33 Wafer substrate (p) at the bottom. A layer of oxide is grown on top of the substrate (middle in gray) and photoresist is applied on top of the oxide

At this point, we are done with the first mask. The N-well mask is removed and the second mask is applied. The second mask defines the “active” regions. Active is all areas with n+, p+, or MOSFET channels. More specifically, it defines areas where the oxide will be thin and namely includes: every part of every transistor as well as any n+ and p+ areas created outside transistors. The second step starts by completely removing all of the remaining oxide in Fig. 7.36. The bare wafer then has a layer of silicon nitride (SiN) deposited using CVD. Photoresist is applied on top of the nitride. The active mask is used to expose the photoresist to the active pattern. The result with the patterned positive photoresist is shown in Fig. 7.37. Figure 7.38 shows the wafer with the silicon nitride stripped away from the areas where the photoresist is developed. Wet etching can be used in this step. Note that the active mask has a “reversed” sense, with the active areas being opaque rather than transparent, this is the opposite of the sense used in the well step. The reason for this will become clear shortly. The remaining photoresist is removed. Oxidation is performed at a high temperature for a long time. Silicon nitride does not allow oxidation to happen because it protects the substrate below it. Only areas where the nitride was stripped away will grow the oxide. The oxide grown in this step is fairly thick (Fig. 7.39). This is the FOX layer used to isolate devices from each other. Notice then that if positive photoresist is used, the sense of the mask has to be reversed because in the oxidation process we are growing on areas which will not be active.

7.6 Locos

239

Fig. 7.34 (Top) Top view of mask used to expose N-well. (Middle) Cross section of the mask on the top along the dotted lines. (Bottom) Cross section of the wafer with photoresist developed away selectively from areas exposed through the mask shown on the top and in the middle

The remaining SiN is removed using a SiN-specific wet etchant, Fig. 7.40. Note that this step did not actually realize any p+ or n+ regions, it just distinguished areas where transistors exist from those where there are none. The active mask is removed and the third step starts. The mask used for the third step is often called the poly mask, and it represents polysilicon areas where MOSFET gates will be created. First the wafer is allowed to oxidize for a short time. In areas where FOX exists, this additional oxidation will cause a negligible increase in the thickness of the oxide. In areas where we had just stripped away the SiN, a thin oxide is grown, the TOX, Fig. 7.41. Using CVD, Polysilicon is deposited over the entire wafer. The polysilicon is deposited in a relatively thick layer. Despite the accuracy of CVD, the resulting poly is very uneven. Poly is higher in areas where there is FOX and lower in areas where there is TOX, Fig. 7.42. If we leave these irregularities they can accumulate and magnify through

the steps, causing defects in the wafer. Accumulated irregularities are particularly dangerous to metallization, where they can cause wire resistance to grow around corners. CMP (Sect. 7.5) must be used to polish the polysilicon. CMP can achieve incredibly smooth surfaces. However, the process should be calibrated so that the polysilicon is not over polished. Ideally, we should stop polishing well before reaching the field oxide. A good result is shown in Fig. 7.43. Positive photoresist is then applied to the entire wafer and the wafer is exposed through the mask. The resist is developed. The majority of the resist is removed by the solvent as shown in Fig. 7.44. This leaves photoresist only in areas where MOSFET gates will remain. A polysilicon-specific etchant is then applied to the wafer. It eats away all the polysilicon except for areas still covered by the photoresist. The red regions left in Fig. 7.45 are the MOSFET gates.

240

7

CMOS Process

Fig. 7.35 Oxide etched away from areas where the photoresist is developed

It might seem wasteful to deposit polysilicon all over the wafer then remove most of it, but this is necessary. We have no mechanism to selectively deposit polysilicon, however, we do have a way to selectively strip it away. This is different from oxide growth in FOX, where SiN could be used to selectively stop oxidation. An oxide-specific etchant is applied. We allow it to act only for enough time to remove the thin oxide. Because FOX is much thicker than TOX, most of the FOX remains after the etching. The only areas where a thin oxide survives is under the polysilicon. The polysilicon does not react with the wet etchant and thus protects the underlying oxide. The oxides under the polysilicon are the oxides of the MOS capacitors. The p+ mask is applied at this point. This is shown in Fig. 7.46. With the p+ mask applied; we cover the wafer in a thick layer of photoresist (Fig. 7.47). The p+ mask is used to realize areas where p+ silicon will be created. However, the

mask does not have to correspond exactly to those areas, it just has to include them. As we will shortly see, p+ will exist if all the following three conditions exist: 1. p+ mask exists. 2. Active mask exists. 3. No poly. The photoresist is developed, exposing areas where the mask was transparent. Notice that the exposed portions include areas of different nature. It includes areas where FOX is exposed, areas where the well or substrate are exposed, and areas where polysilicon is exposed (Fig. 7.48). Ion beam implantation is then used to introduce dopants to create p+ areas (Fig. 7.49). In the N-well the silicon will change from n-type to p+. This creates the sources and drains of all PMOS transistors. In the substrate, the silicon will change from p-type to p+. This is necessary to create the

7.6 Locos

241

Fig. 7.36 Dopants are introduced to create the N-well. The well is realized as the yellow area. Yellow thus denotes lightly doped n-type silicon

ground contact for the substrate. Recall that for proper MOSFET operation we have to assume that all bodies are at ground or at least lower than either source and drain. This is achieved by contacting the substrate at the surface and connecting it to ground using wires made of metal. If the metal wires contact the substrate directly a rectifying contact called a Schottky diode is created. Schottky diodes are created when metals contact low doping silicon. As the doping of silicon is increased, there is a threshold doping level above which the contact switches from rectifying to ohmic. This is why we need to create heavily doped regions in the substrate (Sect. 1.14). Note also that there are FOX and poly areas that will be exposed to ion implantation. The oxide will mostly block the ions and protect areas below. The polysilicon will benefit by improving its conductivity, but it will block the ions from the substrate/well below. Thus the p-select mask can be

drawn relatively liberally as long as it includes all p+ active areas and does not overlap the n+ mask. The next mask is the n+ mask. Its flow is identical to that for the p+ mask. The photoresist remaining from the p+ mask is washed away. Photoresist is applied again and exposed through the n+ mask. It is then developed and exposure creates the n+ regions for sources and drains of NMOS as well as N-well contacts for PMOS. This is shown in Fig. 7.50. Note that only the n+ or the p+ mask carry useful information. This is because one of them can be taken to be the complement of the other. In other words, the n+ mask can be considered to be the inverse of the p+ mask, where all that is dark is transparent and all that is transparent is dark. The reason is due to the very liberal definition of these masks. The n+ mask can actually be anything that is not the p+ mask, because the n+ mask will only cause doping in areas where it overlaps with active regions. And active

242

7

CMOS Process

Fig. 7.37 Active mask. The oxide from the well step was removed and a layer of nitride (light blue) was deposited directly on top of the substrate. Photoresist is spread over the nitride, and is exposed and developed

regions are split between n+, p+, and polysilicon. In fact, if we use negative photoresist, the p+ mask can be used for the n+ step. Since ion implantation is used to create all the n+ and p+ regions, annealing must follow (Sect. 7.5). Annealing requires heating the wafer to a very high temperature then allowing it to cool slowly. Note that this happens after the MOSFET gates have been created. This is the reason that MOSFET gates are created using polysilicon rather than metal. The temperature needed to anneal silicon is so high that it would melt metal. Thus gates have to be made using silicon. The question then is why are gates made using polysilicon rather than monocrystalline silicon? The reason is that the only way we can create single crystal silicon is by melting it first (Sect. 7.4). Thus the best silicon we can grow is polycrystalline.

But why did we create the MOSFET gates before sources and drains? The design flow logically moves from lower structures to higher structures. It would have made more sense to create sources, drains, and contacts first, anneal them, and then create gates. This would also have allowed us to use metals for MOSFET gates. The answer is that most CMOS processes are self-aligned. The left side of Fig. 7.51 shows how the process would be if sources and drains were created first. If the mask for poly is slightly misaligned relative to the n+ mask, then the gate would not cover the entire channel. This causes the gate to lose electrostatic control on the channel, causing a departure from normal operating threshold voltage and on-current. If the misalignment is significant enough, the gate can completely miss the channel, thus causing a complete loss of transistor action. In modern technologies channels are so

7.6 Locos

243

Fig. 7.38 Nitride stripped away selectively

short that a misalignment of a few nanometers can be significant. In a self-aligned process, shown on the right of Fig. 7.51, the gate is created first above the TOX. The gate then acts as a barrier against ions during ion implantation. This allows symmetrical sources and drains to form on either side of the gate. Even if the doping extends slightly under the oxide, the extension would be equal in both the source and the drain. Note that while the gate gets exposed to the dopants, this is not bad, it only helps increase the conductivity of the gate. What matters is that the gate stops dopants from getting into the channel. If there is a misalignment of masks, the gate would still cover the channel, and the balance of the misalignment affects the drawn area of the source or drain, increasing one and decreasing the other. While this might have some ramifications through source/drain resistance, it will not immediately lead to disastrous loss of transistor action.

The next step (and mask) is to create contacts. Contacts are openings that allow poly and active regions to be contacted by the first layer of metal wires. The first step involves depositing oxide using CVD over the entire wafer, Fig. 7.52. The oxide created in Fig. 7.52 cannot be grown using oxidation since by this point the bulk silicon is buried beneath many structures and would not oxidize easily. The deposited oxide is uneven mainly due to the fact that some areas already had FOX below them. Thus CMP has to be used to flatten the oxide as shown in Fig. 7.53. This is important since without a flat oxide the following photolithography pass would create very unreliable structures. At this point the entire wafer is effectively contained in a thick oxide. Photoresist is applied and exposed through the contacts mask. The resist is developed, leaving areas exposed where the contacts will be created, Fig. 7.54.

244

7

CMOS Process

Fig. 7.39 Photoresist washed away and FOX grown

Wet etching is used to open contacts into the oxide as shown in Fig. 7.55. Since the oxide can be fairly thick at this point, the etch is challenging. Wet etching is used; thus a lot of lateral etching can happen if the wafer is to be dipped in the etchant long enough for the contacts to be created. In Chap. 8 we will thus find that design rules for contacts are very strict. Contacts and vias that fail to open or open too little are also a known failure mechanism for microchips (Chap. 14). It is now necessary to move very quickly toward the next step in a controlled environment. If the wafer is left exposed, the contacts will react with air at room temperature, and form a thin layer of native oxide. This oxide layer would substantially increase the contact resistance between the metal layer and the diffusion or poly layers. A metal layer is spread all over the wafer. This is usually achieved through PVD, particularly by using sputtering (Sect. 7.5). The metal layer is thick and uneven due to

unevenness in the wafer. One reason for unevenness is the presence of contacts. The metal layer is planarized using CMP, Fig. 7.56. The metal layer covers the entire surface. To transfer patterns that correspond to true metal traces, photoresist is first spread across the wafer. The wafer is exposed through the metal mask. The metal mask exposes parts of the layout where metal should not exist, rather than those that should exist. Figure 7.57 shows the situation after the resist is developed. Dry etching is then used to realize the wire pattern. Dry etching eats through the metal down to the oxide in areas where photoresist was removed. This leaves metal traces that correspond to wiring at this level, Fig. 7.58. The photoresist is then washed away and a layer of oxide is deposited and planarized over the metal, Fig. 7.59. Aluminum is the metal of choice for this technique of metal patterning. Other metals have better conductivity, but

7.6 Locos

245

Fig. 7.40 Nitride removed, field oxide grown

aluminum reacts very well to dry etching. Copper, which has a far superior conductivity, cannot be dry etched. However, copper is the metal of choice in deep submicron technologies. Techniques for patterning copper are discussed in Sect. 7.7. Most practical CMOS processes do not use a single metal layer for wiring. The reason is that a single metal layer makes the task of placement and routing (Sect. 8.7) much more challenging. Consider Fig. 7.60, module 3 needs to connect to module 4. However, since modules 1 and 2 are already connected using wires, routing between 3 and 4 has to take a much longer route around the circuit. However, if as shown in Fig. 7.61 we can “crossover” wires without them making contact, the routing can be made much shorter. Figure 7.62 shows how a “crossover” can be realized in practice. The top of the figure shows a vertical line and a

horizontal line crossing over. The bottom is a cross section along the horizontal line showing what happens at the crossover. The vertical line runs along the first metal layer we laid in Fig. 7.59. The horizontal line also runs for a while from the left along this metal layer. To crossover the vertical line, the horizontal line goes to a higher metal layer separated from the original metal layer by oxide, this layer is shown in violet in the figure. To connect the higher layer to the lower layer and complete the wire there must be an area of contact between the two. The metal layer in blue in Fig. 7.62 is called Metal1, the one in violet Metal2. CMOS processes usually have even more metal layers, numbered starting from 1 with 1 being the lowest layer and each layer higher than the one before it. Contacts between metal layers are critical for routing. Only Metal1 can contact poly and active areas. Metal2 can only contact Metal1 and Metal3. Metal3 can only contact Metals2

246

7

CMOS Process

Fig. 7.41 Thin oxide grown. Poly (transistor gate) mask applied

and 4. Thus Metalk layer can only contact Metals k+1 and k −1, with Metal1 being an important exception since it can contact poly and active. Contacts between two metal layers are not termed contacts. Only contacts between Metal1 and poly or active is named a contact. Metal to Metal contacts are called vias. The Via1 mask for example creates contacts between Metal1 and Metal2. Viak creates all contacts between Metals k and k+1 layers. The next mask is Via1. It creates openings the same way and using the same flow for the contact mask (Figs. 7.54 and 7.55). Photoresist is applied over the oxide and exposed through the Via1 mask. The resist is then developed and a wet etchant is used to open the vias, Fig. 7.63. We move to the next step quickly before native oxide develops.

Next metal is applied using PVD to form the Metal2 layer. CMP is used to planarize the deposited metal. Photoresist is applied. The Metal2 mask is then used to pattern away all areas where Metal2 does not exist. This is exactly the same process used to create Metal1 traces. The result is shown in Fig. 7.64 Dry etching is used to pattern the deposited metal into Metal2 wires. Note that the exact same steps used to realize contacts and Metal1 were used to create Via1 and Metal2. The result is shown in Fig. 7.65. The steps in Figs. 7.64 and 7.65 used to create Via1 and Metal2 are repeated as many times as necessary to create all the higher metal layers. Note that as more layers are built up, irregularities get magnified, thus the importance of CMP.

7.6 Locos

247

Fig. 7.42 Wafer with polysilicon (red) deposited all over. The poly layer is extremely uneven due to the hills and valleys created by the FOX and TOX

The cross section of the finished chip is shown in Fig. 7.66. Once fabrication ends, a final layer of very thick oxide is deposited on top of the wafer and is polished. This layer is alternatively called the passivation or overglass layer. It provides protection to the chip against scratching and contaminants. All the layers of the chip lie deep below the passivation layer, except for the final metal layer. This metal layer is used to create metal pads, and it protrudes from the passivation layer. In Chap. 13 we will see how these pads are used to connect the finished die to its IC package pins.

7.7

Advanced Issues in CMOS Processing

1. Understand how bipolar parasitics can cause the wafer to latch-up 2. Understand how to address latch-up through various techniques 3. Understand the limitations of LOCOS isolation 4. Recognize the role of STI and how it is fabricated 5. Realize that copper patterning cannot be managed using etching 6. Understand the copper patterning model.

248

7

CMOS Process

Fig. 7.43 CMP is used to polish the wafer creating a planarized layer of polysilicon

Latch-up One major issue that faces CMOS chips is that there are inherent bipolar structures that inevitably arise. These bipolar structures are parasitics whose presence is neither needed nor intended. At first glance, these parasitic bipolars seem impossible to turn on. Further analysis shows that they can indeed turn on due to transient effects. Once they turn on, they are very difficult to turn off, and will almost always lead to catastrophic and often irreversible failure.

The most important parasitic structure that can arise is a bipolar pair that forms between ground and supply through the n-well and substrate. Figure 7.67 shows where these BJTs originate. The first bipolar is a pnp transistor whose p+ emitter terminal is the source of a PMOS transistor. Its base is the n of the n-well, and its collector is the p of the substrate and the p+ contact. The second transistor is an npn whose n+ emitter is the source of an NMOS, its base the p of the substrate and its collector is the n of the n-well and the n+ of the well contact.

7.7 Advanced Issues in CMOS Processing

249

Fig. 7.44 Photoresist development for the poly step

These two bipolar transistors are tightly paired by their very nature. An equivalent circuit is shown in Fig. 7.68. The base of one is always connected to the collector of the other. For example, Bp is the same as Nc because both are formed by the N-well. There are two resistors in the circuit Rw and Rs. Rw is the resistance of the N-well from the point that forms Pb and Nc to the supply contact. Rs is the resistance of the substrate from the point that the substrate forms Nb and Pc to the ground contact. Only the well and the substrate present significant resistance because only they are lightly doped.

The two parasitic transistors are not just paired through terminal connections, they are paired by virtue of their bases and collectors being one and the same. For example, the n-well is the base of the pnp and the collector of the npn. Thus, the structure can also be drawn as in Fig. 7.69. This structure forms a device called a silicon Controlled Rectifier (SCR) between the PMOS source and the NMOS source through the parasitic path. The device contains three PN junctions marked J1 through J3 in Fig. 7.69. The junctions are formed between the PMOS source–N well–substrate–NMOS source.

250

7

CMOS Process

Fig. 7.45 Polysilicon etched away from exposed areas

At first glance, the SCR device seems inherently incapable of conduction. If we examine Fig. 7.70 We can see that the junction J1 through J3 can never be on together. While J1 and J3 can easily be forward together if enough forward bias is applied, J2 is connected cathode to cathode with J1 and anode to anode with J3, thus being incapable of turning on simultaneously with either. This might encourage us to ignore the parasitic bipolars altogether. In steady state, we are assuming that no current flows in the bipolars. This will lead to zero drops on Rw and Rs. This in turn will ensure that the npn base is grounded and the pnp base is at supply in Fig. 7.68. Thus, the PN junctions formed by the base–emitter in both bipolar transistors will have zero

voltage across and will be off. This ensures that both parasitic transistors remain in cutoff. However, the above treatment ignores two major factors: the presence of drains, and the effect of transients. Drains add another emitter terminal to both the npn and the pnp as shown in Fig. 7.71. Thus, the parasitic BJT transistors become four-terminal devices. If we assume the drains are shorted together, as is the case where the PMOS and NMOS form a CMOS inverter (Chap. 3), then the two new emitters form a single new output terminal. Assume that the drain emitter of the npn (Ne2) falls below ground; This could lead to a situation where the structure in Fig. 7.69 starts to conduct. It is impossible for a

7.7 Advanced Issues in CMOS Processing

251

Fig. 7.46 Exposed thin oxide removed, resist washed, and p+ select mask applied

node to fall below ground, however, in some situations ground itself can bounce up from 0 V, leaving nodes below it (Sect. 13.6). The drain node (Ne2, Pe2) has no low-impedance paths except through the MOSFET channels. When the circuit starts up, this node is initially in high impedance. Once the transient has passed, the node will develop a low-impedance path to either rail. Depending on startup conditions, the node might resolve to any voltage level, including 0 V. If the drain is grounded (0 V) at startup, there will still not be an issue as long as the base of the npn transistor (Nb) in Fig. 7.71 is also grounded.

However, during the transient response, the ground rail has to sink a huge amount of current. This current comes from internal nodes that were holding some charge and now need to discharge to start operating properly. These large currents will leave a voltage drop over the metal line supplying the ground connection to all transistors, but more significantly it will leave a drop over the metal to substrate contact (see the p+ mask in Sect. 7.6). This will cause ground to “bounce” up for a short period. The bounce will eventually settle since the process is transient. The danger of the above situation is that the “ground” that the npn sees at Nb will temporarily rise above the nominal value

252

7

CMOS Process

Fig. 7.47 Photoresist applied over entire wafer

of ground. The drain emitter Pe2, however, might be at a “true ground” since its capacitance might be devoid of charge. This can yield enough base–emitter voltage in the npn to turn the base–emitter junction on. This can happen any time the drain is at a low voltage and the ground bounces up. But it is most likely to occur at startup due to the large transient current into ground. If the base–emitter junction of the npn manages to turn on, it will cause a current to flow. Once current, however small, flows anywhere a catastrophic sequence of events ensues. Figure 7.72 traces why this happens: 1. The ground bounces up. 2. The drain (additional emitter) Ne2 is lower than the ground, the BE junction of the parasitic npn turns on.

3. An emitter current flows through the BE junction of the parasitic npn. 4. Most of the emitter current flows into the collector of the npn. 5. Since the well resistance Rw is relatively high, very little of the collector current of the npn flows through it. 6. Most of the collector current of the npn flows into the base of the pnp, starting the real troubles. 7. The emitter current of the npn is multiplied by the common emitter current gain of the pnp, bp, to create a much larger pnp collector current. 8. The pnp collector current observes a large resistance through Rs, the substrate resistance. Most of the pnp collector current flows into the npn base.

7.7 Advanced Issues in CMOS Processing

253

Fig. 7.48 Photoresist developed, removed from areas where p+ should be created if active

9. The pnp collector current is multiplied by the npn common emitter current gain to create a new npn collector current. The above situation is called latch-up. It is a serious and dangerous failure mode for the chip. There is a positive feedback loop, where an initial small current anywhere in the latch, causes ever magnifying currents to flow. It is also self-sustaining, if the transient that triggers it ends, the latch-up will not resolve. Latch-up will cause enormous currents to flow between supply to ground, in essence shorting the two. This will

only stop once the chip has been destroyed, or once either parasitic transistor has had its current forced to null. The latter typically only happens if the chip is completely turned off. Note that the latch structure can also be triggered supply side. If the supply drops enough to turn on the pnp base (Pb) to drain–emitter junction (Pe2), the same positive feedback loop can be triggered. This will happen if supply seen at pnp base drops enough to turn on the drain–emitter to base junction in the pnp. Again this happens if the emitter drain terminal of the pnp bounces above supply, but in reality it is more likely to

254

7

CMOS Process

Fig. 7.49 Ion beam implantation creates p+ areas (light green). Note that not all exposed areas are doped

happen because supply momentarily drops due to large transient current drawn through contact resistance Rw. Latch-up is not a phenomenon that can be managed once it has occurred. It cannot and will not stop until it destroys the chip. And if it occurs once, it is certain to occur again if the same circumstances are repeated. The right approach toward latch-up is to stop it from occurring in the first place. Several approaches can be used to reduce the likelihood of latch-up: • Increasing the number of ground contacts in the substrate, and supply contacts in the well. As discussed in

Sect. 1.14 these heavy doping areas are necessary to make the contacts ohmic. Increasing the number of contacts to ground and supply will reduce the contact resistance substantially. Thus Rw and Rs are reduced. This will cause less voltage drop during transients, thus reducing the likelihood that the emitter–base junctions of the bipolars will turn on. But it also provides a low-impedance path for the emitter current in Fig. 7.72 to flow, potentially causing the base currents in the positive feedback to diminish. Most design rules (Chap. 8) stipulate a minimum ratio between the number

7.7 Advanced Issues in CMOS Processing

255

Fig. 7.50 n+ areas (dark green) created using the n+ mask (above)

of transistors and the number of contacts, going as far as stipulating one contact per transistor • Reducing transient currents. This can be achieved by ensuring that internal nodes do not need to be simultaneously and suddenly charged up, specifically at turn on. There are different techniques to achieve this. For example, the circuit can be partitioned and supply can be turned on for a section at a time. Alternatively supplies can be turned up slowly till they reach their final value, allowing internal nodes to charge up using smaller currents over a longer time. Both approaches reduce the instantaneous total current sourced to ground or drawn

from supply, thus reducing drops on both and reducing the likelihood of enough bounce to start the latch • Reducing bipolar current gains. If the current gains of the two bipolar transistors are less than unity, then the positive feedback loop will no longer be positive, and transient currents will dampen and decay. However, this requires design decisions which would reduce drain doping. This increases drain resistance and can lead to trouble in operating the MOSFET • Of all the design decisions that can reduce or eliminate latch-up, proper isolation of transistors gives the most effective results. By isolating we mean inserting

256

7

CMOS Process

Fig. 7.51 Self-aligned process (right) and n+ first process (left)

insulators between transistors to prevent the BJT parasitics from forming. The paragraphs below compare LOCOS isolation techniques with more advanced techniques that allow better density LOCOS versus STI The most effective way to reduce latch-up is to isolate MOSFET transistors from each other. This leads to significantly increased resistances that can effectively cut up the latch structure. For example, if effective insulation is introduced between the PMOS and NMOS in Fig. 7.67, then an open circuit is introduced between the base of the pnp and the collector of the npn, completely breaking down any possibility of latching. Even imperfect isolation will greatly hinder the base– emitter junctions of the bipolar parasitics leading to much higher turn-on voltages and much lower current gain factors.

In LOCOS, the method of isolation is by using FOX. As discussed in Sect. 7.6, while identifying the active layer, non-active areas of the wafer have a very thick oxide layer grown. This thick oxide is called field oxide or FOX. The main function of FOX is to isolate active areas (or transistors) from each other. This breaks down parasitic bipolar structures. Using FOX for insulation is very limiting, making its use in modern technologies impractical. To provide effective insulation, the FOX layer has to be grown very thick. This involves oxidation for a very long time. However, we are limited in how thick FOX can be grown by what happens at the periphery with the silicon nitride. Recall from Sect. 7.6 that SiN was used to mask oxide growth when growing FOX. In areas where SiN is not etched, oxide growth is prevented and a thin oxide, marking an active area would be grown instead of the FOX. The

7.7 Advanced Issues in CMOS Processing

257

Fig. 7.52 Contact mask applied and thick oxide deposited using CVD

problem is that the border between active and non-active areas is not very clear cut. As shown in Fig. 7.73, as the FOX grows thicker, it starts putting mechanical stresses on the SiN. These stresses will eventually grow so much that the FOX starts lifting the SiN and growing into the active area. This creates a tapered structure, commonly known as the bird’s beak structure. The length and height of the bird’s beak is a complex function of the temperature and time for oxidation, and might be difficult to control. But what the bird’s beak phenomenon means for sure is that the thickness to which FOX can be grown is very

limited. As we try to grow FOX thicker, it eats more into the active area, reducing the fineness of the features we can create, and degrading transistor performance. This is particularly true for modern technologies where dimensions are short to begin with and not a lot of the real estate in the active area can be wasted on birds beaks. Shallow Trench Isolation or STI is an alternative technique used to more effectively insulate MOSFETs from each other. In STI, a preliminary step is performed before any other layer is patterned. This preliminary step replaces the active layer in LOCOS. First, a layer of oxide is grown and photoresist is applied over it. The wafer is then exposed

258

7

CMOS Process

Fig. 7.53 Oxide flattened using CMP

through a trench mask. The trench mask exposes non-active areas, or more accurately areas where we will create isolation trenches. A wet etchant is then applied, the etchant is capable of eating through both the oxide and the substrate silicon. Etching is allowed to take place until trenches of acceptable depth for isolation are created as shown in Fig. 7.74. These trenches are then filled with oxide using CVD and CMP is used to planarize the result, Fig. 7.75. Now we have isolation trenches that can be made very sharp, and deep enough to combine effective isolation with high-density transistor packing. STI has a huge impact on

some other steps of the process, for example diffusion can no longer be used to create the well, and high impact ion implantation is usually used. This will yield the need for more rigorous annealing, and will hurt the crystal structure more fundamentally. Thus, the process becomes more complex and expensive than LOCOS. However, modern technologies cannot function without STI. STI is not good enough for CMOS technologies at the cutting edge of the state of the art, and more advanced techniques that create deeper trenches and more exotic transistors are now necessary. This is particularly true due to the need to combat the contradictory requirements of leakage and latch-up (Chap. 10).

7.7 Advanced Issues in CMOS Processing

259

Fig. 7.54 Exposure through the contact mask

Copper patterning In Sect. 7.6, we discussed how photolithography can be used to pattern aluminum wires. However, aluminum is not the metal of choice for any modern technology. Gold, silver, and particularly copper have significantly better conductivity. The effect of interconnect resistance on overall circuit delay has become prominent due to the differential scaling of interconnects and gates (Chap. 13). Thus, copper in the metal layer is necessary in modern technologies. However, copper is a very difficult metal to work with because it is very difficult to etch. In fact it reacts

very slowly to dry etching, leading to situations where it is etched as slowly as the surrounding oxide. Although some solvents that react to copper exist, wet etching is inappropriate for metal layers in modern technologies because wet etching yields significant lateral etching under the oxide. This leads to awkward wire profiles, hard to predict conductance, low wire packing density, and large minimum width (see Chap. 8 for a deeper discussion of wire design rules). Copper patterning uses a Damascene process instead of typical photolithography. The first step in this process is to grow a layer of oxide over the underlying structure as shown in Fig. 7.76.

260

7

CMOS Process

Fig. 7.55 Etched contacts

This layer of oxide is covered in photoresist and the photoresist is patterned by exposure. Parts where the wiring will exist are developed and removed. This is shown in Fig. 7.77. Etching is used to create trenches where the wires exist. The etchant eats into exposed oxide, leaving photoresist covered parts intact. Etching is not maintained all the way to the substrate. Instead it only creates basins for wires as shown in Fig. 7.78. The photoresist is removed and PVD is used to spread copper over the entire wafer. Enough copper is added to overfill all the trenches as shown in Fig. 7.79. CMP is used to polish off all the excess copper. CMP can be managed well enough to stop at the oxide interface. This

leaves copper tracks only where wires should exist as shown in Fig. 7.80. A variation on the Damascene process called the double Damascene process observes that both the via (or contact) and the metal can be created in the same step. This saves a processing step and mask. Note that Damascene is very sensitive to the planarity of the wafer. The oxide must be planar before metal is deposited, and the metal after polishing must be very planar before a new oxide is deposited to start working on a higher metal layer. CMP is thus integral to copper patterning and must be of very high quality. Non-planarity can easily lead to wire shorting or opening.

7.7 Advanced Issues in CMOS Processing

261

Fig. 7.56 Sputtered and planarized metal (blue). Metal mask is applied

Multi well and SOI In Sect. 7.6, we assumed that the bulk of the wafer forms the bodies of all NMOS transistors. However, this does not correspond to realistic processes. In a more realistic process, the bodies of the NMOS transistors are built in a relatively thin p-type layer on the topside of the wafer. This layer is called the epitaxial layer. The bulk of the wafer is more heavily doped than the epitaxy, being p+ instead of p, Fig. 7.81. A lightly doped epitaxy increases surface mobility. This gradation in doping improves behavior in terms of leakage and parasitics. If all NMOS transistors are built in

the epitaxy, the process is called a single well process. Single well here does not mean that there will necessarily be a single well, but rather that a single type of well will be used. Typically N-wells that include PMOS transistors. Single well processes are prone to latch-up due to the inherent parasitic bipolars. Thus, a significant area has to be invested in creating contacts to ground and supply in the well and body. Additionally, with a single body all NMOS transistors are forced to share one body. This means that the body effect on each transistor is beyond designer control. Use of body effect to control threshold voltage is sometimes necessary in power management in technologies where leakage power is dominant.

262

7

CMOS Process

Fig. 7.57 Photoresist applied and developed through the metal mask

An alternative approach to CMOS is Silicon On Insulator (or SOI). SOI transistors are immune to latch-up and have low leakage. In SOI technologies, the transistors are built in an epitaxial layer on top of an insulator instead of a semiconductor body. We will discuss the fabrication of such transistors in this section, in Chap. 10, we discuss how SOI improves leakage. There are two major methods SOI can be realized as shown in Figs. 7.82 and 7.83. The approach in Fig. 7.82 is called SIMOX or Separation by Implantation of OXygen. In this method ion implantation at high energy is used to introduce an oxygen-rich layer in the middle of the wafer.

The wafer is then heated in an oven, this causes the middle oxygen-rich layer to oxidize, forming a layer of silicon dioxide that forms an insulating base upon which the transistors can be built. In the second approach, Fig. 7.83, two silicon wafers are used. The first silicon wafer has oxide grown or some other insulator deposited on its surface. This wafer is then flipped and bonded on the unmodified silicon wafer. This leads to the same result as SIMOX with an insulating layer sandwiched between silicon. The main challenges of SOI are in creating and maintaining the insulating layer. In the SIMOX process, for

7.7 Advanced Issues in CMOS Processing

263

Fig. 7.58 Dry etched metal pattern

Fig. 7.59 Metal wires created, resist washed and oxide grown and polished Fig. 7.60 Routing using a single metal layer

example, ion bombardment and heating can cause defects to the crystal structure of the epitaxy. In wafer bonding, the way the two wafers bond and then cool together can create problems. In both cases, the differential expansion and shrinking of the silicon and the insulator can cause mechanical stresses in the wafer that have to be addressed. Once these issues are addressed, however, the rest of the

264 Fig. 7.61 Routing if we can “crossover” wires

Fig. 7.62 Crossover using two metal layers. Top is a schematic of the two wires, bottom is a cross section along the horizontal wire

Fig. 7.63 Via1 created

7

CMOS Process

CMOS process can be used without modification. This is very important for foundries because they can adopt SOI technology with minor modifications. An SOI wafer has several advantages relative to a single well approach. First, latch-up is essentially eliminated since no parasitic bipolar transistors exist. Second, there is no need for well contacts. Most importantly parasitic capacitances are significantly reduced since the bodies of the transistors are completely isolated from the bulk of the wafer. We can also create NMOS transistors in individual wells and use body effect on each independently. But above all, SOI allows great reduction in total leakage. This will be further explored in Chap. 10. Figure 7.84 shows two transistors realized in an SOI wafer. The SOI structure obviously simplifies device isolation, reduces parasitics due to the thick buried oxide, and

7.7 Advanced Issues in CMOS Processing

265

Fig. 7.64 Metal2 (violet) sputtered on the wafer and photoresist patterned using the Metal2 mask. CMP has to be used to planarize

removes the fear of latch-up. However, one major complication is that the bodies of the transistors are now inaccessible, and thus their potential is not controllable. This leads to what is called a “floating body effect”, the potential of the body is floating and thus dependent on initial conditions and the history of past potentials. It also depends on charge accumulation in the channel, and the rate of charge recombination. By observing Fig. 7.84 we can reiterate the following advantages for SOI fabrication:

• Susceptibility to latch-up is much lower, in fact it is basically nonexistent. The two transistors have no semiconductor paths between them, they are fully separated by insulators on all sides. There are no bipolar parasitic structures, and no need for complicated or area consuming isolating trenches • Leakage current (Chap. 10) is much lower. To understand why we have to explore the tradeoffs involved in subthreshold conduction, which we will do in detail in Sect. 10.5

266

Fig. 7.65 Metal etched away using dry etching

Fig. 7.66 Final cross section of the two metal layer circuit

7

CMOS Process

7.7 Advanced Issues in CMOS Processing

267

Fig. 7.67 Parasitic bipolar pairs. In a finished chip there are billions such pairs Fig. 7.70 The parasitic path contains junctions that apparently can never switch on together

• Parasitics from the transistor drains and sources to the bulk are reduced. Whereas these capacitances only had to contend with the depletion layer in bulk devices, they have to contend with the buried insulator in this case. This leads to a much higher capacitor plate separation and a much lower capacitance value Fig. 7.68 Equivalent circuit of the parasitic bipolar pair in Fig. 7.67. The second emitters will be explored below

Fig. 7.69 The parasitic bipolars form an SCR. Spmos is the source of the PMOS. Snmos is the source of the NMOS

There are two types of SOI transistors: Fully Depleted SOI transistors (FDSOI), and Partially Depleted SOI transistors (PDSOI). The difference between the two is in the “body” area of the transistors. In PDSOI, the body of the transistor is partially depleted of carriers. That is, part of the channel is depleted of carriers, another part is relatively rich in carriers. These transistors have characteristics very similar to traditional bulk MOSFETs, with one major exception: the body potential. Remember that MOSFETs are four-terminal devices. In bulk MOSFETs, the bulk or substrate is connected to a known potential, which we assume by default is ground for NMOS and supply for PMOS. This is a hard-wired connection, that ensures the potential of the body terminal of the MOSFET is always known. In PDSOI, the body is the area marked as “bodies” in Fig. 7.84. This area is surrounded on all sides by insulators. This leaves the body with an undefined potential, thus the

268

7

CMOS Process

Fig. 7.71 A second emitter is added by the drains. In a static CMOS inverter, the two drains can be assumed to be shorted and thus Pe2 and Ne2 are the same node

Fig. 7.72 Bipolar pair latching up

7.7 Advanced Issues in CMOS Processing

269

Fig. 7.78 Etching wire trenches Fig. 7.73 Bird’s beak patterns in LOCOS

Fig. 7.74 Trench etching

Fig. 7.75 Deposited and planarized oxide over trenches

Fig. 7.76 Oxide growth as a first step in copper patterning

Fig. 7.77 Pattern development for copper wires

Fig. 7.79 Copper sputtering and overfilling

Fig. 7.80 CMP down to oxide to create copper wire patterns

270

7

CMOS Process

• The rate at which charges can recombine in the channel, if all charges recombine quickly, the channel can return to a “neutral” state faster, where its potential can be known

Fig. 7.81 Wafer with epitaxial p+ to p gradation and an N-well (single well process)

body is “floating”. This means that the potential of the body is defined not only by current conditions, but also of the history of how the body was affected. Thus the body potential is affected by three factors: • The history of body potentials • The current conditions on all other terminals

Fig. 7.82 Buried oxide wafer using SIMOX

Another type of SOI transistor is the FDSOI transistor. In this transistor, the channel is fully depleted. That means that the channel is naturally poor in charges. If an inversion layer is created it would most certainly be because of charge attraction from the drain. This has the disadvantage that it creates devices with severe short channel effects (Chap. 10). However, FDSOI devices have two major advantages owing to the dearth of native carriers: • Recombination is much faster. The device is “hungry” for carriers, and thus will recombine them much faster. causing the channel to return to a reset state faster, this improves the speed of the device • Because recombination is fast, the potential of the body is relatively well known

7.8 Account of Layers

271

Fig. 7.83 Wafer bonding

Fig. 7.84 An NMOS and a PMOS created in SOI

7.8

Account of Layers

1. Realize that masks describe processes on common real estate 2. Understand that when combined, masks must be color coded to distinguish each layer 3. Recognize that the “layout”, the color-coded combination of masks represents a top view of the chip. Figure 7.85 shows a compilation of all the masks used in the different steps described in Sect. 7.6. They are ordered from left to right and from top to bottom in the order in which they are used.

Figure 7.86 shows a cross section of the circuit realized in Sect. 7.6. It is also the result of applying the masks in Fig. 7.85. The cross sectional area shows the different “layers” of the design. The layers are arranged vertically from deeper to higher, which is why they are called layers. Table 7.6 lists the layers in order from the deepest to the highest with notes and where and why the layer is used. The layers are built one on top of another. Thus in general deeper layers must be created first. The only exception of the order of creation of layers is that poly has to be created before dopants are implanted into active areas. This is due to the need to preserve a “self-aligned” process (see Sect. 7.6). Note, however, that

272

7

CMOS Process

Fig. 7.85 Masks used in LOCOS in the order of their use. From top left to bottom right: N-well, active, poly, p+ select, n+ select, contact, Metal1, Via1, Metal2

Fig. 7.86 Cross section of circuit created using masks in Fig. 7.85

the active regions are created before poly. It is the p+ and n+ masks that have to be applied after poly. Now the masks in Fig. 7.85 obviously describe how the shared area of the wafer is divided among the different “layers”. If we combine all these masks together in a single drawing, the result would be unreadable. However if we first color-code each mask before combining them we get something very interesting as shown in Fig. 7.87. The legend for color coding is listed in Table 7.6. Figure 7.87 is the “layout” of the circuit. The layout is significant and interesting in at least a couple of ways. First the layout represents a top view of the circuit. With a little training one could be immediately capable of reading what

circuit a certain layout represents just by looking at it. The second point of interest in a layout is that it represents the totality of the masks used to fabricate the chip. In Chap. 8, we will discover that the layout is not the result of combining the different masks. Rather, the masks are derived from the layout. The layout is the final result of the design flow and the start of the fabrication process. The layout is often automatically generated from a computer aided design tool, but can also be hand drawn. Chapter 8 will use the layout both as the starting point and the ending point of answering the question of how a design involving billions of transistors can be managed by a designer.

7.8 Account of Layers

273

Table 7.6 Layers and color codes Layer number and color

Layer name

Use

Notes

1 Yellow

N-well

Defines area where PMOS transistors will be created, representing their bodies

In an N-well process. In a P-well process the well is p-type in a double well process there can be two types of wells

2 Green

Active, also known as diffusion

Defines areas where thin oxide will be grown. This is all source and drain areas as well as MOSFET channels. Also contacts for bodies and wells

Does not actually create the doped regions. It defines areas where thin oxide will be grown. All other areas will grow a thick field oxide. This layer must extend under poly where there is a transistor

3 Red

Poly

Defines MOSFET gates and connections between gates

The intersection of active and poly defines a MOSFET

x Dotted light/dark green

n+, p+ select

Areas where n+ and p+ will be doped

This is variably numbered. It need only cover those active areas that will be doped, but can over extend them. Only selected active areas will be doped. The two masks are complementary, one can be deduced from the other because their union may be the entire layout

4 Black

Contact

Connections to Metal1 from either active or Poly

5 Blue

Metal 1

Wires made at the Metal1 layer

6 Black

Via 1

Connections from Metal1 to Metal 2

7 Violet

Metal 2

Wires made at the Metal2 layer

8 Black

Via 2

Connections from Metal2 to Metal3

Fig. 7.87 Color-coded combination of masks

Metals and vias continue interchangeably depending on the number of metal layers in the technology

8

Design Flow

8.1 1. 2. 3. 4. 5. 6.

What Is a Layout

Interpret a layout Extract main features from a layout Translate a layout into a circuit Draw a cross section of a layout Understand the order of layers in a layout Understand the peculiarity of poly-diffusion intersection due to self-alignment.

Figure 8.1 shows the layout of the LOCOS process from Sect. 7.6. Let us begin by asking what circuit this layout represents. The cardinal rule for reading layouts is that the intersection of an active region and a poly region is a transistor. Note that even though the active is actually drawn below the poly, there will be no dopants in this intersection area. Thus the intersection area between the poly and active layers is the MOSFET channel. To understand why dopants are not inserted into the active region under the poly, review the self-aligned process in Sect. 7.6. Figure 8.2 shows the intersection of an active and poly layer. This is a transistor. Now is this an NMOS or a PMOS, and how can we extract its W and L from the layout? If the intersection of poly and active occurs in the N-well the transistor is PMOS, if it occurs in a p-substrate or P-well it is NMOS. The channel is the rectangular area of the intersection of the two layers as shown in Fig. 8.2. The channel length is the distance between the source and the drain, thus it is the width of the poly line. Channel width is the distance that the poly runs over the active. In the layout, in Fig. 8.3 we can identify two poly-active intersections, and thus two MOSFETs. One is in the N-well and the other is outside it. Thus, one is NMOS and the other is PMOS. The gates of the two transistors are shorted since they are both made using a single track of poly. One terminal (it is unclear at this point if source or drain) of each of the transistors is shorted to one terminal of the other transistor using a metal 1 wire near the center of the layout.

Each transistor also has its other terminal connected to an independent metal 1 track. The resulting circuit is shown in Fig. 8.4. The circuit is obviously a CMOS inverter, and we can conclude that the two terminals of the transistors shorted using metal are the drains. The source of the NMOS is connected to a metal line that will certainly be connected to ground, and that of the PMOS source should be connected to supply. This can be confirmed by the fact that these two lines then run to the body and well contacts the opposite transistor. We can conclude that the following steps should be followed to extract a circuit from a layout 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Each poly-active intersection is a transistor. Each intersection in an N-well is PMOS. Each intersection in p substrate or P-well is NMOS. Connect terminals using metal lines. Connect terminals using Poly lines. Different metal layers that run over each other are crossovers. Only contact or via locations represent real connections between different layers. A p þ area in the substrate or P-well is a body contact for NMOS transistor(s). An n þ area in an N-well is the body contact for PMOS transistor(s). If not stated, examine the resulting circuit and deduce the position of supply and ground.

Figure 8.5 is the cross section of the circuit from the layout in Fig. 8.3 along the dotted line. To obtain the cross section: • Realize layers from bottom to top beginning with the N-well and ending with the highest metal layer • Layers should be in order except for poly and active. Active is realized in order only if we recognize it does not indicate doping, just the growth of FOX • Order of layers is: Well, FOX, Poly, n þ and p þ , Contacts, Metal1, Via1, Metal2, Via2, etc.

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_8

275

276

8 Design Flow

Fig. 8.1 Layout from Chap. 7

8.2 1. 2. 3. 4.

Stick Diagrams

Understand the advantage of drawing stick diagrams Realize the limitations of stick diagrams Learn the rules of drawing stick diagrams Use Euler paths to minimize diffusion strips in a stick diagram.

In Sect. 8.1 we saw how we can extract a schematic from a layout. In reality, we almost always want to do the opposite: draw layouts from schematics. Stick diagrams are an incredibly useful intermediate step for obtaining layouts from schematics. While drawing stick diagrams, we encode some information about the relative positions of different layers. We also encode a lot of information about routing, intersections, and overpasses. Stick diagrams can be very useful in the preliminary design of standard cells (Sect. 8.3). However stick diagrams are not layouts. For example, they do not necessarily contain information about the size of different tracks. They also do not define precise positions of different layers, or any information about separation. Thus the stick diagram does not contain any information that could aid in deciding whether or not the layout will be successful (Sect. 8.5). The stick diagram encodes the following: Fig. 8.2 Poly-active intersection in layout and in cross sectional realization due to the self-aligned process. The intersection is always a transistor

• The rough location of different layers • The rough location of contacts and vias • Optionally it may indicate the aspect ratio of transistors

8.2 Stick Diagrams

277

Fig. 8.3 Layout with distinguishing features pointed out

Fig. 8.6 Legend for stick diagram color code. Contacts and vias can alternatively be indicated by black circles. Some conventions distinguish contacts from vias

Fig. 8.4 Circuit implemented by layout in Fig. 8.3

It does not encode: • Precise location of layers or vias • Width or precise length of tracks The stick diagram is a parallel representation of the schematic. But instead of drawing solid monochromatic lines, we draw color encoded lines indicating different Fig. 8.5 Cross section of layout in Fig. 8.3 along dashed line

layers. Figure 8.6 shows the color code used to indicate the different layers. Note that lines are used instead of rectangles, thus no information is carried about width. The main layers are: green for n-diffusion, light green for p-diffusion, blues for metal layers, and red for polysilicon. Contacts and vias are indicated by black crosses, circles, squares, or diamonds. Optionally, a dashed line is drawn across the diagram. This line indicates the demarcation between the well and the substrate. This helps take the design closer to the layout by insisting that all PMOS lie on one side of the line while all NMOS lie on the other. The following rules are used to draw stick diagrams

278

8 Design Flow

Fig. 8.9 CMOS inverter and associated stick diagram Fig. 8.7 This figure does not show any connections. It does, however, show one transistor

Fig. 8.8 Everything here is a connection

• Any intersection between two different layers (except diffusion and poly) is considered a crossover (Fig. 8.7) • Any intersection between two lines from the same layer is a connection without the need to indicate the presence of a contact (Fig. 8.8) • Any intersection between n-diffusion and poly is an NMOS and it must lay on one side of the demarcation line • Any intersection between p-diffusion and poly is a PMOS and it must lay on one side of the demarcation line • Any intersection between two different layers that represents a connection must be indicated by a via or contact (Fig. 8.8) Figure 8.9 shows a CMOS inverter converted into its stick diagram. There are no specific steps that must be followed while drawing the stick diagram. However, the following are general guidelines (Fig. 8.10):

Fig. 8.10 Steps in drawing the stick diagram

1. Vdd and GND are drawn as Metal1 horizontal lines at the top and bottom. 2. Draw the dashed well-substrate line roughly in the middle of the figure. Note that all locations are rough, so do not spend a lot of time fine-tuning the location of the dashed line. 3. Draw n and p-diffusions wherever there are transistors. These can be, for now, positioned where the transistors are on the schematic. 4. Draw poly lines across the diffusions.

8.2 Stick Diagrams

5. Connect the poly lines of the NMOS and PMOS, extend the poly lines towards the left as input ports (see Sect. 8.3 for an explanation why). Poly lines cannot crossover each other, recall if any lines from the same layer crossover, it is an automatic connection. 6. If possible move the poly inputs to Metal1 before extending them to the left side of the figure, again see Sect. 8.3 for an explanation why. 7. Connect drains using Metal1 lines. Connect sources to drains or to supply rails using Metal1. 8. If a crossover in metal is needed then use Metal2. Avoid resorting to higher metal layers. 9. Extend outputs as Metal1 lines towards the right of the figure (see Sect. 8.3 for a reason why). 10. Review that any cross-layer connections are made using contact/via. 11. Review that no multi-level contact vias are attempted. That is to say, Metal2 only contacts Metal1 and Metal3, but cannot contact poly or via to Metal4. Note some processes allow “stacked vias”, which would make it legal for any layer to connect to any other layer. 12. Optionally, mark the aspect ratios of transistors on poly-diffusion intersections. Figure 8.11 shows the result of applying the above steps to a NOR gate. Note that the stick diagram looks very similar to the schematic, this is because in step 3 above, we assumed active layers were drawn similar to the schematic. The above approach does not allow a systematic positioning of transistors. If transistors are placed in their positions in the schematic; we have to assume that each will have its own diffusion region. If two transistors are in series, they can share the diffusion region across a source/drain (see Fig. 8.12). Parallel transistors will also share diffusion because they share both terminals. In fact, any transistors connected at any node can share a diffusion. Because all Fig. 8.11 NOR gate and stick diagram

279

transistors in a CMOS circuit are connected to another transistor, then we can share a lot of diffusion regions. Sharing diffusion regions saves not only the overhead of having two independent diffusions but more importantly the overhead involved in connecting the two diffusions through the metal layer. The metal layer has a wide minimum width (Sect. 8.5), but it is the contact that will have a devastating effect on the area. Contacts are large and they require a large enclosure by the metal and the diffusion (Sect. 8.5). Since we need two contacts to connect two diffusions, the impact on area can be significant. The Euler path approach allows us to find out whether a single diffusion can be used to build all transistors of the same type. It is one tool that can be used to design efficient layouts. The Euler path is a path through a graph that visits every edge only once. When applied to CMOS circuits, an Euler path is a path that passes every transistor only once and every node at most twice. If an Euler path can be found through a CMOS circuit, then that circuit can be implemented using a single n-diffusion and a single p-diffusion. Using horizontal diffusions and vertical polys, this allows for very regular and compact stick diagrams. Notice that there is no way to guarantee that the Euler path approach would yield a circuit with a minimum area. It only yields a regular, relatively efficient layout. However, hand-drawn and optimized layouts often yield better area results. The Euler path approach is described in detail below 1. Trace a path through either the PDN or the PUN. The path can have any starting point and ending point. However, it must cover all transistors only once and can visit any node (including ground, supply, and output) at most twice. If a path cannot be found, try another starting point.

280

8 Design Flow

Fig. 8.12 Shared diffusion versus individual diffusion for series transistors

2. Trace a path through the remaining network, PUN or PDN following the same sequence of transistors as found in step 1. If this path is an Euler path proceed, if not retrace back to step 1. 3. Arrange poly sticks vertically from left to right in the order of the Euler path, the polys should extend across the well delimiting line. 4. Draw one n-diffusion and one p-diffusion across the polys. 5. Connect points on the diffusion to each other, supply, and ground to realize the circuit. This is best illustrated using an example. Consider the circuit in Fig. 8.13. The figure shows an Euler path traced across the PDN and the PUN. Notice that some nodes (in this case ground and node X) may be visited twice but not more than twice. Figure 8.14 shows the steps described above being implemented to draw the stick diagram. There are three inputs to the circuit, so three vertical poly lines are drawn across the delimiting line. Two active lines are drawn, one in the substrate, and one in the well. The inputs are marked, and the different terminals are connected using metal lines to realize the circuit in Fig. 8.13. Neither the Euler path, nor the order in which the inputs are written on the stick diagram is unique. For example the path in Fig. 8.13 dictates a path where the order of inputs is CAB. However, the stick diagram in Fig. 8.14 follows the sequence CBA, which also corresponds to a valid Euler path on the schematic. Figures 8.15 and 8.16 show another example of finding the Euler path and drawing the stick diagram. If a single Euler path cannot be found through the entire circuit, then multiple actives will have to be used. But we can extend the above approach. If we can cover the entire schematic using K Euler paths, then the stick diagram can be realized using K active strips. For example the function Out’ ¼ A(B þ C) þ D(E F) cannot be covered using a single Euler path. Two paths are chosen One covering BADF and one covering EC. In Fig. 8.17 we manage to cover both the PDN and PUN using two paths.

Fig. 8.13 The function F′ ¼ AB þ C with Euler path traced

Figure 8.18 shows the stick diagram corresponding to the two paths in Fig. 8.17. Two diffusion regions need to be used in both the PDN and the PUN. Metal lines can then be used to route signals properly. In circuits complex enough to require multiple Euler paths, we often need to resort to Metal2 to find a solution for routing.

8.3

Standard Cells

1. Realize the need for a regular design when the number of transistors is large 2. Understand the restrictions on standard cells 3. List library entries for a standard cell

8.3 Standard Cells

281

Fig. 8.14 Steps to draw Euler stick diagram of F′ ¼ AB þ C

Fig. 8.16 Stick diagram of Euler implementation of F′ ¼ AB þ CD

4. Distinguish layers used for routing in a standard cell IC 5. Recognize the need for power line scaling for long rows 6. Compare power routing techniques.

Fig. 8.15 Euler paths for F′ ¼ AB þ CD

There are two rough categories of digital Integrated Circuits: Full-custom, and semi-custom. We can divide each category

282

8 Design Flow

Fig. 8.17 Schematic of Euler paths in function Out′ ¼ A (B þ C) þ D(E þ F). The PDN and PUN are drawn individually but in reality the “Out” node is shared. There can be no single Euler path through both networks. However, two paths are enough to cover the entire network

Fig. 8.18 Stick diagram of Out from Fig. 8.17 using two diffusion strips per network

into even finer sub-categories, but at the end of the day, all circuits are either full-custom or semi-custom. The distinguishing feature between the two is how the layout is drawn. In full-custom circuits, the layout is manually drawn. CAD tools may be used to aid in the design, and pre-made blocks might be used, but direct designer involvement is necessary to complete the layout. Because a full-custom layout is handcrafted for the application at hand, it has the capacity to yield excellent area and power-delay performance. However, this approach to design is only possible when a small number of transistors is used. Thus full-custom in its strictest sense is most often used in analog designs. Analog circuits include a small number of components. But

fine-tuning these components, and dealing with their parasitics and characteristics are very important. Digital circuits typically include an enormous number of transistors. Performance in a digital circuit is more dependent on the overall architecture than the properties of each individual device (Chap. 6). A more automated approach must be followed towards generating the final layout. This set of approaches is termed “semi-custom”. The most common semi-custom technique is the standard cell approach. Figure 8.19 shows an ASIC designed using standard cells. A realistic ASIC will have a much larger number of rows and cells per row. But the general architecture can already be gleaned. Logic gates are arranged in rows of cells. The cells have the same height, thus they form neat rows. However, cells have different widths because they are generally not identical. Cells are usually (but not always) attached without gaps. Horizontal tracks between the rows are used for routing. In this section we will answer the following questions • What are standard cells and how are they designed • What rules must be followed in arranging and connecting standard cells In Sect. 8.7 we will answer the following questions • How do we choose which cells are used in a design • How do we connect the cells together Standard cells belong to standard cell libraries. A library is a very large, nearly exhaustive list of standard cells. When a designer wants to build their own digital circuit, they are restricted to the library. Thus, the target circuit must be

8.3 Standard Cells

283

So what exactly is a standard cell? To best answer this question, we examine how a standard cell is designed. When a library is being built, first an exhaustive list of the required primitives is drawn up. Next each of these primitives has to pass through the following steps: • The primitive is characterized on the logic level. If it is a combinational primitive a truth table is used, if it is a sequential circuit a state transition diagram is drawn • A circuit simulation is performed on the standard cell. This circuit simulation is used to characterize the preliminary delay characteristics of the cell • The layout of the standard cell is hand crafted and optimized • The layout is passed through parasitic extraction and LVS (Sect. 8.7). This allows the layout-related parasitic capacitances and resistances to be obtained and more refinement to be performed on delay figures At this point the library entry for the primitive is considered “complete”. The entry for the specific primitive in the library must include the following: • • • •

Fig. 8.19 Standard cell ASIC

broken down into a set of standard cells that belong to the library. Translating a circuit into standard cells is an automated process called synthesis. Connecting the cells to form the larger circuit is also an automated process called placement and routing (Sect. 8.7). Standard cell libraries must include all basic logic functions such as NOT, NAND2, and NOR2. They also often support larger logic gates, with more inputs. Additionally, most libraries include “primitives” for basic arithmetic including half adders and full adders. All libraries must also include some sequential circuit primitives, at least a DFF. Practical libraries also include larger, commonly used circuits in their libraries such as N-bit adders and multipliers. These larger circuits might be standard cells themselves or they might be built up and optimized from more basic standard cells. Table 8.1 Elements of an ASIC library, how they are obtained, and how they are used

Logic characterization Basic delay information Layout Parasitics

Table 8.1 summarizes how the components of a library entry are obtained and how they are used in the overall design. Understanding how the library is used requires a visit to Sect. 8.7. But at the very basic level the information about the cell in the library is used to allow CAD tools to map the user design to the standard cells, and then to allow the cell’s contribution to the performance of the overall circuit to be estimated. Figure 8.20 shows a sample standard cell layout for a NOT gate. This gate is not optimized for area, however, its height will have a very specific significance. The layout of the standard cell has to fulfill DRC rules that apply to a general layout (Sects. 8.5 and 8.6). But additionally, the following restrictions must also be observed:

Element of library

Obtained from

Used in

Truth table/state transition table

Logic characterization

Translating user design to standard cells (synthesis)

Delay information

Circuit simulation

Adding delay information to synthesis. Post-synth simulation

Layout

Hand design or combining smaller layouts

Forming the overall circuit layout. Place and route

Parasitics

Parasitic extraction

Post layout simulation

284

8 Design Flow

Fig. 8.21 NOR2 standard cell layout. A and B are the gate inputs Fig. 8.20 NOT standard cell layout. Input and output must be in Metal1. Vdd, GND, and well must run through very specific locations

• The cardinal rule of standard cells is that their height is fixed. All standard cells in a library must have the same height, this height is called the “pitch”. A more complicated cell can be wide, in fact, it can be as wide as it needs to, but it can never exceed the pitch in height. Pitch is usually dictated by the height needed for simpler cells, such as the NOT gate • Metal1 lines of a preset width must run along the bottom and top of the cell horizontally. Their location and size must be fixed and they have to run across the entire width of the cell. The top line will be used to supply Vdd. The bottom line will be used to supply ground • The N-well has to run through the top part of the cell and has to have a specific height • Use poly and Metal1 to route within the cell. If absolutely necessary use Metal2 for internal routing, but never use higher metal layers • Inputs and outputs have to be in the Metal1 layer • Try to keep poly and diffusion lines perpendicular to each other. For example, if polys are running vertically, run diffusions horizontally. This is a guideline as opposed to a hard rule Stick diagrams are often a good starting point for drawing standard cell layouts. If the Euler path approach is used, then suboptimal cell layouts are obtained but poly and diffusion orthogonality guidelines are observed. Figure 8.21 shows the standard cell layout of a NOR2 gate. The gate has the same height as NOT but is much wider. In general, cells can be as wide as necessary but must fulfill all the guidelines for standard cell layouts listed above. Figure 8.22 shows a very simplified library entry for the NOT gate. This is only an illustrative sample. As stated

above, there are four main components to every library entry: The logic characterization, the delay model, the layout, and the parasitic model. The logic characterization of the cell is shown in the top left of Fig. 8.22. In this case, we show a very simple truth table for the combinational circuit. More realistic library entries will include a longer truth table with entries for output in cases where the input is metastable, high impedance, contentioned, or weakly driven (see Sect. 9.3). The section numbered 2 in Fig. 8.22 is the circuit model or the delay model. We show a very simple entry with intrinsic high to low and low to high delays (Chap. 3). A more realistic library entry will also include information about the logic effort for the cell driving an external output (Chap. 4). It will also list the delay performance for different “corners”. The fabrication process is not perfect and it produces diversity in the results. This affects the mobility of electrons and holes in MOSFET devices. A good delay model will list delay values for “corners” of the design where the NMOS and PMOS devices are fast, slow, or typical (Sect. 6.9). Parts 3 and 4 of Fig. 8.22 are tightly related. They show the layout and the parasitics extracted from the layout. Sections 8.5 and 8.6 will illustrate how layouts are drawn. Standard cell layouts follow these standard rules as well as the additional rules described earlier. Section 8.7 will discuss how parasitics are extracted in more detail. Figure 8.23 shows the justification for some of the standard cell rules discussed above. Standard cells have to have the same pitch because they are arranged in rows. The cells in a row must have the same height. Each row must also be of equal height. The space between rows is used for routing channels. The constant pitch is also necessary when combined with the top and bottom Vdd and GND lines. When arranged in a row, the cells will have a continuous Vdd and GND strip that

8.3 Standard Cells

285

Fig. 8.24 Standard cell row with a gap. There are two options, either complete the well and power lines through the gap, or make sure to route properly from both sides

Fig. 8.22 Simplified library entry for NOT gate

Fig. 8.23 Rows of standard cells with abutted power lines and wells

runs across the row. This would only happen if the cells have the same pitch. Enough space is left around the ASIC to allow pads to be created for connection with pins (see Sect. 14.6). The horizontal spaces between rows contain Metal1 tracks for horizontal routing. In some cases, the place and route tool (Sect. 8.7) will need to leave a space or clearance open in a standard cell row for routing. This is shown in Fig. 8.24. In these cases the tool must intervene to complete the power lines in the row. Because standard cells have identical pitch and well locations, when they are arranged next to each other their wells abut. If an opening is left to allow routing, the tool must either intervene to complete the well or take into consideration that the two created wells do not violate design rules (Sect. 8.6).

Once the cells are arranged, their inputs and outputs must be connected to realize the overall function of the circuit. These connections fall under two categories: local and long-range. Table 8.2 lists different layer routing strategies, which will be discussed in detail below. Local connections are made between cells very close together in the row (typically directly adjacent). It is useful for the placement tool to place cells that will connect together close. Thus, local connections are very plentiful. Figure 8.25 shows how local connections are implemented. These connections access the I/O ports of the cells through Metal1. Metal1 is combined with another layer that can then be used to complete the local connections across two cells. We need more than one layer to avoid creating unwanted connections with the same layer. Recall that if two tracks from the same layer intersect, that is an automatic connection. Poly can be used in conjunction with Metal1 in processes with limited metal layers. In processes with plenty of metal layers, poly is avoided for routing and Metal2 is used in conjunction with Metal1 for same row routing. Polysilicon has a higher resistance than metals and thus has worse delay properties (Sect. 13.3). When long-range connections need to be made, the horizontal wiring tracks that run between the rows are used. This is shown in Fig. 8.26. These tracks are made of Metal1. They are accessed by the cells through Metal2 or higher. The tracks allow the signal to travel very long distances horizontally. So they can be used to connect cells in the same row over long distances or if routing them using the approach in Fig. 8.25 becomes too complicated. The standard cells have to access the horizontal track using Metal2 or higher to avoid creating unwanted connections with Metal1 wires within the cell, or horizontal tracks that it is supposed to crossover. This is why the outputs of a standard cell have to be in Metal1 because routing is done mainly through a higher metal. If the outputs of the cell were in poly or active, we would need more than a single via to

286 Table 8.2 When and how different layers can be used in routing. Most routing takes place through the metal layer, but special cases are not uncommon

8 Design Flow Layer

Use in routing

Diffusion

Can be used to create a shared diffusion strip for multiple transistors (Sect. 8.2). It is also used to route grounds in NOR ROMs (Sect. 12.2), and to route long control signals in DRAMs (Sect. 12.8). Otherwise should never be used for routing due to the prohibitive resistance

Poly

Often used within a cell to route between transistor gates. This saves the overhead of creating a contact to metal. Sometimes used in same-row inter-cell routing in processes with less than 3 metal layers

Metal1

Supply, ground, in-cell routing, horizontal routing channels

Metal2

Local connections in-row when more than 3 metals are available. Vertical connections inter-row when less than 3 metals are available

Metal3 through Metal (N-1)

Generalized routing, usually long-range. The higher the metal layer the longer the range

MetalN

Often reserved for special signals (power rails, clocks, etc)

Fig. 8.25 Poly/Metal1 used for row (local) connections (above). Metal1/Metal2 used for row connections (below). Two layers are needed to perform crossovers with in-cell wires

Fig. 8.26 Signal jumpers performing horizontal connection, and performing cross track connections. Use Metal2 in a 3-metal process. Higher metals otherwise

access the Metal2 for routing. Note that every horizontal track supports only a single connection. If a vertical connection is needed between two horizontal track sets then a dedicated high metal track is used to make the connection. Which metal layer is used to make which connections is a matter of choice. Any choice that achieves the design constraints and satisfies DRC is “correct”. Metal2 can be used for vertical connections in processes with 3 metal layers. However, in processes with more metal layers,

Fig. 8.27 Power line scaling. If the row is longer, the power and ground lines are made thicker

Metal2 is used for in-row communication, and thus cannot be used for vertical routing. Power signals are considered special cases. They cannot be treated similarly to logic signals. As shown in Fig. 8.23 power and ground are routed horizontally across every row. However, Fig. 8.27 shows a refinement of this approach. In this figure we see that the Vdd and GND lines in the cells are widened the longer the row of standard cells. A longer row of cascaded cells means more cells drawing more current from supply and sinking it to ground. The higher current will lead to higher drops on the metal line. Even though the metal line has a low resistivity, the large drawn currents can leave significant variations. The ground and supply variations are troubling in two ways: they vary with time, normally being much worse during critical switching phases. And they vary with location. Cells further away from the supply will see a longer track and thus a higher drop. Widening the track for longer cascades lowers resistance, allowing supply perturbations to be controlled. Figure 8.28 shows how supply and ground can be provided in a very simple ASIC. Very thick power and ground lines driven from the input pads run horizontally along the top and the bottom in Metal1 layer. Alternating power and

8.3 Standard Cells

287

Fig. 8.28 Simple power distribution. Horizontal routing is done through Metal1. Vertical routing is through Metal2

ground lines run vertically through the columns of blocks of standard cells. These vertical lines must be connected to the main horizontal power rails using plenty of vias to reduce drop over the contact resistance. The vertical lines are in Metal2. Each row of standard cells will derive its ground and supply from the vertical Metal2 lines using Metal1 to supply the abutted cells. Supply in Fig. 8.28 runs from left to right while ground runs from right to left. The leftmost cells in the topmost blocks of standard cell rows will see the worst ground bounce, while the rightmost cells in the bottom blocks will see the worst supply drop. A large ASIC cannot tolerate drops over the entire width of the chip. Figure 8.29 shows how drops can be reduced. There are supply and ground rails both at the top and the bottom. The vertical Metal2 lines are contacted to both the top and the bottom rail. This allows supply and ground to be driven from the top and the bottom. The maximum drop is reduced almost to half and is seen in the middle block of standard cells. Figure 8.30 shows this approach taken even further. Ground and supply form complete rings around the chip area. They are driven to the standard cell blocks vertically through Metal2 and horizontally through Metal3. The drops are significantly reduced and are observed in the center-most cells. Practical technologies have dedicated layers for supply and GND. For example, the process may stipulate that Metal6 is used only for Vdd. In this case, an extreme approach can be used to supply distribution: The entire Metal6 plane can be driven from the supply pin. This layer can then be contacted to each individual cell row, significantly reducing drops. One might ask if the Metal6 layer itself will not see drops. It will, however, higher metal layers

Fig. 8.29 Two-sided supply and ground. Only one via is shown, but as many vias as will fit must be used

Fig. 8.30 Ring power supply

288

8 Design Flow

• Impurities. An unclean environment will lead to contamination, which can lead to significant changes in electrical properties of silicon. It can also lead to the creation of traps in the oxide, this will exacerbate the hot carrier effect (Sect. 10.6), and lead to changes in threshold voltage • Temperature variations. This can impact almost all steps in photolithography including implantation, deposition, oxidation, and resist baking • Variations in ion implantation. This leads to variation in resistivity, but can also cause the depletion region and diffusion capacitance to become unpredictable • Resist shrinkage and tearing. As the resist dries and shrinks, it may leave some areas uncovered. This can lead to unpredictable etching • Mask misalignment. This is critical and requires a lot of management (Sect. 7.5) • Oxide thickness variations. If the thin oxide varies, threshold voltage will vary (Sect. 1.17). If the field oxide thickness is uneven, then metal PVD can lead to metal line breakage along corners • Via faults. Via may not open at all, via may open too little, or via may open too much creating a short. This is due to variations in etching

are typically much thicker than lower layers, reducing their sheet resistance significantly. Clock signals are another special category of signals. Clocks in a synchronous design have to be routed to a huge number of cells. The enormous load combined with variable delay to different cells can introduce significant complications to timing. Designing a clock network is the topic of Sect. 13.7. External inputs and output are also treated differently from internal signals. Because they have to handle the extreme loads of off-chip capacitances, they are routed and driven very carefully. I/O management is the topic of Sects. 13.5 and 14.6.

8.4

Design Rules: Foundations

1. Understand that the fabrication process is not ideal 2. Connect the need for design rules with non-idealities in fabrication 3. Understand the four categories of design rules: Width, separation, enclosure, and special 4. Discuss the minimum density rule 5. Discuss the antenna effect and the antenna rule. In this section, we want to examine how we can take a design from a circuit diagram to a layout. We will assume that the layout will be drawn manually, although in the next few sections we will realize that this is far from practical. However, assuming that the design will be manually drawn will help expose us to one very important aspect of layouts: Design rules. The manufacturing process uses the masks obtained from the layout to expose the wafer through repeated photolithographic iterations (Chap. 7). However, this manufacturing process is never ideal. There are defects (Chap. 14) that can arise during manufacturing, leading to an imperfect translation of the layout. The primary causes of defects in manufacturing are:

Table 8.3 Process variations

All the manufacturing defects can be traced back to one of the variations in the photolithographic process listed in Table 8.3. To address these issues design rules are introduced. Design rules are restrictions on the layout. There are four categories of design rules: • Minimum width. This is the most obvious set of rules. A minimum width is imposed on all layers. This minimum width may be much higher than the technology parameter (minimum channel length). This guarantees that etching is successful, deposition has enough space to take effect, and that corners due to oxide thickness variations are not too thin • Minimum separation. This can be a minimum separation between two tracks in the same layer, or between two

Process variation

Effect on layer

Effect on circuit

Mask misalignment

All layers, but particularly active

Can cause variations in drain/source area. Can cause metals to short or not to connect

CVD variation

Particularly affects oxide

Oxide thickness variations. In thin oxide affects threshold. Otherwise can affect other layers creating unintended opens

PVD variation

Metal layers

Leads to variations in resistance. Combined with uneven oxide can lead to opens, and corners that suffer from electromigration

Etch variation

Vias mostly, but also metal

In vias can cause incomplete, absent, or over opening. In metal layers can cause wrong tracks for Al and variations in resistance for Cu

8.4 Design Rules: Foundations

289

tracks in different layers. These rules can be directly related to mask alignment faults • Minimum enclosure. This means that one layer must be enclosed within another by a minimum distance. This is particularly important for contacts and vias where the enclosure will ensure that etch variations will not affect the proper realization of the contact (see Table 8.3) • Special rules. These are rules that do not fall under any of the above three categories. Later in this section, we will discuss two such rules, namely minimum density rules for metal layers, and antenna rules for metal/poly areas The aim of these design rules is to optimize the yield (Sect. 14.1) of the process. The rules introduce safety margins to ensure that variations in the manufacturing process will lead to predictable and limited chip failures. These limited failures will then translate into an acceptable yield. Lenient design rules allow more dense features to be realized. This allows a lower cost per manufactured millimeter. However, the more lenient the design rules, the more faulty chips that result from manufacturing and the lower yield. Conservative design rules introduce more empty spaces and larger dimensions. This ensures a higher yield. However, it also causes designs to occupy more area. It is obvious that there is a tradeoff between yield and area for different sets of design rules. There is no “best” design rule that everyone has to use; the choice depends on the importance of yield versus area. Some design rules are “special” in that they cannot be categorized as describing minimum dimensions, minimum separation, or minimum enclosure. One example is the minimum density rule. This is a rule that applies in particular to the metal layers. This rule stipulates that metals throughout the wafer, or at least in certain parts of the wafer have to be packed together with a minimum density. While most design rules insist on a minimum distance between two features; the minimum density rule stipulates the opposite. It stipulates that wires have to be packed at a certain minimum density. The minimum density rule is particularly important in damascene patterning technologies (Sect. 7.7). But it also has a role in older technologies. The rule has its roots in the interplay between CMP and oxide deposition and is necessary to achieve planarity. Figure 8.31 shows a situation where a metal layer is patterned with a large distance between the wires. After patterning Metal1, we deposit oxide in preparation for the Metal2 mask. The oxide will deposit on the wafer, however, it will not be uniform. Because the Metal1 wires rise above the surface, the oxide will form hills above them. If the metal lines are very widely spaced, there will be very deep troughs or valleys between the hills formed above the metal lines.

Fig. 8.31 Oxide growth and planarization with a density rules violation. (1) CVD leads to trough due to uneven oxide deposition. (2) When planarized some of the trough is left remaining. (3) When oxide is grown and a higher-level metal is deposited, the trough is also filled with metal. (5) Patterning the higher metal layer will lead to unwanted metal remnants that can lead to shorts and failure

When CMP is used to planarize the wafer, the metal lines will be reached before the oxide can be fully planarized. Planarization is very critical for metal layers but is especially critical for copper patterning where CMP is an integral part of realizing the pattern (see Sect. 7.7). As seen in sub-Fig. 2 of Fig. 8.31 we are left with remnant troughs between the wires. Any more polishing and we would start cannibalizing the metal lines. When we deposit Metal2 as in sub-Fig. 3, then additional Metal2 tracks are deposited in the troughs. These tracks will not be properly etched and can create catastrophic shorts. Minimum density rules stipulate that metal lines cannot be too widely spaced. If necessary we have to introduce dummy open metal rectangles as fillers to satisfy the rule. This is shown in Fig. 8.32 where a third metal line is introduced to increase the packing density. The tightly packed metal lines in Fig. 8.32 cause oxide to deposit in very large hills. There are no wide spaces where deep oxide valleys can form. Once CMP is applied (2 in Fig. 8.32), a perfectly horizontal oxide surface will be achieved. This allows us to deposit and pattern Metal2 without issues. Notice that the minimum density rule is more critical for processes with more metal layers. The higher the metal layer, the more magnified the effect of uneven oxide in a deeper layer. Another special form of design rules is called the antenna rule. The antenna rule is rather complicated. It involves the interaction between metal and poly layers. The antenna rule might initially seem unwarranted. To understand how

290

8 Design Flow

Fig. 8.34 Cross section of two inverter connections in Fig. 8.33 through a single metal line. Dn is the drain of the NMOS in the first inverter, Gn is the gate of the NMOS in the second inverter

Fig. 8.35 Cross section of finished connection using two metal layers Fig. 8.32 Oxide growth and planarization without density rules violation. The dummy wire in the middle is drawn just to prevent the formation of a trough and can be left unconnected

antenna rules function, we first have to understand why they are needed. The antenna rule is added to fight the “antenna effect”. The antenna effect is something that you would never notice in a finished chip. It affects wafers only during the manufacturing process, however, it can have a devastating effect. It is also a misnomer since we are not really dealing with electromagnetic radiation or reception. The antenna effect has to do with the accumulation of electrostatic charge. Consider the circuit in Fig. 8.33, the drains of the first inverter are connected to the gates in the second inverter. This connection will be through a metal layer (see above and Chap. 7). Thus, we need to contact the poly layer of the gates in inverter 2 and contact the active layer of the drains of inverter 1 and then connect the two through metal layers. Figure 8.34 shows part of this connection with one drain from inverter 1 and one gate from inverter 2.

Fig. 8.33 Circuit under fabrication that may suffer from antenna damage

The situation will become more complicated if the two nodes are connected through more than one metal layer. This will happen especially if the two modules are placed in distant locations, and is almost guaranteed to be a common occurrence in modern chips. Recall that higher metal layers can only be contacted by the metal layers below them through vias. If we are using two metal layers for the connection, then the finished chip will look like Fig. 8.35. The antenna effect asks the question: what would happen if “a lot” of electrostatic charge accumulates over the Metal1 and Metal2 lines in Fig. 2.11? The accumulated charge will cause a large electric field to form over Gn2. Note that Gn2 is separated from the substrate by a thin oxide. If the metal lines are long enough, and the accumulated charge is large enough, the built up field can be big enough to cause this thin oxide to break down. This would lead to irreversible failure of the second inverter in Fig. 8.33. In reality, the situation is not so dire. Notice that the metal lines in Fig. 8.35 are also connected to the drains of the first inverter, or Dn1 in Fig. 8.33. If a channel is formed in the transistors of the first inverter, then any accumulated charge over the metal lines will simply leak through the low impedance NMOS channel to ground. But even if the NMOS transistor in the first transistor is cutoff, there is a reverse biased PN junction between the drain and the substrate. This reverse PN junction will break down at a much lower charge than the gate dielectric, shorting out any excess charge. Reverse PN junction breakdown is reversible and thus this process is non catastrophic. Therefore, the antenna effect is a non-issue in a finished wafer/chip as in Fig. 8.35. This is because every metal line in a CMOS circuit will be connected to at least one transistor drain, allowing any excess charge to leak safely. So why bother introducing a special rule for the antenna effect?

8.4 Design Rules: Foundations

291

Fig. 8.36 Metal1 realized, now entering Metal2 photolithography. A large stretch of Metal1 is left connected only to a MOSFET gate

The problem occurs during fabrication and happens because the metal layers are fabricated from the lowest to the highest. Figure 8.36 shows such a situation where Metal1 has been deposited and patterned and we are moving to the next step where Via1 and then Metal2 will be realized. The problem now is that there is a large stretch of Metal1 connected only to the gate of transistor 2. This stretch of metal might accumulate enough charge to break down the gate dielectric. It is not connected to any sources or drains through which the charge can leak by breaking down the reverse junction. But how do we know that there will be electrostatic charge on the gate? Maybe if we properly ground the wafer; this effect would be negligible. Unfortunately, we can guarantee that metal lines (especially aluminum) will accumulate enormous electrostatic charges. Dry etching (Sect. 7.5) is used to pattern some metals. The plasma phase etchant used in dry etching is highly charged by definition, and will necessarily leave all stretches of metal with a lot of charges until the etching process concludes. Antenna rules stipulate a maximum ratio of total connected metal to the total poly area as long as the metal has not yet been connected to a junction. This might sound complicated at first, but it is much simpler once you get into the details: • Calculate the total area of all metal layers that will connect to a MOSFET gate without the full connection to a source or drain being realized. For example, if we use 4 metal layers to complete the connection, then Metal1 through Metal3 will all contact the poly before the whole connection is completed through Metal4 • Calculate the total area of the contacted poly • If the ratio of the first figure to the second figure exceeds the design rule then you are in violation of antenna rules So what does this mean? If a MOSFET gate is contacted by metal, then it must be routed somewhere else. Since we are dealing mostly with CMOS, MOSFET gates are inputs. CMOS inputs comes from CMOS outputs, which are always

MOSFET drains. Thus, the MOSFET gate is connected to MOSFET drains through metal lines. Depending on the routing (Sect. 8.7), the number of metal layers used for routing can vary. Assume K metal layers are used. Metal layers are realized from bottom to top. Thus, only when the Kth metal layers is realized, would we end up with a situation like Fig. 8.35 where the connected PN junction at the drain makes us “safe” from the antenna effect. Thus, the total area of metal layers from Metal1 through MetalK-1 connected to the MOSFET gate will accumulate charge. The amount of charge accumulated is proportional to the total area of all these metal layers. But would this cause the dielectric to breakdown? Not necessarily. If the poly connected to the MOSFET gate has a large area, then it can accommodate the charge without creating a large field. Thus, the metric of concern should not be the metal area, but rather the ratio of the metal area to the poly area. The antenna rule must be checked for each and every MOSFET gate. If the ratio of areas exceeds a certain threshold then an “antenna rule violation” is declared. These violations must be fixed, otherwise, the circuit will fail during manufacture. At a very basic level, an antenna violation indicates that our layout is “wrong” and must be revised. There are three systematic ways to do this. It is critical that the solution be systematic, because there is a huge number of transistor gates to check and modify, thus the process has to be automated. The first approach works only in processes that allow stacked vias. Stacking vias means that different via layers can be drawn and realized one over the other. Essentially, this means that any metal layer can access all other metal layers without going through the intermediate layers. Stacked via processes do not suffer from the antenna effect. As shown in Fig. 8.37, stacking vias means only a minimal total metal area is accumulated over any gate. This guarantees that violations will be very rare. The second approach works for processes that do not support via stacking. The approach is shown in Fig. 8.38. In

292

8 Design Flow

Fig. 8.37 Stacked vias to highest metal. Top, finished connection cross-section. Bottom, maximum charge accumulation

the figure, three metal layers are used for routing. Normally we would go up from Metal1 to Metal2, then from Metal2 to Metal3, and then use Metal3 to perform the majority of routing. Figure 8.38 shows a slight modification of this. On the MOSFET gate-side of the routing, we go up to Metal1 then to Metal2 then to Metal3. We then go back down to Metal2 and back up to Metal3 to finish the remainder of the routing. The bottom of Fig. 8.38 shows the situation just before Via2 and Metal3 are realized. The top of the figure shows the situation after routing has been completed. In the bottom sub-figure, we note that this approach minimizes the area of Metal2 connected to the MOSFET gate by making it go up to Metal3 as soon as possible. Notice we are not overruling routing decisions because we immediately go back down to Metal2, allowing the rest of routing to proceed as usual. The simplest solution for the antenna effect is shown in Fig. 8.39. The bottom sub-figure shows the situation with maximum electrostatic build up. An extra PN junction is created near the gate by creating an active area. In the finished circuit this active area will not affect functionality due to the reverse bias. But during fabrication, the reverse junction to the substrate or well provides a path for electrostatic charge to leak without destroying the gate oxide.

This solution is simple, but the area overhead of the additional diffusions can add up.

8.5

Design Rules—Sample

1. Contrast micrometer-based design rules with scalable design rules 2. Understand the conventions in describing design rules 3. Examine a sample design rule set. There are two types of design rules 1-Micrometer-based design rules: These constitute the majority of industrial-scale design rules. In this category, the rules (dimensions, separations, etc.) are stated in absolute terms, i.e., in micrometers. This makes these rules unscalable, a set of rules that work for a 65 nm technology will not work for 45 nm technology. However this kind of DRC is usually very area efficient. It only restricts dimensions, separations, and enclosures by as little as absolutely necessary. Thus, designs that use micrometer-based rules tend to be smaller, faster, and consume less power.

8.5 Design Rules—Sample

293

Fig. 8.38 Going to higher metal then immediately coming down. Top: The finished cross section. Bottom, maximum antenna charge accumulation

Fig. 8.39 Extra junction added gate-side to allow charge leakage. Top, finished cross-section. Bottom, maximum charge accumulation

2-Scalable (k-based) design rules: These rules are not stated in absolute dimensions. Instead, they are stated as multiples of a minimum feature k. This minimum feature represents the minimum possible resolution in the process.

Thus, all features must be drawn on a grid that uses k as increments. Any dimension, separation, or enclosure must be integer multiples of k. k is half the minimum channel length. This type of design rule tends to be very conservative. Rules

294

8 Design Flow

Fig. 8.42 Poly design rules from Table 8.6 Fig. 8.40 N-well design rules from Table 7.18

Fig. 8.41 Active design rules from Table 8.5

are strict, requiring larger dimensions, separations, and enclosures. Thus k-based designs are less efficient than micrometer-based designs in all dimensions of design characterization (area, power, and speed). However, k rules are scalable. Thus, a layout can be easily migrated into a different technology simply by plugging the correct value of k. To tape-out the design to a 90 nm technology, for example, k would be 45 nm. This type of DRC cannot be used for extremely deep submicron technologies, and it tends to find application only in academic settings. In this section, we will present a sample k based ruleset. The rules are derived from the MOSIS scalable design rules.

We will only show important layers that correspond to masks used in Chap. 7. DRCs support some other optional layers that may be used in specialty processes. These rules may allow exotic features such as a second polysilicon layer for linear capacitors. Section 12.8 discusses an example where such a non-traditional layer is used. The design rules are divided into a number of tables. Each table entry is numbered according to its layer order. The layers are ordered from bottom to top rather than in the order in which they are fabricated. Rules within the same layer are also numbered. This allows easy reference to rules by numbers. For example, rule x.y is rule number y for layer x.

8.5 Design Rules—Sample

295

Fig. 8.45 Metal1 rules

Fig. 8.43 Select design rules from Table 8.7 Fig. 8.46 Via1 rules from Table 8.11

1-Well These rules control the N-well (Fig. 8.40). If the process is double well, the same rules apply to the P-well. Note that the minimum dimension of a well (rule 1.1) is rather large. This is not an inconvenience since the minimum area of the well dictated by rule 1.1 is barely enough to contain one transistor. Spacing between two wells depends on the well types and their potentials. If the process is twin-well then p and n wells can be abutted. But wells of the same type need to be significantly spaced. The spacing of wells depends on whether they are connected to the same potential (Table 8.4). 2-Active

Fig. 8.44 Active and poly contact rules from Table 8.8

The active layer defines areas where thin oxide is realized (Table 8.5). Rules 2.3 and 2.4 dictate a minimum enclosure

296 Table 8.4 Design rules for layer 1, the N-well

Table 8.5 Design rules for layer 2, active

Table 8.6 Design rules for layer 3, poly

Table 8.7 Design rules for layer 4, the select masks

8 Design Flow Number

Rule

Size (k)

1.1

Minimum width

12

1.2

Minimum spacing—different potential

18

1.3

Minimum spacing—same potential

6

1.4

Minimum spacing—different types

0

Number

Rule

Size (k)

2.1

Minimum width

3

2.2

Minimum spacing

3

2.3

Source/drain to well edge

6

2.4

Substrate/well contact to well edge

3

2.5

Separation of n þ and p þ active

4

Number

Rule

Size (k)

3.1

Minimum width

2

3.2

Minimum spacing over thin oxide

4

3.2.a

Minimum spacing over field oxide

3

3.3

Minimum gate clearing of active

3

3.4

Minimum active clearing of gate

4

3.5

Minimum poly to active separation

1

Number

Rule

4.1

Minimum spacing from enclosed channel active

3

4.2

Minimum overlap of active other than channel

2

4.3

Minimum overlap of contact

1

4.4

Spacing (only same type) and width

4

for active within the well with a distinction between MOSFET active and substrate/well contact active. It also defines the distance of actives outside the well to the edge of the well as shown in the figure. Note that defining which type of active we are talking about requires knowledge of the select mask. Rule 2.5 requires n and p active regions to be separated by a minimum distance (Fig. 8.41). 3-Poly This layer together with layer 2 are critical because their intersection forms the MOSFET. Layer 3 describes MOSFET gates. Design rule 3.1 in Table 8.6 dictates the minimum channel length. Rule 2.1 dictates the minimum channel width. Rule 3.2a allows polys to be brought closer for routing outside the active. Rules such as 3.3, 3.4, and 3.5

Size (k)

guarantees that misalignment between different layers will not lead to loss of transistor action (Fig. 8.42). 4-Select The select layer describes both n and p select. Note that different selects can abut, in which case they can share the same active region. However, the non-MOSFET select must clear the channel as dictated by rule 4.1 in Table 8.7. If two selects of the same type are drawn they must be separated by 4.4 (Fig. 8.43). 5-Contact to active Contacts have a fixed size, not a minimum size. They must be 2  2, they cannot be larger. Rule 5.4 in Table 8.8

8.5 Design Rules—Sample

297

Requires minimum distance between a contact to active and any nearby poly (Fig. 8.44). 6-Contact to poly Contact to poly rules are nearly identical to contact to active rules (Table 8.9).

9-Metal2 15-Metal3 22-Metal4 26-Metal5

7-Metal1 Metal 1 lines can be as narrow as active lines (rule 7.1), but must be wider than poly lines (Fig. 8.45). Creating a contact has a significant area overhead because the contact has an exact size (rule 5.1), requires enclosure by both layers on all sides (rules 4.2, 5.2, and 7.3), and must clear the channel by a minimum distance (5.4 and 6.4). Wide metal lines are used to carry critical signals or to route for longer distances. Wider Metal1 lines have a different separation rule from thinner lines (rule 7.4) (Table 8.10). 8-Via1 Vias tend to be larger than contacts. Rules 8.4 and 8.5 in Table 8.11 can be ignored for technologies that allow contact/via stacking. In such technologies the contact and Via1 can be drawn right over each other, allowing active or poly to directly reach Metal2. In such a case Via1 may be drawn over active, poly, and contact (Fig. 8.46).

Table 8.8 Design rules for layer 5, contacts

Table 8.9 Design rules for layer 6, contact to poly

Table 8.10 Design rules for layer 7, Metal1

Vias have an exact size, just like contacts. If the area provided by the via is too resistive, we can achieve a lower resistance by drawing multiple vias, but we cannot draw a larger via. Like contacts, vias are area consuming because they are large and require an enclosure by both layers on all sides.

All these higher-level metal layers have rules very similar to Metal1 with the exception of rules x.2 where separation is 4 instead of 3. Thus use the same table as Metal1, replacing the second-row entry with 4. These high metal layers are usually used to perform inter-cell routing. The higher the metal layer the farther away the cells being routed. See Sect. 8.3 for traditional use of metal layers. The number of available metal layers depends on technology (Table 8.12). 30-Metal6 This is the highest metal layer supported by MOSIS rules. It is a thick metal layer that is often used to route power and ground. Because this is the highest metal layer, it has a wider minimum width. A higher metal layer has to be thicker and wider. The reason is that the accumulation of irregularities in oxide deposition after each step of photolithography would create significant valleys and hills in higher metal layers. Proper metal patterning requires wires to be thick and wide (Table 8.13).

Number

Rule

Size (k)

5.1

Exact size

22

5.2

Minimum enclosure by active

1

5.3

Minimum spacing

4

5.4

Minimum distance to MOSFET channel

2

Number

Rule

Size (k)

6.1

Exact size

22

6.2

Minimum enclosure by poly

1

6.3

Minimum spacing

4

6.4

Minimum spacing to MOSFET channel

2

Number

Rule

Size (k)

7.1

Minimum width

3

7.2

Minimum separation

3

7.3

Minimum overlap of contact

1

7.4

Minimum spacing when either metal line is wider than 10k

6

298

8 Design Flow

Table 8.11 Design rules for layer 8, Via1

Table 8.12 Design rules for layers 9, 15, 22, and 26. Higher metal layers

Number

Rule

Size (k)

8.1

Exact size

33

8.2

Minimum spacing

3

8.3

Minimum overlap by Metal1

1

8.4

Minimum spacing to contact

2

8.5

Minimum spacing to active or poly edge (only techs that do not allow via stacking)

2

Number

Rule

x.1

Minimum width

3

x.2

Minimum separation

4

x.3

Minimum overlap of contact

1

x.4

Minimum spacing when either metal line is wider than 10k

8

14-Via2 21-Via3 25-Via4 MOSIS assumes stacking above Via1 is permissible, thus introducing no separation rules for successive vias. All the rules match those for Via1. Note ViaX connects MetalX to Metal(X þ 1) (Table 8.14). 29-Via5 This layer requires larger vias with more separation for the same reason Metal6 has a larger width. The oxide irregularities at such a high layer require wider vias, otherwise, etching may fail to open the via. Additionally, Metal6 is expected to carry sizable current, narrow vias can be quickly degraded by electromigration (Table 8.15).

8.6 1. 2. 3. 4.

Fixed-Point Simulation

Examine the ASIC design flow Contrast floating-point and fixed-point numbers Model fixed-point numbers as binary integers Realize the role of fixed-point simulation in the design flow

Table 8.13 Design rules for layer 30, Metal6

Size (k)

5. Model fixed-point operations using pseudocode 6. Understand how the performance of fixed-point systems can be quantified. Figure 8.47 shows a typical ASIC design flow. The flow is recursive and may require multiple passes. The end result of the design flow is the set of masks used to fabricate the ASIC. The masks can be directly extracted from the layout. Thus, the layout can also be considered a valid endpoint for the design. The flow starts with a high-level description. This description might be as general as a verbal discussion of the algorithm that needs to be implemented. But it might be as specific as a simulation in a high-level computer language. But in all cases, the high-level description is far removed from a physical implementation and needs to pass through multiple steps before it is realized as a layout. These steps are summarized in Table 8.16. Synthesis, and place & route will be discussed in detail in Sect. 8.7. In this section, we are concerned with the fixed-point modeling step. The high-level algorithm description can easily be converted into a simulation. This simulation is typically performed by non-circuit designers and is thus concerned with metrics that are far removed from

Number

Rule

30.1

Minimum width

Size (k) 5

30.2

Minimum separation

5

30.3

Minimum overlap of contact

2

30.4

Minimum spacing when either metal line is wider than 10k

10

8.6 Fixed-Point Simulation Table 8.14 Design rules for layers 14, 21, and 25: higher-level vias

Table 8.15 Design rules for layer 29, Via5

299 Number

Rule

Size (k)

x.1

Exact size

33

x.2

Minimum spacing

3

x.3

Minimum overlap by Metalx

1

Number

Rule

Size (k)

29.1

Exact size

44

29.2

Minimum spacing

4

29.3

Minimum overlap by Metal5 and Metal6

1

hardware implementation. Typically a systems designer will be concerned about achieving functionality, and perhaps about the number of arithmetic operations performed; but never about the speed, area, or power of the ultimate hardware that will implement the algorithm. At such a high-level, the simulation is often floating-point. This means that both numbers and arithmetic operations on such numbers are floating-point. Floating-point numbers are represented in the following form Number ¼ significand  baseexponent Floating-point can represent a very wide range of numbers due to the use of exponents. The system is termed floating-point since the position of the binary point in the floating-point processor is not pre-defined and can be placed anywhere depending on the scale that needs to be used. System-level simulations are performed on general-purpose processors with floating-point units. This allows the designer to focus on modeling the algorithm instead of thinking about how numbers are represented or worrying about an inaccessible programming paradigm. However, floating-point operations are complicated and hardware that uses floating-point numbers is inefficient. The majority of ASICs use fixed-point numbers instead. Fixed-point systems differ from floating-point in three significant ways • Fixed-point arithmetic operations are very efficient when implemented in hardware. In Chap. 11 we will consider how arithmetic translates into hardware. All the circuits we will discuss assume fixed-point operands • The width of operands in fixed-point systems is usually much smaller than in floating-point systems. This is both a result of and a reason for the efficiency of operations • By definition, the position of the binary point in a fixed-point system is fixed

When we say that the position of the binary point is fixed, we mean that it lies in the same location for all registers. This divides any word into an integer component and a fractional component, with each occupying the same number of bits for all registers. Figure 8.48 shows an 8-bit register, in a fixed-point system where the binary point position is chosen so that 6 of the 8 bits form the integer part. The first step in the design flow in Fig. 8.47 involves transforming the floating-point model into a fixed-point model. This is necessary to provide a reference point for the hardware implementation. The fixed-point simulation is used to determine the sizes of the registers used in operations. This in turn determines the size and design requirements of the arithmetic units used. Thus, we move a step closer to hardware implementation. If the hardware implementation matches the sizes in the fixed-point model, then the results of the fixed-point simulation must also match those from the hardware model. Thus, the fixed-point model can provide a “gold standard” against which to measure hardware functionality during testing (Sect. 14.1). The fixed-point model provides a bit-accurate model for the hardware. This means that if the implementation is proper, the output of the hardware should match that from the fixed-point simulation bit by bit. However, the fixed-point simulation provides absolutely no information on sequence of outputs or on timing. For a discussion of simulations that carry information on timing, see Sect. 8.7. There are two questions about fixed-point modeling • How floating-point numbers and operations can be transformed into fixed-point numbers and operations • How the decision can be made about the sizing of registers and operators Below, we will develop a simple but powerful method for fixed-point simulation. Although we will always represent

300 Fig. 8.47 Simplified ASIC design flow

8 Design Flow

8.6 Fixed-Point Simulation Table 8.16 Main steps in the design process with their input and main output. For more information on synthesis and Place & route see Sect. 8.7

301 Step

Input

Output

Fixed-point modeling

Floating-point algorithm

Bit-accurate fixed-point algorithm

Synthesis

Behavioral description

Netlist

Place and route

Netlist

Layout

Fig. 8.48 Registers in an 8-bit, 6-bit integer system. The fractional part of the word is 2-bits long

numbers as pure integers, the method is agnostic to the position of the binary point. The advantage of this system is that it allows the usage of floating-point processors to perform fixed-point simulations. Modeling a fixed-point number on a floating-point platform All simulation environments run on floating-point processors. To build a fixed-point model, one has to build a library of functions that force the processor to mimic the performance of a fixed-point unit of a certain width. The model we will develop will assume that all numbers are integers. That is to say, the binary point in all fixed-point registers lies to the right of the LSB as in Fig. 8.49. This might seem highly restrictive. After all, we are preventing ourselves from modeling any numbers with fractional parts. However, notice that addition and multiplication will preserve the location of the fixed-point in any fractional numbers. Thus, performing these operations on pure integers is equivalent to performing them on mixed-integer-fraction numbers with the difference of a scaling factor. This is best illustrated by an example Consider the addition operation of the two integers 13 þ 11 ¼ 24. In binary, this operation is “1101” þ ”1011” ¼ ”11000”. If we try to add the two numbers 6.5 þ

Fig. 8.49 Pure integer register

5.5 ¼ 12, then the operation in binary is 110.1 þ 101.1 ¼ 1100.0. Adding 3.25 þ 2.75 ¼ 6 is 11.01 þ 10.11 ¼ 110.00. The operation 1.625 þ 1.375 ¼ 3 is 1.101 þ 1.011 ¼ 11.000. The operation 0.8125 þ 0.6875 ¼ 1.5 is 0.1101 þ 0.1011 ¼ 1.1. The above shows that all the operations are essentially identical. The operands and the results contain the same binary string, while the position of the binary point varies. The position of the binary point can be easily predicted from the operands. In fact, each result is a shifted version of the one above it, which in decimal is equivalent to division by 2. Thus, the last result is equivalent to the first result divided by 2^4, which is a 4-bit shift in binary. We can now use these observations to generate fixed-point numbers in a floating-point simulation environment. Assume we generate a random number “a” drawn from a uniform distribution between 0 and 1, and that this number is stored in a floating-point register. If this number is multiplied by {2^(p)–1}, then its range switches from 0–1 to 0–{2^(p)-1}. However, this number is still floating, it contains a fractional component that can be of significant length. If this number is then truncated (or rounded), then it becomes a pure integer. Any number generated using the pseudocode shown below is thus an integer that belongs to the set 0–{2^ (p)–1}. This is equivalent to a fixed-point register of width “p”. There is no loss of generality in assuming the number is an integer as seen in the above discussion. This approach allows us to always consider only the integer parts and ignore any fractions, which greatly simplifies simulation. Pseudocode to generate fixed-point “a”: a ¼ rand; a ¼ a  ð2^ p  1Þ; a ¼ floorðaÞ;

Performing fixed-point addition If we add the fixed-point numbers modeled above in a floating-point simulator, the result will be floating-point. To model the effects that fixed-point adders have on numbers, we recall that generally, addition adds a bit to the result compared to the operands (Chap. 11).

302

This additional bit can be dealt with in three ways • The result can be stored in N þ 1 bits, the guard bit allowing noiseless storage of the result • The MSB can be dropped and the remaining N bits stored. This opens the chance for overflow but allows non-overflow results to be stored noiselessly. This is almost never used in ASICs • The LSB can be dropped. In this case, the result will never be totally “wrong”. However, some results will have a “noise” due to the effect of truncation These options are best illustrated through an example. If we add the two numbers 10 þ 13 ¼ 23, in binary and assuming 4-bit inputs 1010 þ 1101 ¼ 10111. The result is stored in a 5-bit register. If this result is left as it is, then this is equivalent to the first option in the list above. If the MSB is dropped then the result becomes 0111 ¼ 7. This is obviously a wrong result or an overflow. Note that this would always happen if the carry-out of the last bit is one, and will never happen if the carry-out is zero. The last option in the list above is equivalent to storing 1011 instead of 10111. This can be modeled in binary by shifting the number to the right, then removing all bits in the fractional part: 10111-> 1011.1 -> 1011. In decimal this is equivalent to dividing by 2 then truncating the fractional part: 23 -> 23/2 ¼ 11.5 -> 11. Storing 11 instead of 23 might seem as “wrong” as storing 7 instead of 23. Note however, that the lost information while storing 11 was lost from the LSB. The number 11 exists in a different scale from 23. Thus to compare the two, we return 11 to the original scale by multiplying by 2. Thus, the stored number is equivalent to 22. This is not equal to 23 but is not far off. The difference 23−22 ¼ 1 is a noise component. This noise is due to quantization and is thus called quantization noise. Quantization noise will always exist in odd numbers and never in even numbers when truncating only one bit.

8 Design Flow

multiplication. If each multiplication in the cascade is allowed to double the width of the inputs, then the latter multipliers will be enormous. Thus, the outputs of multipliers are always truncated. This is shown in Fig. 8.50. The result will be “noiseless” when stored in a register with a width equal to the summation of the number of bits of the inputs. A number of bits are truncated, yielding a balance that is acceptable. The truncated bits then represent quantization noise for the multiplier. The pseudocode for modeling truncation in a multiplier output is shown below. The length of operand “a” is assumed N, the length of operand “b” is assumed M. The result is assumed to be truncated by P bits. The result is “r”. Pseudocode to perform fixed-point multiplication: r ¼ a  b; r ¼ r=2^ P; r ¼ floorðrÞ;

Quantifying performance of fixed-point systems The main question that faces a designer when migrating a design to fixed-point is the size of registers to use. Consider Fig. 8.50, how do we choose P, the number of bits to truncate? If P is large, then the register used to store the result is small. Even better, subsequent arithmetic operators will be small. On the other hand the large truncated portion introduces a large quantization noise, causing the result to deviate further from the “clean” floating-point results. Larger registers can significantly reduce quantization noise, but will also lead to a cascade of ever larger hardware.

Performing fixed-point multiplication Fixed-point multiplication can be modeled along very similar lines to fixed-point addition above. However, multiplication yields much wider results than addition. When two numbers N bit wide are multiplied the result is 2 N wide. This width is called the noiseless wordlength. Storing the noiseless result of multiplication is generally unacceptable. This is not because the registers needed to store these results are large, but rather because most algorithms implemented in hardware require a sequence of operations rather than a single isolated operation. Thus, the result of one multiplication is often the input to another

Fig. 8.50 Multiplier with truncated output. The noiseless result of multiplying two operands of lengths N and M is N þ M. If P bits are truncated, the stored output is N þ M-P bits long, with the truncated bits becoming quantization noise

8.6 Fixed-Point Simulation

303

Making a decision on register sizes is a very complicated process without a clear right or wrong answers. The designer has to be very familiar with the requirements of the application. In some applications, the noise introduced by quantization can be tolerated without affecting the user experience. In some others quantization noise becomes the limiting factor, and we need to manage it carefully. The decision-making process uses the fixed-point simulation to find if the current setup is “good enough”. But what metric is used to determine what is “good”? The most direct metric is Quantization Signal to Noise Ratio, or QSNR for short. QSNR is defined as:  2  E Xfloating QSNR ¼  2  E Xfloating  Xfixed

decisions on register sizes. Thus, there must be a way for specifications to be communicated between the two levels clearly.

where Xfixed is the output of the fixed-point arithmetic unit, and Xfloating is the output of the corresponding floating-point operation. Note that the two numbers must be on the same scale, i.e., the floating point must be brought to the same position as the fixed-point before subtracting. Xfloating-Xfixed is the quantization noise, and QSNR is thus a ratio of the quantization noise power and the floating-point power. QSNR, when measured in dB measures how much the signal has been degraded due to truncation. We can generally assume that each additional output bit adds 6 dB to the QSNR at the output. QSNR loses its meaning for long algorithms with complex sequences of operations. After all, what is the acceptable level of QSNR for a radio transceiver? Which signal would we choose to measure QSNR for in such a complicated system? The ultimate judgment on fixed-point performance is always made by measuring the highest level output of the system versus its floating-point counterpart. This allows us to estimate degradation in terms that relate to the application at hand. For example in a radio transceiver, the ultimate measure of performance for system designers is bit error rate or BER. Thus, an appropriate measure of how fixed-point truncation impacts the design is to measure the shift it imposes on BER. In other words, to achieve the same BER as the floating-point model, how much higher would the signal to noise ratio have to be in the fixed-point system? In other applications, BER may not be the best metric, or there may not even be a BER, thus the ultimate judgment must be made on an application-by-application basis. This illustrates the need for clear communication between engineers at different levels of the design. The system designer is usually most familiar with acceptable performance metrics. Hardware designers are more likely to make

In Sect. 8.3 we introduced the standard cell architecture as a major implementation approach for ASICs. In Sect. 8.6, we introduced the design flow for ASICs. In this section we will discuss the design flow in more detail, giving each of the steps more attention. We will give particular attention to new information discovered in every step of the design flow, as well as ways and needs to trace back to an earlier step when/if a problem is found. The design flow is drawn in more detail in Fig. 8.51. At every step, we define the inputs necessary, as well as the expected outputs of the step. Table 8.17 lists simulations that can be performed after each step in the flow. We will discuss each in more detail later. We need to perform multiple simulations because after every design step we uncover new information. This information can uncover wrong behaviors that were not seen earlier. Figure 8.51 illustrates one important aspect of ASIC design: that it is iterative. At every step of the flow, if a problem is discovered, it is often necessary to go back to an earlier design step and make changes. Thus, understanding how the design steps interact with each other is critically important.

8.7

Physical Design

1. Understand the detailed ASIC design flow 2. Recognize the simulation information available at each level of the design flow 3. Understand how the library is used at every step of the design flow 4. Realize the iterative nature of place & route 5. Recognize the impact of constraints on place & route 6. Contrast DRC and LVS.

RTL modeling Input: Fixed-point high-level model Outputs: Portable, synthesizable HDL of the design Simulation: Behavioral simulation In this step, the fixed-point model is re-written in the form of a Hardware Description Language (HDL). HDL languages are very flexible and allow users to write in a programming-like syntax. However, this can be a disadvantage. While writing HDL, the designer must be aware of how the code will be translated into hardware. The mindset should be describing the hardware model rather than writing a sequence of operations (see Chap. 9).

304

8 Design Flow

Fig. 8.51 Design flow in more detail. The output of every step is shown. Each step also allows a new simulation to be performed. At every step, we have the chance to iterate back up to any previous design step

Most hardware implementations use a synchronous pipeline approach (Chap. 6), or at the very least very few clock domains well separated by interface circuits (Sect. 13.9). The single-clock, register-only approach simplifies the remainder of the design flow and allows the resulting hardware to be predictable. In HDL the transfer of

data between registers is fully captured. This means that data will flow between registers at the correct clock edge. The RTL model allows a “behavioral simulation” to be performed. The behavioral simulation captures all the fixed-point effects of the hardware. The results can then be compared to the results of the fixed-point simulation. The

8.7 Physical Design Table 8.17 Different levels of simulation offer different levels of information. Lower levels will always inherit all the information from all the above levels, adding more information of its own. Thus, the post-extraction simulation includes the maximum amount of information in any simulation

305 Simulation

Information available from simulation

Floating-point

Function only. We can confirm that the arithmetic functions performed lead to the results expected. This is usually performed by a systems engineer with little input from hardware designers

Fixed-point

Function, bit-accurate. We can confirm that the results of operations fit the expected results for the chosen register sizes. A fixed-point simulation can be done by either systems engineers or hardware designers. But best results are achieved with input from both

Behavioral

Function, bit-accurate, cycle-accurate. We can confirm that correct outputs are produced in the clock cycle expected

Post-synthesis

Function, bit-accurate, cycle-accurate, gate delays. We can confirm that there are no setup-time violations (Chap. 6) due to combinational and sequential logic delay

Post place and route

Function, bit-accurate, cycle-accurate, gate delays, routing delays. We can confirm that the additional wire delays (Chap. 13) do not create new setup or hold-time violations

Post-extraction

Function, bit-accurate, cycle-accurate, gate delays, routing delays, parasitics. We can confirm that additional parasitic capacitances and resistances due to the layout do not degrade performance beyond our specs

results should be perfectly bit-matched. If a bit mismatch is found, the designer must go back and examine either or both of the RTL and fixed-point models. The behavioral simulation is an event-driven simulation. That is to say it recalculates the output whenever an event happens at the input. The event can either be a change in one of the inputs or a change in the clock. Thus, in a properly designed synchronous pipeline, the behavioral simulation will provide information about when (in terms of the number of clock cycles) data becomes available, not just what said output is. The output of the behavioral simulation can be represented on a waveform showing the outputs and inputs relative to the clock signal. Notice that this level of simulation does not contain “timing” data in the strict sense. We do have information about which cycle data becomes available on but there is no information on propagation delay through any components. In other words, you can use an arbitrarily small clock period, and the simulation would still work perfectly. The behavioral simulation tests the fixed-point performance of the circuit, its functionality, and the alignment of shift registers in the pipeline. It does not test either gate or routing delays. It does not capture setup or hold-time violations, and does not contain any information about the particular technology used. Good RTL should be technology-independent and portable. The main output of RTL modeling is a design written in HDL. The HDL must be synthesizable because it will be the main input to synthesis.

Synthesis Inputs: HDL design files, library, some simple constraints Outputs: Netlist, timing report including only gate delay information Simulation: Post-synthesis simulation Synthesis is the core of the design flow. It is a critical step in transforming a behavioral description into something that physically exists on a target platform. Synthesis is target-specific, meaning that it is fully cognizant of the target platform and that everything it does is informed by the target. There are two main inputs for synthesis. The first is the HDL written in the RTL modeling step. If this HDL is written by an experienced coder, then it will be “synthesizable”. The concept of synthesizability is difficult to explain briefly and will be discussed in more detail in Chap. 9. The second input to synthesis is the library. The library of standard cells was discussed in detail in Sect. 8.3. Because libraries are vendor-specific, synthesis is also vendor-specific. Synthesis is the process of mapping the RTL into a shopping list of components drawn only from the library. RTL is written liberally within the restrictions of syntax and good coding practices. The library contains a finite number of possible standard cells. Synthesis asks how the liberally written RTL code can be implemented using only those elements that belong to the library.

306

Well-written RTL should be portable. Meaning that it should be written to be applicable to multiple implementation platforms. This increases the weight of the task that synthesis performs because it is the first bridge between an agnostically written code and the actual technology used to implement it. A standard cell library entry (Fig. 8.22) typically includes some form of logic characterization of the cell. The synthesizer tries to breakdown the HDL design into logic expressions that fit into these standard cells. Optimization algorithms are used to perform this efficiently. The synthesizer will often try to simplify logic expressions in the design file to implement them in as few standard cells as possible. But sometimes, the tool will also expand some expressions if this makes them fit more elegantly into available cells. Most libraries include lists of very basic cells that can be used to build larger circuits. However, modern synthesizers are also adept at detecting large arithmetic blocks such as N-bit adders and full multipliers. Libraries are also often designed with such large macro-blocks already handcrafted and optimized. When the synthesizer is done mapping the design to standard cells, it produces a list of the cells needed. This list is called the netlist and is the main output of synthesis. The netlist is an HDL file that (hopefully) implements the exact same function as the original (behavioral) RTL model. However, it is written using primitives that belong to the library only. Each library entry also includes delay information on the outputs of the cell. This allows the synthesizer to form an idea about the performance of the circuit. We know the cells used to implement the design. We also know the delay of each of the cells. The synthesizer can then sum the total combinational delays between each pair of registers. Thus, the synthesizer can calculate all path delays. And by finding the maximum path delay, the synthesizer can identify the critical path and its delay. The word “delay” above refers exclusively to gate delay. After synthesis, we are aware of what gates we will use to implement the design, but we are still clueless about where the cells will be placed and how they will be connected to each other. Thus, routing delay is missing from the synthesis timing report. Routing delay can be a huge component of total delay, especially in deep submicron (Chap. 13). However, the operating frequency that synthesis suggests is very useful. It represents a sort of optimistic projection. If things go really well in the rest of the design flow, this is how fast we can operate. If the designer is not satisfied with this, they have to go back to RTL modeling. We now know where the critical path is, and we roughly know where we need to break it to generate enough slack to improve performance. Once the pipeline is properly broken we can

8 Design Flow

resynthesize and confirm that performance indeed achieves our target. After synthesis, we can perform the post-synthesis simulation. This simulation inherits everything from behavioral simulation. Thus it is cycle-accurate and bit-accurate. However, it also adds delay information. Thus, an arbitrarily small clock period will not lead to a correct simulation output. In fact, any clock period lower than that from the timing report will produce erroneous results. The gold standard for the post-synthesis simulation can be either the results from the behavioral simulation if those have already been verified or the results of the fixed-point simulation. In some cases, the post-synthesis simulation fails to produce correct results even when the clock period from the timing report is respected, and even if the behavioral simulation is functioning properly. Such cases arise when there are initialization problems or if the RTL is written with bad coding practices. The main output of synthesis is the netlist. It is used by placement and routing to generate the final layout. But first, the netlist goes through a couple of optional steps that can sometimes help the rest of the design flow. Partitioning Partitioning and floorplanning are closely knit steps. Partitioning is sometimes seen as part of synthesis. In partitioning, we define large parts of the design as modules called partitions. A design is usually divided into at most a few partitions. Partitions are most useful when they represent a functionally connected unit. For example, in a microprocessor we might partition the design into an ALU, a register file, and a controller. In a radio chip we might partition the design into a transmitter and a receiver. The decision on what constitutes a partition is qualitative and requires designer involvement. Partitioning allows us to regroup the netlist into sub-netlists. This can be useful when combined with floorplanning because it often speeds up and simplifies placement and routing. However, partitioning is optional. One might choose not to partition a design because it is small, because the design does not immediately lend itself to partitioning, or because one thinks not partitioning will give better performance. An unpartitioned design is also called flat or flattened design. Floorplanning There are two inputs to floorplanning: the partitions of the design, and the dimensions of the target ASIC. Floorplanning then has to assign each partition to a specific portion of the ASIC real estate. This usually also involves defining the positions of specific pin pads.

8.7 Physical Design

Floorplanning is closely tied to partitioning. If the design is flattened, then floorplanning becomes moot. At this point, it is important to explicitly compare the advantages and disadvantages of flattened and partitioned designs. We will shortly observe how placement and routing are difficult, iterative, optimization process. Placement and routing are very time-consuming, and it often fails. When it fails, it is sometimes not clear if the failure is inevitable or if the tool or the designer could help nudge it along. In an intelligently partitioned and floorplanned design, we handplace units that communicate frequently closer together. This gives the design tool a good starting point, helping to nudge it towards a smart solution. This will significantly reduce the required optimization effort, helping placement and routing finish much quicker. A flat design, on the other hand, allows the placement and routing tool to approach the problem blindly. This often causes the process to take longer. But it sometimes allows the tool to find a better solution. Notice that floorplanning acts as an additional constraint on placement and routing. So should we or should we not partition. There is obviously no single right answer. Designer involvement, experience, and even trial and error are inevitable. In general, in medium to large scale designs, partitioning should at least be attempted. It is very rare for flat designs to find a solution with significantly more slack than intelligently partitioned ones. Place and Route (PAR) Placement and routing are inextricably related and are often abbreviated as PAR. The reason the two cannot be separated is that they are always done in an iterative optimization flow. The inputs to PAR are: • • • •

Partitioned or flat netlist from synthesis Library Floorplan (if any) Constraints

The placement tool translates the synthesis netlist into their corresponding layouts. These mini-layouts are obtained from the library. Now, all we need to do is to arrange these mini-layouts on the floorplan and connect them to realize the overall design. Arranging the layouts is the placement step. Connecting them is the routing step. However, we cannot haphazardly place the cells. We have to stick to the standard cell architecture shown in Fig. 8.52. According to Sect. 8.3, all cells have layouts with the same pitch. This allows them to be neatly arranged into rows connected through horizontal and vertical metal tracks.

307

Fig. 8.52 Standard cell implementation of ASIC

Let us consider a very simple (unconstrained) PAR task shown in Fig. 8.53. The figure shows 30 standard cells placed around a rectangular chip. Metal1 horizontal tracks are used to perform horizontal routing. Metal2 is used to perform vertical routing. Review Sect. 8.3 for more detail on routing strategies. The routing tool needs to perform the following routings: 1–16, 2–17, 3–28, and 4–30. Figure 8.53 shows the tool having performed all the routings except for the last task. There are no horizontal tracks available to route cell 4 to. All tracks are occupied by other cells. Additionally, we cannot route from cell 4 to cell 30 using Metal2 because the track from cell 28 is in the way. In a two metal process, the routing tool has to give up. However, the routing tool will not report a failure to the user just yet. Instead, it will indicate inability to route to the placement tool. The placement tool will try to replace the paths that failed. The routing tool will try again, checking that no new impossible routings were created. For example, if the placement tool simply exchanges cells 4 and 28, then it can resolve the situation completely. With cell 4 placed in the same row as cell 30, it can use the last row of horizontal tracks to route.

Fig. 8.53 Standard cell ASIC with routing contention. Cannot route cell 4 to 30

308

8 Design Flow

The above is an artificially easy problem. There is plenty of space on the chip. Additionally, the design is unconstrained. One of the main inputs to PAR is a file listing the constraints on the design. Constraints are additional restrictions placed on the tool. It now needs to finish routing while also satisfying these additional restrictions. We have already seen constraints in the form of chip dimensions while floorplanning. But other common constraints include: • Pin placement: The user requires inputs and outputs to the chip to be at certain locations. This will constrain the routing tool when it is trying to route inputs and outputs between pads and the core. This is often covered while floorplanning • Area: The user can ask the routing tool to fit everything into a maximum given area. And often they instruct the tool to fit the design into a given aspect ratio • Speed: The user will usually ask the routing tool to make the design work at a certain clock frequency without setup or hold-time violations For example, an area-constrained design will tell the PAR tool to fit the design in a given total area and a given aspect ratio. The tool will then try to obtain the best possible delay by placing and routing in the given area. A design constrained only by delay will achieve the required delay, then try to minimize the area on which the design is realized. In a realistic situation chip area and aspect ratio are often dictated by cost and the vendor. Delay is dictated by system specifications. So we often have to deal with a design that is constrained by both area and delay. In such a case the PAR tool will have to fit a design in a certain area and have such a circuit operating at a minimum given speed. Once routing has been done, we know the length of the metal tracks in all connections, and timing information can be obtained about interconnect delay. Since gate delay had already been obtained during synthesis, a nearly complete picture for timing can be obtained. Once PAR is done a simulation called post PAR simulation can be performed. This simulation includes said complete timing information and can give a much more realistic view of circuit performance than post-synthesis simulation. This is particularly true for highly scaled technologies where routing can be a major part of delay (Chap. 13). The timing information obtained from routing is also used within the PAR process to guide optimization within the constraints. Consider the situation shown in Fig. 8.54. Assume the design is constrained to a delay of 7.5 ns. Path 1 does not satisfy this constraint. The gate delays of the two cells in the path add up to 6 ns. However, the routing delay between them is 2 ns, adding up to a total path delay of 8 ns. Thus, although the routing tool found a way to connect all

Fig. 8.54 ASIC with failed post PAR delay. Cell delays are shown in the cells. Interconnect delays are shown below the pathname. Each path consists of two vertical Metal2 sections and one horizontal Metal1 section. Total path delays are in order: 8 ns, 7.5 ns, 7 ns, and 7 ns

Fig. 8.55 ASIC with resolved PAR delay failure. The bottom right cell is moved to the first row reducing the interconnect delay for path1 to 1 ns and the total delay for path 1 to 7 ns. Timing closure is achieved for the given constraints

cells, that way is still “wrong” because it violates one of the constraints. Note that synthesis results would not have indicated this problem, because they lack routing delay information. From the point of view of synthesis “delay” in path1 is only 6 ns. Figure 8.55 shows how the situation in Fig. 8.54 can be resolved. The routing tool will resort to the placement tool, which will move one of the two cells so that they are now much closer. This will reduce their routing delay to 1 ns, thus reducing the total path delay to 7 ns, thus leading to “closure”. Closure is a situation where all constraints are satisfied and a solution has been found. In a large design with millions of paths, the tool will obviously need to perform many such iterations before it finds closure. The user usually specifies a certain “optimization effort” to the tool. This indicates how hard the tool should work on optimizing delay, and how long (how many iterations) it should try before giving up.

8.7 Physical Design

309

If the PAR tool tries for too long and cannot find any solution that allows the required delay to be achieved within the given area, it will report a failure. This failure will include a list of setup-time violations. These are the paths that failed to fit within the delay constraint. These paths will cause setup-time violations on the receiving register if the constraint clock is used. To allow the designer to understand the extent of the problem, the amount of negative slacks in the offending paths is also reported. The designer normally starts PAR by giving constraints obtained from the synthesis area and delay estimates. If the PAR tool closes (finishes successfully) then the designer can either accept the result or try to constrain it further to see if they can achieve even better performance. If the tool does not achieve closure then the designer can either relax the constraints or revisit the pipeline design. These options are summarized in Table 8.18. Sometimes the PAR tool will also report hold-time violations. In Sects. 6.8 and 6.9 we extensively discussed why the tool might fail to resolve hold-time violations, and how in reality unresolved hold-time violations can be re-expressed as setup-time violations. Once PAR is done, the layout is ready. The layout is the result of placing the standard cell mini-layouts and routing them using metal tracks. This layout is now nearly ready for fabrication. However, a few more steps are necessary, top among them is parasitic extraction. Parasitic extraction allows us to perform the ultimate simulation. Parasitic extraction In the extraction step, the parasitic capacitances and resistances are extracted from the layout. Delay information during synthesis and PAR is obtained from typical values for Table 8.18 Results of PAR and corresponding designer action

the technology. The parasitics allow us to refine these figures by encoding specific information about the layout. For example, we might obtain more information about the impact of the resistance of source and drain active regions, or the effect of drawn well parasitic capacitance. Information obtained from parasitic extraction is encoded in a standard form, called SDF or standard delay format. This file is then fed-back or “back-annotated” to the PAR simulation where it significantly refines delay information. The resulting simulation is called post-extraction or post-layout simulation. It contains the maximum amount of timing information. It is the simulation that most closely resembles the finished chip. If this simulation passes, the finished chip is likely to work. If the simulation does not pass, then the designer must trace the violating path. The designer then has the option of going back to PAR where they might try to extract more slacks or relax the constraints. The designer can also trace back all the way up to HDL modeling where they can break the offending path, creating enough slack to absorb the parasitic delay. Once post-layout simulation passes the layout is considered “ready”. However, this readiness does not indicate the layout can be “taped-out”. Tape-out is the action of sending the mask descriptions to the fabrication facility. The etymology of the expression “tape-out” is controversial. A popular explanation traces the expression to the historical practice of mailing finished masks to fabs on magnetic (cassette) tape. However, the true origin is even more archaic. It traces back to the practice of making photomasks by manually attaching dark adhesive tape to a transparent mask.

Result of PAR

Interpretation

Designer action

No negative slacks, one or more zero slack

The tool just managed to close. It found a critical path

Accept the result and move to implementation. constrain design further and see if it closes. Increase optimization effort and see if it extracts more slack

Negative slacks

The tool did not achieve timing closure

Increase optimization effort (often does not work for large negatives) Revisit floorplanning Relax constraints Revisit HDL model and cut the paths with negative slacks

No negative slacks, no zero slack

The tool found plenty of slack

Accept the result and operate at the frequency suggested by the slacks Constrain the delay further and see if it gets a better performance Constrain area more and see if closure can still be achieved

310

The user-generated layout is not the file that the fabrication facility uses to extract the masks. Instead, an intermediary format must be created first. Figure 8.51 shows the output file of the design process as a GDSII file. GDSII is a popular, though not unique, format for communicating masks from designers to fabs. GDSII is an acronym for Graphical Database System 2. It is a text file format used exclusively to describe ASIC masks. The GDSII file is fairly simple. It includes standard headers and trailers. The body of the file is a set of numerical coordinates. The coordinates indicate the position of vertices for rectangles in different layers. The rectangles describe the drawn layout. Thus GDSII is the text analog of the graphical representation of layouts. In fact, very simple automatic tools can generate the GDSII file from the layout. Tools are also available to extract the layout from GDSII. We can even generate a 3D model for the finished chip from the GDSII file. DRC and LVS There are two error checking steps that should be performed before generating the GDSII file. Strictly speaking, these two steps are “optional”. However, there is no practical situation where a designer will tape-out without doing them both first. If these steps are skipped, we can almost guarantee that the designer will not receive working chips from the fabrication facility. The two error checking steps are • Design Rule Check (DRC in Sects. 8.4 and 8.5). • Layout versus schematic (LVS) Design rule checks are vendor-specific. The designer must pass the layout through a design rule checker. Any design rule violations must be fixed and removed. Many fabs will not accept design files that are not devoid of design rule violations, and if they do accept them they cannot guarantee the results. If a design passes DRC, then we can guarantee that the circuit drawn on the layout will be realized in Silicon with a minimum yield (Sect. 14.1). This does not mean that every chip we get back will be defect-free; however, a minimum percentage will be realized correctly. All simulations performed through the design flow verify that the netlist extracted by synthesis will work properly and will meet timing requirements. DRC confirms that features drawn on the layout will be realized in Silicon. However, at no step do we confirm that the circuit realized by the layout performs the same function as the netlist. LVS does this. LVS asks the question: Does the layout output match the schematic output? Or in simpler words,

8 Design Flow

does the layout perform the function we intend it to do. To do this the LVS tool performs the inverse of the design flow without assuming any knowledge about the function it performs: 1. LVS examines the layout. Devices are detected (e.g. every active-poly intersection is a MOSFET). Connections through metal or poly layers are recognized. The tool can accept either the layout or the GDSII file as an input, depending on the particular tool used. This extraction step transforms the layout into a set of transistors with connections. 2. The extracted MOSFET network is translated into a set of logic gates. 3. The logic gates are translated into logic functions and these logic functions are simplified. 4. A given netlist is used as the “schematic” versus which we compare the layout. This is usually the synthesis netlist. This netlist is translated into a logic expression. The logic expression is simplified. 5. The LVS tool compares the functionality of the extracted layout with the simplified “schematic” netlist. 6. If the two netlists have matched logic outputs then LVS is said to pass. If not, then the tool will generate a list of areas where it detects differences between the extracted and given netlists. A good LVS tool will try to translate these differences into layout terms or logic terms depending on the context. The faults typically found by LVS and how to correct them are listed in Table 8.19. At this point, we will examine the library again. We will look at the contents of a library entry and how this is used during the design flow. This will help us reiterate the design flow and how each step helps in verifying the design. Figure 8.56 shows a very simple library. The library contains only seven cells. For each cell, the library entry contains four elements: logic characterization, circuit model, layout, and parasitics (Sect. 8.3). The logic characterization in Fig. 8.56 is marked L1 through L7. Cells 1 through 3 as well as 5 through 7 are combinational cells. Cell 4 is marked as DFF instead, and it is a D flip flop. For combinational cells, the logic characterization will typically be a truth table. For the DFF, the logic characterization is a state transition table. The “circuit model” in the library is a model of the delay of the cell. These are marked as C1 through C7. For combinational cells, the circuit model contains information about propagation delays. This will typically include low to high and high to low delays for different input transition combinations (Sect. 3.9) and design corners (Sect. 6.9). For the DFF, CDFF includes values for setup, propagation, and hold-times.

8.7 Physical Design Table 8.19 LVS faults and required actions

311 Fault

Action by designer

Shorts. These are signal shorts in the layout that do not correspond to connections in the schematic

The shorts are typically found in a metal layer. Less often in poly. The designer can go in to examine how the short can be cut without violating DRC

Opens. Open circuited tracks in the layout that should correspond to signal connections in the schematic

The opens typically exist at the metal or poly layers. Fixing them involves designer intervention in the layout

Wrong components. The layout contains the wrong cell compared to the schematic

This normally requires going up to and constraining the synthesizer

Missing components. The schematic contains components missing from the layout

Examine the synthesizer and HDL for segments that might have been erroneously trimmed. To further understand trimming see Chap. 9

Fig. 8.56 Library components in use. The logic model is used by the synthesizer to break the HDL into a netlist. The rest of the library (physical library) is used to transform the netlist into a characterized layout. The circuit model gives information about gate delays, allowing the post-synthesis simulation to be performed. Layouts are used by PAR to form the all-chip layout. Parasitics are used in parasitic extraction to form the final delay model

The library also includes the layouts of each of the cells, marked as Y1 through Y7. All cell layouts have the same height. It also includes information on layout-related parasitics marked as P1 through P7. In Fig. 8.56, the design in HDL is fed to the synthesizer. The synthesizer breaks down the HDL into a netlist of cells from the library. To do this, the synthesizer simplifies logic from HDL and uses logic characterizations from the library to complete the mapping. The synthesizer produces a netlist containing 5 flip flops and four combinational cells. The synthesizer is also aware of how the cells must be connected. This allows it to deduce the paths in the circuit. There are specifically three paths: T1, T2, and T3. The synthesizer can then use circuit models from the library to extract the critical path. In Fig. 8.56, the delay in path 1 can be

calculated from the delays of the DFF and cell 1. Similarly, C7 and CDFF can be used to calculate the delay in path T2. Delay in path T3 can be calculated from CDFF, C3, and C5. By finding the maximum delay among T1, T2, and T3; the synthesizer can pinpoint the critical path and calculate its delay. The synthesizer also encodes all slacks, allowing us to perform a post-synthesis simulation containing all gate delays. The place and route tool will try to transform the netlist into a layout. Layouts for all the cells used in the netlist are available in the library as Y1 through Y7. This allows the PAR tool to form the overall layout from these mini-layouts. Standard cell layouts are of the same height, and can thus be arranged in neat rows. Metal layers are used to route the standard cells according to the strategies discussed in Sect. 8.3. The PAR tool can calculate total path delays

312

including routing delay. This allows the tool to determine if closure can be achieved under the given constraints. Exceptions to timing constraints are as important as the constraints themselves. They sometimes offer enough leeway for the PAR tool to achieve closure when it would otherwise be impossible. At other times they can even change the critical path of the circuit. Exceptions fall under one of two categories: False paths: These are paths between registers that will never actually be triggered. Or if they are triggered, then we would never be concerned about their values. This is the case for example for some registers which get written only once and can have as much time as needed to be written. For example, configuration registers. But it is also true of the intermediate results of synchronizers when passing between clock domains (Sect. 13.9). Such intermediate results are very likely to cause hold-time violations, but these violations should be ignored. Multi cycle paths: These are paths whose outputs are significant, however they do not take one cycle to be written. In other words, the inputs to the combinational logic will remain constant for more than one cycle, while the output will only be considered every few cycles. Timing for these paths can then be relaxed by as many times as there are cycles available.

If the PAR tool achieves closure, it will report this and calculate all slacks. A post PAR simulation can be performed. If closure is not found, setup and/or hold-time violations are reported, allowing the designer to trace back Fig. 8.57 Cost-performance of different platforms. Performance can be measured as energy, PDP, or MOPS/mW. Cost is often total cost in dollars. Flexibility is measured in terms of ease of modification or man-hours to finalize the design

8 Design Flow

and solve them. Once the layout is ready, parasitics can be calculated for the layout using the library of parasitic elements P1 through P7.

8.8

FPGAs

1. Compare hardware platforms in terms of cost, flexibility, and performance 2. Understand the general architecture and design paradigm of FPGAs 3. recognize the architecture of a simple FPGA logic block 4. Compare FPGA and ASIC design flows 5. Recognize the peculiarities of FPGA library entries 6. Understand routing in FPGA. FPGA stands for Field-Programmable Gate Arrays. FPGAs are the most successful class of circuits that belongs to a bigger category of array-like implementation platforms. These array-like architectures allow designers to implement their designs by reconfiguring or “programming” hardware in the field, thus the name. FPGAs are significantly less efficient than full-custom ASICs. However, modern FPGAs can approach the performance of standard cell ASICs. Figure 8.57 and Table 8.20 show the tradeoff between cost and ease of design on the one hand and performance measured as power, delay, or energy on the other hand. Full-custom ASICs are the most efficient, most costly, and most difficult to design. General-purpose microprocessors allow very easy and cheap programming, but they are extremely inefficient. It is clear that ease of design comes from the generality of the architecture. But generality also leads to inefficiency, thus the two have to go hand in hand. A general-purpose processor has to have an architecture that supports anything

8.8 FPGAs Table 8.20 Implementation platforms, granularity of control, and limitations on freedom of implemented design. Granularity means the level at which we can control what hardware is realized. In other words, how much direct control we have on the hardware

313 Platform

Granularity of control over hardware

Limitations on design freedom

Full custom

Full control. The designer determines all aspects of hardware including gate layout, gate location, and routing

None

Standard cell

Designer has control over which standard cells are used, how many, where they are placed, and how they are routed

Cells must be in rows. Can only choose layouts from the library. Routing through jumpers and specific layers

FPGA

Configuration of cells, their location, and routing

All cells are the same. Their architecture is fixed. Routing networks are fixed

Processors

None

Cannot control hardware except through a slow, general-purpose fetch-execute cycle

Fig. 8.58 Illustrative logic cell. Alternatively known as a logic block

from a technical simulation to a video game. Thus, when the processor is used for a certain application, the majority of the architecture is imposing a large but unavoidable overhead. FPGAs try to bridge the gap between the two extremes of ASIC and general purpose processor. Note that a standard cell ASIC looks very regular (see Sect. 8.3). It has rows of cells with the same height separated by identical routing channels. This regular architecture is what inspired FPGAs. In FPGAs the cells in the row not only have the same height but they are also identical! How can we use these rows of identical cells to implement any design? The answer is that each of these cells has some degree of programmability built into it. The programmability allows the cells to be reconfigured. This allows us to map any hardware design to the FPGA. This is best illustrated by first considering the architecture of these mysterious “cells”. Figure 8.58 shows an illustrative example of an FPGA cell. This cell is extremely simple. Realistic FPGA cells are a lot more complex and versatile. But this example is very useful in understanding how FPGAs work. The sample cell in Fig. 8.58 consists of a 4 input Look Up Table (LUT), a Full Adder (FA), and a DFF. It also has three multiplexers. The inputs to the cell are A, B, C, D, b, Cin, S1, S2, S3, and CLK. Its outputs are Cout and Dout. The LUT consists of a 16 location register file with an address decoder. This allows very versatile implementation

of any 4 input function. For example Table 8.21 shows four examples where the LUT is used to implement four different combinational functions. The inputs ABCD are used as the address of the LUT, causing the output to correspond to the location in the LUT. The FA can accept one of its inputs from the output of the LUT, but because this input is multiplexed it can also accept it directly from the external input C. Thus, the FA can either have outputs that are independent or dependent on the LUT. The DFF can accept an input that is the sum bit of the FA, or it can accept it directly from the external input D through MUX2. It can also accept the input directly as the output of the LUT if Cin and b to the FA are null and an input to the FA is accepted as the output of the LUT. The cell’s two output can thus be used in many configurations. For example, Out can be a registered version of D. It can be a registered version of the function in the LUT. It can be a registered version of the output of the FA, which can, in turn, be dependent or independent of the output of the LUT. The output Out can also be the unregistered version of all the above. What determines the logic function observed at Out? The answer is obviously the MUX select lines S1, S2, and S3. Thus, if the designer specifies these bits, they can reconfigure the block. Figure 8.59 shows the design flow for FPGA. It is very similar to the standard ASIC design flow. This is an

314 Table 8.21 Four input LUT used to implement four different functions. Note the LUT cannot implement all four at once. Each column other than the first represents values stored in the LUT. The ABCD column will be fed to the LUT address

8 Design Flow ABCD

Four input AND

Four input OR

AB þ CD

Four input XOR

‘0000’

0

0

0

0

‘0001’

0

1

0

1

‘0010’

0

1

0

1

‘0011’

0

1

1

0

‘0100’

0

1

0

1

‘0101’

0

1

0

0

‘0110’

0

1

0

0

‘0111’

0

1

1

1

‘1000’

0

1

0

1

‘1001’

0

1

0

0

‘1010’

0

1

0

0

‘1011’

0

1

1

1

‘1100’

0

1

1

0

‘1101’

0

1

1

1

‘1110’

0

1

1

1

‘1111’

1

1

1

0

Fig. 8.59 Design flow for FPGA

indication of their similar architectures, design philosophies, and performance. The first few steps in the design flow are, in fact, identical to ASIC design.

The same process is used to transform the design into a fixed-point model (Sect. 8.6). The HDL behavioral model is also nearly identical to that used for ASIC if good coding

8.8 FPGAs

Fig. 8.60 FPGA library entry. Left, the structure of the entry. Right, an entry for an unregistered FA

practices are used (Sect. 9.22). In fact, the behavioral simulation should produce identical results regardless of the target implementation platform. Synthesis for FPGA is similar but not identical to synthesis in ASIC. Figure 8.60 shows a (very) simplified library entry for FPGA. Like ASIC processes, different FPGA vendors have different libraries. These libraries mainly contain different ways in which cells can be programmed to realize different logic functions. FPGA libraries are at once similar but very distinct from ASIC libraries. The Library entry includes the logic model for the component, its gate delay model, and the programming bits necessary to configure a logic cell into that specific library entry. The first two components of the library entry are very similar to the ASIC library. But the third component is different. It simply describes the MUX select lines necessary to configure the cell. This is analogous to the mini-layout in the ASIC library entry. The right of Fig. 8.60 illustrates this for a library entry for an unregistered full adder. It includes its truth table. Because many of the cell entries are unused, they appear as don’t cares in the truth table. The delay model will also include some delay figures and a lot of don’t cares for unused transitions. The configuration part lists MUX selects that would bypass the DFF and LUT in Fig. 8.58 to create an unregistered FA. The synthesizer maps the HDL model into a netlist derived from the library, showing the required number of cells and how these cells should be programmed. It will try to shoehorn the logic dictated by the HDL into the cells made available by the library. When the netlist is determined, the synthesizer can gather all the programming information for all the netlist cells into a larger file. This larger file will form the nucleus of the FPGA programming file. The synthesizer can also obtain timing information about the gates used in the netlist. It can use these to calculate the delays in the circuit. This allows the synthesizer to estimate the critical path delay and define the logic path that corresponds to this critical path. The timing information is saved in a timing file, allowing post-synthesis simulation. The synthesizer will produce five outputs:

315

• The netlist containing the cells used with their configurations • The programming model for the cells used in the netlist • A gate delay model that can be used in post-synthesis simulations • A timing report showing different delays in the circuit, with particular focus on the critical path. This will include information about gate delay only • A utilization report showing the number of cells used to implement the design Figure 8.61 shows how routing is performed in an FPGA. The cells are arranged into larger blocks called islands. Communication between cells within the same island is through local switching networks. If communication is necessary between different islands, then external routing channels have to be used. These channels run vertically and horizontally between the islands. At intersections, global switch networks are used to connect a certain row to a certain column. All the connections discussed above are programmable. Transmission gates at intersections can connect vertical and horizontal lines in a programmable manner in global switches. Local switch networks are composed internally of multiplexers which can also be programmed. The Placement and Routing tool for the FPGA will choose to place specific cells from the synthesis netlist in specific locations in the FPGA. It will then route these cells together using local and global connections. FPGA PAR can be constrained the same way ASIC PAR is (Sect. 8.7). However, FPGA PAR tools have a much harder time achieving a possible set of connections that realize the circuit functionality, let alone achieve a certain operating frequency. Additionally, once you use an FPGA, you might as well use all its available resources, thus area constraints are usually not as important for FPGA as they are for ASIC. Instead, PAR constraints are usually limited to clock specifications and pin placement. The PAR tool will produce five outputs • Programming model for routing networks • Assign programming model obtained from synthesis to specific cells in the FPGA • Delay model that includes estimates for routing delay • Timing report showing critical path re-estimate with routing • Utilization report showing more accurate cell and routing utilization The PAR tool takes the programming file produced by synthesis and adds all the programming bits for local and global switches. Thus, when PAR is finished, the final programming file is ready. This file can then be used to program the FPGA.

316

8 Design Flow

Fig. 8.61 FPGA routing

This binary programming file is normally held in an EEPROM outside the FPGA. Upon powering up, the FPGA will read the program and will configure its cells and interconnects according to the programming file. Thus, realizing the designer model in hardware. The PAR area and timing reports are improved versions of the post-synthesis reports. The PAR timing model contains information about routing delay, and thus allows very accurate post PAR simulations to be carried out. The PAR utilization report can show higher utilization than post-synthesis due to additional resources used for routing. Why are FPGAs less efficient than ASICs. The answer is that it is nearly impossible to fully utilize those cells that are counted as “used”. For example, some cells will be used only for their FAs while others will be used for their LUTs, and yet others will be used for their DFF. In each case the rest of the cell cannot be used for another element of the netlist and is thus wasted. However, there are two trends that mend this significant limitation of FPGAs: Table 8.22 Availability of resources on a typical FPGA

• The design of cells and libraries allows for much more efficient mapping. More intelligently designed cells allow more fractions of the cell to be used. Bigger and better libraries allow multiple statements from the HDL model to be mapped to one cell • The most troubling blocks to map are macro-blocks. Consider for example a 16-bit full multiplier. The multiplier will require 240 full adders (Chap. 11). This will require 240 cells of which only the FA is used. Another huge resource drain is large shift registers and memories. These will utilize only the DFF and waste the rest of the cell. Thus, modern FPGAs include a number of specialty cells that can be used to implement these large blocks more efficiently. There are usually a limited number of such specialty cells compared to general-purpose cells. But their impact on performance can be significant. Table 8.22 discusses some of the “specialty” cells available in modern FPGAs.

Resource

Availability

Use

Logic cells

Thousands to tens of thousands

These are the general-purpose cells discussed in this chapter. They are used to implement most random combinational and sequential logic

Multiplier cells

Few dozen to low hundreds

Implementing large full multipliers. A single multiplier cell has inputs in the range of 10–20 bits but can be combined to form larger multipliers

Memory block

Few dozen to low hundreds

To form memories or large registers and FIFOs. Each block has one or two read/write ports. Most often each block has a few hundred Kbits. Can easily be combined into larger memories

I/O pins

Few dozen to hundreds

To allow external I/Os to be driven into the FPGA core. This can be a limitation for large designs

Clock buffers

Few to low teens

To drive clocks. Each independent clock needs a buffer. Some FPGAs also have built in DLLs to derive related clocks

9

HDL

9.1

Design Philosophy

1. Distinguish the concurrent nature of VHDL 2. Recognize the danger of considering VHDL, a programming language 3. Understand a properly designed wrapper 4. Understand the concept of synthesizability 5. Categorize VHDL code as synthesizable and unsynthesizable. In this chapter, we will use VHDL as an example of HDLs. Other HDLs have similar design strategies, precautions, and pitfalls. Thus the conclusions of this chapter can be applied to a wider set of languages. HDL stands for Hardware Description Language. The name is a very important indicator of how these languages should be written. They are simply descriptions of the hardware. The description can be of the behavior of the hardware, or it can be a declaration of components and their connections. Usually, a design mixes both: descriptions of behavior and descriptions of component connections. HDLs are also sometimes called RTL. RTL stands for Register Transfer Language. This is an indication of a very important facet of HDLs. Namely, that they often deal with how data is handled as it is transferred between registers. VHDL has a lot of programming-like syntax. This is for two reasons: • VHDL syntax is designed to have natural language-like properties. This is also the philosophy behind a lot of programming languages. In other words, VHDL did not evolve from programming languages, but it resembles them because both evolved under similar constraints. As a result, a lot of designers confuse more familiar programming strategies for acceptable practices in HDL • VHDL contains a lot of statements that are only used in testing the finished design in simulation. Since these statements will never have to be synthesized (Sect. 8.7), they are written very much like programming languages

The distinction between VHDL and programming is best expressed by a snippet of code. Consider the codes in Snippets 9.1 and 9.2. The two codes look very similar. This similarity is misleading. Figure 9.1 shows how each snippet is interpreted. For the programming Snippet in 9.2, the sequence of operations is the well-known variable swap. As shown in right sub-figure of Fig. 9.1, the result of the code will be the exchange of the values of variables A and B. This happens following the sequence: The value of variable A is stored in variable T. The value of variable B is stored in A, overwriting it. The value of variable B takes the value that was stored in variable T, which is the original value of variable A. The sequence in Snippet 9.1, on the other hand, describes hardware connections. Thus it describes three nodes and how they are connected. As shown in the left sub-figure of Fig. 9.1 node A is connected to node T. Node B is connected to node A. And node T is connected to node B, which is redundant since all three nodes are shorted after the second statement. Why do the two snippets lead to such different results? The difference is that the VHDL snippet is interpreted as a parallel or concurrent implementation. This comes from the fact that it describes hardware connections. When describing such connections, the sequence in which they are described is unimportant. For example in Fig. 9.2, the three components I, II, and III are connected according to a port connection lists. We can describe which ports are connected to which in whatever order and we would be describing the exact same connections. Thus we must think of the order of statements as completely irrelevant and consider the functionality to happen in parallel. Snippet 9.2, on the other hand, describes a sequence of operations. Exchanging any two statements in the snippet will yield a completely different result. The functionality realized is dependent on the sequence in which the statements are written.

© Springer Nature Switzerland AG 2020 K. Abbas, Handbook of Digital CMOS Technology, Circuits, and Systems, https://doi.org/10.1007/978-3-030-37195-1_9

317

318

9 HDL

Snippet. 9.1 VHDL connections that yield a short circuit between three nodes

Fig. 9.1 VHDL and programming interpretation. To the left, concurrent assignment is a description of node connections. To the right, programming is a sequence of operations

Fig. 9.2 These 2-port connection lists will lead to the exact same circuit despite the order being different in each

There are some cases where we do need hardware to implement sequences. The clearest example is when we need to implement sequential logic. An exception has to be made to the concurrent nature of VHDL in these cases. Section 9.9 introduces how this can be done.

VHDL is also a deeply hierarchical language. Circuits are declared and described. They can then be used as components to build higher level circuits. These higher level circuits can then be used as components at an even higher level. As shown in Fig. 9.3 this can be done to as many levels as the designer needs. At the highest level of the design there should be one final wrapper to combine all components together into one final circuit. A wrapper is shown in Fig. 9.4. A proper wrapper should only contain connections between large subsystems and connections to input/output ports. Any additional logic in the wrapper is called “glue logic”. This is normally interface logic, and is undesirable. Glue logic can either be combined into their own component(s), or they can be rolled into one of the existing blocks. VHDL code blocks can be divided into two categories: • Synthesizable. These are code blocks that will pass successfully through the synthesis and place and route tools (Sect. 8.7). Ideally, they should also produce predictable results from synthesis. All code up to and including the wrapper must be synthesizable. The definition of synthesizable code exceeds simply passing through design tools. Some code can pass synthesis without producing errors or even warnings. However, the resulting hardware is unpredictable and its performance is unreliable. Such code should not strictly be considered synthesizable • Non-synthesizable: These are statements that cannot successfully pass synthesis or that would produce unpredictable results. These statements may be used, however, they should only be used in the testbench (Sects. 9.17 through 9.20).

Snippet. 9.2 A programming sequence used to exchange the contents of two variables

9.2 The Entity

Fig. 9.3 Hierarchical design. Component types are labeled by lowercase letters. Instances of the same component type are distinguished by numerics after the colon. Section 9.6 further expands on the difference between entities and instances

9.2

The Entity

1. Examine the syntax of an entity declaration 2. Understand entities do not describe component construction 3. Understand entities do not describe component instances 4. Recognize that an entity only describes the I/Os and name of a component 5. Distinguish keywords from non-keywords in VHDL syntax. Snippet 9.3 shows an entity declaration. The entity is the basic building block of VHDL. In all code snippets keywords are written in red. Some of the black text is chosen by the designer as labels. Some other parts of the black text are obtained from the libraries (Sect. 9.3). The entity is a description of the inputs and outputs of a component, without any concern to what lies within it. Thus the inside of the entity remains a black box at this point.

Snippet. 9.3 Entity declaration

319

Figure 9.5 shows how the entity in Snippet 9.3 is interpreted. The entity declaration must include a valid name for the entity. This name, sixteen_bit_adder in this case, is associated with this component as shown in Fig. 9.5. The entity does not concern itself with instances of the use of the component, nor with how it is implemented. Thus Snippet 9.3 says there is component called sixteen_bit_adder. However, it does not consider how many such adders are used, where they are used, what they do, how they are constructed on the inside, or how they are connected to the outside. The second compulsory part of an entity declaration is the port statement. The port statement declares all input and output ports of the entity and their wordlength in bits. In this example, we know there are two input ports. Their names are input_a and input_b. We know they are inputs because of the use of the keyword “in”. Each of these ports has bits indexed from 15 to 0. Thus, each of the ports is 16 bits long. We can also declare their widths in the form (0 up to 15), but declaring busses using the keyword “downto” is more common. There is also one output port named add_output that is 17 bits long. We use the keyword “out” to indicate its nature. VHDL also allows the declaration of ports that are input/output through the keyword “inout”. However, these ports require very special treatment. We will consider what “std_logic_vector” means in Sect. 9.3. However, all other parts of the entity declaration should be clear by now. How Snippet 9.3 corresponds to Fig. 9.5 should also be clear.

9.3

IEEE Library and std_logic

1. Understand the definition of a library 2. Recognize the work and IEEE libraries as essential to all designs

320

9 HDL

Fig. 9.4 Left, the wrapper combines three subsystems into the final design. Random and small logic is left over, termed glue logic. This is not a desirable design strategy. Right, the design is rearranged. If sensible, glue logic is rolled into one of the existing subsystems. Glue logic that does not belong anywhere is included in its own additional subsystem

Fig. 9.5 Sixteen bit adder black box

3. 4. 5. 6.

List the most important packages in the IEEE library Understand the bit and bit_vector types Understand the std_logic and std_logic_vector types Realize when values of the std_logic type can be seen in simulation.

Every VHDL design must use at least one library. A library is a place where all important parts of the design are collected. This includes commonly used packages (see Sect. 9.21). These packages will contain common functions (Sect. 9.18) and data types (Sect. 9.4) that the design uses. It

will also contain components (Sect. 9.6) that will be used in the hierarchical design. The most important library in any design is the “work” library. This is the library that contains entities (Sect. 9.2) that are defined by the user. This library allows different entities to use other entities as building blocks (components) as long as they all reside in the same path. The work library is used without the need for explicit declaration. The second most important library is the IEEE VHDL library. This is a library designed by IEEE to support some commonly used functions and types. Without the IEEE library, writing VHDL would be very hard. The library allows a wide variety of operators to be used over a wide variety of data types. The IEEE library is declared using the syntax in Snippet 9.4. It should be the first thing a designer writes in a VHDL file. Each library consists of a number of packages (Sect. 9.21). Packages can contain custom-defined types, functions, and components. Once a library has been declared, the designer has to define which packages or components within the library they intend to use. The syntax of the second line in Snippet 9.4 shows how this can be done. The general form of this line is: use library_name.package_name.pacakage_component. In Snippet 9.4 the std_logic_1164 package from library IEEE is declared as being used. The last part of the line indicates that “all” parts of the package will be used.

9.3 IEEE Library and std_logic

321

Snippet. 9.4 The IEEE library is declared in the first line. A specific package from the library is used in the second line

The following are commonly used packages from the IEEE library: • • • • •

std_logic_1164 std_logic_arith std_logic_unsigned std_logic_signed std_logic_textio.

The last package std_logic_textio will be discussed in detail in Sect. 9.20. The rest of the packages will be discussed in this section. In VHDL, the main carrier of data is called a signal. A signal can usually be approximated as a node in the digital circuit. The description of a circuit usually involves connecting these signals through different components. Each signal must have a “type”. Types describe the set of values that a signal can take. The default data type for signals in VHDL is the bit type. Snippet 9.5 shows how a signal of type bit can be declared. Type “bit” indicates that a signal can take one of two possible values “1” or “0”. A one-dimensional vector of bits is declared using the bit_vector type as shown in the second line in Snippet 9.5. Vectors are very useful since they allow the designer to declare busses with a common name. The size of the bus is indicated by its indices through “(3 downto 0)”. Thus data_bus in Snippet 9.5 is 4 bits long. Specific bits in the vector can be referenced by indices, for example data_bus(2) refers to the third bit in the bus. Ranges of busses can also be referenced, for example data_bus(1 downto 0) is a 2-bit sub-signal of data_bus. The bit type is sufficient to represent a signal whose value can only be either “0” or “1”. However, typical digital nodes are also electrical nodes. They can take values that lie somewhere between “0” and “1” depending on circuit conditions. Some simulators have the capacity to simulate more than just “0” and “1” values. Most designers need to see more than just “0” and “1”. Thus, a data type is needed which allows signals to be more than just these two values.

Snippet. 9.5 Bit and bit_vector signal declarations

The std_logic_1164 is an IEEE package that declares the std_logic data type. This package should be used in every entity. All digital signals should be of the std_logic or std_logic_vector type instead of the bit or bit_vector types. The standard logic type expands the possible values that a bit can take to those listed in Table 9.1. It includes “0” and “1” as well as seven other values that can indicate critical information in simulation. “U”, “X”, and “Z” are particularly important. When using std_logic signals, it is easy to forget that they can be more than just “1” or “0”. After all in “normal” or “correct” operation, one and zero are the values we would typically see. However, not taking into account the possibility of other values can lead to unwanted synthesis results. Section 9.12 discusses this phenomenon in detail. The three packages std_logic_arith, std_logic_signed, and std_logic_unsigned are related. Typically, the designer should declare std_logic_arith combined with one of the two other packages. Std_logic_arith opens up two sets of tools: • Arithmetic and comparison operators. These are discussed in more detail in Sect. 9.4 • Conversion functions. These functions allow the designer to convert from one data type to another data type. The arith package alone is under defined and can lead to simulation and synthesis issues. It defines arithmetic operations on integer signals of the std_logic_vector type. Thus, the statement A+B where both A and B are std_logic_vectors of the same size would be interpreted as an N-bit adder. However, sometimes it is important to know whether the numbers are signed (two’s complement) or unsigned, before the arithmetic operation can be interpreted properly. Declaring the use of std_logic_signed or std_logic_unsigned will redefine arithmetic operators for either signed or unsigned numbers. Note that the signed and unsigned packages are mutually exclusive. Declaring the use of one implies that all registers are either signed or unsigned for arithmetic purposes. The

322

9 HDL

Table 9.1 Possible values for a std_logic single-bit signal

Value

Meaning

When used

‘0’

Strong zero

A “normal” zero, well driven by an unopposed low impedance to ground

‘1’

Strong one

A “normal” one, well driven by an unopposed low impedance to supply. “0” and “1” together should be the most commonly seen values in all simulations

‘X’

Cannot resolve

The node’s value cannot be determined. Typically, the node is contentioned. This means two strong drivers are accessing the same node. This is usually a failure of the circuit and many synthesizers will refuse to let such a description pass in the first place

‘U’

Undefined

The value of the signal has not been driven to any value. This is typical when the simulation has just been started and the node in uninitialized

‘Z’

High impedance

The node is in high impedance mode. Commonly seen in tri-state buffers

‘-’

Don’t care

The user indicates they do not care which value the signal takes

‘H’

Weak one

There is a logic one but it is not properly driven. This is typical in wired OR. The “1” is not at supply but is well above logic threshold

‘L’

Weak zero

There is a logic zero but it is not properly driven, typical in wired AND. The “0” is not at ground, but is well below the logic threshold

‘W’

Weak undefined

The node is weakly driven, but its value can neither be resolved as one or zero, it is close to the metastable range

declaration of the other package becomes blocked. If signed and unsigned numbers must be mixed in operations, then conversion functions have to be used (Sect. 9.18).

9.4

Types, Attributes, and Operators

1. 2. 3. 4. 5.

Understand the basic data types in VHDL Declare and use custom types and subtypes List the most common operators in VHDL Define an attribute Recognize the most common predefined attributes in VHDL 6. Understand how user-defined attributes are declared and when they are used. In addition to bit and std_logic discussed in Sect. 9.3, VHDL allows signals to have the following types: • Integer: This is typically reserved for variables • Real: This requires the use of the package math_real. These should only be used in the testbench since they are not synthesizable. In some cases, real types can be used

to evaluate generics and constants in synthesizable constructs, but this requires care • Boolean: This is similar to the “bit” type except that the 1 and 0 represent true and false rather than logic values. This can be used to evaluate conditional statements. The Boolean type does not do anything that the std_logic type cannot do. In conditional statements, std_logic types are automatically converted to Boolean. Thus, explicit declaration of Boolean types is rare. More interestingly, VHDL allows the designer to define her own data types. These user-defined data types can be very useful. Snippet 9.6 shows two very important user-defined types: The enumeration and the array. The first line in Snippet 9.6 is an enumeration. An enumeration declares a list of labels as possible values in the data type. Signals declared with the type my_list can have one of the values within the braces. Note that the values in the enumerated list are only labels, and thus do not have to be logically connected to each other. Enumerations are typically used in state machines, this is discussed in detail in Sect. 9.16. When not used with state machines, special consideration has to be given to how the enumeration list

Snippet. 9.6 Enumeration type (first line) and array type (second and third lines)

9.4 Types, Attributes, and Operators

323

will be translated in hardware. If left unconstrained, the synthesizer will translate enumerations into std_logic_vectors of appropriate size. The second line in Snippet 9.6 shows the type statement used to declare an array. An array is a vector, each entry of which can be anything, including another vector. Thus the second line declares an array of length array_length. Each entry in the array is a std_logic register of width entry_size. Arrays are typically used in shift registers (Sect. 9.13) and memories (Sect. 9.14). VHDL also allows users to define subtypes. Subtypes are subsets of bigger types that are more restrictive. Snippet 9.7 shows subtypes in use. The first line declares a new subtype called short_vector. This is a std_logic_vector whose length is fixed at 4. The second line declares a subtype which is an integer that can only lie in the range 0–255. Signals declared using subtypes can be used in operations with their parent type without needing conversions. Thus a signal of short_vector type can be readily combined with a signal of the std_logic_vector type. Using user-defined types and subtypes might initially look confusing. However, it is considered good practice and is recommended. Synthesizers can deal with enumerations, arrays, and subtypes very efficiently, and their judicious use can make the code more readable. VHDL has a large set of operators that can be used on signals, constants, and variables. Most of these operators are self-explanatory and are similar to familiar operators in programming and simulation languages. However, one has to be careful when using these operators in VHDL. Above all, the designer has to be cognizant of what hardware the operator will entail. For example, while a programmer can use a multiplication operation without much afterthought, a hardware designer must be aware of its impact on area, power, and delay (Chap. 11). Table 9.2 lists five arithmetic operators. Both + and − count as two operators each depending on whether they accept one or two operands. These operators act on bit_vector or std_logic_vector types by default. Once the std_logic_arith package is defined, its definition of functions for these operators supersedes and the operators are defined for std_logic. When combined with the std_logic_signed or std_logic_unsigned package, these operators are then restricted to either signed or unsigned numbers. Arithmetic operators can always be used with integer operands regardless of the packages used. Arithmetic operators return values of the same type as the operands. Table 9.2 Arithmetic operators in the std_logic_arith package. Operands are marked as op1 and op2. The “+” and “−” operators can accept two or one operands

Operator

Table 9.3 lists comparison operators. These operators are redefined by the std_logic_arith package and the signed and unsigned packages in the same way the operators in Table 9.2. They can also act on integers. The only condition is that the operands of the comparison operators have to be of the same type. All comparison operators return Boolean results. They are most often used in conditional statements. Table 9.4 defines two shifting operators which are also redefined by the std_logic_arith package. The shift function cannot accept an integer operand. It accepts a second argument which is the number of bit positions to shift. Shift operators can lead to unexpected results when used with std_logic types. Table 9.5 lists the remaining VHDL operators. These operators can usually act on any data type. Most also accept two operands. Operator priority follows the same patterns as in programming and arithmetic, and can be preempted using parentheses. The concatenation operator “&” is particularly versatile and important. It is used to concatenate two strings together to form one longer string. When applied to std_logic_vector operands, it allows busses or parts of busses to be combined into another bus. See Sect. 9.13 for an example on how this can be used to write very efficient shift register code. The shift operators sll, slr, sra, sla, ror, and rol can be confusing both to the designer and to the synthesizer. A shift by a constant number of bit positions is best applied using the concatenation operator. Particular attention has to be given when using arithmetic operators. These operations will be translated by the synthesizer into hefty arithmetic circuits. Thus all arithmetic has to be used very carefully and within limits. As a general guideline • Addition can be used liberally • Multiplication should be used carefully • Exponentiation, division, quotient, and remainder should be avoided unless they are only used to evaluate constants. Attributes are tools that allow the designer to extract additional information about a named item in a design. They can be applied to types, signals, files, variables, functions, and components. Attributes are either predefined or user defined. Although the two categories are very similar in their syntax and definition, their use is substantially different.

Number of operands

Meaning

op1 + op2 + op1

2 or 1

Addition or unary positive

op1 − op2 − op1

2 or 1

Subtraction or unary negation

op1 * op2

2

Multiplication

324 Table 9.3 Comparison operators defined by std_logic_arith package. Operands are marked as op1 and op2

Table 9.4 Shift operators defined by std_logic_arith

Table 9.5 Miscellaneous VHDL operators. Operands are marked as op1 and op2

9 HDL Operator

Number of operands

Meaning

op1 > op2

2

Greater than

op1 < op2

2

Less than

op1 >= op2

2

Greater than or equal

op1 . The must be a real signal at the current level of the hierarchy, so it must either be a port of the current entity, or one of its internal signals. The port mappings shown in Snippet 9.14

330

9 HDL

Snippet. 9.11 The circuit seventeen_bit_adder

Snippet. 9.12 The entity declaration for layered_adder

achieve the port connections shown in Fig. 9.7. The order in which the ports are listed is unimportant. Snippet 9.15 shows an alternative method for port mapping. In this method, only the signal names are indicated. The port names are not explicitly stated. In this case the order of assignment is very critical. The ports are assigned in their order in the entity (and component) declaration. Thus in Snippet 9.15 the signals external_inputa, external_inputb, internal_inputa are assigned, respectively, to the ports in the component declaration for sixteen_bit_adder in Snippet

9.13. This method of port mapping is highly unrecommended. It yields less readable code, it assumes the designer must know the order of ports for all entities, and it opens up the chance for port misassignment. All ports must be mapped regardless of the port mapping method used. Even if one of the outputs of a component is to be left unconnected, or one of its inputs is trivial, we must explicitly map said ports. Snippet 9.16 shows how special port mappings can be performed. This can be used to explicitly map unread output

9.6 Structural Connections

331

Snippet. 9.13 Declaring components for use in structural architecture

ports or don’t care input ports. These connections are shown in Fig. 9.8. Each port offers an interesting use of syntax and/or functions. input_a in adder1 in Snippet 9.16 is mapped to all zeros. This can be done in two ways. First, the port can be explicitly mapped to a zeros vector of the appropriate size. In such a case the mapping would be in the form: input_a => “0000000000000000”. This is tedious and confusing. It requires the designer to be aware of the bus size, to write the correct number of zeros, and to give up any attempt at making the design scalable. The last point will become much

clearer in Sect. 9.7, where the use of generics and constants will allow us to write designs free from explicit sizes. The other way to fill a bus with zeros is shown in Snippet 9.16. The syntax a => (others => ‘0’) indicates all unconnected bits in the port are connected to zeros. Since none of the bits of the port are connected, this would lead to all bits being connected to ‘0’. input_b in adder1 is formed by the concatenation of two busses. One is a 9-bit section of the bus external_inputb. The other is an all “1”s vector. Using the syntax (others => ‘1’) allows the designer to avoid calculating the difference

332

Snippet. 9.14 Component instantiations

Fig. 9.7 Layered adder. The circuit is internally built using three instances of two types of components

9 HDL

9.6 Structural Connections

333

Fig. 9.8 Special port connections in Snippet 9.16

Snippet. 9.15 Alternative, highly unrecommended, port mapping

Snippet. 9.16 Special port mappings

between the size of the port input_b and the signal external_inputb(8 downto 0). This makes the design easier to write, read, and modify. Output add_output in adder1 is left open. This is used in this example only to illustrate that even if a port is left open, it has to be explicitly connected to an open. “Open” is a keyword in VHDL. Note that when all outputs of an instance are left open, that is an indication such an instance will not be used. This allows the synthesizer to trim out such a circuit from the netlist. The entity adder1 in Snippet 9.16 will be completely removed by the synthesizer since its only output is not used.

Declaring open ports is sometimes necessary, and very useful in designs with multiple output ports. For example, in Fig. 9.9 the design has three outputs. Internally, these three outputs flow through independent paths. If output3 is left open when instantiating, then all logic upon which output3 and only output3 is dependent can be removed by the synthesizer. This is marked in gray in Fig. 9.9 as a “trimmed circuit”. In general, any path upon which the overall output of the highest level of the design is not dependent can be trimmed. Thus in Fig. 9.10 the highest level output is independent of several logic components at several levels in the design,

334

Fig. 9.9 Component instance with multiple outputs and one output left open

Fig. 9.10 Multilevel components and trimming. Paths upon which unconnected outputs only depend can be removed

these components can be trimmed. Trimming is a major, and easy way for synthesizers to reduce resource usage. Thus, leaving output ports open, if they are unused, will allow the

Snippet. 9.17 For-generate statement used to instantiate circuits

9 HDL

design to be minimized without the designer having to manually modify the architecture. Similarly, connecting “don’t care” inputs to constant strings (typically all ‘0’ or all ‘1’) will cause the outputs of some blocks to become trivial and thus allows the synthesizer to significantly minimize the design. This is clearly illustrated by adder2 in Snippet 9.16. Port input_a in adder2 in Snippet 9.16 is connected to the output of a function called conv_std_logic_vector. For more information on functions and their proper use see Sect. 9.18. This function accepts two integers as an input. The first integer is a number, the second integer is a bus size. The output of the function is a std_logic_vector whose size is the second argument of the function and whose contents are the decimal equivalent of the first argument. The use of conv_std_logic_vector allows designers to write very flexible binary constants because it allows both the size and the content of the vector to be expressed in terms of integer expressions. input_a in adder2 in Snippet 9.16 would be assigned the std_logic vector “0000 0000 0000 1001”. Adder2 in Snippet 9.16 has two constant inputs, one is 9 and the other is all “1”s. Thus its output will also be a constant equal to the summation of these two constants. The synthesizer can then replace the adder with a constant stored in signal internal_inputb. This is another method by which synthesizers simplify logic. Similar to trimming, the synthesizer may replace parts or all of a design with constants depending on the inputs. When declaring a very large number of instances of the same entity, the for-generate syntax in Snippet 9.17 is particularly useful. This is especially the case when inputs to and outputs from the instances have some regular pattern. The for-generate block starts with a compulsory label. The label is necessary to allow the instances created to be uniquely identified. The index of the generate loop,

9.6 Structural Connections

generation_index in Snippet 9.17 indicates how many times the instance is to be declared. The generate loop must end with the keywords “end generate”. The body of the generate loop includes the instantiation to be repeated. The instance may or may not be labeled. The inputs and outputs in the port map can be a function of the generate_index. Entities can have multiple architectures. As discussed in Sect. 9.5, the compiler/synthesizer will always assume the last written architecture is the default one to use. However, as Snippet 9.18 shows, we can bind different instances to different architectures at instantiation time. The instantiations in Snippet 9.18 differ only in the first line which has the general format: label: entity library.entity_name(architcture_choice). Library is the name of the library where the component being instantiated resides. In the overwhelming majority of cases, this is the work library (Sect. 9.3). The entity_name(architecture_name) clause binds the specific entity to one of its architectures. Configurations allow a more systematic approach to architecture binding. Configurations are discussed in Sect. 9.21. At this point, it is worth examining the difference between a design, entity, architecture, component, and instance. These terms are closely related but are fundamentally different, so they sometimes lead to confusion. An entity is a way to declare the existence of a design. We indicate the name of the design as well as all the details of its ports. We do not discuss how the design is made, where it is used, to what it is connected, nor how many times it is used. An architecture is a way to describe how an entity is made. It explicitly states the operations and building blocks inside the entity. There can be multiple architectures for each entity, but there has to be at least one. An entity–architecture pair is a design. A design fully describes a circuit: its name, how it looks like from the outside, and how it functions on the inside. However, it still does not define how many times

335

the design is used or how it is connected to the outside world. A component is an entity when it is used as a building block for another entity. The component declaration simply indicates our intention to use the building block in the current architecture. It does not indicate how or how many times the component is used, nor does it limit us to one architecture of the component. Finally, an instance is the “for real” place where an entity is used. An instance indicates that a design is declared and used once in a particular place. Everything is specified, concrete, and tangible once the instance is declared. Thus, we have to bind the used entity to a specific architecture if it has multiple architectures. We also have to connect all its ports to real signals.

9.7 1. 2. 3. 4. 5. 6.

Generics and Constants

Recognize the value of using constants Realize that constants cannot be used until declared Contrast generics and constants Understand the syntax for generic mapping Recognize the inheritance of generics Write perfectly scalable code using generics.

The entity declaration in Sect. 9.2 describes a 16-bit input adder. The output is 17 bits because the adder adds an additional carry-out bit (Chap. 11). In many cases, it is very useful to declare and describe building blocks that are scalable. That is, ones whose size can be determined by the designer at implementation time. For example consider the case where we need to declare three different adders, a 14-bit input adder, a 15-bit input adder, and a 16-bit input adder. In this case, we will need to

Snippet. 9.18 Architecture-entity binding. All entities are assumed to reside in the work library. The instance Add1 binds entity sixteen_bit_add to architecture behavioral1. The instance Add2 binds entity sixteen_bit_add to architecture behavioral2

336

declare three different entities with three different architectures. This is despite the fact that all three architecture bodies will be identical, consisting of a single addition statement. In such a case the use of constants, and particularly generics can be a very elegant tool. Constants and generics are very similar, the main difference is where they are declared and used. Constants are easier to understand so we will discuss them first. A constant is declared and used inside an architecture. It is declared in conjunction with internal signals. Its use is exactly the same as the use of a number. Constants can be of any data type, but the majority of useful constants are integers. They are most appropriately used to indicate bus widths or bit positions. But they can be used in any situation where a number is used. In Snippet 9.19, a section of an input bus is extracted and becomes the internal signal extracted_bus. The length of this extracted section is indicated by the constant extracted_part. Note that the constant can be used as soon as it is declared. Thus it can be used to indicate the length of extracted_bus in the signal declaration. However, this would have produced an error if the constant declaration came after the signal declaration. The constant can also be used freely within the architecture. Generics are declared in the entity. They can then be used anywhere in the entity or the architecture. This, combined with the fact that generics can be inherited to lower levels, differentiates generics from constants, and allows us to use generics to write very scalable code. Snippet 9.20 shows an entity declaration with generics. Generics are declared before, and in a similar manner to ports. The generic keyword indicates the start of generic declaration. The generics are then listed with their names, types, and default values. In this example, we have a single generic named input_bus_width. It is an integer whose default value is 16. Declaring a default value is optional but encouraged. The default value will be used if no numerical value is provided for the generic when the entity is

9 HDL

instantiated. If no numerical value or a default value is available at instantiation, synthesis will fail. Once the generic is declared it can be used in the rest of the entity. In this case, it is used to declare the widths of all the inputs and outputs. The architecture associated with the entity in Snippet 9.20 is in Snippet 9.21. Combined, the two describe an input_bus_width-bit adder whose output is input_bus_width+1 bits. Thus we can instantiate any number of adders with any input width from this single entity. The architecture in Snippet 9.21 does not show how generics can be used inside an architecture. To examine this consider the example shown in Snippet 9.22. This example duplicates the bus extraction code in Snippet 9.19, however, in this case a generic is used. This allows us to decide how many bits are extracted at implementation (or more accurately, instantiation) time. Snippet 9.23 shows how generics are determined at instantiation time. This is the same example from Sect. 9.6. In Sect. 9.6 we used two types of components, one for 16-bit adders, and one for 17-bit adders. In Snippet 9.23 we use only one component. This component is the n_bit_adder whose entity is shown in Snippet 9.20 and whose architecture is shown in Snippet 9.21. The component from Snippet 9.20 is instantiated three times. Each instantiation does not only bind ports to actual signals through a port map; but also binds generics to actual values through a generic map. Thus using a single component, the circuit in Fig. 9.7 can be designed by instantiating three instances of the same component. This is shown in Snippet 9.23. By simply binding each generic to the size of the adder we want, we can determine the adder size at instantiation time. Notice that the generic map has a very similar syntax to port maps. The generic map must assign a value to the generic of the proper data type. If a generic is not bound to a value in the generic map, then the default value is used. If a default value is not available either, then an error message is produced and synthesis fails.

Snippet. 9.19 Declaration and use of a constant. Note that to use the constant in the signal declaration, the constant has to be declared first

9.7 Generics and Constants

337

Snippet. 9.20 Entity declaration with generics. The generic has a name, type, and an optional, though highly recommended, default value

Snippet. 9.21 Architecture with generics

Snippet. 9.22 The extracted part of the bus can be decided at instantiation time because it is defined in terms of a generic rather than a constant

338

9 HDL

Snippet. 9.23 Two different size adders instantiated through one entity. This is an architecture body

The real role of generics lies in the possibility of passing them down hierarchical levels. Thus the Snippet in 9.24 shows a modification on Snippet 9.23 to avoid declaring bus widths in terms of explicit integers. Instead, the bus width of the layered adder itself is defined as a generic, then the

widths of the instances of n_bit_adder are defined through their generics in terms of the generic high_level_bus_width. This allows us to keep declaring generics in terms of generics from a higher level, collecting them all at the top level of the design. This strategy allows for perfectly

9.7 Generics and Constants

339

Snippet. 9.24 Generics assigned through inherited generics. The generic input_bus_width is mapped to high_level_bus_width at instantiation. Note that although high_level_bus_width is a generic for layered_adder it “appears” as a number for instantiations within its architecture

scalable designs. Note that the generic “high_level_bus_width” at the layered adder level becomes a constant at the n_bit_adder instantiation level within the architecture. Thus it can be used in exchange for explicit integers. The for-generate structure described in Sect. 9.6 is particularly flexible when combined with generics. Snippet 9.25 shows how for-generate can create a 20-bit shift register. The

component DFF is a D flip-flop with a data input D and a data output Q. The clock is fed to the input port clk. The internal architecture of the component DFF is discussed in detail in Sect. 9.13. Snippet 9.25 shows how the DFF component can be used to construct a shift register. Since the shift register is very long, manually writing 20 component instantiations would be too cumbersome. The for-generate statement uses the fact

340

Snippet. 9.25 For-generate statement used to declare 20 flip-flops

9 HDL

9.7 Generics and Constants

341

Snippet. 9.26 For-generate combined with generics to design variable length shift register

that the instantiations are all for the same component to greatly reduce the amount of code. The generate statement must be labeled. This is followed by the for-generate line. In the for-generate line an index is defined and the body of the generate statement is called as many times as the index limits. The label combined with the index will become the label for the individual instances.

Each DFF will take its input from the output of the preceding DFF and will give its output as an input to the one following. Snippet 9.26 shows how the generate statement can be combined with generics. In this case, a shift register of variable length can be written using very efficient code. This allows us to write a single code and reuse it for a large variety of shift registers. The limit of the for-generate loop is

342

9 HDL

not defined as a number, but is instead a generic of the entity. This allows the length of the shift register to be defined at instantiation time.

9.8

Multiplexing and Choice

1. Recognize the syntax for declaring multiplexers 2. Understand the concurrent nature of multiplexing 3. Realize conditions in multiplexing must be mutually exclusive 4. Realize that leaving a select case undefined will lead to errors. Multiplexing is a very common structure in VHDL. Snippets 9.27 and 9.28 show two different ways in which multiplexers are declared. The two are equivalent and can be used interchangeably without problems. Note that both structures describe concurrent implementations. Thus all conditions are evaluated in parallel and their statements are ready to implement in parallel. This is the proper operation of a multiplexer. However, this means that the selector must have mutually exclusive conditions for every statement. Consider, for example, Snippet 9.27. The “sel” signal can either be “00”, “01”, “10”, or something else. None of these conditions can be true simultaneously. If two conditions can be true simultaneously, then the output node will be contentioned, with two sources trying to write to it. This will lead to a synthesizer error.

Snippet. 9.27 When-else used for multiplexing

Snippet. 9.28 “With X select” used for multiplexing

Note also that the last condition in both cases is an expansive and inclusive “else”. In Snippet 9.27 we say else without specifying else what. In Snippet 9.28, the last condition is when others. Note that in neither case did we specify the only apparently remaining case of “sel”, which is “11”. This is because sel is most likely of the std_logic type, and thus it can take values other than “1” and “0”. These must be covered by the last condition. See Sect. 9.3 for further discussion of the possible values of std_logic. If the last statement covered only the “11” condition, the synthesizer will raise an error because it does not know what to do in case the selector is “ZZ”, “XX”, “UU”, “X0”, “X1”, etc.

9.9

The Process Statement

1. Understand the need to execute sequential statements 2. Study the syntax of the process statement 3. Recognize the difference between signal assignment scheduling and execution 4. Realize the role of the sensitivity list 5. List the proper methods of using processes 6. List syntax allowed only within processes. As explained in Sect. 9.1, VHDL is by definition a concurrent language. And it has to be. Most circuits are combinational circuits that execute in parallel. Thus they require a language that describes their behavior and connections rather than a sequence of operations as is the case for most programming languages.

9.9 The Process Statement

343

Snippet. 9.29 Statements that process sequentially

However, in some cases, we do need to execute sequences of operations. The process statement is our way to do this. The process statement indicates a block of data which is treated in a special way by the simulator and the synthesizer. The process statement has two major features that make it interesting: • All statements inside are executed sequentially. This opens up a whole class of circuits that could not otherwise be implemented • VHDL preserves a lot of highly flexible syntax for use exclusively within processes. This class of syntax can cause issues in synthesis, but when used properly they can greatly simplify hardware description and testing. Processes can be confusing. Their behavior in simulation and the behavior of signals within them can lead to a lot of misunderstandings. Thus, it is good design practice to preserve the use of processes only for the following purposes: • Clocked elements such as counters (Sect. 9.15) • Storage elements such as registers (Sect. 9.13), latches (Sect. 9.12), and memories (Sect. 9.14) • Selection and multiplexing (Sect. 9.11) • State machines (Sect. 9.16) • Testbenches and non-synthesized structures (Sects. 9.17 through 9.20) Snippet 9.29 shows the general syntax of a process. The process begins with a label, in this case process_label. The label allows us to identify the process among other processes. Process labels are optional and it is not uncommon to not write them. The process keyword indicates a process has been started. The space between the process keyword and the begin keyword is reserved for the declaration of local variables (Sect. 9.10). The space between the begin keyword and the end process keywords is the body of the process.

In Snippet 9.29 the three statements in the body of the process are executed in sequence. Thus, we are not describing node connections. In this case, the values of A and B will be exchanged through an intermediate signal T. Obviously, this snippet, when synthesized, must include a memory element to temporarily store the value T. Sections 9.12 and 9.13 explore, in detail, how systematic and predictable memory elements can be written using the process statement. There is one common point of confusion about processes, which is when and how often they are executed. A process is a sequence of statements as opposed to descriptions of connections. Thus, we have to be able to define when the process is “called”. The process always executes once at the beginning of a simulation. The Snippet in 9.29 will loop again after that initial time. For the process to execute again, one of two tools has to be used to “sensitize” it. The first is the wait statement, which is not always synthesizable and is discussed in detail in Sect. 9.19. The other way to control when the process is called is the sensitivity list. Snippet 9.30 shows a process with a sensitivity list. The sensitivity list is a list of signals in brackets after the process keyword. These signals must have been used as inputs inside the body of the process. In Snippet 9.30 only signal R0 exists in the sensitivity list. Thus the process is called initially when the simulation starts. The process is “sensitized” and runs again whenever there is an event on any signal in the sensitivity list. Thus any “event” on R0 will cause the process to run again. “Event” in this case means any kind of actual transition. Assume a simulation of the process in Snippet 9.30. Assume also that all signals R0 through R3 are initialized to 0. But assume that R0 increments every 100 ns. The behavior of the four signals will be as shown in Table 9.11. The changes on R1 through R3 in the table are not immediately obvious and must be explored in more detail.

344

9 HDL

Snippet. 9.30 Process with “incomplete” sensitivity list. This process will run every time R0 actually changes

Table 9.11 Behavior of incomplete sensitivity list in Snippet 9.30

Time (ns)

0

100

200

300

400

500

R0

0

1

2

3

4

5

R1

0

1

2

3

4

5

R2

0

0

1

2

3

4

R3

0

0

0

1

2

3

Whenever R0 changes in Table 9.11, the process is called and executed. Thus, when R0 becomes 1 at 100 ns, the process is called. The process causes R1 to assume the value of R0. Since the process was only called because R0 became 1, R1 will also be assigned the value 1. R2 assumes the value of R1, however, it will not assume the value 1, instead it will see the old value of R1 which is 0. Consider what happens at 500 ns. R0 changes from 4 to 5, calling the process. The first statement will cause R1 to be scheduled to assume the value of R0, which is 5. The signal R1 will not immediately assume the value 5. Instead all changes on signals are scheduled through all statements in the process. The scheduled changes do not become actual changes until the end of the process. Thus at statement R2